P1.5.5 — Test Corpus Parity Harness — Audit (Step 1)

Branch: feature/p1-5-5-parity-harness Worktree: .worktrees/claude/p1-5-5-parity-harness Base SHA: 0150dcd1 (origin/main, post-R86 κ Wave 5) Wave: R87 κ Wave 6 Round: R87 (continuing κ Phase 1) Author tier: T3 executor (autonomous mandate, T0 dated 2026-05-07)

§1. Task framing

P1.5.5 ships a parity harness that runs a fixed event corpus through two distinct rulesets (an “old” ruleset + a candidate “new” ruleset) and produces a deterministic 5-bucket categorization of the divergences found. It does NOT replace either ruleset with the other; it only measures the divergence.

The output ParityReport is the gating artifact for P1.5.2 (Migration). A candidate upgrade is admissible iff:

both_admit_diverge is empty (no rule still admits the same event but produces a different mutation set), AND
The set old_admit_new_reject ∪ old_reject_new_admit is a subset of a pre-declared scope (the upgrade author committed in advance to which admission boundaries would shift, and only those are allowed to shift).

The harness is the parity gate. Without it, a Phase 1 ruleset upgrade has no mechanism to prove “non-breaking”, and θ consensus cannot vote on a fork because the fork’s RULE_UPGRADE divergence event has no proof of bounded scope.

§2. Existing surface (read-only inventory)

§2.1. The κ rule engine — `src/domains/rules/engine.ts` (P1.3.1, 729 lines)

Public surface at base 0150dcd1:

Export	Kind	Shape
`MAX_INTEGER_OPS`	const	`10_000` (per-rule node-visit cap)
`MAX_CALL_DEPTH`	const	`16` (FuncCall nesting cap)
`MAX_ARG_COUNT`	const	`8` (per-call arity cap)
`Category`	type	`'Admission' \\| 'StateTransition' \\| 'Consequence' \\| 'Promotion'`
`CATEGORY_ORDER`	const	the 4-tuple, frozen
`Mutation`	interface	`{ kind: 'set'\\|'emit'\\|'apply'; target: string; field: string; old_value?: unknown; new_value: unknown }`
`BudgetTracker`	interface	`{ integer_ops; call_depth; current_arg_count }`
`Context`	interface	readonly `{ event; state; rule_version; epoch; bindings; budget }`
`RuleResult`	type	`{ status: 'admitted'; mutations: Mutation[] } \\| { status: 'rejected'; reason: string }`
`TransitionResult`	interface	`{ all_mutations: Mutation[]; per_category_results: Map<Category, RuleResult[]> }`
`CategorizedRule`	interface	`{ rule: RuleNode; category: Category }`
`RuleRegistry`	interface	`{ getAll(): readonly CategorizedRule[] }`
`RuleBudgetExceeded`	class	typed Error
`evaluate(rule, context)`	fn	per-rule evaluator
`evaluateExpr(expr, context)`	fn	recursive walker
`executeRuleset(registry, event, state, rule_version, epoch)`	fn	orchestrator returning `TransitionResult`

Critical for the harness — executeRuleset semantics (engine.ts:682–729):

Iterates CATEGORY_ORDER (Admission → StateTransition → Consequence → Promotion).
Within each category, sorts rules ASCII-alpha by name (asciiCompareByName, engine.ts:480).
Each rule runs in a fresh Context with fresh BudgetTracker (per-rule).
all_mutations is the FLATTENED stream of admitted-rule mutations in execution order.
per_category_results records every rule’s outcome (admitted-or-rejected) keyed by its category.
Determinism contract: identical (registry, event, state, rule_version, epoch) produces bit-identical all_mutations. This is exactly the property the harness relies on to compute hash-stable effect sets.

The harness does NOT need access to evaluate / evaluateExpr directly — it only consumes the public executeRuleset outputs.

§2.2. The canonical serializer — `src/domains/rules/canonical.ts` (P1.5.4, 311 lines)

Public surface:

Export	Kind	Notes
`canonicalize(value): string`	fn	byte-identical JSON for any reachable input
`byteLength(value): number`	fn	UTF-8 byte length
`CanonicalSerializationError`	class	thrown for unrepresentable inputs

Properties used by the harness:

Single-line, no whitespace between tokens.
Object keys sorted by UTF-16 code unit comparison (locale-independent).
bigint → decimal-string toString form, no n suffix.
Mutation.new_value is unknown; canonical handles bigint/string/boolean/ null/array/plain-object/integer-number recursively.
Throws CanonicalSerializationError for undefined, function, symbol, non-integer number, non-plain object, reference cycle.

The harness uses canonicalize(mutations) as the input to SHA-256. Because the engine guarantees mutation order is stable across runs (alpha-by-name within category), the canonical bytes are stable too — and the SHA-256 digest is a deterministic effect-set fingerprint.

§2.3. The version hash module — `src/domains/rules/versioning.ts` (P1.5.1, 433 lines)

Public surface relevant to the harness:

Export	Kind	Notes
`ENGINE_VERSION`	const	`'kappa-engine/1-0-0'`
`VERSION_HASH_PREFIX`	const	`'sha256:'`
`VERSION_HASH_HEX_LENGTH`	const	`64`
`VERSION_HASH_TOTAL_LENGTH`	const	`71`
`VersionHashError`	class	input-shape error
`computeVersionHash(ruleset, v?)`	fn	SHA-256 entry
`verifyRuleVersion(exp, act)`	fn	constant-time hex compare
`stripLocations(value)`	fn	recursive `location` removal
`canonicalizeRuleset(ruleset)`	fn	strip + sort + canonicalize

The harness will reuse computeVersionHash to stamp the ParityReport with both rulesets’ version hashes (so a downstream consumer can verify the report was generated against the expected pair). The per-event effect hash uses canonicalize directly (NOT computeVersionHash) — the engine version is already encoded in the report’s old_version_hash / new_version_hash stamp, not in every per-event hash.

§2.4. The determinism scanner — `src/domains/rules/determinism.ts` (302 lines)

Public surface:

inspectFunctionForbidden(fn): readonly string[] — regex scan for forbidden tokens against fn.toString().
assertNoForbiddenOps(fn, opts?) — throws on non-empty hits.
assertDeterministic(fn, args, opts?) — N-run equality check.
deepEqualDeterministic(a, b) — bigint-aware deep equality.
DeterminismError — typed error.

Forbidden tokens (FORBIDDEN_PATTERNS, determinism.ts:56–72):

Math.*, Date.*, new Date
setTimeout, setInterval, setImmediate
fetch, XMLHttpRequest
require('fs'), from 'fs' (or node:fs)
crypto.* (member access pattern; named imports survive)
process.hrtime, process.nextTick
await
async function, async (
\d+.\d+ float literal (negative lookbehind on digits + n)
[native code]

The harness body MUST scan clean against inspectFunctionForbidden. Companion: the file is also subject to the rule-engine corpus self-scan at src/__tests__/domains/rules/determinism.test.ts:833 (Group 12), which re-applies the same patterns to every .ts file under src/domains/rules/ after comment stripping. Comments may freely cite forbidden tokens; only the post-strip code body is checked.

§2.5. The registry — `src/domains/rules/registry.ts` (P1.2.4, 513 lines)

Implements RuleRegistry interface from engine. The harness does NOT depend on this module at compile time — it accepts readonly CategorizedRule[] (the shape engine.RuleRegistry.getAll() returns) so test fixtures can construct rulesets without round-tripping through DSL parsing.

The harness’s input shape is two readonly CategorizedRule[] arrays, not two RuleRegistry instances. This keeps the harness decoupled from the loader / DSL / parser / validator stack — all of which are P1.2.x modules that may evolve independently. Tests pass the data directly.

§2.6. The parser — `src/domains/rules/parser.ts` (P1.2.2, 1000+ lines)

Public surface (subset relevant to harness):

Export	Kind
`RuleNode`	interface (top-level rule)
`Expression`	union type
`Location`	interface
`parse(input): ParseResult`	fn

The harness imports RuleNode (engine input shape via CategorizedRule) but does NOT call parse. Test fixtures may parse DSL strings or build AST nodes directly — both work; the engine doesn’t care about provenance.

§2.7. Test layout convention

At base 0150dcd1:

src/__tests__/domains/rules/
├── bps-constants.test.ts      (P1.1.3)
├── builtins.test.ts           (P1.3.2)
├── canonical.test.ts          (P1.5.4)
├── determinism.test.ts        (P1.1.2)
├── engine.test.ts             (P1.3.1)
├── integer-math.test.ts       (P1.1.1)
├── lexer.test.ts              (P1.2.1)
├── parser.test.ts             (P1.2.2)
├── policy-gate.test.ts        (P1.3.4)
├── registry.test.ts           (P1.2.4)
├── state-access.test.ts       (P1.3.3)
├── validator.test.ts          (P1.2.3)
└── versioning.test.ts         (P1.5.1)

Per CLAUDE.md §9.1, the canonical test directory is src/__tests__/, NOT src/domains/rules/__tests__/. The task prompt’s literal path string (src/domains/rules/__tests__/parity-harness.test.ts) is the donor-style colocated convention used in some early Phase 0 spec drafts; the project convention applied across all 13 sibling κ tests is src/__tests__/domains/ rules/<name>.test.ts. The dispatcher prompt explicitly authorizes this adjustment (“match the project test-file convention used by R86 siblings: src/__tests__/domains/rules/parity-harness.test.ts”).

§2.8. Existing event/effect/mutation types

There is no canonical Event type in src/domains/rules/ at base 0150dcd1. The engine accepts event: Readonly<Record<string, unknown>> — a plain record. The harness must define a thin Event type carrying:

A stable EventId (string, for the details_by_event map keys + bucket arrays).
The event payload — itself a Readonly<Record<string, unknown>>, which is passed through unchanged to executeRuleset(registry, event, state, ...).
The state snapshot to run the event against (the engine takes both event and state as inputs; a parity corpus must carry the state for each event so two rulesets see the same input pair).
The rule_version and epoch fields the engine consumes.

The harness will define this Event type locally and export it. Future consumers (P1.5.2 migration runner, P1.4.2 conflict resolver) can re-export it from there.

§3. Sibling artifact references

The 5-step audit / contract / packet / verification documents shipped by the sibling κ tasks set the structural template for this audit:

docs/audits/p1-5-1-version-hash-audit.md — version hash audit (companion file)
docs/audits/p1-5-4-canonical-audit.md — canonical audit (consumer of identical pattern)
docs/audits/p1-3-1-engine-audit.md — engine audit (the harness’s primary upstream)
docs/audits/p1-2-4-registry-audit.md — registry audit (the harness’s loader-level peer)

The structure used here mirrors p1-5-1-version-hash-audit.md since the harness is the layered consumer of both the engine’s outputs and the canonicalizer.

§4. Out-of-scope (deferred)

Out of scope for P1.5.5:

Migration application — turning a parity report into an actual ruleset upgrade. This is P1.5.2 (Wave 7). The harness only produces the report.
fork_id minting for divergent paths. P1.5.5 produces the divergence set; the fork machinery is ι (Phase 5).
Live ruleset diffing UI — the harness ships TS APIs, not a CLI.
Reading rulesets from disk — the harness body never touches fs. Test corpus is shipped as in-process data (DEFAULT_CORPUS).
Worker parallelization — explicitly forbidden by the task prompt (“serial is simpler”). Thread scheduling is non-deterministic; consensus cannot tolerate that.
Timing assertions inside the harness body — the perf assertion lives in the test file (uses Date.now() in tests, not in harness body). The harness body never reads wall-clock time.

§5. Affordances

What the harness can rely on:

executeRuleset is deterministic. Two calls with identical inputs produce bit-identical all_mutations (per engine.ts:682–729 invariant).
canonicalize is deterministic. Two calls with identical inputs produce byte-identical strings (per canonical.ts:297–300).
createHash('sha256') is deterministic by construction (NIST FIPS 180-4). It’s a named import in versioning.ts; the harness imports the same way.
The 4-category × ASCII-alpha-name execution order is stable across all hosts.
The engine never throws unbounded errors — every per-rule error is caught at the rule boundary in executeRuleset and converted to {status: 'rejected', reason: string}. The harness never has to try/catch around executeRuleset.

§6. Constraints

What the harness must NOT do:

No Math.* — would corrupt determinism. Use comparison operators directly for max/min: a > b ? a : b.
No Date.* — would tie output to wall clock.
No async / await — every API is synchronous. The harness is sync to keep the call graph determinism-checkable by inspectFunctionForbidden.
No worker threads — task prompt forbids this explicitly. Loop is serial.
No fs access inside harness body — the corpus is in-process data.
No short-circuit on first divergence — the entire report is the value. The harness walks every event regardless of how many diverge.
crypto.<member> is forbidden by the corpus self-scan; we use a named import (import { createHash } from 'node:crypto') just like versioning.ts:72. The token literal crypto.createHash never appears in the source body.
No [native code] literal — it would match the corpus self-scan.
No float literals — would match the regex \d+\.\d+ after comment stripping.

§7. Risks

Risk	Severity	Mitigation
Determinism scanner hits `crypto.createHash` if we wrote the call as `crypto.createHash(...)`	High	Use named import: `import { createHash } from 'node:crypto'` (versioning.ts:72 pattern).
Effect-hash captures non-canonical `Map`/`Set` shapes from `Mutation.new_value`	Med	The engine’s `Mutation` shape is `{kind, target, field, new_value}` — `new_value` is `unknown`. Tests must avoid Map/Set values; if any rule were ever to emit one, `canonicalize` throws. The harness does not catch — it propagates the error and the report fails to build. This is the correct semantics: a non-canonicalizable mutation is itself a determinism violation.
Performance: 10000-event corpus must complete in <5s	Low	The engine runs ~µs per simple rule; 10000×N (rules) calls fit comfortably even with bigint and SHA-256. The test asserts wall-time using `Date.now()` inside the test, not the harness body.
Default corpus design — 100 events feels like a lot to hand-curate	Low	The corpus is structured, not random — 5 events per (admission category × shape) family covers the matrix. Detailed taxonomy ships in the packet (Step 3).
`details_by_event` Map uses `EventId` (string) keys — could leak insertion order if Map iteration is exercised	Low	Map iteration order in ECMA-262 is stable insertion order. Insertion order is the corpus iteration order, which is stable. So Map iteration is deterministic. The harness only WRITES to the map; consumers may iterate, but the write order is corpus order.
Comment-strip in corpus self-scan misses block comments containing `*/`-then-something	Low	Same risk shipped in 12 sibling files; `stripComments` is sufficient for our straight prose comments. We will not embed regex literals or odd block-comment content.
TypeScript noUncheckedIndexedAccess + `[]`-array-access patterns	Low	Sibling files use `arr[i]!` patterns; we mirror them. Eslint enforces.
`RuleNode` import path (`./parser.js`) requires extension at runtime ESM	None	Project standard — every sibling test does this. tsconfig is `module: NodeNext`.

§8. Acceptance criteria — restated from task prompt §P1.5.5

AC#	Statement	Source
AC1	`runParity({old_ruleset, new_ruleset, corpus, declared_divergence_scope}): ParityReport` exists.	task-prompts §P1.5.5
AC2	Per event: `old_hash = SHA-256(canonical(old_result.mutations))`, `new_hash = SHA-256(canonical(new_result.mutations))`.	task-prompts §P1.5.5
AC3	5-bucket categorization: `both_admit_same`, `both_admit_diverge`, `old_admit_new_reject`, `old_reject_new_admit`, `both_reject`.	task-prompts §P1.5.5
AC4	`pass = (both_admit_diverge.length === 0) AND ((old_admit_new_reject ∪ old_reject_new_admit) ⊆ scope)`.	task-prompts §P1.5.5
AC5	`details_by_event: Map<EventId, {old_result, new_result, old_hash, new_hash}>`.	task-prompts §P1.5.5
AC6	`DEFAULT_CORPUS` exported with ≥100 events covering admission/state-transition/consequence/promotion/governance/identity/fork.	task-prompts §P1.5.5
AC7	Determinism: identical inputs → identical report bytes.	task-prompts §P1.5.5
AC8	Performance: 10000 events < 5 seconds.	task-prompts §P1.5.5
AC9	Determinism scanner clean: `inspectFunctionForbidden(runParity)` returns `[]`.	implicit (κ corpus self-scan §6 in determinism.test.ts:833)
AC10	`npm run build && npm run lint && npm test` all green.	dispatcher prompt §gate

§9. References

Spec source: docs/guides/implementation/task-prompts/p1.1-kappa-rule-engine.md §P1.5.5 (lines 2750–2933).
Concept doc: docs/3-world/physics/laws/rule-engine.md §Test corpus parity requirement.
Sibling audits: docs/audits/p1-3-1-engine-audit.md, docs/audits/p1-5-1-version-hash-audit.md, docs/audits/p1-5-4-canonical-audit.md, docs/audits/p1-2-4-registry-audit.md.
Live code at base 0150dcd1:
- src/domains/rules/engine.ts — P1.3.1 evaluator (729 lines)
- src/domains/rules/canonical.ts — P1.5.4 serializer (311 lines)
- src/domains/rules/versioning.ts — P1.5.1 hash (433 lines)
- src/domains/rules/determinism.ts — P1.1.2 scanner (302 lines)
- src/domains/rules/registry.ts — P1.2.4 loader (513 lines)
- src/domains/rules/parser.ts — P1.2.2 AST (1000+ lines)

Step 1 / 5. Audit complete. Next step: behavioral contract.