P1.5.5 — Test Corpus Parity Harness — Audit (Step 1)

Branch: feature/p1-5-5-parity-harness Worktree: .worktrees/claude/p1-5-5-parity-harness Base SHA: 0150dcd1 (origin/main, post-R86 κ Wave 5) Wave: R87 κ Wave 6 Round: R87 (continuing κ Phase 1) Author tier: T3 executor (autonomous mandate, T0 dated 2026-05-07)


§1. Task framing

P1.5.5 ships a parity harness that runs a fixed event corpus through two distinct rulesets (an “old” ruleset + a candidate “new” ruleset) and produces a deterministic 5-bucket categorization of the divergences found. It does NOT replace either ruleset with the other; it only measures the divergence.

The output ParityReport is the gating artifact for P1.5.2 (Migration). A candidate upgrade is admissible iff:

  1. both_admit_diverge is empty (no rule still admits the same event but produces a different mutation set), AND
  2. The set old_admit_new_reject ∪ old_reject_new_admit is a subset of a pre-declared scope (the upgrade author committed in advance to which admission boundaries would shift, and only those are allowed to shift).

The harness is the parity gate. Without it, a Phase 1 ruleset upgrade has no mechanism to prove “non-breaking”, and θ consensus cannot vote on a fork because the fork’s RULE_UPGRADE divergence event has no proof of bounded scope.


§2. Existing surface (read-only inventory)

§2.1. The κ rule engine — src/domains/rules/engine.ts (P1.3.1, 729 lines)

Public surface at base 0150dcd1:

Export Kind Shape
MAX_INTEGER_OPS const 10_000 (per-rule node-visit cap)
MAX_CALL_DEPTH const 16 (FuncCall nesting cap)
MAX_ARG_COUNT const 8 (per-call arity cap)
Category type 'Admission' \| 'StateTransition' \| 'Consequence' \| 'Promotion'
CATEGORY_ORDER const the 4-tuple, frozen
Mutation interface { kind: 'set'\|'emit'\|'apply'; target: string; field: string; old_value?: unknown; new_value: unknown }
BudgetTracker interface { integer_ops; call_depth; current_arg_count }
Context interface readonly { event; state; rule_version; epoch; bindings; budget }
RuleResult type { status: 'admitted'; mutations: Mutation[] } \| { status: 'rejected'; reason: string }
TransitionResult interface { all_mutations: Mutation[]; per_category_results: Map<Category, RuleResult[]> }
CategorizedRule interface { rule: RuleNode; category: Category }
RuleRegistry interface { getAll(): readonly CategorizedRule[] }
RuleBudgetExceeded class typed Error
evaluate(rule, context) fn per-rule evaluator
evaluateExpr(expr, context) fn recursive walker
executeRuleset(registry, event, state, rule_version, epoch) fn orchestrator returning TransitionResult

Critical for the harness — executeRuleset semantics (engine.ts:682–729):

  • Iterates CATEGORY_ORDER (Admission → StateTransition → Consequence → Promotion).
  • Within each category, sorts rules ASCII-alpha by name (asciiCompareByName, engine.ts:480).
  • Each rule runs in a fresh Context with fresh BudgetTracker (per-rule).
  • all_mutations is the FLATTENED stream of admitted-rule mutations in execution order.
  • per_category_results records every rule’s outcome (admitted-or-rejected) keyed by its category.
  • Determinism contract: identical (registry, event, state, rule_version, epoch) produces bit-identical all_mutations. This is exactly the property the harness relies on to compute hash-stable effect sets.

The harness does NOT need access to evaluate / evaluateExpr directly — it only consumes the public executeRuleset outputs.

§2.2. The canonical serializer — src/domains/rules/canonical.ts (P1.5.4, 311 lines)

Public surface:

Export Kind Notes
canonicalize(value): string fn byte-identical JSON for any reachable input
byteLength(value): number fn UTF-8 byte length
CanonicalSerializationError class thrown for unrepresentable inputs

Properties used by the harness:

  • Single-line, no whitespace between tokens.
  • Object keys sorted by UTF-16 code unit comparison (locale-independent).
  • bigint → decimal-string toString form, no n suffix.
  • Mutation.new_value is unknown; canonical handles bigint/string/boolean/ null/array/plain-object/integer-number recursively.
  • Throws CanonicalSerializationError for undefined, function, symbol, non-integer number, non-plain object, reference cycle.

The harness uses canonicalize(mutations) as the input to SHA-256. Because the engine guarantees mutation order is stable across runs (alpha-by-name within category), the canonical bytes are stable too — and the SHA-256 digest is a deterministic effect-set fingerprint.

§2.3. The version hash module — src/domains/rules/versioning.ts (P1.5.1, 433 lines)

Public surface relevant to the harness:

Export Kind Notes
ENGINE_VERSION const 'kappa-engine/1-0-0'
VERSION_HASH_PREFIX const 'sha256:'
VERSION_HASH_HEX_LENGTH const 64
VERSION_HASH_TOTAL_LENGTH const 71
VersionHashError class input-shape error
computeVersionHash(ruleset, v?) fn SHA-256 entry
verifyRuleVersion(exp, act) fn constant-time hex compare
stripLocations(value) fn recursive location removal
canonicalizeRuleset(ruleset) fn strip + sort + canonicalize

The harness will reuse computeVersionHash to stamp the ParityReport with both rulesets’ version hashes (so a downstream consumer can verify the report was generated against the expected pair). The per-event effect hash uses canonicalize directly (NOT computeVersionHash) — the engine version is already encoded in the report’s old_version_hash / new_version_hash stamp, not in every per-event hash.

§2.4. The determinism scanner — src/domains/rules/determinism.ts (302 lines)

Public surface:

  • inspectFunctionForbidden(fn): readonly string[] — regex scan for forbidden tokens against fn.toString().
  • assertNoForbiddenOps(fn, opts?) — throws on non-empty hits.
  • assertDeterministic(fn, args, opts?) — N-run equality check.
  • deepEqualDeterministic(a, b) — bigint-aware deep equality.
  • DeterminismError — typed error.

Forbidden tokens (FORBIDDEN_PATTERNS, determinism.ts:56–72):

  • Math.*, Date.*, new Date
  • setTimeout, setInterval, setImmediate
  • fetch, XMLHttpRequest
  • require('fs'), from 'fs' (or node:fs)
  • crypto.* (member access pattern; named imports survive)
  • process.hrtime, process.nextTick
  • await
  • async function, async (
  • \d+.\d+ float literal (negative lookbehind on digits + n)
  • [native code]

The harness body MUST scan clean against inspectFunctionForbidden. Companion: the file is also subject to the rule-engine corpus self-scan at src/__tests__/domains/rules/determinism.test.ts:833 (Group 12), which re-applies the same patterns to every .ts file under src/domains/rules/ after comment stripping. Comments may freely cite forbidden tokens; only the post-strip code body is checked.

§2.5. The registry — src/domains/rules/registry.ts (P1.2.4, 513 lines)

Implements RuleRegistry interface from engine. The harness does NOT depend on this module at compile time — it accepts readonly CategorizedRule[] (the shape engine.RuleRegistry.getAll() returns) so test fixtures can construct rulesets without round-tripping through DSL parsing.

The harness’s input shape is two readonly CategorizedRule[] arrays, not two RuleRegistry instances. This keeps the harness decoupled from the loader / DSL / parser / validator stack — all of which are P1.2.x modules that may evolve independently. Tests pass the data directly.

§2.6. The parser — src/domains/rules/parser.ts (P1.2.2, 1000+ lines)

Public surface (subset relevant to harness):

Export Kind
RuleNode interface (top-level rule)
Expression union type
Location interface
parse(input): ParseResult fn

The harness imports RuleNode (engine input shape via CategorizedRule) but does NOT call parse. Test fixtures may parse DSL strings or build AST nodes directly — both work; the engine doesn’t care about provenance.

§2.7. Test layout convention

At base 0150dcd1:

src/__tests__/domains/rules/
├── bps-constants.test.ts      (P1.1.3)
├── builtins.test.ts           (P1.3.2)
├── canonical.test.ts          (P1.5.4)
├── determinism.test.ts        (P1.1.2)
├── engine.test.ts             (P1.3.1)
├── integer-math.test.ts       (P1.1.1)
├── lexer.test.ts              (P1.2.1)
├── parser.test.ts             (P1.2.2)
├── policy-gate.test.ts        (P1.3.4)
├── registry.test.ts           (P1.2.4)
├── state-access.test.ts       (P1.3.3)
├── validator.test.ts          (P1.2.3)
└── versioning.test.ts         (P1.5.1)

Per CLAUDE.md §9.1, the canonical test directory is src/__tests__/, NOT src/domains/rules/__tests__/. The task prompt’s literal path string (src/domains/rules/__tests__/parity-harness.test.ts) is the donor-style colocated convention used in some early Phase 0 spec drafts; the project convention applied across all 13 sibling κ tests is src/__tests__/domains/ rules/<name>.test.ts. The dispatcher prompt explicitly authorizes this adjustment (“match the project test-file convention used by R86 siblings: src/__tests__/domains/rules/parity-harness.test.ts”).

§2.8. Existing event/effect/mutation types

There is no canonical Event type in src/domains/rules/ at base 0150dcd1. The engine accepts event: Readonly<Record<string, unknown>> — a plain record. The harness must define a thin Event type carrying:

  • A stable EventId (string, for the details_by_event map keys + bucket arrays).
  • The event payload — itself a Readonly<Record<string, unknown>>, which is passed through unchanged to executeRuleset(registry, event, state, ...).
  • The state snapshot to run the event against (the engine takes both event and state as inputs; a parity corpus must carry the state for each event so two rulesets see the same input pair).
  • The rule_version and epoch fields the engine consumes.

The harness will define this Event type locally and export it. Future consumers (P1.5.2 migration runner, P1.4.2 conflict resolver) can re-export it from there.


§3. Sibling artifact references

The 5-step audit / contract / packet / verification documents shipped by the sibling κ tasks set the structural template for this audit:

  • docs/audits/p1-5-1-version-hash-audit.md — version hash audit (companion file)
  • docs/audits/p1-5-4-canonical-audit.md — canonical audit (consumer of identical pattern)
  • docs/audits/p1-3-1-engine-audit.md — engine audit (the harness’s primary upstream)
  • docs/audits/p1-2-4-registry-audit.md — registry audit (the harness’s loader-level peer)

The structure used here mirrors p1-5-1-version-hash-audit.md since the harness is the layered consumer of both the engine’s outputs and the canonicalizer.


§4. Out-of-scope (deferred)

Out of scope for P1.5.5:

  • Migration application — turning a parity report into an actual ruleset upgrade. This is P1.5.2 (Wave 7). The harness only produces the report.
  • fork_id minting for divergent paths. P1.5.5 produces the divergence set; the fork machinery is ι (Phase 5).
  • Live ruleset diffing UI — the harness ships TS APIs, not a CLI.
  • Reading rulesets from disk — the harness body never touches fs. Test corpus is shipped as in-process data (DEFAULT_CORPUS).
  • Worker parallelization — explicitly forbidden by the task prompt (“serial is simpler”). Thread scheduling is non-deterministic; consensus cannot tolerate that.
  • Timing assertions inside the harness body — the perf assertion lives in the test file (uses Date.now() in tests, not in harness body). The harness body never reads wall-clock time.

§5. Affordances

What the harness can rely on:

  1. executeRuleset is deterministic. Two calls with identical inputs produce bit-identical all_mutations (per engine.ts:682–729 invariant).
  2. canonicalize is deterministic. Two calls with identical inputs produce byte-identical strings (per canonical.ts:297–300).
  3. createHash('sha256') is deterministic by construction (NIST FIPS 180-4). It’s a named import in versioning.ts; the harness imports the same way.
  4. The 4-category × ASCII-alpha-name execution order is stable across all hosts.
  5. The engine never throws unbounded errors — every per-rule error is caught at the rule boundary in executeRuleset and converted to {status: 'rejected', reason: string}. The harness never has to try/catch around executeRuleset.

§6. Constraints

What the harness must NOT do:

  1. No Math.* — would corrupt determinism. Use comparison operators directly for max/min: a > b ? a : b.
  2. No Date.* — would tie output to wall clock.
  3. No async / await — every API is synchronous. The harness is sync to keep the call graph determinism-checkable by inspectFunctionForbidden.
  4. No worker threads — task prompt forbids this explicitly. Loop is serial.
  5. No fs access inside harness body — the corpus is in-process data.
  6. No short-circuit on first divergence — the entire report is the value. The harness walks every event regardless of how many diverge.
  7. crypto.<member> is forbidden by the corpus self-scan; we use a named import (import { createHash } from 'node:crypto') just like versioning.ts:72. The token literal crypto.createHash never appears in the source body.
  8. No [native code] literal — it would match the corpus self-scan.
  9. No float literals — would match the regex \d+\.\d+ after comment stripping.

§7. Risks

Risk Severity Mitigation
Determinism scanner hits crypto.createHash if we wrote the call as crypto.createHash(...) High Use named import: import { createHash } from 'node:crypto' (versioning.ts:72 pattern).
Effect-hash captures non-canonical Map/Set shapes from Mutation.new_value Med The engine’s Mutation shape is {kind, target, field, new_value}new_value is unknown. Tests must avoid Map/Set values; if any rule were ever to emit one, canonicalize throws. The harness does not catch — it propagates the error and the report fails to build. This is the correct semantics: a non-canonicalizable mutation is itself a determinism violation.
Performance: 10000-event corpus must complete in <5s Low The engine runs ~µs per simple rule; 10000×N (rules) calls fit comfortably even with bigint and SHA-256. The test asserts wall-time using Date.now() inside the test, not the harness body.
Default corpus design — 100 events feels like a lot to hand-curate Low The corpus is structured, not random — 5 events per (admission category × shape) family covers the matrix. Detailed taxonomy ships in the packet (Step 3).
details_by_event Map uses EventId (string) keys — could leak insertion order if Map iteration is exercised Low Map iteration order in ECMA-262 is stable insertion order. Insertion order is the corpus iteration order, which is stable. So Map iteration is deterministic. The harness only WRITES to the map; consumers may iterate, but the write order is corpus order.
Comment-strip in corpus self-scan misses block comments containing */-then-something Low Same risk shipped in 12 sibling files; stripComments is sufficient for our straight prose comments. We will not embed regex literals or odd block-comment content.
TypeScript noUncheckedIndexedAccess + []-array-access patterns Low Sibling files use arr[i]! patterns; we mirror them. Eslint enforces.
RuleNode import path (./parser.js) requires extension at runtime ESM None Project standard — every sibling test does this. tsconfig is module: NodeNext.

§8. Acceptance criteria — restated from task prompt §P1.5.5

AC# Statement Source
AC1 runParity({old_ruleset, new_ruleset, corpus, declared_divergence_scope}): ParityReport exists. task-prompts §P1.5.5
AC2 Per event: old_hash = SHA-256(canonical(old_result.mutations)), new_hash = SHA-256(canonical(new_result.mutations)). task-prompts §P1.5.5
AC3 5-bucket categorization: both_admit_same, both_admit_diverge, old_admit_new_reject, old_reject_new_admit, both_reject. task-prompts §P1.5.5
AC4 pass = (both_admit_diverge.length === 0) AND ((old_admit_new_reject ∪ old_reject_new_admit) ⊆ scope). task-prompts §P1.5.5
AC5 details_by_event: Map<EventId, {old_result, new_result, old_hash, new_hash}>. task-prompts §P1.5.5
AC6 DEFAULT_CORPUS exported with ≥100 events covering admission/state-transition/consequence/promotion/governance/identity/fork. task-prompts §P1.5.5
AC7 Determinism: identical inputs → identical report bytes. task-prompts §P1.5.5
AC8 Performance: 10000 events < 5 seconds. task-prompts §P1.5.5
AC9 Determinism scanner clean: inspectFunctionForbidden(runParity) returns []. implicit (κ corpus self-scan §6 in determinism.test.ts:833)
AC10 npm run build && npm run lint && npm test all green. dispatcher prompt §gate

§9. References

  • Spec source: docs/guides/implementation/task-prompts/p1.1-kappa-rule-engine.md §P1.5.5 (lines 2750–2933).
  • Concept doc: docs/3-world/physics/laws/rule-engine.md §Test corpus parity requirement.
  • Sibling audits: docs/audits/p1-3-1-engine-audit.md, docs/audits/p1-5-1-version-hash-audit.md, docs/audits/p1-5-4-canonical-audit.md, docs/audits/p1-2-4-registry-audit.md.
  • Live code at base 0150dcd1:
    • src/domains/rules/engine.ts — P1.3.1 evaluator (729 lines)
    • src/domains/rules/canonical.ts — P1.5.4 serializer (311 lines)
    • src/domains/rules/versioning.ts — P1.5.1 hash (433 lines)
    • src/domains/rules/determinism.ts — P1.1.2 scanner (302 lines)
    • src/domains/rules/registry.ts — P1.2.4 loader (513 lines)
    • src/domains/rules/parser.ts — P1.2.2 AST (1000+ lines)

Step 1 / 5. Audit complete. Next step: behavioral contract.


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.