P1.5.5 — Test Corpus Parity Harness — Audit (Step 1)
Branch: feature/p1-5-5-parity-harness
Worktree: .worktrees/claude/p1-5-5-parity-harness
Base SHA: 0150dcd1 (origin/main, post-R86 κ Wave 5)
Wave: R87 κ Wave 6
Round: R87 (continuing κ Phase 1)
Author tier: T3 executor (autonomous mandate, T0 dated 2026-05-07)
§1. Task framing
P1.5.5 ships a parity harness that runs a fixed event corpus through two distinct rulesets (an “old” ruleset + a candidate “new” ruleset) and produces a deterministic 5-bucket categorization of the divergences found. It does NOT replace either ruleset with the other; it only measures the divergence.
The output ParityReport is the gating artifact for P1.5.2 (Migration). A
candidate upgrade is admissible iff:
both_admit_divergeis empty (no rule still admits the same event but produces a different mutation set), AND- The set
old_admit_new_reject ∪ old_reject_new_admitis a subset of a pre-declared scope (the upgrade author committed in advance to which admission boundaries would shift, and only those are allowed to shift).
The harness is the parity gate. Without it, a Phase 1 ruleset upgrade has no
mechanism to prove “non-breaking”, and θ consensus cannot vote on a fork because
the fork’s RULE_UPGRADE divergence event has no proof of bounded scope.
§2. Existing surface (read-only inventory)
§2.1. The κ rule engine — src/domains/rules/engine.ts (P1.3.1, 729 lines)
Public surface at base 0150dcd1:
| Export | Kind | Shape |
|---|---|---|
MAX_INTEGER_OPS |
const | 10_000 (per-rule node-visit cap) |
MAX_CALL_DEPTH |
const | 16 (FuncCall nesting cap) |
MAX_ARG_COUNT |
const | 8 (per-call arity cap) |
Category |
type | 'Admission' \| 'StateTransition' \| 'Consequence' \| 'Promotion' |
CATEGORY_ORDER |
const | the 4-tuple, frozen |
Mutation |
interface | { kind: 'set'\|'emit'\|'apply'; target: string; field: string; old_value?: unknown; new_value: unknown } |
BudgetTracker |
interface | { integer_ops; call_depth; current_arg_count } |
Context |
interface | readonly { event; state; rule_version; epoch; bindings; budget } |
RuleResult |
type | { status: 'admitted'; mutations: Mutation[] } \| { status: 'rejected'; reason: string } |
TransitionResult |
interface | { all_mutations: Mutation[]; per_category_results: Map<Category, RuleResult[]> } |
CategorizedRule |
interface | { rule: RuleNode; category: Category } |
RuleRegistry |
interface | { getAll(): readonly CategorizedRule[] } |
RuleBudgetExceeded |
class | typed Error |
evaluate(rule, context) |
fn | per-rule evaluator |
evaluateExpr(expr, context) |
fn | recursive walker |
executeRuleset(registry, event, state, rule_version, epoch) |
fn | orchestrator returning TransitionResult |
Critical for the harness — executeRuleset semantics (engine.ts:682–729):
- Iterates
CATEGORY_ORDER(Admission → StateTransition → Consequence → Promotion). - Within each category, sorts rules ASCII-alpha by name (
asciiCompareByName, engine.ts:480). - Each rule runs in a fresh Context with fresh BudgetTracker (per-rule).
all_mutationsis the FLATTENED stream of admitted-rule mutations in execution order.per_category_resultsrecords every rule’s outcome (admitted-or-rejected) keyed by its category.- Determinism contract: identical
(registry, event, state, rule_version, epoch)produces bit-identicalall_mutations. This is exactly the property the harness relies on to compute hash-stable effect sets.
The harness does NOT need access to evaluate / evaluateExpr directly — it
only consumes the public executeRuleset outputs.
§2.2. The canonical serializer — src/domains/rules/canonical.ts (P1.5.4, 311 lines)
Public surface:
| Export | Kind | Notes |
|---|---|---|
canonicalize(value): string |
fn | byte-identical JSON for any reachable input |
byteLength(value): number |
fn | UTF-8 byte length |
CanonicalSerializationError |
class | thrown for unrepresentable inputs |
Properties used by the harness:
- Single-line, no whitespace between tokens.
- Object keys sorted by UTF-16 code unit comparison (locale-independent).
bigint→ decimal-string toString form, nonsuffix.Mutation.new_valueisunknown; canonical handles bigint/string/boolean/ null/array/plain-object/integer-number recursively.- Throws
CanonicalSerializationErrorforundefined, function, symbol, non-integer number, non-plain object, reference cycle.
The harness uses canonicalize(mutations) as the input to SHA-256. Because the
engine guarantees mutation order is stable across runs (alpha-by-name within
category), the canonical bytes are stable too — and the SHA-256 digest is a
deterministic effect-set fingerprint.
§2.3. The version hash module — src/domains/rules/versioning.ts (P1.5.1, 433 lines)
Public surface relevant to the harness:
| Export | Kind | Notes |
|---|---|---|
ENGINE_VERSION |
const | 'kappa-engine/1-0-0' |
VERSION_HASH_PREFIX |
const | 'sha256:' |
VERSION_HASH_HEX_LENGTH |
const | 64 |
VERSION_HASH_TOTAL_LENGTH |
const | 71 |
VersionHashError |
class | input-shape error |
computeVersionHash(ruleset, v?) |
fn | SHA-256 entry |
verifyRuleVersion(exp, act) |
fn | constant-time hex compare |
stripLocations(value) |
fn | recursive location removal |
canonicalizeRuleset(ruleset) |
fn | strip + sort + canonicalize |
The harness will reuse computeVersionHash to stamp the ParityReport with
both rulesets’ version hashes (so a downstream consumer can verify the report
was generated against the expected pair). The per-event effect hash uses
canonicalize directly (NOT computeVersionHash) — the engine version is
already encoded in the report’s old_version_hash / new_version_hash stamp,
not in every per-event hash.
§2.4. The determinism scanner — src/domains/rules/determinism.ts (302 lines)
Public surface:
inspectFunctionForbidden(fn): readonly string[]— regex scan for forbidden tokens againstfn.toString().assertNoForbiddenOps(fn, opts?)— throws on non-empty hits.assertDeterministic(fn, args, opts?)— N-run equality check.deepEqualDeterministic(a, b)— bigint-aware deep equality.DeterminismError— typed error.
Forbidden tokens (FORBIDDEN_PATTERNS, determinism.ts:56–72):
Math.*,Date.*,new DatesetTimeout,setInterval,setImmediatefetch,XMLHttpRequestrequire('fs'),from 'fs'(ornode:fs)crypto.*(member access pattern; named imports survive)process.hrtime,process.nextTickawaitasync function,async (\d+.\d+float literal (negative lookbehind on digits +n)[native code]
The harness body MUST scan clean against inspectFunctionForbidden.
Companion: the file is also subject to the rule-engine corpus self-scan at
src/__tests__/domains/rules/determinism.test.ts:833 (Group 12), which
re-applies the same patterns to every .ts file under src/domains/rules/
after comment stripping. Comments may freely cite forbidden tokens; only the
post-strip code body is checked.
§2.5. The registry — src/domains/rules/registry.ts (P1.2.4, 513 lines)
Implements RuleRegistry interface from engine. The harness does NOT depend
on this module at compile time — it accepts readonly CategorizedRule[] (the
shape engine.RuleRegistry.getAll() returns) so test fixtures can construct
rulesets without round-tripping through DSL parsing.
The harness’s input shape is two readonly CategorizedRule[] arrays, not
two RuleRegistry instances. This keeps the harness decoupled from the
loader / DSL / parser / validator stack — all of which are P1.2.x modules
that may evolve independently. Tests pass the data directly.
§2.6. The parser — src/domains/rules/parser.ts (P1.2.2, 1000+ lines)
Public surface (subset relevant to harness):
| Export | Kind |
|---|---|
RuleNode |
interface (top-level rule) |
Expression |
union type |
Location |
interface |
parse(input): ParseResult |
fn |
The harness imports RuleNode (engine input shape via CategorizedRule) but
does NOT call parse. Test fixtures may parse DSL strings or build AST nodes
directly — both work; the engine doesn’t care about provenance.
§2.7. Test layout convention
At base 0150dcd1:
src/__tests__/domains/rules/
├── bps-constants.test.ts (P1.1.3)
├── builtins.test.ts (P1.3.2)
├── canonical.test.ts (P1.5.4)
├── determinism.test.ts (P1.1.2)
├── engine.test.ts (P1.3.1)
├── integer-math.test.ts (P1.1.1)
├── lexer.test.ts (P1.2.1)
├── parser.test.ts (P1.2.2)
├── policy-gate.test.ts (P1.3.4)
├── registry.test.ts (P1.2.4)
├── state-access.test.ts (P1.3.3)
├── validator.test.ts (P1.2.3)
└── versioning.test.ts (P1.5.1)
Per CLAUDE.md §9.1, the canonical test directory is src/__tests__/, NOT
src/domains/rules/__tests__/. The task prompt’s literal path string
(src/domains/rules/__tests__/parity-harness.test.ts) is the donor-style
colocated convention used in some early Phase 0 spec drafts; the project
convention applied across all 13 sibling κ tests is src/__tests__/domains/
rules/<name>.test.ts. The dispatcher prompt explicitly authorizes this
adjustment (“match the project test-file convention used by R86 siblings:
src/__tests__/domains/rules/parity-harness.test.ts”).
§2.8. Existing event/effect/mutation types
There is no canonical Event type in src/domains/rules/ at base 0150dcd1.
The engine accepts event: Readonly<Record<string, unknown>> — a plain
record. The harness must define a thin Event type carrying:
- A stable
EventId(string, for thedetails_by_eventmap keys + bucket arrays). - The event payload — itself a
Readonly<Record<string, unknown>>, which is passed through unchanged toexecuteRuleset(registry, event, state, ...). - The
statesnapshot to run the event against (the engine takes botheventandstateas inputs; a parity corpus must carry the state for each event so two rulesets see the same input pair). - The
rule_versionandepochfields the engine consumes.
The harness will define this Event type locally and export it. Future
consumers (P1.5.2 migration runner, P1.4.2 conflict resolver) can re-export it
from there.
§3. Sibling artifact references
The 5-step audit / contract / packet / verification documents shipped by the sibling κ tasks set the structural template for this audit:
docs/audits/p1-5-1-version-hash-audit.md— version hash audit (companion file)docs/audits/p1-5-4-canonical-audit.md— canonical audit (consumer of identical pattern)docs/audits/p1-3-1-engine-audit.md— engine audit (the harness’s primary upstream)docs/audits/p1-2-4-registry-audit.md— registry audit (the harness’s loader-level peer)
The structure used here mirrors p1-5-1-version-hash-audit.md since the harness
is the layered consumer of both the engine’s outputs and the canonicalizer.
§4. Out-of-scope (deferred)
Out of scope for P1.5.5:
- Migration application — turning a parity report into an actual ruleset upgrade. This is P1.5.2 (Wave 7). The harness only produces the report.
fork_idminting for divergent paths. P1.5.5 produces the divergence set; the fork machinery is ι (Phase 5).- Live ruleset diffing UI — the harness ships TS APIs, not a CLI.
- Reading rulesets from disk — the harness body never touches fs. Test
corpus is shipped as in-process data (
DEFAULT_CORPUS). - Worker parallelization — explicitly forbidden by the task prompt (“serial is simpler”). Thread scheduling is non-deterministic; consensus cannot tolerate that.
- Timing assertions inside the harness body — the perf assertion lives in
the test file (uses
Date.now()in tests, not in harness body). The harness body never reads wall-clock time.
§5. Affordances
What the harness can rely on:
executeRulesetis deterministic. Two calls with identical inputs produce bit-identicalall_mutations(per engine.ts:682–729 invariant).canonicalizeis deterministic. Two calls with identical inputs produce byte-identical strings (per canonical.ts:297–300).createHash('sha256')is deterministic by construction (NIST FIPS 180-4). It’s a named import in versioning.ts; the harness imports the same way.- The 4-category × ASCII-alpha-name execution order is stable across all hosts.
- The engine never throws unbounded errors — every per-rule error is
caught at the rule boundary in
executeRulesetand converted to{status: 'rejected', reason: string}. The harness never has to try/catch aroundexecuteRuleset.
§6. Constraints
What the harness must NOT do:
- No Math.* — would corrupt determinism. Use comparison operators
directly for max/min:
a > b ? a : b. - No Date.* — would tie output to wall clock.
- No async / await — every API is synchronous. The harness is sync
to keep the call graph determinism-checkable by
inspectFunctionForbidden. - No worker threads — task prompt forbids this explicitly. Loop is serial.
- No fs access inside harness body — the corpus is in-process data.
- No short-circuit on first divergence — the entire report is the value. The harness walks every event regardless of how many diverge.
crypto.<member>is forbidden by the corpus self-scan; we use a named import (import { createHash } from 'node:crypto') just like versioning.ts:72. The token literalcrypto.createHashnever appears in the source body.- No
[native code]literal — it would match the corpus self-scan. - No float literals — would match the regex
\d+\.\d+after comment stripping.
§7. Risks
| Risk | Severity | Mitigation |
|---|---|---|
Determinism scanner hits crypto.createHash if we wrote the call as crypto.createHash(...) |
High | Use named import: import { createHash } from 'node:crypto' (versioning.ts:72 pattern). |
Effect-hash captures non-canonical Map/Set shapes from Mutation.new_value |
Med | The engine’s Mutation shape is {kind, target, field, new_value} — new_value is unknown. Tests must avoid Map/Set values; if any rule were ever to emit one, canonicalize throws. The harness does not catch — it propagates the error and the report fails to build. This is the correct semantics: a non-canonicalizable mutation is itself a determinism violation. |
| Performance: 10000-event corpus must complete in <5s | Low | The engine runs ~µs per simple rule; 10000×N (rules) calls fit comfortably even with bigint and SHA-256. The test asserts wall-time using Date.now() inside the test, not the harness body. |
| Default corpus design — 100 events feels like a lot to hand-curate | Low | The corpus is structured, not random — 5 events per (admission category × shape) family covers the matrix. Detailed taxonomy ships in the packet (Step 3). |
details_by_event Map uses EventId (string) keys — could leak insertion order if Map iteration is exercised |
Low | Map iteration order in ECMA-262 is stable insertion order. Insertion order is the corpus iteration order, which is stable. So Map iteration is deterministic. The harness only WRITES to the map; consumers may iterate, but the write order is corpus order. |
Comment-strip in corpus self-scan misses block comments containing */-then-something |
Low | Same risk shipped in 12 sibling files; stripComments is sufficient for our straight prose comments. We will not embed regex literals or odd block-comment content. |
TypeScript noUncheckedIndexedAccess + []-array-access patterns |
Low | Sibling files use arr[i]! patterns; we mirror them. Eslint enforces. |
RuleNode import path (./parser.js) requires extension at runtime ESM |
None | Project standard — every sibling test does this. tsconfig is module: NodeNext. |
§8. Acceptance criteria — restated from task prompt §P1.5.5
| AC# | Statement | Source |
|---|---|---|
| AC1 | runParity({old_ruleset, new_ruleset, corpus, declared_divergence_scope}): ParityReport exists. |
task-prompts §P1.5.5 |
| AC2 | Per event: old_hash = SHA-256(canonical(old_result.mutations)), new_hash = SHA-256(canonical(new_result.mutations)). |
task-prompts §P1.5.5 |
| AC3 | 5-bucket categorization: both_admit_same, both_admit_diverge, old_admit_new_reject, old_reject_new_admit, both_reject. |
task-prompts §P1.5.5 |
| AC4 | pass = (both_admit_diverge.length === 0) AND ((old_admit_new_reject ∪ old_reject_new_admit) ⊆ scope). |
task-prompts §P1.5.5 |
| AC5 | details_by_event: Map<EventId, {old_result, new_result, old_hash, new_hash}>. |
task-prompts §P1.5.5 |
| AC6 | DEFAULT_CORPUS exported with ≥100 events covering admission/state-transition/consequence/promotion/governance/identity/fork. |
task-prompts §P1.5.5 |
| AC7 | Determinism: identical inputs → identical report bytes. | task-prompts §P1.5.5 |
| AC8 | Performance: 10000 events < 5 seconds. | task-prompts §P1.5.5 |
| AC9 | Determinism scanner clean: inspectFunctionForbidden(runParity) returns []. |
implicit (κ corpus self-scan §6 in determinism.test.ts:833) |
| AC10 | npm run build && npm run lint && npm test all green. |
dispatcher prompt §gate |
§9. References
- Spec source:
docs/guides/implementation/task-prompts/p1.1-kappa-rule-engine.md§P1.5.5 (lines 2750–2933). - Concept doc:
docs/3-world/physics/laws/rule-engine.md§Test corpus parity requirement. - Sibling audits:
docs/audits/p1-3-1-engine-audit.md,docs/audits/p1-5-1-version-hash-audit.md,docs/audits/p1-5-4-canonical-audit.md,docs/audits/p1-2-4-registry-audit.md. - Live code at base
0150dcd1:src/domains/rules/engine.ts— P1.3.1 evaluator (729 lines)src/domains/rules/canonical.ts— P1.5.4 serializer (311 lines)src/domains/rules/versioning.ts— P1.5.1 hash (433 lines)src/domains/rules/determinism.ts— P1.1.2 scanner (302 lines)src/domains/rules/registry.ts— P1.2.4 loader (513 lines)src/domains/rules/parser.ts— P1.2.2 AST (1000+ lines)
Step 1 / 5. Audit complete. Next step: behavioral contract.