P1.5.5 — Test Corpus Parity Harness — Verification (Step 5)

Branch: feature/p1-5-5-parity-harness Worktree: .worktrees/claude/p1-5-5-parity-harness Base SHA: 0150dcd1 (origin/main, post-R86 κ Wave 5) Wave: R87 κ Wave 6 Author tier: T3 executor Audit: docs/audits/p1-5-5-parity-harness-audit.md (dbac7cd6) Contract: docs/contracts/p1-5-5-parity-harness-contract.md (ca2cb7b1) Packet: docs/packets/p1-5-5-parity-harness-packet.md (caf98b1c) Implement: feat(p1-5-5): test corpus parity harness (5a202d58)


§1. Test evidence

§1.1. Gate suite — all three green

$ npm run build
> colibri@0.0.1 build
> tsc

> colibri@0.0.1 postbuild
> node scripts/copy-migrations.mjs
copy-migrations: copied 6 migration(s) ... -> .../dist/db/migrations
$ npm run lint
> colibri@0.0.1 lint
> eslint src

(no output — clean)
$ npm test
...
Test Suites: 41 passed, 41 total
Tests:       2029 passed, 2029 total
Snapshots:   0 total
Time:        ~30-60 s

The test count progression on this branch:

  • Pre-R87 (post-R86 κ Wave 5) baseline: 1972 tests / 40 suites (per memory).
  • R87 P1.5.5 adds: 57 tests in 1 new suite (parity-harness.test.ts).
  • This branch result: 2029 tests / 41 suites.

The arithmetic checks: 1972 + 57 = 2029. ✅

§1.2. Pre-existing flake observation

A single transient run during this work showed 1 failed / 2028 passed where the failure was deep in src/__tests__/server.test.ts (NOT the parity harness). Re-running gave clean 2029/2029. This matches the pre-existing startup — subprocess smoke flake documented in memory MEMORY.md (“pre-existing server-startup-smoke flake hit twice across R86 executors; green on retry, no R86 introduction”).

Verdict: NOT introduced by P1.5.5. Documented; not silenced; flagged for future stabilization.

§1.3. Coverage on parity-harness.ts

From the Jest coverage report:

parity-harness.ts | 97.24% Stmts | 90.69% Branch | 100% Funcs | 97.22% Lines

The four uncovered lines (228, 255, 266, 318) are the defensive if (arr === undefined) continue; and if (e === undefined) branches in anyRuleAdmitted / collapseToRuleResult / validateInput. They guard against unreachable cases under the noUncheckedIndexedAccess tsconfig narrowing — the loop bounds and the engine’s invariant that per_category_results is keyed by every Category mean these branches cannot fire on real input. They are kept for typing tractability, not control-flow coverage.

100% function coverage on every exported function: runParity, effectHash, matchesScope, plus internal helpers all hit.

§1.4. Test fixture coverage trace

Contract §7 fixture Test name Status
F1 — Identical rulesets F1.1 / F1.2 / F1.3 green
F2 — Old admits, new rejects F2.1 / F2.2 / F2.3 green
F3 — Old rejects, new admits F3.1 / F3.2 green
F4 — Diverging mutations F4.1 / F4.2 / F4.3 green
F5 — Both reject F5.1 / F5.2 green
F6 — Empty corpus F6.1 green
F7 — Empty rulesets, non-empty corpus F7.1 green
F8 — Determinism F8.1 / F8.2 / F8.3 green
F9 — Performance F9.1 (10000 events <5s) green
F10 — Default corpus shape F10.1 / F10.2 / F10.3 / F10.4 / F10.5 / F10.6 / F10.7 / F10.8 green
F11 — Scope string match F11.1 / F11.2 / F11.3 / F11.4 green
F12 — Scope regex match F12.1 / F12.2 / F12.3 green
F13 — Scope empty F13.1 / F13.2 green
F14 — Hash format F14.1 / F14.2 / F14.3 green
F15 — Determinism scanner F15.1 / F15.2 / F15.3 green
F16 — Input validation F16.1 / F16.2 / F16.3 / F16.4 / F16.5 / F16.6 / F16.7 / F16.8 / F16.9 / F16.10 green
F17 — Cross-call independence F17.1 / F17.2 green
F18 — NO_RULES collapse F18.1 / F18.2 green
F19 — Output frozen F19.1 (added during impl) green

All 18 contract fixtures + 1 added (F19 frozen output) pass.


§2. Acceptance crosswalk

AC# (audit §8) Statement Verified by
AC1 runParity exists with locked input/output shape F1–F18 all consume the API; passes
AC2 Per-event SHA-256 effect hashes F14.1/F14.2/F14.3 — every hash 71 chars, 'sha256:' prefix, deterministic
AC3 5 categorization buckets F1 (same), F4 (diverge), F2 (admit→reject), F3 (reject→admit), F5/F7 (both reject) — every bucket covered
AC4 pass = (both_admit_diverge == []) AND (divergent ⊆ scope) F2.1/F3.1 (in-scope→pass), F2.2/F3.2 (out-of-scope→fail), F4 (diverge→fail regardless of scope)
AC5 details_by_event: Map<EventId, {old_result, new_result, old_hash, new_hash}> F1.2 / F4.3 / F5.2 / F18 — all four fields verified
AC6 DEFAULT_CORPUS ≥100 events, all 7 categories F10.1 (≥100), F10.2 (=101), F10.3 (unique), F10.4/F10.5 (frozen), F10.6 (every category present), F10.7 (works through harness), F10.8 (deterministic)
AC7 Determinism: identical inputs → identical report bytes F8.1/F8.2 (reportBytes via canonicalize), F10.8 (default corpus)
AC8 10000-event corpus < 5 seconds F9.1 (Date.now()-bracketed in test scope)
AC9 Determinism scanner clean: inspectFunctionForbidden returns [] F15.1/F15.2/F15.3 — runParity / effectHash / matchesScope all return []
AC10 npm run build && npm run lint && npm test all green §1.1 above — all three green

All 10 ACs satisfied.

§2.1. Corpus self-scan compliance

src/__tests__/domains/rules/determinism.test.ts §Group 12 (the “rule-engine corpus self-scan”) re-applies the FORBIDDEN_PATTERNS regex set against every .ts file under src/domains/rules/ after comment stripping. Adding parity-harness.ts to that directory means the file is now in scope of the scan.

The full Jest suite passes, which means the corpus self-scan passes — zero forbidden tokens detected in parity-harness.ts after comment stripping.

Specific clean-room properties verified:

  • crypto.<member> token absent (named import: import { createHash } from 'node:crypto' per versioning.ts:72 pattern).
  • [native code] literal absent.
  • Math.<member> absent (no Math.* used; comparison operators handle max/min where needed).
  • Date.<member> and new Date absent (no clock reads in harness body — Date.now() only appears in test file).
  • setTimeout / setInterval / setImmediate absent.
  • fetch / XMLHttpRequest absent.
  • from 'fs' / require('fs') absent.
  • process.hrtime / process.nextTick absent.
  • await absent.
  • async function / async ( absent.
  • Float literal \d+\.\d+ absent in source body (the regression caught during impl was a JSDoc §3.5 reference inside runParity body, fixed by replacing with section 3 step 5 per fn-source-toString semantics).

§2.2. Determinism scanner caveat note

A subtle property surfaced during impl that’s worth recording for future κ work:

The inspectFunctionForbidden(fn) scanner uses fn.toString() and applies the regex set without comment stripping. Comments inside the function body (including JSDoc lines that survive into toString() on V8 — Node’s implementation includes the JSDoc block when the function is exported via ES module semantics) are scanned literally.

By contrast, the corpus self-scan (determinism.test.ts §Group 12) DOES strip comments first, so JSDoc references like §3.5 in a file-level comment are fine.

Implication for future κ work: when writing inline comments INSIDE a function body that will be exported and tested with inspectFunctionForbidden(fn) === [], avoid \d+\.\d+ patterns in the comment text. Use section X step Y, chapter X subsection Y, or hyphenate (§3-5).


§3. Files shipped

docs/audits/p1-5-5-parity-harness-audit.md         (commit dbac7cd6)
docs/contracts/p1-5-5-parity-harness-contract.md   (commit ca2cb7b1)
docs/packets/p1-5-5-parity-harness-packet.md       (commit caf98b1c)
src/domains/rules/parity-harness.ts                (commit 5a202d58, ~600 LOC)
src/__tests__/domains/rules/parity-harness.test.ts (commit 5a202d58, ~640 LOC)
docs/verification/p1-5-5-parity-harness-verification.md  (this commit)

Total LOC delta: 1737 insertions across 5 commits before this verification. Plus this verification doc.

Zero edits to existing source. Purely additive.


§4. Risks & gotchas observed during implementation

§4.1. Discovered: §3.5 in inline comment fails determinism scanner

What: During the test gate, F15.1 inspectFunctionForbidden(runParity) === [] failed with hit '3.5'. Root cause: an inline comment // Pass decision per contract §3.5. inside the runParity function body matched the float-literal regex (?<![0-9n])\b\d+\.\d+\b/g.

Fix: Renamed the comment to // Pass decision per contract section 3 step 5. (commit 5a202d58).

Lesson: fn.toString() includes inline //-comments inside a function body verbatim. The determinism scanner’s regex set runs against that output without stripping comments. Future κ work writing exported functions: avoid \d+\.\d+ patterns in inline comments inside the function body.

The contract / audit / packet documents are NOT subject to this constraint (only the function source is).

§4.2. Map iteration order

ECMA-262 guarantees Map iteration follows insertion order. The harness populates details_by_event in corpus order, and tests F8.3 verify that consumers reading the Map see corpus order.

This was already a known property; documented here for completeness.

§4.3. Empty mutation list ≠ rejection

A rule that admitted but produced zero mutations is still admitted. The anyRuleAdmitted walker checks r.status === 'admitted', NOT r.mutations.length > 0. F1.1 verifies this: makeAdmittingRule('R') has zero effects but produces status: 'admitted', mutations: [] and lands in both_admit_same.

§4.4. effectHash([]) is the canonical “rejected” hash

Both rejected rulesets produce all_mutations: [] and therefore effectHash([]) — the same SHA-256 digest. F5.1 verifies this. The implication for downstream consumers (P1.5.2 migration runner) is that detail.old_hash === detail.new_hash does NOT distinguish “both admitted identically” from “both rejected”; the bucket assignment is the authoritative signal.


§5. Coverage details

parity-harness.ts:
  Statements: 97.24%  (uncovered: defensive narrowing branches)
  Branches:   90.69%  (uncovered: noUncheckedIndexedAccess defensive ifs)
  Functions:  100.00% (every exported + internal fn covered)
  Lines:      97.22%

The four uncovered lines (228, 255, 266, 318) are intentional defensive guards under the project’s strict TypeScript settings. They cannot fire on well-formed input — the engine guarantees per_category_results is keyed by every Category, and the loop bounds in validateInput make corpus[i] === undefined unreachable. We keep the guards for type narrowing tractability rather than removing them and adding ! non-null assertions, which would silently lose runtime safety under e.g. a future engine refactor that broke the invariant.


§6. Determinism guarantees verified

Invariant (contract §3.6) Verification
I1 — Bit-identical output for identical input F8.1 / F8.2 / F10.8 (reportBytes byte equality)
I2 — details_by_event insertion = corpus order F8.3 explicitly verifies
I3 — Bucket arrays insertion = corpus order implicit in F1–F4 (every test asserts bucket array equals expected ordered subset of corpus ids)
I4 — No wall-clock dependency in body F15.1/F15.2/F15.3 (scanner rejects Date.*)
I5 — No worker / async / fork F15.1/F15.2/F15.3 (scanner rejects async/await)
I6 — Determinism scanner clean F15.1/F15.2/F15.3 + corpus self-scan in determinism.test.ts §Group 12

All 6 invariants hold.


§7. Performance assertion

F9.1 ran a 10000-event corpus through the harness and asserted wall-time < 5000ms. Observed time on the dev host: well under 5 seconds (typical Node ≥ 20 host completes in 100–500 ms for this load — the 5s ceiling is a regression sentinel, not a target).

The performance assertion is a gate, not a flake-prone microbenchmark. The 10× headroom between observed and budget makes false positives extremely unlikely; a real regression that pushed the runtime over 5s would surface a fundamental algorithmic issue worth investigating, not silencing.


§8. Pass / fail outcome

PASS. All 10 ACs satisfied. All 18 contract fixtures green plus 1 added (F19). Coverage 97%+ on parity-harness.ts. Build / lint / test gates all green at base SHA 0150dcd1 + this branch.

The task is complete and ready for PR.


§9. Unblocks

P1.5.2 (Migration) — Wave 7 candidate. The migration runner consumes runParity(input).pass as its admission gate. The harness is the necessary upstream; without it, P1.5.2 has no mechanism to prove non-breaking.

The migration runner will:

  1. Build old_ruleset = currentRuleset() and new_ruleset = parseProposedSource(newSource).
  2. Construct corpus from a stored canonical event log + DEFAULT_CORPUS suffix.
  3. Construct declared_divergence_scope from the upgrade-author’s metadata.
  4. Call runParity(input).
  5. If pass: true, atomically swap; if pass: false, refuse the upgrade with the diagnostic bucket contents.

§10. References

  • Step 1 audit: docs/audits/p1-5-5-parity-harness-audit.md (dbac7cd6)
  • Step 2 contract: docs/contracts/p1-5-5-parity-harness-contract.md (ca2cb7b1)
  • Step 3 packet: docs/packets/p1-5-5-parity-harness-packet.md (caf98b1c)
  • Step 4 implement: commit 5a202d58
  • Spec: docs/guides/implementation/task-prompts/p1.1-kappa-rule-engine.md §P1.5.5
  • Concept: docs/3-world/physics/laws/rule-engine.md §Test corpus parity requirement

Step 5 / 5. Verification complete. Task is ready for PR + writeback.


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.