P1.5.5 — Test Corpus Parity Harness — Behavioral Contract (Step 2)

Branch: feature/p1-5-5-parity-harness Worktree: .worktrees/claude/p1-5-5-parity-harness Base SHA: 0150dcd1 Wave: R87 κ Wave 6 Author tier: T3 executor Audit upstream: docs/audits/p1-5-5-parity-harness-audit.md (committed at dbac7cd6)


§1. Module identity

File: src/domains/rules/parity-harness.ts

A pure synchronous module that runs a fixed event corpus through two κ rule engine evaluations (an “old” ruleset vs. a candidate “new” ruleset) and produces a deterministic 5-bucket categorization report.

The module is the gating artifact for P1.5.2 (Migration). A candidate upgrade is admissible iff its ParityReport.pass === true.

The harness is a pure data transformation. No I/O, no DB, no network, no env reads, no console output, no async, no clock, no RNG. Two callers on any host (Node ≥ 20, any platform, any locale) produce byte-identical ParityReport outputs for byte-identical inputs.


§2. Public surface (LOCKED)

// Type aliases
export type EventId = string;
export type EventIdPattern = string | RegExp;

// The harness's own Event shape
export interface ParityEvent {
  readonly id: EventId;
  readonly event: Readonly<Record<string, unknown>>;
  readonly state: Readonly<Record<string, unknown>>;
  readonly rule_version: string;
  readonly epoch: bigint;
}

// Re-exports for ergonomics — the harness consumer rarely imports from engine
export type { CategorizedRule, RuleResult, Mutation, TransitionResult } from './engine.js';

// Input
export interface ParityInput {
  readonly old_ruleset: readonly CategorizedRule[];
  readonly new_ruleset: readonly CategorizedRule[];
  readonly corpus: readonly ParityEvent[];
  readonly declared_divergence_scope: readonly EventIdPattern[];
}

// Per-event detail
export interface ParityEventDetail {
  readonly old_result: RuleResult;
  readonly new_result: RuleResult;
  readonly old_hash: string;          // 71 chars: 'sha256:' + 64 hex
  readonly new_hash: string;          // 71 chars: 'sha256:' + 64 hex
}

// Output
export interface ParityReport {
  readonly both_admit_same: readonly EventId[];
  readonly both_admit_diverge: readonly EventId[];
  readonly old_admit_new_reject: readonly EventId[];
  readonly old_reject_new_admit: readonly EventId[];
  readonly both_reject: readonly EventId[];
  readonly pass: boolean;
  readonly details_by_event: ReadonlyMap<EventId, ParityEventDetail>;
}

// Errors
export class ParityHarnessError extends Error { /* see §5 */ }

// Entry point
export function runParity(input: ParityInput): ParityReport;

// Default corpus
export const DEFAULT_CORPUS: readonly ParityEvent[];

// Helper: hash a mutation list (also used internally)
export const EFFECT_HASH_PREFIX: 'sha256:';
export const EFFECT_HASH_HEX_LENGTH: 64;
export const EFFECT_HASH_TOTAL_LENGTH: 71;
export function effectHash(mutations: readonly Mutation[]): string;

// Helper: pattern match against EventIdPattern[]
export function matchesScope(id: EventId, patterns: readonly EventIdPattern[]): boolean;

The EventIdPattern accepts both literal strings (matched by ===) and RegExps (matched by .test()). This lets a migration author write a scope as:

declared_divergence_scope: [
  'admit/commitment/large-amount',  // literal id
  /^reject\/dispute\/v2-/,           // regex prefix
]

§3. Behavioral contract

§3.1. runParity(input) algorithm

function runParity(input):
  validate(input)                      // §5 errors
  buckets = { both_admit_same: [], both_admit_diverge: [],
              old_admit_new_reject: [], old_reject_new_admit: [],
              both_reject: [] }
  details = new Map()
  oldRegistry = wrapAsRegistry(input.old_ruleset)
  newRegistry = wrapAsRegistry(input.new_ruleset)

  for event in input.corpus:
    old_t = executeRuleset(oldRegistry, event.event, event.state,
                           event.rule_version, event.epoch)
    new_t = executeRuleset(newRegistry, event.event, event.state,
                           event.rule_version, event.epoch)

    old_admitted = anyRuleAdmitted(old_t)
    new_admitted = anyRuleAdmitted(new_t)
    old_hash = effectHash(old_t.all_mutations)
    new_hash = effectHash(new_t.all_mutations)
    old_result = collapseToRuleResult(old_t)
    new_result = collapseToRuleResult(new_t)

    if old_admitted and new_admitted:
      if old_hash == new_hash: buckets.both_admit_same.push(event.id)
      else:                    buckets.both_admit_diverge.push(event.id)
    elif old_admitted and not new_admitted:
      buckets.old_admit_new_reject.push(event.id)
    elif not old_admitted and new_admitted:
      buckets.old_reject_new_admit.push(event.id)
    else:
      buckets.both_reject.push(event.id)

    details.set(event.id, { old_result, new_result, old_hash, new_hash })

  pass = (buckets.both_admit_diverge.length == 0) AND
         allDivergentInScope(buckets, input.declared_divergence_scope)

  return frozen(buckets, pass, details)

§3.2. Bucket semantics — admission detection

A TransitionResult from executeRuleset is “admitted” iff at least one rule in any category produced status ‘admitted’. Otherwise it is “rejected”.

This rule is precise:

  • A ruleset with zero rules → all_mutations: [], per_category_results has empty arrays for every category. Both rulesets producing empty output land in both_reject with old_hash === new_hash (the SHA-256 of canonical([])). This is the correct semantics: an event that no rule matches is “rejected by silence”.
  • A ruleset with one rule that admitted but produced zero mutations → status ‘admitted’, all_mutations: []. Empty mutation list does NOT imply rejection. The anyRuleAdmitted check walks per_category_results.get(c) for c in CATEGORY_ORDER and returns true on the first status === 'admitted'.

§3.3. effectHash(mutations) algorithm

function effectHash(mutations):
  body = canonicalize(mutations)        // P1.5.4
  hash = createHash('sha256')
  hash.update(body, 'utf8')
  return EFFECT_HASH_PREFIX + hash.digest('hex')

Output: exactly 71 characters — 'sha256:' + 64 lowercase hex.

The hash is computed over Mutation[], not over the full TransitionResult. The all_mutations array IS the deterministic “effect set” for an event under a ruleset; rejected-rule reasons and per-category result arrays carry diagnostic context but are NOT part of the effect set.

This is the same pattern as versioning.ts:357 computeVersionHash (named createHash import + canonicalize + utf8 + hex). It uses no member-access form crypto.<x> so the rule-engine corpus self-scan sees no forbidden token in the harness body.

§3.4. matchesScope(id, patterns) algorithm

function matchesScope(id, patterns):
  for p in patterns:
    if typeof p === 'string':
      if id === p: return true
    elif p instanceof RegExp:
      if p.test(id): return true
  return false

Matching is OR across patterns. Empty patterns means scope is empty — every divergent event fails the subset check (correct: declaring zero allowed divergences means no upgrade with divergence may pass).

§3.5. pass decision

function decidePass(buckets, scope):
  if buckets.both_admit_diverge.length > 0: return false
  for id in concat(buckets.old_admit_new_reject, buckets.old_reject_new_admit):
    if not matchesScope(id, scope): return false
  return true

Note: pass is independent of buckets.both_admit_same and buckets.both_reject (the latter two are non-divergent — both rulesets agree).

§3.6. Determinism guarantees (LOCKED)

I1. Bit-identical output for identical input. Two calls runParity(x) and runParity(x) (same x) produce JSON.stringify-equivalent reports with details_by_event Maps that iterate in identical insertion order.

I2. Insertion order is corpus order. details_by_event is populated in the order events are walked (input.corpus[0], input.corpus[1], …). Map iteration in ECMA-262 is stable insertion order, so consumers see corpus order on for…of.

I3. Bucket arrays are insertion order. Each bucket is filled left-to-right as events stream through the corpus. The arrays preserve corpus order within their bucket.

I4. No wall-clock dependency in the body. runParity, effectHash, matchesScope, wrapAsRegistry, anyRuleAdmitted, collapseToRuleResult — none of these read Date.now() or process.hrtime. The performance assertion is delegated to the test file.

I5. No worker / async / fork. The harness is a single synchronous loop. No Promise, no setImmediate, no Worker. Behaviour is independent of host thread scheduling.

I6. Determinism scanner clean. inspectFunctionForbidden(runParity) returns []. Same for effectHash, matchesScope. The corpus self-scan in determinism.test.ts §Group 12 re-applies the regex set against the entire parity-harness.ts source and finds zero hits.

§3.7. Performance contract

P1: For a corpus of 10000 events, two rulesets of ≤ 50 rules each, on Node ≥ 20 / any host with non-degraded CPU (≥ 1 GHz), the wall-clock duration of runParity(input) is < 5000 ms.

The test file enforces this with Date.now() ticks bracketing the call. The harness body itself never reads the clock. A failure here surfaces a real performance regression; we do not silently retry.

The Phase 1 expected steady-state is well under the budget — typical wall-clock for the same load is around 100-500ms on the dev hosts. The 5s ceiling exists as a regression sentinel, not as a target.


§4. Type definitions — discussion

§4.1. Why a fresh ParityEvent shape rather than reusing engine Event?

The engine’s executeRuleset takes 5 separate args: registry, event, state, rule_version, epoch. A parity corpus must carry all five for each event so two rulesets can be re-run identically. The harness packages them into one record with a stable id for bucket / map indexing.

The Event name is intentionally NOT used here — Event is a built-in DOM type in lib.dom.d.ts that bleeds into Node’s type space when the project’s tsconfig includes browser-flavored libs. We disambiguate as ParityEvent.

§4.2. Why CategorizedRule[] instead of RuleRegistry?

Audit §2.5 — keeps the harness decoupled from the loader stack. Tests construct rulesets directly without DSL parsing, sidestepping P1.2.1–P1.2.4 churn. The harness wraps the array on entry:

function wrapAsRegistry(rules: readonly CategorizedRule[]): RuleRegistry {
  return { getAll: () => rules };
}

The wrap is a 2-line zero-cost shim. It also makes the harness reusable with the live RuleRegistry instance: a caller that has one in hand can pass registry.getAll().

§4.3. Why RuleResult (not TransitionResult) in details_by_event?

The task prompt’s details_by_event shape is Map<EventId, {old_result: RuleResult, new_result: RuleResult, old_hash, new_hash}>. But executeRuleset returns TransitionResult (which has per_category_results: Map<Category, RuleResult[]> and all_mutations).

We collapse the TransitionResult to a single RuleResult:

function collapseToRuleResult(t: TransitionResult): RuleResult {
  // First admitted rule across CATEGORY_ORDER wins; otherwise the first
  // rejection reason wins; otherwise NO_RULES.
  for (const c of CATEGORY_ORDER):
    for (r of t.per_category_results.get(c)):
      if r.status === 'admitted':
        return { status: 'admitted', mutations: t.all_mutations };
  for (const c of CATEGORY_ORDER):
    for (r of t.per_category_results.get(c)):
      if r.status === 'rejected':
        return { status: 'rejected', reason: r.reason };
  return { status: 'rejected', reason: 'NO_RULES' };
}

Important property: if any rule admitted, the collapsed RuleResult.mutations is the FULL t.all_mutations (not just the mutations of one rule). This is the effect set that the hash digests.

§4.4. EventIdPattern regex semantics

Regexes are matched with .test(id) — anchor as needed in the source:

[/^foo/, 'literal-id', /bar$/]

Regex objects are accepted as-is; we do not clone or re-construct them. A caller passing a stateful regex (/x/g with mutable lastIndex) is their problem; we document this gotcha in JSDoc.

§4.5. Frozen output

The returned ParityReport:

  • Object is Object.freezed.
  • Bucket arrays are Object.freezed (typed readonly EventId[]).
  • details_by_event is the live Map. ReadonlyMap is the type, but Map.prototype.set/delete/clear exists at runtime. A consumer who bypasses TS will violate the contract; we accept this — same pattern as RuleRegistry.byType in registry.ts:452.

§5. Error contract

ParityHarnessError extends Error. Subclass name: 'ParityHarnessError'. Thrown from runParity for input-shape violations:

Condition Message format
input is not an object 'input must be an object'
input.old_ruleset is not an array 'old_ruleset must be an array of CategorizedRule'
input.new_ruleset is not an array 'new_ruleset must be an array of CategorizedRule'
input.corpus is not an array 'corpus must be an array of ParityEvent'
input.declared_divergence_scope is not an array 'declared_divergence_scope must be an array of EventIdPattern'
corpus[i].id not a non-empty string 'corpus[i].id must be a non-empty string (i={i})'
Duplicate corpus[i].id 'corpus contains duplicate event id "{id}" at index {i} (first seen at {j})'
corpus[i].rule_version not a string 'corpus[i].rule_version must be a string (i={i})'
corpus[i].epoch not a bigint 'corpus[i].epoch must be a bigint (i={i})'

Errors thrown by upstream modules (e.g. CanonicalSerializationError from canonicalize) propagate. The harness does not wrap them. Rationale: a non-canonicalizable mutation is a determinism violation in the engine layer and the caller needs the typed upstream error to debug.

runParity does NOT validate event.event / event.state / individual ruleset entries — those propagate to executeRuleset, which already has its own typed error path through errorToRejection (engine.ts:459). The harness consumes executeRuleset’s output as-is.


§6. Default corpus (DEFAULT_CORPUS)

Exported readonly ParityEvent[] with 101 events covering the 7 categories called out in the task prompt:

Category Events Coverage
Admission 24 6 transition types × 4 shapes (admit-clean, admit-with-effects, reject-by-guard, no-match)
StateTransition 18 6 transition types × 3 shapes
Consequence 12 REPUTATION_DECAY × 12 numeric shapes
Promotion 8 bare-name rules × 2 admit + 2 reject + 4 NO_MATCH
Governance 12 GOVERNANCE_PROPOSE/VOTE × 6 shapes (admit, reject, vote-tied, vote-passed, vote-failed, withdrawn)
Identity 12 IDENTITY_CREATE/UPDATE × 6 shapes (create-new, create-duplicate, update-name, update-key, update-noop, update-stale)
Fork 15 FORK_CREATE/MERGE × varied (genesis, branch, merge-clean, merge-conflict, ascendant, divergent, etc.)
Total 101  

Every event has:

  • A unique stable id of the form <category>/<shape>/<sub-shape> (e.g. admit/commitment-create/with-effects).
  • A non-empty event record (the event payload).
  • A non-empty state record (the state snapshot).
  • A rule_version (string).
  • An epoch (bigint).

The corpus is frozen (Object.freeze applied to the array and to every event record). Tests verify the freeze.

The corpus is hand-curated (NOT runtime-generated). This eliminates classes of bugs where a runtime generator could change between Node versions, breaking determinism. The corpus literal is part of the source and gets the same Git history as the harness body.


§7. Test acceptance criteria

The test file src/__tests__/domains/rules/parity-harness.test.ts covers every AC from §8 of the audit, plus the determinism + perf properties.

§7.1. Test fixtures

Fixture Purpose
F1 — Identical rulesets All events land in both_admit_same (or both_reject for non-matching events); pass: true; both_admit_diverge: [].
F2 — Old admits, new rejects old_admit_new_reject populated; pass: true if scope covers all, else false.
F3 — Old rejects, new admits old_reject_new_admit populated; same scope-driven pass logic.
F4 — Diverging mutations Both rulesets admit but produce different mutations → both_admit_diverge populated; pass: false regardless of scope.
F5 — Both reject Both rulesets reject the same event → both_reject. Hashes still computed; both equal effectHash([]).
F6 — Empty corpus Returns the 5 empty buckets, pass: true (vacuously), empty Map.
F7 — Empty rulesets, non-empty corpus Every event lands in both_reject; pass: true.
F8 — Determinism Run harness twice, compare report bytes via canonicalize. Identical.
F9 — Performance 10000 events × 2 small rulesets, wall-time < 5000ms.
F10 — Default corpus shape DEFAULT_CORPUS.length >= 100, all ids unique, all frozen, every category represented.
F11 — Scope matching — string Literal-id scope works.
F12 — Scope matching — regex Regex scope works (.test() semantics).
F13 — Scope matching — empty Empty scope ⇒ any divergence fails pass.
F14 — Hash format All hashes start 'sha256:' and are exactly 71 chars long.
F15 — Determinism scanner inspectFunctionForbidden(runParity) returns []; same for effectHash and matchesScope.
F16 — Input validation errors Each of the 8 conditions in §5 throws ParityHarnessError.
F17 — Cross-call independence Calling runParity does NOT mutate the input arrays/maps.
F18 — Rule-engine corpus self-scan inclusion The harness file is one of the files scanned by determinism.test.ts §Group 12; the test will fail if any forbidden token slips in. (No new test needed — the existing scan suffices.)

§7.2. Coverage target

Branch coverage on parity-harness.ts: 100%. We achieve this by:

  • 5 buckets each populated by at least one event in F1–F4.
  • All 8 input-validation paths covered in F16.
  • The unique NO_RULES path covered by an empty old ruleset that still produces a non-admitted TransitionResult from executeRuleset (an empty-rules registry produces TransitionResult with empty arrays — collapse falls through both for-loops to NO_RULES).
  • Both string and regex scope branches covered.

§8. Out-of-contract (deferred)

  • Persisted parity reports (DB write) — P1.5.2.
  • Comparing reports across multiple parity runs (regression detection) — out of P1.5.x.
  • Parallel corpus partitioning — explicitly excluded.
  • A prettyPrint(report) helper — out of contract; tests can read the Map directly.

§9. Acceptance crosswalk

AC# (audit §8) Satisfied by
AC1 (runParity exists) §2 public surface + §3.1 algorithm
AC2 (per-event SHA-256) §3.3 effectHash + F14
AC3 (5 buckets) §3.1 + §3.2 + F1–F5
AC4 (pass condition) §3.5 decidePass + F2/F3/F4/F11–F13
AC5 (details_by_event shape) §2 ParityEventDetail + F1 verifies
AC6 (DEFAULT_CORPUS ≥100, all categories) §6 default corpus + F10
AC7 (deterministic) §3.6 I1–I6 + F8
AC8 (perf <5s for 10k) §3.7 + F9
AC9 (determinism scanner clean) §3.6 I6 + F15 + F18
AC10 (all gates green) run by npm run build && npm run lint && npm test per dispatcher prompt

§10. References

  • Step 1 audit: docs/audits/p1-5-5-parity-harness-audit.md (this PR)
  • Spec source: docs/guides/implementation/task-prompts/p1.1-kappa-rule-engine.md §P1.5.5
  • Concept doc: docs/3-world/physics/laws/rule-engine.md §Test corpus parity requirement
  • Sibling contracts: docs/contracts/p1-5-1-version-hash-contract.md, docs/contracts/p1-5-4-canonical-contract.md
  • Live code at base 0150dcd1: src/domains/rules/{engine,canonical,versioning,determinism,registry,parser}.ts

Step 2 / 5. Behavioral contract complete. Next step: execution packet.


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.