P1.5.8 — Verification (Step 5 of 5)

Captures test evidence and parity matrix for the cross-model parity suite shipped in step 4.

1. Test count delta

Anchor	Count
Baseline (main @ `6cfd269b`)	3353
After P1.5.8 (HEAD)	3464
Delta	+111

The packet projected ~108 net new tests; the implementation lands at +111 (which includes 3 driver-registry sanity tests added in the implement step to defend the helper contract).

2. Gate results

Gate	Result
`npm run build`	green — `tsc` exits 0; `postbuild` copies migrations
`npm run lint`	green — `eslint src` exits 0; zero warnings
`npm test` (full)	green — 3464 / 3464 pass; 78 / 78 suites pass; 46.6 s wall-clock
`npm test` (parity-only)	green — 111 / 111 pass; 5.8 s wall-clock

2.1 Flake notes

A single full-suite run earlier in the verification cycle hit consensus/parity-harness.test.ts › G7.1, reputation/tokens.test.ts › ..., and db/migrations/009-model-candidates.test.ts › ... flakes. Both are pre-existing (documented in the dispatch slice as well-known and retry-clean) and unrelated to this slice. A clean retry succeeded (3464 / 3464). The parity suite ran clean on EVERY attempt across the verification cycle (5 invocations: 1 in-isolation under npx jest parity.test.ts + 4 inside the full suite).

3. Parity matrix — invariants × adapters

The suite asserts 8 contract blocks × 4 adapters + 4 cross-cutting + 3 driver-registry sanity = 111 tests. Each contract block contains multiple internal expect assertions.

Invariant block	Description	Claude	Kimi	Codex	OpenAI
P1 — shape parity	6 spec fields + types + no extras (3 tests)	yes	yes	yes	yes
P2 — determinism	Two identical fixtures → equal results (1 test)	yes	yes	yes	yes
P3 — token accounting	prompt/completion tokens + missing usage degrades (3 tests)	yes	yes	yes	yes
P4 — stop-reason mapping	success + tool-use + always-string (3 tests)	yes	yes	yes	yes
P5 — tool-use mapping	content shape + input object + multi-tool + empty-tools (4 tests)	yes	yes	yes	yes
P6 — error mapping	401 + 500-retries-exhausted + network + missing-key + Error subclass (5 tests)	yes	yes	yes	yes
P7 — latency	non-negative + finite + populated (3 tests)	yes	yes	yes	yes
P8 — injection seams	fetchFn + logger + delayFn + apiKey (4 tests)	yes	yes	yes	yes

Per driver = 26 tests · 4 drivers = 104 driver-parity tests.

Cross-cutting (4 tests over all 4 adapters jointly)

#	Test	Verdict
C1	All 4 adapters return structurally equal CompletionResult shape	pass
C2	All 4 adapters yield identical token counts	pass
C3	All 4 adapters return string-typed stopReason	pass
C4	All 4 adapters emit JSON-stringified tool_use[] array	pass

Driver-registry sanity (3 tests)

#	Test	Verdict
R1	Exactly 4 drivers registered	pass
R2	Driver names = `['claude', 'codex', 'kimi', 'openai']`	pass
R3	Every driver exposes the parity-contract surface	pass

Total: 104 + 4 + 3 = 111 tests, all green.

4. Stop-reason value divergence (documented, asserted)

Per contract §P4.2, OpenAI’s adapter passes finish_reason through verbatim (openai.ts:441-443), while Claude, Kimi, and Codex normalize to the Anthropic vocabulary ('end_turn', 'tool_use', 'max_tokens').

The cross-cutting test C3 asserts this explicitly:

const reasonValues = reasons.map((r) => r.reason).sort();
expect(reasonValues).toEqual(['end_turn', 'end_turn', 'end_turn', 'stop']);

That is, 3 of 4 adapters produce 'end_turn'; 1 (OpenAI) produces 'stop'. The shape invariant (P1: stopReason is a string) is uniformly maintained — only the value normalization diverges, and the suite asserts that divergence explicitly.

5. Sibling-race compliance

src/domains/router/fallback.ts is UNTOUCHED. Verified by:

git diff --stat HEAD -- src/domains/router/fallback.ts
(empty)

This satisfies the dispatch override’s CRITICAL constraint.

Adapter source files are also untouched: claude.ts, kimi.ts, codex.ts, openai.ts — all unchanged from base 6cfd269b.

6. Files touched (final)

File	Status	LoC
`docs/audits/p1-5-8-parity-audit.md`	new	174
`docs/contracts/p1-5-8-parity-contract.md`	new	211
`docs/packets/p1-5-8-parity-packet.md`	new	288
`src/__tests__/domains/router/parity-helpers.ts`	new	596
`src/__tests__/domains/router/parity.test.ts`	new	809
`docs/verification/p1-5-8-parity-verification.md`	new	(this file)

Production code touched: zero files.

7. Kimi-flake decision

Per the dispatch override §”Optional clarifications”:

Optional: fix the kimi.test.ts ● injection seams › 7. latency measurement: 50ms delay → latencyMs >= 50 flake introduced in P1.5.2 W3. … if the staging file’s slice doesn’t authorize touching the existing adapter test files, leave the flake and document in verification doc; the parity suite is the right successor.

Decision: deferred. Rationale:

Editing src/__tests__/domains/router/adapters/kimi.test.ts is out-of-slice scope (P1.5.2 territory).
The parity suite’s P7.1 test asserts result.latencyMs >= 0 instead of >= delay, so the kimi-class flake CANNOT recur in the parity suite by construction.
The kimi adapter test continues to assert >= 50 against a 50 ms delay, which remains the documented brittle pattern. Future round may relax that to >= 45 or use a deterministic clock injection.

Net effect: the parity suite is the right successor to that assertion. The historical test is left as-is per the override.

8. Branch + commit log

Branch: feature/p1-5-8-parity Worktree: .worktrees/claude/p1-5-8-parity Base: origin/main @ 6cfd269b

Step	Commit
1. Audit	`audit(p1-5-8-parity): inventory cross-model parity surface`
2. Contract	`contract(p1-5-8-parity): behavioral contract for cross-model parity`
3. Packet	`packet(p1-5-8-parity): execution plan`
4. Implement	`feat(p1-5-8-parity): cross-model parity test suite (4 adapters)`
5. Verify	(this commit) — `verify(p1-5-8-parity): test evidence + parity matrix`

9. Acceptance — slice criteria

From the dispatch override §”Test gate”:

npm run build && npm run lint && npm test — all gates green.
Baseline 3353 tests preserved; +111 new tests; 3464 total.
No regression on existing tests (flakes resolve on retry).
No mutation to src/domains/router/fallback.ts.
Parity matrix covers all 4 adapters × 8 invariant blocks.

10. Closing

P1.5.8 ships 111 parity tests spanning shape, determinism, tokens, stop-reason, tool-use mapping, error mapping, latency, and injection seams across all four δ adapters. The router’s CompletionFn boundary is now provably interchangeable across Claude / Kimi / Codex / OpenAI: adapters are bit-shape-compatible at the result level, error-class discriminants are uniform-by-shape, and tool-use translation produces identical Anthropic-shaped content from divergent wire shapes.

The slice ships file-disjoint from sibling P1.5.10 (ζ integration modifies fallback.ts; this slice does not). Both can merge sequentially without conflict.