P1.5.8 — Contract (Step 2 of 5)

Behavioral contract for the cross-model parity suite. Pins each parity invariant to a planned test name. Step 3 (packet) translates these into an execution plan; Step 4 (implement) wires the file.

1. Parity invariants

For each adapter A ∈ {claude, kimi, codex, openai}:

P1 — Shape parity (golden path)

Invariant	Description
P1.1	`(adapter, prompt) → CompletionResult` returns an object with the 6 fields from `claude.ts:134-141`: `content`, `model`, `promptTokens`, `completionTokens`, `latencyMs`, `stopReason`.
P1.2	Each field has the spec-defined type: `content` string, `model` string, `promptTokens` non-negative integer, `completionTokens` non-negative integer, `latencyMs` non-negative finite number, `stopReason` string.
P1.3	The returned object has NO extra enumerable own properties beyond the 6 specified.

P2 — Determinism parity

Invariant	Description
P2.1	Calling adapter twice with identical (prompt, mocked HTTP response) yields two `CompletionResult`s with structurally equal `content`, `model`, `promptTokens`, `completionTokens`, `stopReason` (NOT `latencyMs` which is wall-clock-derived).

P3 — Token-accounting parity

Invariant	Description
P3.1	`promptTokens` is populated from the wire’s native location (Anthropic `usage.input_tokens`; Kimi/Codex/OpenAI `usage.prompt_tokens`); given matching mocked counts, all 4 adapters yield numerically equal `promptTokens`.
P3.2	Same for `completionTokens` (Anthropic `output_tokens`; OpenAI-shape `completion_tokens`).
P3.3	Missing usage block degrades to `0/0` (not `NaN`, not throw).

P4 — Stop-reason normalization parity

Invariant	Description
P4.1	Claude `stop_reason: 'end_turn'` → `'end_turn'`.
P4.2	OpenAI-shape `finish_reason: 'stop'` → normalized to `'end_turn'` by Kimi + Codex adapters. (OpenAI adapter passes `'stop'` through unchanged — see `openai.ts:441-443`.)
P4.3	OpenAI-shape `finish_reason: 'tool_calls'` → `'tool_use'` (Kimi + Codex). (OpenAI passes through unchanged.)
P4.4	OpenAI-shape `finish_reason: 'length'` → `'max_tokens'` (Kimi + Codex); OpenAI passes through.
P4.5	Missing `finish_reason` → adapter-specific default (‘unknown’). All 4 adapters tolerate this and return a string, never throw.

Note: The OpenAI adapter at openai.ts:441-443 passes finish_reason through verbatim (P4.2/3/4 are not the SAME normalized vocabulary across all four). The parity suite tests this as a documented divergence: Kimi + Codex normalize to Anthropic vocab; OpenAI passes through. The shape invariant (P1: stopReason is a string) holds across all four — only the value mapping diverges. The contract calls out this divergence as expected and tests it explicitly.

P5 — Tool-use mapping parity

Invariant	Description
P5.1	When mocked with a tool-use response, every adapter returns `CompletionResult` whose `content` is a JSON-stringified array of Anthropic-shape `tool_use` blocks: `{type: 'tool_use', id, name, input}`.
P5.2	The `input` field is a parsed object (not a JSON string).
P5.3	Multiple tool calls in one response produce multiple `tool_use` blocks in `content`, preserving order.
P5.4	An empty `tools` array degrades to a plain (non-tool-use) call: `tools` key omitted from the request body.

P6 — Error mapping parity

Invariant	Description
P6.1	HTTP 401 → adapter-specific error class with terminal code (`ANTHROPIC_API_ERROR` / `KIMI_API_ERROR` / `CODEX_API_ERROR` / `OPENAI_API_ERROR`).
P6.2	HTTP 500 → after retries, adapter-specific error class with `_RETRIES_EXHAUSTED` code.
P6.3	Network-level fetch error → adapter-specific error class with `_API_ERROR` code, `status: undefined`.
P6.4	Missing API key → adapter-specific `*ConfigError` (`ANTHROPIC_CONFIG_ERROR` / `KIMI_CONFIG_ERROR` / `CODEX_CONFIG_ERROR` / `OPENAI_CONFIG_ERROR`).
P6.5	All four adapter error classes extend `Error` and have a `code` discriminant property.

P7 — Latency parity

Invariant	Description
P7.1	Every successful result has `latencyMs >= 0` (NOT testing the lower bound as the kimi-flake demonstrates timer imprecision).
P7.2	`latencyMs` is a finite number (`Number.isFinite(result.latencyMs)`).
P7.3	`latencyMs` is populated even on the success path with zero-delay mocks (i.e. the adapter records latency from request-start to response-parse, not from request-start to mock-resolve).

P8 — Injection seam parity

Invariant	Description
P8.1	All four adapters accept `fetchFn` and use it exclusively (no global `fetch` calls during tests).
P8.2	All four adapters accept `logger` and route their log lines through it (success path logs an info line with adapter tag, tokens, latency).
P8.3	All four adapters accept `delayFn` and call it for retry sleeps (verifiable by counting `delayFn` calls under a 500 → 500 → 500 → 200 fixture).
P8.4	All four adapters accept `apiKey` and prefer it over the env var.

2. Test surface (planned)

The parity suite ships as ONE file: src/__tests__/domains/router/parity.test.ts.

A second helper file may be needed for shared fixtures: src/__tests__/domains/router/parity-helpers.ts.

2.1 Adapter-driver abstraction

Each adapter is exercised through a ParityDriver interface that hides the per-adapter signature divergence:

interface ParityDriver {
  readonly name: 'claude' | 'kimi' | 'codex' | 'openai';
  callPlain: (prompt: string, opts: ParityCallOptions) => Promise<CompletionResult>;
  callWithTools: (prompt: string, tools: AnthropicTool[], opts: ParityCallOptions) => Promise<CompletionResult>;
  makeSuccessResponse: () => unknown;        // adapter-native HTTP body
  makeToolUseResponse: () => unknown;        // tool-call wire shape
  makeMissingUsageResponse: () => unknown;   // missing usage block
  parseErrorClass: ErrorConstructor;          // adapter-specific error class
  configErrorClass: ErrorConstructor;         // adapter-specific config error
  apiErrorCode: string;                       // 'ANTHROPIC_API_ERROR' / etc.
  retriesExhaustedCode: string;               // '_RETRIES_EXHAUSTED' variant
  configErrorCode: string;                    // '_CONFIG_ERROR' variant
}

The shared mock-fetch helper is reused from the existing per-adapter suites’ pattern (makeMockFetch(responses)).

2.2 Test matrix

The suite runs describe.each(drivers) over the 4 drivers, with these test blocks per driver:

#	Block	Invariants asserted
1	P1 — shape parity (golden path)	P1.1, P1.2, P1.3
2	P2 — determinism	P2.1
3	P3 — token accounting	P3.1, P3.2, P3.3
4	P4 — stop-reason parity	P4.1, P4.2, P4.3, P4.4, P4.5
5	P5 — tool-use mapping	P5.1, P5.2, P5.3, P5.4
6	P6 — error mapping (401/500/net)	P6.1, P6.2, P6.3, P6.4, P6.5
7	P7 — latency	P7.1, P7.2, P7.3
8	P8 — injection seams (fetch/log/delay/key)	P8.1, P8.2, P8.3, P8.4

4 adapters × 8 blocks = 32 driver-parity test cases. Some blocks contain multiple expect-assertions internally; the test count surfaces in the verification doc.

2.3 Cross-cutting tests (single-block, all 4 adapters in one test)

#	Test name	Invariant span
C1	`all 4 adapters return structurally equal CompletionResult shape on success`	P1.1, P1.2, P1.3 jointly
C2	`all 4 adapters yield identical token counts given equivalent mocked usage`	P3.1, P3.2 jointly
C3	`all 4 adapters return a string-typed stopReason given a 'stop' / 'end_turn' fixture`	P4 envelope
C4	`all 4 adapters emit JSON-stringified tool_use array on tool-use response`	P5.1 jointly

4 cross-cutting tests running joint assertions over the 4 adapters in one test() block.

2.4 Helper function — adapter-driver registry

const DRIVERS: ParityDriver[] = [
  makeClaudeDriver(),
  makeKimiDriver(),
  makeCodexDriver(),
  makeOpenAiDriver(),
];

Each make*Driver returns a frozen object satisfying the interface above.

3. Determinism + isolation

Every fetch call is mocked via injected fetchFn; no global stubbing.
delayFn is an instant no-op; retry tests never sleep.
logger is silent (records into an in-memory array for assertions that demand it).
apiKey is a fixed dummy string per adapter; no env var reads (the *_API_KEY env is overridden via the options bag).
Tests do NOT call resetRouterStats() because they do not exercise the router’s cost layer — they call adapters directly.

4. Untouched files (CRITICAL)

src/domains/router/fallback.ts — sibling P1.5.10 territory.
src/domains/router/adapters/*.ts — Wave 3 closed.
src/domains/integrations/claude.ts — Phase 0 closed.
src/domains/router/tools.ts — P1.5.7 closed; sibling P1.5.10 may touch.

5. Files created

File	Role
`src/__tests__/domains/router/parity.test.ts`	Main parity test suite
`src/__tests__/domains/router/parity-helpers.ts`	Shared fixtures + driver impls

No production code is modified.

6. Acceptance gate

npm run build && npm run lint && npm test MUST pass. Pre-existing known flakes (kimi latency, consensus G7.1, reputation race, server startup) retry-clean — same baseline as 3353 tests at 6cfd269b.

Expected test count delta: +32 driver-parity + 4 cross-cutting + a few N-of-1 fixture-shape tests = approximately +40 tests, yielding 3393. Final count recorded in Step 5 (verification).