P1.5.8 — Contract (Step 2 of 5)

Behavioral contract for the cross-model parity suite. Pins each parity invariant to a planned test name. Step 3 (packet) translates these into an execution plan; Step 4 (implement) wires the file.

1. Parity invariants

For each adapter A ∈ {claude, kimi, codex, openai}:

P1 — Shape parity (golden path)

Invariant Description
P1.1 (adapter, prompt) → CompletionResult returns an object with the 6 fields from claude.ts:134-141: content, model, promptTokens, completionTokens, latencyMs, stopReason.
P1.2 Each field has the spec-defined type: content string, model string, promptTokens non-negative integer, completionTokens non-negative integer, latencyMs non-negative finite number, stopReason string.
P1.3 The returned object has NO extra enumerable own properties beyond the 6 specified.

P2 — Determinism parity

Invariant Description
P2.1 Calling adapter twice with identical (prompt, mocked HTTP response) yields two CompletionResults with structurally equal content, model, promptTokens, completionTokens, stopReason (NOT latencyMs which is wall-clock-derived).

P3 — Token-accounting parity

Invariant Description
P3.1 promptTokens is populated from the wire’s native location (Anthropic usage.input_tokens; Kimi/Codex/OpenAI usage.prompt_tokens); given matching mocked counts, all 4 adapters yield numerically equal promptTokens.
P3.2 Same for completionTokens (Anthropic output_tokens; OpenAI-shape completion_tokens).
P3.3 Missing usage block degrades to 0/0 (not NaN, not throw).

P4 — Stop-reason normalization parity

Invariant Description
P4.1 Claude stop_reason: 'end_turn''end_turn'.
P4.2 OpenAI-shape finish_reason: 'stop' → normalized to 'end_turn' by Kimi + Codex adapters. (OpenAI adapter passes 'stop' through unchanged — see openai.ts:441-443.)
P4.3 OpenAI-shape finish_reason: 'tool_calls''tool_use' (Kimi + Codex). (OpenAI passes through unchanged.)
P4.4 OpenAI-shape finish_reason: 'length''max_tokens' (Kimi + Codex); OpenAI passes through.
P4.5 Missing finish_reason → adapter-specific default (‘unknown’). All 4 adapters tolerate this and return a string, never throw.

Note: The OpenAI adapter at openai.ts:441-443 passes finish_reason through verbatim (P4.2/3/4 are not the SAME normalized vocabulary across all four). The parity suite tests this as a documented divergence: Kimi + Codex normalize to Anthropic vocab; OpenAI passes through. The shape invariant (P1: stopReason is a string) holds across all four — only the value mapping diverges. The contract calls out this divergence as expected and tests it explicitly.

P5 — Tool-use mapping parity

Invariant Description
P5.1 When mocked with a tool-use response, every adapter returns CompletionResult whose content is a JSON-stringified array of Anthropic-shape tool_use blocks: {type: 'tool_use', id, name, input}.
P5.2 The input field is a parsed object (not a JSON string).
P5.3 Multiple tool calls in one response produce multiple tool_use blocks in content, preserving order.
P5.4 An empty tools array degrades to a plain (non-tool-use) call: tools key omitted from the request body.

P6 — Error mapping parity

Invariant Description
P6.1 HTTP 401 → adapter-specific error class with terminal code (ANTHROPIC_API_ERROR / KIMI_API_ERROR / CODEX_API_ERROR / OPENAI_API_ERROR).
P6.2 HTTP 500 → after retries, adapter-specific error class with _RETRIES_EXHAUSTED code.
P6.3 Network-level fetch error → adapter-specific error class with _API_ERROR code, status: undefined.
P6.4 Missing API key → adapter-specific *ConfigError (ANTHROPIC_CONFIG_ERROR / KIMI_CONFIG_ERROR / CODEX_CONFIG_ERROR / OPENAI_CONFIG_ERROR).
P6.5 All four adapter error classes extend Error and have a code discriminant property.

P7 — Latency parity

Invariant Description
P7.1 Every successful result has latencyMs >= 0 (NOT testing the lower bound as the kimi-flake demonstrates timer imprecision).
P7.2 latencyMs is a finite number (Number.isFinite(result.latencyMs)).
P7.3 latencyMs is populated even on the success path with zero-delay mocks (i.e. the adapter records latency from request-start to response-parse, not from request-start to mock-resolve).

P8 — Injection seam parity

Invariant Description
P8.1 All four adapters accept fetchFn and use it exclusively (no global fetch calls during tests).
P8.2 All four adapters accept logger and route their log lines through it (success path logs an info line with adapter tag, tokens, latency).
P8.3 All four adapters accept delayFn and call it for retry sleeps (verifiable by counting delayFn calls under a 500 → 500 → 500 → 200 fixture).
P8.4 All four adapters accept apiKey and prefer it over the env var.

2. Test surface (planned)

The parity suite ships as ONE file: src/__tests__/domains/router/parity.test.ts.

A second helper file may be needed for shared fixtures: src/__tests__/domains/router/parity-helpers.ts.

2.1 Adapter-driver abstraction

Each adapter is exercised through a ParityDriver interface that hides the per-adapter signature divergence:

interface ParityDriver {
  readonly name: 'claude' | 'kimi' | 'codex' | 'openai';
  callPlain: (prompt: string, opts: ParityCallOptions) => Promise<CompletionResult>;
  callWithTools: (prompt: string, tools: AnthropicTool[], opts: ParityCallOptions) => Promise<CompletionResult>;
  makeSuccessResponse: () => unknown;        // adapter-native HTTP body
  makeToolUseResponse: () => unknown;        // tool-call wire shape
  makeMissingUsageResponse: () => unknown;   // missing usage block
  parseErrorClass: ErrorConstructor;          // adapter-specific error class
  configErrorClass: ErrorConstructor;         // adapter-specific config error
  apiErrorCode: string;                       // 'ANTHROPIC_API_ERROR' / etc.
  retriesExhaustedCode: string;               // '_RETRIES_EXHAUSTED' variant
  configErrorCode: string;                    // '_CONFIG_ERROR' variant
}

The shared mock-fetch helper is reused from the existing per-adapter suites’ pattern (makeMockFetch(responses)).

2.2 Test matrix

The suite runs describe.each(drivers) over the 4 drivers, with these test blocks per driver:

# Block Invariants asserted
1 P1 — shape parity (golden path) P1.1, P1.2, P1.3
2 P2 — determinism P2.1
3 P3 — token accounting P3.1, P3.2, P3.3
4 P4 — stop-reason parity P4.1, P4.2, P4.3, P4.4, P4.5
5 P5 — tool-use mapping P5.1, P5.2, P5.3, P5.4
6 P6 — error mapping (401/500/net) P6.1, P6.2, P6.3, P6.4, P6.5
7 P7 — latency P7.1, P7.2, P7.3
8 P8 — injection seams (fetch/log/delay/key) P8.1, P8.2, P8.3, P8.4

4 adapters × 8 blocks = 32 driver-parity test cases. Some blocks contain multiple expect-assertions internally; the test count surfaces in the verification doc.

2.3 Cross-cutting tests (single-block, all 4 adapters in one test)

# Test name Invariant span
C1 all 4 adapters return structurally equal CompletionResult shape on success P1.1, P1.2, P1.3 jointly
C2 all 4 adapters yield identical token counts given equivalent mocked usage P3.1, P3.2 jointly
C3 all 4 adapters return a string-typed stopReason given a 'stop' / 'end_turn' fixture P4 envelope
C4 all 4 adapters emit JSON-stringified tool_use array on tool-use response P5.1 jointly

4 cross-cutting tests running joint assertions over the 4 adapters in one test() block.

2.4 Helper function — adapter-driver registry

const DRIVERS: ParityDriver[] = [
  makeClaudeDriver(),
  makeKimiDriver(),
  makeCodexDriver(),
  makeOpenAiDriver(),
];

Each make*Driver returns a frozen object satisfying the interface above.

3. Determinism + isolation

  • Every fetch call is mocked via injected fetchFn; no global stubbing.
  • delayFn is an instant no-op; retry tests never sleep.
  • logger is silent (records into an in-memory array for assertions that demand it).
  • apiKey is a fixed dummy string per adapter; no env var reads (the *_API_KEY env is overridden via the options bag).
  • Tests do NOT call resetRouterStats() because they do not exercise the router’s cost layer — they call adapters directly.

4. Untouched files (CRITICAL)

  • src/domains/router/fallback.ts — sibling P1.5.10 territory.
  • src/domains/router/adapters/*.ts — Wave 3 closed.
  • src/domains/integrations/claude.ts — Phase 0 closed.
  • src/domains/router/tools.ts — P1.5.7 closed; sibling P1.5.10 may touch.

5. Files created

File Role
src/__tests__/domains/router/parity.test.ts Main parity test suite
src/__tests__/domains/router/parity-helpers.ts Shared fixtures + driver impls

No production code is modified.

6. Acceptance gate

npm run build && npm run lint && npm test MUST pass. Pre-existing known flakes (kimi latency, consensus G7.1, reputation race, server startup) retry-clean — same baseline as 3353 tests at 6cfd269b.

Expected test count delta: +32 driver-parity + 4 cross-cutting + a few N-of-1 fixture-shape tests = approximately +40 tests, yielding 3393. Final count recorded in Step 5 (verification).


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.