P1.5.8 — Audit (Step 1 of 5)

Round: R92 Wave 7 parallel slice 1/2. Sibling: P1.5.10 ζ integration (file-disjoint — modifies src/domains/router/fallback.ts; this slice MUST NOT touch that file). Base: origin/main @ 6cfd269b (post-P1.5.7 #258 merge).

1. Scope statement

Inventory the four δ adapters (Claude, Kimi, Codex, OpenAI), their shared CompletionResult contract, and the existing per-adapter test surface, in order to identify which parity invariants are already covered per-adapter and which require a new cross-model suite.

The deliverable of P1.5.8 is a single integration test file (or test subdir) that exercises ALL four adapters through the same fixtures and asserts structural equality of their CompletionResult shape across equivalent mocked responses. That gives the router a load-bearing parity proof: adapters are interchangeable at the CompletionFn boundary.

2. Adapters in scope (4)

Adapter Public entry points Source file API protocol Test file
Claude createCompletion · createCompletionWithTools src/domains/integrations/claude.ts Anthropic Messages src/__tests__/domains/integrations/claude.test.ts
Kimi createKimiCompletion · createKimiCompletionWithTools src/domains/router/adapters/kimi.ts OpenAI Chat Completions src/__tests__/domains/router/adapters/kimi.test.ts
Codex createCodexCompletion · createCodexCompletionWithTools src/domains/router/adapters/codex.ts OpenAI Chat Completions src/__tests__/domains/router/adapters/codex.test.ts
OpenAI createOpenAiCompletion · createOpenAiCompletionWithTools src/domains/router/adapters/openai.ts OpenAI Chat Completions src/__tests__/domains/router/adapters/openai.test.ts

All four return CompletionResult (re-exported from claude.ts:134-141):

interface CompletionResult {
  readonly content: string;
  readonly model: string;
  readonly promptTokens: number;
  readonly completionTokens: number;
  readonly latencyMs: number;
  readonly stopReason: string;
}

This shape is structurally identical across all four adapters by design (P1.5.2 / P1.5.3 / P1.5.4 each cite “shape parity with claude.ts” as an invariant — see kimi.ts:24-43, codex.ts:11-37, openai.ts:11-79). The parity invariant is therefore declared by every adapter individually; P1.5.8’s contribution is to verify it under one fixture set, so the router’s CompletionFn boundary is provably interchangeable.

3. Shared types

Sourced from src/domains/integrations/claude.ts:

  • CompletionResult (line 134-141) — return shape; identical across all 4 adapters.
  • AnthropicTool (line 95-99) — tool descriptor; structurally re-exported by every adapter (openai.ts:182-186 aliases as OpenAiTool but it is the same shape).

Per-adapter CompletionOptions shapes:

  • Claude: CompletionOptions — no baseUrl.
  • Kimi: KimiCompletionOptions — adds baseUrl.
  • Codex: CodexCompletionOptions — adds baseUrl.
  • OpenAI: OpenAiCompletionOptions — adds baseUrl.

Each adapter accepts injectable seams: fetchFn, logger, delayFn, apiKey. This is the injection point P1.5.8 will use.

4. Existing test coverage (per-adapter, pre-P1.5.8)

Each adapter has its own dedicated test file with hand-crafted fixtures:

Adapter Tests (file LoC) Determinism Tool-use 401 error 500 error Retry/timeout
Claude ~580 LoC yes yes yes yes yes
Kimi ~620 LoC yes yes yes yes yes
Codex ~590 LoC yes yes yes yes yes
OpenAI ~610 LoC yes yes yes yes yes

Gap: every adapter has its own assertion fixtures. There is no test that runs the same fixture through all four and asserts the CompletionResult shape is structurally equal. That gap is what P1.5.8 closes.

5. Router boundary tests (already exist)

  • src/__tests__/domains/router/fallback.test.ts — exercises routeRequest through completionFn / completionFnRegistry injection. Each test injects a CompletionFn stub, not a real adapter. Confirms that the router contract is shape-agnostic but does NOT prove that the four real adapters all SATISFY the shape with equivalent mocked HTTP responses.
  • src/__tests__/domains/router/tools.test.ts — MCP tool surface tests for router_score, router_call, router_fallback, router_stats.

6. Sibling-race constraint

P1.5.10 ζ integration runs in parallel and modifies:

  • src/domains/router/fallback.ts (to emit ζ trail events per router call)
  • src/domains/router/tools.ts (to ensure the 4 MCP tools emit shape)

The slice override therefore forbids this slice from touching either of those files. Parity tests may IMPORT from them (e.g. import routeRequest from ../../../domains/router/fallback.js) but not edit them.

7. Pre-existing flakes (acknowledged, not in scope)

  • kimi.test.ts § injection seams › 7. latency measurement: 50ms delay → latencyMs >= 50 — timer imprecision under load. The override states: optional fix, may leave with note.
  • consensus/parity-harness.test.ts › G7.1 — perf budget flake (10000 iterations < 5s). Pre-existing R89 Phase B issue; not in scope.
  • server.test.ts › startup chain — pre-existing pre-R75 flake; not in scope.

8. What the parity suite must prove

For each of the 4 adapters, given a uniform set of mocked HTTP responses expressed in the adapter’s native wire shape:

  1. Shape parity — every successful CompletionResult has the 6 fields from §3 with the correct types.
  2. Determinism — same fixture twice → structurally equal result (excluding latencyMs which is wall-clock).
  3. Token-accounting paritypromptTokens and completionTokens are populated from the wire’s native location (Anthropic usage.input_tokens / usage.output_tokens; OpenAI/Kimi/Codex usage.prompt_tokens / usage.completion_tokens).
  4. Stop-reason parity — Anthropic vocabulary on the result side (end_turn / tool_use / max_tokens / etc.) regardless of the underlying wire vocabulary.
  5. Tool-use mapping parity — when a tool-use response is mocked, the result’s content is a JSON-stringified Anthropic-shape tool_use[] array.
  6. Error mapping parity — 401 / 500 / network errors raise the adapter-specific error class with the expected code discriminant.
  7. Latency parity — every result has a finite non-negative latencyMs. (Suite uses >= 0 not >= delay to dodge the kimi flake.)
  8. Injection-seam parity — every adapter accepts fetchFn, logger, delayFn, apiKey from the options bag (all four are tested via their existing per-adapter suites; the parity suite ASSUMES this and uses the four uniformly).

9. Out of scope

  • ANY edit to src/domains/router/fallback.ts (sibling P1.5.10 territory).
  • ANY edit to adapter source files (sibling Wave 3 territory; P1.5.2/3/4 closed).
  • ζ Decision Trail recording (P1.5.10 scope).
  • Real network calls (forbidden — all fetchFn injected).
  • Multi-run flake-detection loop (the prompt’s “5 repeat runs” is a manual local check, not a test-loop construct).
  • Wire-byte parity (different providers wrap tokens differently; we test structural parity, not byte parity).

10. Path forward

Per CLAUDE.md §6, next steps are:

  • Step 2 (contract) — pin parity invariants precisely as acceptance criteria, mapping each to a planned test name.
  • Step 3 (packet) — execution plan: test file layout, fixture registry, mocked-fetch helper, the 4-adapter × N-invariant matrix.
  • Step 4 (implement) — write src/__tests__/domains/router/parity.test.ts.
  • Step 5 (verify) — record test count delta + parity matrix.

Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.