P1.5.6 — Cost Accounting — Behavioral Contract

Round: R92 Wave 5 of 7 Branch: feature/p1-5-6-cost Base: origin/main @ c284ad22 Audit: p1-5-6-cost-audit.md

1. Public surface (new exports from `src/domains/router/cost.ts`)

/* Per-call cost computation. Pure: candidate snapshot drives the cost
 * per-1k-tokens lookup. Returns USD as a JS `number` with 4-decimal
 * internal precision (the API surface presents 2 decimals — that's a
 * P1.5.7 router_stats formatting concern, not a cost.ts concern). */
export function computeCostUsd(
  modelId: ModelId,
  promptTokens: number,
  completionTokens: number,
  candidatesSnapshot?: ReadonlyArray<ModelCandidate>,
): number;

/* Per-model aggregate snapshot used by `router_stats` (P1.5.7) and by
 * the ζ decision trail (P1.5.10). Frozen at both levels. */
export interface RouterStats {
  readonly calls_total: number;
  readonly successes: number;
  readonly failures: number;
  readonly avg_cost_usd: number;
  readonly p50_latency_ms: number;
  readonly success_rate: number;
}

/* Per-call record-keeping hook. Called once per attempted adapter
 * call — both success and failure paths. NOT called for circuit-open
 * skips or NoAdapter pre-flight bailouts (those never reach the
 * adapter). */
export interface RouterCallRecord {
  readonly promptTokens: number;
  readonly completionTokens: number;
  readonly latencyMs: number;
  readonly success: boolean;
  /** Optional snapshot for cost lookup; usually flows from RouteOptions. */
  readonly candidatesSnapshot?: ReadonlyArray<ModelCandidate>;
}

export function recordRouterCall(
  modelId: ModelId,
  record: RouterCallRecord,
): void;

/* Reads the full router-stats snapshot. Frozen at both levels. */
export function getRouterStats(): {
  readonly models: Readonly<Record<ModelId, RouterStats>>;
};

/* Resets one model (with arg) or every model (no arg). */
export function resetRouterStats(modelId?: ModelId): void;

/* Internal ring buffer constants — exported for tests. */
export const ROUTER_LATENCY_RING_SIZE = 1000 as const;

2. `RouteResult` extension (additive)

export interface RouteResult {
  readonly model: ModelId;
  readonly content: string;
  readonly finishReason: string;
  readonly promptTokens: number;
  readonly completionTokens: number;
  readonly latencyMs: number;
  // P1.5.6 additions (append-only):
  readonly costUsd: number;
  readonly modelsAttempted: ReadonlyArray<ModelId>;
}

The two new fields are append-only. Existing Phase 0 destructuring sites that bind only { model, content, finishReason, promptTokens, completionTokens, latencyMs } continue to compile (TypeScript structural typing accepts extra fields).

3. Cost-computation invariants

ID	Invariant	Verified by
I-COST-1	`computeCostUsd(modelId, p, c, snap)` reads `snap` for `m where m.model_id === modelId` and uses its `cost_bps_per_kilotoken`.	`cost.test.ts` “reads cost from snapshot row”
I-COST-2	When the snapshot lacks `modelId`: returns `0`.	`cost.test.ts` “returns 0 when modelId absent”
I-COST-3	When `snap` is `undefined`: returns `0`.	`cost.test.ts` “returns 0 when snapshot omitted”
I-COST-4	When `cost_bps_per_kilotoken === 0`: returns `0`.	`cost.test.ts` “returns 0 for free tier”
I-COST-5	When `promptTokens + completionTokens === 0`: returns `0`.	`cost.test.ts` “returns 0 for zero-token call”
I-COST-6	Formula: `Number(((BigInt(p + c) * BigInt(cost_bps)) / 1000n)) / 10000`	`cost.test.ts` golden-vector cases
I-COST-7	Integer overflow safe — bigint inner math.	`cost.test.ts` “handles large token counts” (e.g. 10^9 tokens × 1000 bps)
I-COST-8	Result is a JS `number`, never `NaN`, never `Infinity`. Negative tokens or negative bps coerce to `0` (defence in depth).	`cost.test.ts` “negative tokens returns 0”
I-COST-9	Deterministic: identical `(modelId, p, c, snap)` → identical USD.	`cost.test.ts` “identical inputs produce identical output across 100 invocations”

4. Aggregate invariants

ID	Invariant	Verified by
I-AGG-1	`calls_total === successes + failures` after every call.	`cost.test.ts` “calls_total accounting”
I-AGG-2	`success_rate === successes / calls_total` (or `0` when `calls_total === 0`).	`cost.test.ts` “success_rate at 80%”
I-AGG-3	`avg_cost_usd === Σ(per_call_cost_usd) / successes` (or `0` when `successes === 0`). Failed calls contribute `0` to the sum (since their `promptTokens + completionTokens === 0`). The mean denominator is `successes`, not `calls_total`.	`cost.test.ts` “avg_cost_usd reflects actual sums”
I-AGG-4	`p50_latency_ms` is the median of `latencies` populated so far (or `0` when no calls). For an even count, median = lower-of-two (the n/2-1 index, zero-indexed in sorted order). For odd count, median = middle element.	`cost.test.ts` “p50 for 5 calls” + “p50 for 6 calls (even)”
I-AGG-5	The latency ring buffer is bounded at `ROUTER_LATENCY_RING_SIZE = 1000`. After the 1001st call, the oldest latency is overwritten; `p50` is over the most recent 1000.	`cost.test.ts` “ring buffer bound”
I-AGG-6	Per-model isolation: `recordRouterCall('claude', …)` does not mutate the stats for any other `ModelId`.	`cost.test.ts` “kimi stats don’t contaminate claude”
I-AGG-7	`resetRouterStats('claude')` clears only the `'claude'` entry; other models retain their state.	`cost.test.ts` “single-model reset”
I-AGG-8	`resetRouterStats()` clears every model’s state.	`cost.test.ts` “reset all”
I-AGG-9	`getRouterStats()` returns a frozen `{ models }` object with frozen per-model entries. Mutating the result is a TypeError under strict mode.	`cost.test.ts` “result is frozen”
I-AGG-10	Failed calls (`success: false`) contribute to `failures` and `calls_total` but NOT to `avg_cost_usd` (since cost is 0 for failed calls and the denominator is `successes`). Their `latencyMs` IS recorded into the ring buffer (so p50 reflects total round-trip behavior including failures).	`cost.test.ts` “failure-only model has p50 over failures”
I-AGG-11	The aggregate sum-of-costs is stored as `bigint` (in bps space); the divide to USD happens once at `getRouterStats` time, not on every `recordRouterCall`.	`cost.test.ts` “no float drift across 10000 calls”

5. `modelsAttempted` invariants

ID	Invariant	Verified by
I-MA-1	`modelsAttempted` is a frozen `ReadonlyArray<ModelId>`.	`fallback.test.ts` cost block
I-MA-2	On single-attempt success: `modelsAttempted.length === 1` and `modelsAttempted[0] === winner`.	`fallback.test.ts` cost block “single-attempt happy path”
I-MA-3	On cascade (A fails, B succeeds): `modelsAttempted === [A, B]` in chain-walk order.	`fallback.test.ts` cost block “cascade”
I-MA-4	Order = `orderedChain(scores)` order (NOT alphabetical, NOT scoring-input order).	`fallback.test.ts` cost block “order matches chain walk”
I-MA-5	On `FallbackChainExhaustedError`: `modelsAttempted` is NOT exposed (no `RouteResult` is produced). The error’s `attempts[]` already carries the visit list — `modelsAttempted` is only on the success path.	(Verified by `RouteResult` being unreachable on the failure path.)

6. `routeRequest` body invariants (post-P1.5.6)

The P1.5.5 invariants I1–I8 and I10–I19 (from p1-5-5-fallback-cb-contract.md §6) are preserved verbatim. I9 is augmented:

ID	P1.5.5 statement	P1.5.6 augmentation
I9	`getCircuitBreakerState()` returns a frozen snapshot.	(Unchanged.)

The dispatch packet’s explicit allowance overrides the P1.5.5 forbidden “No costUsd / modelsAttempted fields on RouteResult” — that forbidden was a temporal gate (“not in P1.5.5 — wait for W5”). W5 = P1.5.6, which is this slice.

New P1.5.6 invariants on routeRequest:

ID	Invariant	Verified by
I-RR-20	The success path calls `recordRouterCall(winner, {…, success: true, candidatesSnapshot: options.candidatesSnapshot})` exactly once before returning.	`fallback.test.ts` cost block “happy path increments successes”
I-RR-21	Every adapter-call failure (caught in the `catch (err)` block) calls `recordRouterCall(modelId, {…, success: false})` exactly once.	`fallback.test.ts` cost block “cascade failure increments failures”
I-RR-22	`CircuitOpenError` and `NoAdapterError` pre-flight bailouts do NOT call `recordRouterCall`.	`fallback.test.ts` cost block “circuit-open does not record”
I-RR-23	The successful `RouteResult.costUsd` is `computeCostUsd(winner, upstream.promptTokens, upstream.completionTokens, options.candidatesSnapshot)`.	`fallback.test.ts` cost block “costUsd reflects token math”
I-RR-24	`RouteResult.modelsAttempted` lists every `modelId` the chain walk visited, in walk order, including the winner. Length matches `attempts.length + 1` on success.	`fallback.test.ts` cost block “modelsAttempted lists chain walk”
I-RR-25	`RouteResult` remains frozen (I16 preserved); the new fields are read-only.	`fallback.test.ts` cost block “RouteResult is frozen”

7. Failure-latency measurement

For failed attempts (the catch block in routeRequest):

const attemptStart = (options.nowFn ?? Date.now)();
try {
  const upstream = await raceWithTimeout(adapter(prompt, upstreamOptions), modelId, timeoutMs);
  ...
} catch (err) {
  recordFailure(modelId, nowOpts);
  const measuredMs = (options.nowFn ?? Date.now)() - attemptStart;
  recordRouterCall(modelId, {
    promptTokens: 0,
    completionTokens: 0,
    latencyMs: measuredMs,
    success: false,
    candidatesSnapshot: options.candidatesSnapshot,
  });
  attempts.push(Object.freeze({ model: modelId, error: normaliseError(err) }));
}

attemptStart captured inside the for loop, before the adapter call.
measuredMs flows through the same nowFn injection seam that drives the circuit breaker — tests using fake clocks see consistent timing.
The failure path measures real wall-clock spent on the failed call. This is more accurate than reading upstream.latencyMs (which is unavailable because no upstream exists).

8. Snapshot-flow contract

computeCostUsd and recordRouterCall accept candidatesSnapshot as a function argument. They DO NOT read from process / DB / global state. The snapshot flows from:

The MCP tool layer (P1.5.7) → reads mcp_model_candidates once per request → injects into RouteOptions.candidatesSnapshot.
routeRequest → forwards options.candidatesSnapshot to recordRouterCall on each call.
recordRouterCall → forwards to computeCostUsd for the cost lookup.

This preserves the existing P1.5.1 contract (“scoring is pure — DB read happens at the call site, not in the module”). The cost module follows the same pattern.

If options.candidatesSnapshot is omitted (no MCP tool layer, no test injection), cost lookup returns 0 and aggregates count the call but record 0 USD. This matches the dispatch packet’s behavior for the Phase-0-compat 'claude' ModelId path.

9. `index.ts` barrel addition

// src/domains/router/index.ts
export * from './scoring.js';
export * from './fallback.js';
export * from './adapters/codex.js';
export * from './adapters/kimi.js';
export * from './adapters/openai.js';
export * from './cost.js';

The barrel re-export is unconditional. Downstream tests in __tests__/domains/router/cost.test.ts import from '../../../domains/router/cost.js' directly; the barrel surfaces the symbols for any caller importing from '../router/index.js'.

10. Test plan (deliverables)

`cost.test.ts`

Golden-vector computeCostUsd:
- 1500 tokens @ 300 bps → 0.0450 USD
- 0 tokens → 0
- Missing snapshot → 0
- Missing modelId in snapshot → 0
- 1_000_000 tokens @ 50 bps → 0.5 USD
- Negative tokens → 0
Ring-buffer bound: 1500 calls → p50 over last 1000.
Success rate: 80 successes + 20 failures → 0.8.
avg_cost_usd: 3 successful calls with costs 0.10, 0.20, 0.30 → avg 0.20.
Per-model isolation: kimi calls don’t affect claude stats.
resetRouterStats('claude') clears claude only.
resetRouterStats() clears everything.
p50 odd / even count cases.
Frozen result.
Failure-only model has nonzero failures, zero successes, zero avg_cost_usd.

`fallback.test.ts` (new describe block: “RouteResult cost + modelsAttempted”)

Happy path: costUsd set, modelsAttempted === [winner].
Cascade: costUsd reflects winner cost, modelsAttempted === [A, B].
Failed-only path: FallbackChainExhaustedError thrown — modelsAttempted never exposed (success path only).
RouteResult frozen.

11. Out-of-scope per dispatch

MCP tool registration (router_stats) → P1.5.7.
ζ decision-trail recording of cost → P1.5.10.
Cost parity tests across arbiters → P1.5.8.
DB persistence of stats → Phase 2+.
fallbackDepth field on RouteResult → not in dispatch (deferred).

12. Risk surfaces (carried from audit §12)

The audit’s §12 risks are mitigated as described there. Additional contract-level guard: the aggregates module exports nothing that lets a caller poke at internal mutable state. The only mutation entry point is recordRouterCall; the only read entry point is getRouterStats (which deep-copies/freezes). External assertions against mutability rely on Object.isFrozen checks.

P1.5.6 — Cost Accounting — Behavioral Contract

1. Public surface (new exports from src/domains/router/cost.ts)

2. RouteResult extension (additive)