P1.5.6 — Cost Accounting — Audit

Round: R92 Wave 5 of 7 Branch: feature/p1-5-6-cost Base: origin/main @ c284ad22 (post-P1.5.5 #256 — fallback + CB + adapter re-exports) Dispatch: docs/guides/implementation/task-prompts/p1.5-delta-router-graduation.md §P1.5.6 Depends on: P1.5.5 (N-member fallback + CB) — landed at #256 Unblocks: P1.5.7 (router_* MCP tools — router_stats reads getRouterStats() from this slice)

1. Goal

Add per-call token → USD translation and per-model in-memory aggregates (calls, latency p50, success rate, average cost) so the P1.5.7 router_stats MCP tool and the ζ Decision Trail (P1.5.10) have real cost / performance data to surface.

Two append-only fields land on RouteResult:

costUsd: number — the cost of the successful model’s call, computed from (promptTokens + completionTokens) × cost_bps_per_kilotoken / 1000 divided by 10000 at the API surface.
modelsAttempted: ReadonlyArray<ModelId> — every model the chain walk visited (success + every failed attempt, in chain order).

No public signature of routeRequest, RouteOptions, FallbackChainExhaustedError, FallbackAttempt, CompletionFn, CompletionFnOptions, or ScoringFn changes. The two new RouteResult fields are append-only — destructuring callers using only the Phase 0 set (model / content / finishReason / promptTokens / completionTokens / latencyMs) continue to compile.

2. Surface inventory

2.1. Files to create

Path Purpose

src/domains/router/cost.ts computeCostUsd(modelId, promptTokens, completionTokens) → number. In-memory aggregates: recordRouterCall(modelId, { promptTokens, completionTokens, latencyMs, success }), getRouterStats() → { models: Record<ModelId, RouterStats> }, resetRouterStats(modelId?). Bounded latency ring buffer (1000 slots / model). Bigint-bps math; single divide-by-10000 to USD at the API edge. Pure-ish module — candidatesSnapshot injectable so tests don’t read the DB; production callers in fallback.ts provide the DB-derived snapshot.

src/__tests__/domains/router/cost.test.ts Golden-vector tests for computeCostUsd; ring-buffer bound (1500 calls → p50 over last 1000); success-rate + avg-cost rollup; per-model isolation; reset semantics (single + all).

Path	Purpose
`src/domains/router/cost.ts`	`computeCostUsd(modelId, promptTokens, completionTokens) → number`. In-memory aggregates: `recordRouterCall(modelId, { promptTokens, completionTokens, latencyMs, success })`, `getRouterStats() → { models: Record<ModelId, RouterStats> }`, `resetRouterStats(modelId?)`. Bounded latency ring buffer (1000 slots / model). Bigint-bps math; single divide-by-10000 to USD at the API edge. Pure-ish module — `candidatesSnapshot` injectable so tests don’t read the DB; production callers in `fallback.ts` provide the DB-derived snapshot.
`src/__tests__/domains/router/cost.test.ts`	Golden-vector tests for `computeCostUsd`; ring-buffer bound (1500 calls → p50 over last 1000); success-rate + avg-cost rollup; per-model isolation; reset semantics (single + all).

2.2. Files to modify

Path	Current state	Change
`src/domains/router/fallback.ts`	P1.5.5 ships an N-member cascade. `RouteResult` has exactly 6 fields (`model / content / finishReason / promptTokens / completionTokens / latencyMs`). The chain walk records `recordSuccess` / `recordFailure` on the circuit breaker but does NOT call any cost-aggregate hook (the module does not exist yet).	Append `costUsd: number` and `modelsAttempted: ReadonlyArray<ModelId>` to `RouteResult`. On every successful attempt, call `recordRouterCall(modelId, {promptTokens, completionTokens, latencyMs, success: true})`; populate `costUsd` from `computeCostUsd(winner, prompt, completion)`. On every failed attempt (including timeouts but not `CircuitOpenError` / `NoAdapterError` — those are NOT adapter-call failures), call `recordRouterCall(modelId, {promptTokens: 0, completionTokens: 0, latencyMs: measured, success: false})` with the measured wall-clock since the attempt started. Track `modelsAttempted` across the chain walk; include in the final frozen `RouteResult`.
`src/domains/router/index.ts`	Re-exports `./scoring.js`, `./fallback.js`, and the 3 adapter modules.	Add `export * from './cost.js';` for the new module.
`src/__tests__/domains/router/fallback.test.ts`	1011 lines covering happy path, scoring integration, upstream forwarding, failure wrapping, cascade, timeout, CB, NoAdapter, error precedence. No assertions about `costUsd` or `modelsAttempted`.	Add a small `costUsd + modelsAttempted` describe block — one happy-path test and one cascade test verifying the new fields land. Pre-existing tests continue to pass (additive).

2.3. Files NOT modified

src/domains/router/scoring.ts — cost reads from the candidate snapshot the same way scoring does; signatures unchanged.
src/domains/router/circuit.ts — CB state machine independent of cost aggregates.
src/domains/router/adapters/{kimi,codex,openai}.ts — adapter surface unchanged (adapters already report promptTokens + completionTokens + latencyMs in CompletionResult).
src/db/migrations/009_model_candidates.sql — already seeds cost_bps_per_kilotoken for all 8 candidates.
src/server.ts — no MCP tool registration this round (router_* tools ship in P1.5.7).
src/domains/integrations/claude.ts — CompletionResult shape already carries the needed token counts.

3. Exports that MUST be preserved (signatures unchanged)

From src/domains/router/fallback.ts:

routeRequest(prompt, options?) → Promise<RouteResult> — body changes (records aggregates, tracks modelsAttempted, builds costUsd), signature byte-identical.
FallbackChainExhaustedError — class shape unchanged.
RouteOptions — interface unchanged.
RouteResult — APPENDED ONLY (costUsd: number, modelsAttempted: ReadonlyArray<ModelId>). Existing fields keep identical types.
FallbackAttempt — unchanged.
CompletionFn / CompletionFnOptions / ScoringFn — unchanged.
CIRCUIT_COOLDOWN_MS, CIRCUIT_FAILURE_THRESHOLD, getCircuitBreakerState, resetCircuitBreaker, CircuitState, RouterTimeoutError, CircuitOpenError, NoAdapterError, ROUTER_PHASE_0_SHAPE — unchanged.

The dispatch packet explicitly allows adding costUsd + modelsAttempted to RouteResult — they are the only new RouteResult fields landing in P1.5.6. fallbackDepth is not in scope (it does not appear in the dispatch packet’s “FILES TO MODIFY” section under RouteResult).

4. `CompletionResult` token surface (unchanged from W3)

src/domains/integrations/claude.ts:134-141:

export interface CompletionResult {
  readonly content: string;
  readonly model: string;
  readonly promptTokens: number;
  readonly completionTokens: number;
  readonly latencyMs: number;
  readonly stopReason: string;
}

The four adapters (Claude/Kimi/Codex/OpenAI) all return this shape. promptTokens and completionTokens are integers; their sum is the basis of computeCostUsd.

5. `mcp_model_candidates.cost_bps_per_kilotoken` (P1.5.9, migration 009)

cost_bps_per_kilotoken INTEGER NOT NULL CHECK (cost_bps_per_kilotoken >= 0)

Seeded (8 rows, bps per 1000 tokens):

model_id	cost_bps_per_kilotoken	enabled
claude-sonnet-3-5	300	1
claude-haiku-3-5	80	0
gpt-4o	250	0
gpt-4o-mini	15	0
gemini-1-5-pro	125	0
llama-3-3-70b	50	0
mixtral-8x22b	60	0
kimi-k2	120	0

The dispatch packet’s prompt section names 'claude' as a ModelId value. The migration does NOT seed a 'claude' row — only 'claude-sonnet-3-5', 'claude-haiku-3-5' etc. The 'claude' literal in the union is the abstract / forward-compat ModelId that the P0.5.2 fallback test suite asserts. Cost lookup for 'claude' therefore needs a defensive resolution: prefer an exact match in the snapshot; if none, fall back to 0n and return 0 (USD). This is the documented behavior in the contract — production callers always provide the snapshot; tests that exercise the Phase-0-compat 'claude' path get a zero cost (deterministic, frozen).

6. Cost formula (per dispatch packet)

costUsd = ((promptTokens + completionTokens) * cost_bps_per_kilotoken / 1000) / 10000

Worked-out: with cost_bps_per_kilotoken = 300 (Claude Sonnet) and promptTokens + completionTokens = 1500:

inner_bps_int = (1500 * 300) / 1000 = 450 bps      (integer-bps space)
costUsd       = 450 / 10000        = 0.0450 USD

The internal arithmetic uses bigint to avoid float drift. The /1000 step is integer division on bigint (truncation toward zero). The /10000 step is the only floating-point divide and happens at the API surface — it converts the integer bps quantity to a JS number with 4-decimal internal precision (router_stats will round to 2 decimals at presentation time in P1.5.7).

Edge cases:

promptTokens + completionTokens === 0 → costUsd = 0.
cost_bps_per_kilotoken === 0 → costUsd = 0 (gemini-like free tiers, or unseeded model).
Tokens × bps overflow: never reachable. Even at Number.MAX_SAFE_INTEGER tokens × Number.MAX_SAFE_INTEGER bps, bigint cannot overflow.

7. Aggregates schema (in-memory)

interface RouterStats {
  readonly calls_total: number;     // successes + failures
  readonly successes: number;
  readonly failures: number;
  readonly avg_cost_usd: number;    // mean over successful calls only
  readonly p50_latency_ms: number;  // median over the last <=1000 latencies (success + failure)
  readonly success_rate: number;    // successes / calls_total
}

State per ModelId:

interface MutableAgg {
  calls_total: number;
  successes: number;
  failures: number;
  total_cost_bps: bigint;          // sum of per-call cost in bps; divide at edge for avg
  latencies: number[];             // bounded ring buffer; capacity 1000
  ringHead: number;                // next write index
  ringFilled: boolean;             // true once latencies.length === 1000
}

The ring buffer:

Capacity = 1000 slots per model.
Writes via ringHead: write, then advance modulo 1000.
Once ringFilled === true, the buffer holds the most recent 1000 entries.
p50 is computed from a sorted copy of the live entries — never mutates the buffer.

Memory ceiling: 9 ModelIds × 1000 numbers × 8 bytes ≈ 72 KB total. Well below any meaningful threshold.

8. `modelsAttempted` semantics

The chain walk in routeRequest already maintains an attempts: FallbackAttempt[] array. modelsAttempted is derived from this plus the winner:

Each iteration that records a FallbackAttempt (failure path, including CircuitOpenError / NoAdapterError / RouterTimeoutError) contributes its modelId.
The successful iteration’s modelId is appended last.
Order = chain walk order = orderedChain(scores) order.

On FallbackChainExhaustedError: no RouteResult is produced (the error is thrown). modelsAttempted is only visible on the success path. The dispatch packet specifies “list of all models actually called (success + fail), in chain order” — that list lives on the successful RouteResult only.

On a single-attempt success: modelsAttempted.length === 1.

9. Aggregate update sites in `routeRequest`

Three call sites for recordRouterCall:

Successful attempt (just before return Object.freeze({...})):

recordRouterCall(modelId, {
  promptTokens: upstream.promptTokens,
  completionTokens: upstream.completionTokens,
  latencyMs: upstream.latencyMs,
  success: true,
  candidatesSnapshot: options.candidatesSnapshot,
});

Adapter-call failure (inside catch (err) block):

recordRouterCall(modelId, {
  promptTokens: 0,
  completionTokens: 0,
  latencyMs: measuredMs,   // Date.now() - attemptStart
  success: false,
  candidatesSnapshot: options.candidatesSnapshot,
});

measuredMs is the wall-clock since Date.now() was captured at the top of the loop body. For tests using injected nowFn, the clock injection seam also flows here.

Pre-flight short-circuit (CircuitOpenError, NoAdapterError): no aggregate update. These attempts never reach the upstream adapter — the model neither succeeded nor failed in any externally observable sense. The chain walk records the attempt for telemetry (FallbackAttempt) but cost / stats are unaffected.

10. Forbiddens (dispatch §FORBIDDENS)

No floating-point accumulation. All in-loop math is bigint bps; the single Number(...) / 10000 happens at the API edge.
No unbounded memory growth. Ring buffer at 1000 slots/model is mandatory.
No AMS_* env vars.
No MCP tool registration in this round (P1.5.7 scope).
No existing-RouteResult-field type changes. costUsd + modelsAttempted are appended.
No edits to the main checkout.
No DB read at module load. Cost lookup is per-call, snapshot-driven (the snapshot lives in RouteOptions.candidatesSnapshot, propagated from the call site).

11. Concept-doc + ADR anchors

docs/3-world/social/llm.md §”The candidate table” + §”Phase 1.5 candidate cohort” — cost_bps_per_kilotoken semantics.
docs/architecture/decisions/ADR-005-multi-model-defer.md §Decision (step 4) — “Cost accounting layer reads from candidate table, computes per-call cost, emits aggregates.”
docs/architecture/decisions/ADR-004-tool-surface.md — router_stats is the P1.5.7 surface that exposes getRouterStats().
docs/contracts/p1-5-5-fallback-cb-contract.md §6 — invariants I1–I19 from P1.5.5; this round preserves I1–I8 and I10–I19, and augments I9 (which forbade new RouteResult fields) per the explicit dispatch-packet exemption.

12. Risks & mitigations

Risk	Mitigation
Cost lookup for `'claude'` abstract `ModelId` returns 0 USD on Phase-0-compat path	Documented in contract §3; tests exercise both paths (real Sonnet row + zero-cost compat path).
`recordRouterCall` becomes a hot path under sustained load	Module is process-local, no I/O, no allocation beyond the ring-buffer slot. Worst-case work per call: 1 Map lookup + 6 number adds + 1 bigint add + 1 array index.
Ring buffer p50 drift if buffer is short-filled	`p50` reads from `latencies.slice(0, ringFilled ? 1000 : ringHead)` — i.e. only the populated prefix. With `<1000` calls, p50 is computed over the actual count, not over 1000 phantom zeros.
Test pollution across modules — `getRouterStats` reads module-level state	`cost.test.ts` calls `resetRouterStats()` in `beforeEach` + `afterEach`. The `fallback.test.ts` block that exercises the new fields does the same. Jest worker isolation guarantees per-worker module instances — no inter-worker leakage.
ESM `.js` import paths	All new imports use the `.js` suffix (project’s documented ESM-output rule). The `cost.ts` module imports `ModelId` from `./scoring.js`; the `fallback.ts` patch imports `recordRouterCall` / `computeCostUsd` from `./cost.js`.
`candidatesSnapshot` absent → cost = 0	Documented. In production the snapshot is always provided by the call site. Tests covering Phase-0-compat continue to get cost = 0 — pre-existing tests asserting only on the Phase 0 fields continue to pass.

13. Acceptance criteria (carried from dispatch packet §Acceptance)

computeCostUsd(modelId, promptTokens, completionTokens, snapshot?) → number reads cost_bps_per_kilotoken from the snapshot; returns USD with 4-decimal internal precision.
Per-model aggregates: calls_total, successes, failures, avg_cost_usd, p50_latency_ms, success_rate.
RouteResult appends costUsd: number + modelsAttempted: ReadonlyArray<ModelId> (Phase 0 callers compile).
p50 computed over a bounded ring buffer (1000 latencies/model).
getRouterStats() exported for P1.5.7.
resetRouterStats(modelId?) exported for tests + operator use.
Integer-bps math; single divide-by-10000 at the edge.
npm run build && npm run lint && npm test green.

14. Out-of-scope (deferred per dispatch)

router_stats MCP tool registration → P1.5.7.
Parity tests for cost across arbiters → P1.5.8.
ζ Decision Trail recording of cost → P1.5.10.
DB-backed persistence of aggregates → Phase 2+ (not in any Phase 1.5 slice).