P1.5.6 — Cost Accounting — Audit
Round: R92 Wave 5 of 7
Branch: feature/p1-5-6-cost
Base: origin/main @ c284ad22 (post-P1.5.5 #256 — fallback + CB + adapter re-exports)
Dispatch: docs/guides/implementation/task-prompts/p1.5-delta-router-graduation.md §P1.5.6
Depends on: P1.5.5 (N-member fallback + CB) — landed at #256
Unblocks: P1.5.7 (router_* MCP tools — router_stats reads getRouterStats() from this slice)
1. Goal
Add per-call token → USD translation and per-model in-memory aggregates (calls, latency p50, success rate, average cost) so the P1.5.7 router_stats MCP tool and the ζ Decision Trail (P1.5.10) have real cost / performance data to surface.
Two append-only fields land on RouteResult:
costUsd: number— the cost of the successful model’s call, computed from(promptTokens + completionTokens) × cost_bps_per_kilotoken / 1000divided by 10000 at the API surface.modelsAttempted: ReadonlyArray<ModelId>— every model the chain walk visited (success + every failed attempt, in chain order).
No public signature of routeRequest, RouteOptions, FallbackChainExhaustedError, FallbackAttempt, CompletionFn, CompletionFnOptions, or ScoringFn changes. The two new RouteResult fields are append-only — destructuring callers using only the Phase 0 set (model / content / finishReason / promptTokens / completionTokens / latencyMs) continue to compile.
2. Surface inventory
2.1. Files to create
| Path | Purpose |
|---|---|
src/domains/router/cost.ts |
computeCostUsd(modelId, promptTokens, completionTokens) → number. In-memory aggregates: recordRouterCall(modelId, { promptTokens, completionTokens, latencyMs, success }), getRouterStats() → { models: Record<ModelId, RouterStats> }, resetRouterStats(modelId?). Bounded latency ring buffer (1000 slots / model). Bigint-bps math; single divide-by-10000 to USD at the API edge. Pure-ish module — candidatesSnapshot injectable so tests don’t read the DB; production callers in fallback.ts provide the DB-derived snapshot. |
src/__tests__/domains/router/cost.test.ts |
Golden-vector tests for computeCostUsd; ring-buffer bound (1500 calls → p50 over last 1000); success-rate + avg-cost rollup; per-model isolation; reset semantics (single + all). |
2.2. Files to modify
| Path | Current state | Change |
|---|---|---|
src/domains/router/fallback.ts |
P1.5.5 ships an N-member cascade. RouteResult has exactly 6 fields (model / content / finishReason / promptTokens / completionTokens / latencyMs). The chain walk records recordSuccess / recordFailure on the circuit breaker but does NOT call any cost-aggregate hook (the module does not exist yet). |
Append costUsd: number and modelsAttempted: ReadonlyArray<ModelId> to RouteResult. On every successful attempt, call recordRouterCall(modelId, {promptTokens, completionTokens, latencyMs, success: true}); populate costUsd from computeCostUsd(winner, prompt, completion). On every failed attempt (including timeouts but not CircuitOpenError / NoAdapterError — those are NOT adapter-call failures), call recordRouterCall(modelId, {promptTokens: 0, completionTokens: 0, latencyMs: measured, success: false}) with the measured wall-clock since the attempt started. Track modelsAttempted across the chain walk; include in the final frozen RouteResult. |
src/domains/router/index.ts |
Re-exports ./scoring.js, ./fallback.js, and the 3 adapter modules. |
Add export * from './cost.js'; for the new module. |
src/__tests__/domains/router/fallback.test.ts |
1011 lines covering happy path, scoring integration, upstream forwarding, failure wrapping, cascade, timeout, CB, NoAdapter, error precedence. No assertions about costUsd or modelsAttempted. |
Add a small costUsd + modelsAttempted describe block — one happy-path test and one cascade test verifying the new fields land. Pre-existing tests continue to pass (additive). |
2.3. Files NOT modified
src/domains/router/scoring.ts— cost reads from the candidate snapshot the same way scoring does; signatures unchanged.src/domains/router/circuit.ts— CB state machine independent of cost aggregates.src/domains/router/adapters/{kimi,codex,openai}.ts— adapter surface unchanged (adapters already reportpromptTokens+completionTokens+latencyMsinCompletionResult).src/db/migrations/009_model_candidates.sql— already seedscost_bps_per_kilotokenfor all 8 candidates.src/server.ts— no MCP tool registration this round (router_*tools ship in P1.5.7).src/domains/integrations/claude.ts—CompletionResultshape already carries the needed token counts.
3. Exports that MUST be preserved (signatures unchanged)
From src/domains/router/fallback.ts:
routeRequest(prompt, options?) → Promise<RouteResult>— body changes (records aggregates, tracksmodelsAttempted, buildscostUsd), signature byte-identical.FallbackChainExhaustedError— class shape unchanged.RouteOptions— interface unchanged.RouteResult— APPENDED ONLY (costUsd: number,modelsAttempted: ReadonlyArray<ModelId>). Existing fields keep identical types.FallbackAttempt— unchanged.CompletionFn/CompletionFnOptions/ScoringFn— unchanged.CIRCUIT_COOLDOWN_MS,CIRCUIT_FAILURE_THRESHOLD,getCircuitBreakerState,resetCircuitBreaker,CircuitState,RouterTimeoutError,CircuitOpenError,NoAdapterError,ROUTER_PHASE_0_SHAPE— unchanged.
The dispatch packet explicitly allows adding costUsd + modelsAttempted to RouteResult — they are the only new RouteResult fields landing in P1.5.6. fallbackDepth is not in scope (it does not appear in the dispatch packet’s “FILES TO MODIFY” section under RouteResult).
4. CompletionResult token surface (unchanged from W3)
src/domains/integrations/claude.ts:134-141:
export interface CompletionResult {
readonly content: string;
readonly model: string;
readonly promptTokens: number;
readonly completionTokens: number;
readonly latencyMs: number;
readonly stopReason: string;
}
The four adapters (Claude/Kimi/Codex/OpenAI) all return this shape. promptTokens and completionTokens are integers; their sum is the basis of computeCostUsd.
5. mcp_model_candidates.cost_bps_per_kilotoken (P1.5.9, migration 009)
cost_bps_per_kilotoken INTEGER NOT NULL CHECK (cost_bps_per_kilotoken >= 0)
Seeded (8 rows, bps per 1000 tokens):
| model_id | cost_bps_per_kilotoken | enabled |
|---|---|---|
| claude-sonnet-3-5 | 300 | 1 |
| claude-haiku-3-5 | 80 | 0 |
| gpt-4o | 250 | 0 |
| gpt-4o-mini | 15 | 0 |
| gemini-1-5-pro | 125 | 0 |
| llama-3-3-70b | 50 | 0 |
| mixtral-8x22b | 60 | 0 |
| kimi-k2 | 120 | 0 |
The dispatch packet’s prompt section names 'claude' as a ModelId value. The migration does NOT seed a 'claude' row — only 'claude-sonnet-3-5', 'claude-haiku-3-5' etc. The 'claude' literal in the union is the abstract / forward-compat ModelId that the P0.5.2 fallback test suite asserts. Cost lookup for 'claude' therefore needs a defensive resolution: prefer an exact match in the snapshot; if none, fall back to 0n and return 0 (USD). This is the documented behavior in the contract — production callers always provide the snapshot; tests that exercise the Phase-0-compat 'claude' path get a zero cost (deterministic, frozen).
6. Cost formula (per dispatch packet)
costUsd = ((promptTokens + completionTokens) * cost_bps_per_kilotoken / 1000) / 10000
Worked-out: with cost_bps_per_kilotoken = 300 (Claude Sonnet) and promptTokens + completionTokens = 1500:
inner_bps_int = (1500 * 300) / 1000 = 450 bps (integer-bps space)
costUsd = 450 / 10000 = 0.0450 USD
The internal arithmetic uses bigint to avoid float drift. The /1000 step is integer division on bigint (truncation toward zero). The /10000 step is the only floating-point divide and happens at the API surface — it converts the integer bps quantity to a JS number with 4-decimal internal precision (router_stats will round to 2 decimals at presentation time in P1.5.7).
Edge cases:
promptTokens + completionTokens === 0→costUsd = 0.cost_bps_per_kilotoken === 0→costUsd = 0(gemini-like free tiers, or unseeded model).- Tokens × bps overflow: never reachable. Even at
Number.MAX_SAFE_INTEGERtokens ×Number.MAX_SAFE_INTEGERbps, bigint cannot overflow.
7. Aggregates schema (in-memory)
interface RouterStats {
readonly calls_total: number; // successes + failures
readonly successes: number;
readonly failures: number;
readonly avg_cost_usd: number; // mean over successful calls only
readonly p50_latency_ms: number; // median over the last <=1000 latencies (success + failure)
readonly success_rate: number; // successes / calls_total
}
State per ModelId:
interface MutableAgg {
calls_total: number;
successes: number;
failures: number;
total_cost_bps: bigint; // sum of per-call cost in bps; divide at edge for avg
latencies: number[]; // bounded ring buffer; capacity 1000
ringHead: number; // next write index
ringFilled: boolean; // true once latencies.length === 1000
}
The ring buffer:
- Capacity =
1000slots per model. - Writes via
ringHead: write, then advance modulo 1000. - Once
ringFilled === true, the buffer holds the most recent 1000 entries. p50is computed from a sorted copy of the live entries — never mutates the buffer.
Memory ceiling: 9 ModelIds × 1000 numbers × 8 bytes ≈ 72 KB total. Well below any meaningful threshold.
8. modelsAttempted semantics
The chain walk in routeRequest already maintains an attempts: FallbackAttempt[] array. modelsAttempted is derived from this plus the winner:
- Each iteration that records a
FallbackAttempt(failure path, includingCircuitOpenError/NoAdapterError/RouterTimeoutError) contributes itsmodelId. - The successful iteration’s
modelIdis appended last. - Order = chain walk order =
orderedChain(scores)order.
On FallbackChainExhaustedError: no RouteResult is produced (the error is thrown). modelsAttempted is only visible on the success path. The dispatch packet specifies “list of all models actually called (success + fail), in chain order” — that list lives on the successful RouteResult only.
On a single-attempt success: modelsAttempted.length === 1.
9. Aggregate update sites in routeRequest
Three call sites for recordRouterCall:
- Successful attempt (just before
return Object.freeze({...})):recordRouterCall(modelId, { promptTokens: upstream.promptTokens, completionTokens: upstream.completionTokens, latencyMs: upstream.latencyMs, success: true, candidatesSnapshot: options.candidatesSnapshot, }); - Adapter-call failure (inside
catch (err)block):recordRouterCall(modelId, { promptTokens: 0, completionTokens: 0, latencyMs: measuredMs, // Date.now() - attemptStart success: false, candidatesSnapshot: options.candidatesSnapshot, });measuredMsis the wall-clock sinceDate.now()was captured at the top of the loop body. For tests using injectednowFn, the clock injection seam also flows here. - Pre-flight short-circuit (
CircuitOpenError,NoAdapterError): no aggregate update. These attempts never reach the upstream adapter — the model neither succeeded nor failed in any externally observable sense. The chain walk records the attempt for telemetry (FallbackAttempt) but cost / stats are unaffected.
10. Forbiddens (dispatch §FORBIDDENS)
- No floating-point accumulation. All in-loop math is
bigintbps; the singleNumber(...) / 10000happens at the API edge. - No unbounded memory growth. Ring buffer at 1000 slots/model is mandatory.
- No
AMS_*env vars. - No MCP tool registration in this round (P1.5.7 scope).
- No existing-
RouteResult-field type changes.costUsd+modelsAttemptedare appended. - No edits to the main checkout.
- No DB read at module load. Cost lookup is per-call, snapshot-driven (the snapshot lives in
RouteOptions.candidatesSnapshot, propagated from the call site).
11. Concept-doc + ADR anchors
docs/3-world/social/llm.md§”The candidate table” + §”Phase 1.5 candidate cohort” —cost_bps_per_kilotokensemantics.docs/architecture/decisions/ADR-005-multi-model-defer.md§Decision (step 4) — “Cost accounting layer reads from candidate table, computes per-call cost, emits aggregates.”docs/architecture/decisions/ADR-004-tool-surface.md—router_statsis the P1.5.7 surface that exposesgetRouterStats().docs/contracts/p1-5-5-fallback-cb-contract.md§6 — invariants I1–I19 from P1.5.5; this round preserves I1–I8 and I10–I19, and augments I9 (which forbade newRouteResultfields) per the explicit dispatch-packet exemption.
12. Risks & mitigations
| Risk | Mitigation |
|---|---|
Cost lookup for 'claude' abstract ModelId returns 0 USD on Phase-0-compat path |
Documented in contract §3; tests exercise both paths (real Sonnet row + zero-cost compat path). |
recordRouterCall becomes a hot path under sustained load |
Module is process-local, no I/O, no allocation beyond the ring-buffer slot. Worst-case work per call: 1 Map lookup + 6 number adds + 1 bigint add + 1 array index. |
| Ring buffer p50 drift if buffer is short-filled | p50 reads from latencies.slice(0, ringFilled ? 1000 : ringHead) — i.e. only the populated prefix. With <1000 calls, p50 is computed over the actual count, not over 1000 phantom zeros. |
Test pollution across modules — getRouterStats reads module-level state |
cost.test.ts calls resetRouterStats() in beforeEach + afterEach. The fallback.test.ts block that exercises the new fields does the same. Jest worker isolation guarantees per-worker module instances — no inter-worker leakage. |
ESM .js import paths |
All new imports use the .js suffix (project’s documented ESM-output rule). The cost.ts module imports ModelId from ./scoring.js; the fallback.ts patch imports recordRouterCall / computeCostUsd from ./cost.js. |
candidatesSnapshot absent → cost = 0 |
Documented. In production the snapshot is always provided by the call site. Tests covering Phase-0-compat continue to get cost = 0 — pre-existing tests asserting only on the Phase 0 fields continue to pass. |
13. Acceptance criteria (carried from dispatch packet §Acceptance)
computeCostUsd(modelId, promptTokens, completionTokens, snapshot?) → numberreadscost_bps_per_kilotokenfrom the snapshot; returns USD with 4-decimal internal precision.- Per-model aggregates:
calls_total,successes,failures,avg_cost_usd,p50_latency_ms,success_rate. RouteResultappendscostUsd: number+modelsAttempted: ReadonlyArray<ModelId>(Phase 0 callers compile).- p50 computed over a bounded ring buffer (1000 latencies/model).
getRouterStats()exported for P1.5.7.resetRouterStats(modelId?)exported for tests + operator use.- Integer-bps math; single divide-by-10000 at the edge.
npm run build && npm run lint && npm testgreen.
14. Out-of-scope (deferred per dispatch)
router_statsMCP tool registration → P1.5.7.- Parity tests for cost across arbiters → P1.5.8.
- ζ Decision Trail recording of cost → P1.5.10.
- DB-backed persistence of aggregates → Phase 2+ (not in any Phase 1.5 slice).