P1.5.6 — Cost Accounting — Behavioral Contract
Round: R92 Wave 5 of 7
Branch: feature/p1-5-6-cost
Base: origin/main @ c284ad22
Audit: p1-5-6-cost-audit.md
1. Public surface (new exports from src/domains/router/cost.ts)
/* Per-call cost computation. Pure: candidate snapshot drives the cost
* per-1k-tokens lookup. Returns USD as a JS `number` with 4-decimal
* internal precision (the API surface presents 2 decimals — that's a
* P1.5.7 router_stats formatting concern, not a cost.ts concern). */
export function computeCostUsd(
modelId: ModelId,
promptTokens: number,
completionTokens: number,
candidatesSnapshot?: ReadonlyArray<ModelCandidate>,
): number;
/* Per-model aggregate snapshot used by `router_stats` (P1.5.7) and by
* the ζ decision trail (P1.5.10). Frozen at both levels. */
export interface RouterStats {
readonly calls_total: number;
readonly successes: number;
readonly failures: number;
readonly avg_cost_usd: number;
readonly p50_latency_ms: number;
readonly success_rate: number;
}
/* Per-call record-keeping hook. Called once per attempted adapter
* call — both success and failure paths. NOT called for circuit-open
* skips or NoAdapter pre-flight bailouts (those never reach the
* adapter). */
export interface RouterCallRecord {
readonly promptTokens: number;
readonly completionTokens: number;
readonly latencyMs: number;
readonly success: boolean;
/** Optional snapshot for cost lookup; usually flows from RouteOptions. */
readonly candidatesSnapshot?: ReadonlyArray<ModelCandidate>;
}
export function recordRouterCall(
modelId: ModelId,
record: RouterCallRecord,
): void;
/* Reads the full router-stats snapshot. Frozen at both levels. */
export function getRouterStats(): {
readonly models: Readonly<Record<ModelId, RouterStats>>;
};
/* Resets one model (with arg) or every model (no arg). */
export function resetRouterStats(modelId?: ModelId): void;
/* Internal ring buffer constants — exported for tests. */
export const ROUTER_LATENCY_RING_SIZE = 1000 as const;
2. RouteResult extension (additive)
export interface RouteResult {
readonly model: ModelId;
readonly content: string;
readonly finishReason: string;
readonly promptTokens: number;
readonly completionTokens: number;
readonly latencyMs: number;
// P1.5.6 additions (append-only):
readonly costUsd: number;
readonly modelsAttempted: ReadonlyArray<ModelId>;
}
The two new fields are append-only. Existing Phase 0 destructuring sites that bind only { model, content, finishReason, promptTokens, completionTokens, latencyMs } continue to compile (TypeScript structural typing accepts extra fields).
3. Cost-computation invariants
| ID | Invariant | Verified by |
|---|---|---|
| I-COST-1 | computeCostUsd(modelId, p, c, snap) reads snap for m where m.model_id === modelId and uses its cost_bps_per_kilotoken. |
cost.test.ts “reads cost from snapshot row” |
| I-COST-2 | When the snapshot lacks modelId: returns 0. |
cost.test.ts “returns 0 when modelId absent” |
| I-COST-3 | When snap is undefined: returns 0. |
cost.test.ts “returns 0 when snapshot omitted” |
| I-COST-4 | When cost_bps_per_kilotoken === 0: returns 0. |
cost.test.ts “returns 0 for free tier” |
| I-COST-5 | When promptTokens + completionTokens === 0: returns 0. |
cost.test.ts “returns 0 for zero-token call” |
| I-COST-6 | Formula: Number(((BigInt(p + c) * BigInt(cost_bps)) / 1000n)) / 10000 |
cost.test.ts golden-vector cases |
| I-COST-7 | Integer overflow safe — bigint inner math. | cost.test.ts “handles large token counts” (e.g. 10^9 tokens × 1000 bps) |
| I-COST-8 | Result is a JS number, never NaN, never Infinity. Negative tokens or negative bps coerce to 0 (defence in depth). |
cost.test.ts “negative tokens returns 0” |
| I-COST-9 | Deterministic: identical (modelId, p, c, snap) → identical USD. |
cost.test.ts “identical inputs produce identical output across 100 invocations” |
4. Aggregate invariants
| ID | Invariant | Verified by |
|---|---|---|
| I-AGG-1 | calls_total === successes + failures after every call. |
cost.test.ts “calls_total accounting” |
| I-AGG-2 | success_rate === successes / calls_total (or 0 when calls_total === 0). |
cost.test.ts “success_rate at 80%” |
| I-AGG-3 | avg_cost_usd === Σ(per_call_cost_usd) / successes (or 0 when successes === 0). Failed calls contribute 0 to the sum (since their promptTokens + completionTokens === 0). The mean denominator is successes, not calls_total. |
cost.test.ts “avg_cost_usd reflects actual sums” |
| I-AGG-4 | p50_latency_ms is the median of latencies populated so far (or 0 when no calls). For an even count, median = lower-of-two (the n/2-1 index, zero-indexed in sorted order). For odd count, median = middle element. |
cost.test.ts “p50 for 5 calls” + “p50 for 6 calls (even)” |
| I-AGG-5 | The latency ring buffer is bounded at ROUTER_LATENCY_RING_SIZE = 1000. After the 1001st call, the oldest latency is overwritten; p50 is over the most recent 1000. |
cost.test.ts “ring buffer bound” |
| I-AGG-6 | Per-model isolation: recordRouterCall('claude', …) does not mutate the stats for any other ModelId. |
cost.test.ts “kimi stats don’t contaminate claude” |
| I-AGG-7 | resetRouterStats('claude') clears only the 'claude' entry; other models retain their state. |
cost.test.ts “single-model reset” |
| I-AGG-8 | resetRouterStats() clears every model’s state. |
cost.test.ts “reset all” |
| I-AGG-9 | getRouterStats() returns a frozen { models } object with frozen per-model entries. Mutating the result is a TypeError under strict mode. |
cost.test.ts “result is frozen” |
| I-AGG-10 | Failed calls (success: false) contribute to failures and calls_total but NOT to avg_cost_usd (since cost is 0 for failed calls and the denominator is successes). Their latencyMs IS recorded into the ring buffer (so p50 reflects total round-trip behavior including failures). |
cost.test.ts “failure-only model has p50 over failures” |
| I-AGG-11 | The aggregate sum-of-costs is stored as bigint (in bps space); the divide to USD happens once at getRouterStats time, not on every recordRouterCall. |
cost.test.ts “no float drift across 10000 calls” |
5. modelsAttempted invariants
| ID | Invariant | Verified by |
|---|---|---|
| I-MA-1 | modelsAttempted is a frozen ReadonlyArray<ModelId>. |
fallback.test.ts cost block |
| I-MA-2 | On single-attempt success: modelsAttempted.length === 1 and modelsAttempted[0] === winner. |
fallback.test.ts cost block “single-attempt happy path” |
| I-MA-3 | On cascade (A fails, B succeeds): modelsAttempted === [A, B] in chain-walk order. |
fallback.test.ts cost block “cascade” |
| I-MA-4 | Order = orderedChain(scores) order (NOT alphabetical, NOT scoring-input order). |
fallback.test.ts cost block “order matches chain walk” |
| I-MA-5 | On FallbackChainExhaustedError: modelsAttempted is NOT exposed (no RouteResult is produced). The error’s attempts[] already carries the visit list — modelsAttempted is only on the success path. |
(Verified by RouteResult being unreachable on the failure path.) |
6. routeRequest body invariants (post-P1.5.6)
The P1.5.5 invariants I1–I8 and I10–I19 (from p1-5-5-fallback-cb-contract.md §6) are preserved verbatim. I9 is augmented:
| ID | P1.5.5 statement | P1.5.6 augmentation |
|---|---|---|
| I9 | getCircuitBreakerState() returns a frozen snapshot. |
(Unchanged.) |
The dispatch packet’s explicit allowance overrides the P1.5.5 forbidden “No costUsd / modelsAttempted fields on RouteResult” — that forbidden was a temporal gate (“not in P1.5.5 — wait for W5”). W5 = P1.5.6, which is this slice.
New P1.5.6 invariants on routeRequest:
| ID | Invariant | Verified by |
|---|---|---|
| I-RR-20 | The success path calls recordRouterCall(winner, {…, success: true, candidatesSnapshot: options.candidatesSnapshot}) exactly once before returning. |
fallback.test.ts cost block “happy path increments successes” |
| I-RR-21 | Every adapter-call failure (caught in the catch (err) block) calls recordRouterCall(modelId, {…, success: false}) exactly once. |
fallback.test.ts cost block “cascade failure increments failures” |
| I-RR-22 | CircuitOpenError and NoAdapterError pre-flight bailouts do NOT call recordRouterCall. |
fallback.test.ts cost block “circuit-open does not record” |
| I-RR-23 | The successful RouteResult.costUsd is computeCostUsd(winner, upstream.promptTokens, upstream.completionTokens, options.candidatesSnapshot). |
fallback.test.ts cost block “costUsd reflects token math” |
| I-RR-24 | RouteResult.modelsAttempted lists every modelId the chain walk visited, in walk order, including the winner. Length matches attempts.length + 1 on success. |
fallback.test.ts cost block “modelsAttempted lists chain walk” |
| I-RR-25 | RouteResult remains frozen (I16 preserved); the new fields are read-only. |
fallback.test.ts cost block “RouteResult is frozen” |
7. Failure-latency measurement
For failed attempts (the catch block in routeRequest):
const attemptStart = (options.nowFn ?? Date.now)();
try {
const upstream = await raceWithTimeout(adapter(prompt, upstreamOptions), modelId, timeoutMs);
...
} catch (err) {
recordFailure(modelId, nowOpts);
const measuredMs = (options.nowFn ?? Date.now)() - attemptStart;
recordRouterCall(modelId, {
promptTokens: 0,
completionTokens: 0,
latencyMs: measuredMs,
success: false,
candidatesSnapshot: options.candidatesSnapshot,
});
attempts.push(Object.freeze({ model: modelId, error: normaliseError(err) }));
}
attemptStartcaptured inside theforloop, before the adapter call.measuredMsflows through the samenowFninjection seam that drives the circuit breaker — tests using fake clocks see consistent timing.- The failure path measures real wall-clock spent on the failed call. This is more accurate than reading
upstream.latencyMs(which is unavailable because noupstreamexists).
8. Snapshot-flow contract
computeCostUsd and recordRouterCall accept candidatesSnapshot as a function argument. They DO NOT read from process / DB / global state. The snapshot flows from:
- The MCP tool layer (P1.5.7) → reads
mcp_model_candidatesonce per request → injects intoRouteOptions.candidatesSnapshot. routeRequest→ forwardsoptions.candidatesSnapshottorecordRouterCallon each call.recordRouterCall→ forwards tocomputeCostUsdfor the cost lookup.
This preserves the existing P1.5.1 contract (“scoring is pure — DB read happens at the call site, not in the module”). The cost module follows the same pattern.
If options.candidatesSnapshot is omitted (no MCP tool layer, no test injection), cost lookup returns 0 and aggregates count the call but record 0 USD. This matches the dispatch packet’s behavior for the Phase-0-compat 'claude' ModelId path.
9. index.ts barrel addition
// src/domains/router/index.ts
export * from './scoring.js';
export * from './fallback.js';
export * from './adapters/codex.js';
export * from './adapters/kimi.js';
export * from './adapters/openai.js';
export * from './cost.js';
The barrel re-export is unconditional. Downstream tests in __tests__/domains/router/cost.test.ts import from '../../../domains/router/cost.js' directly; the barrel surfaces the symbols for any caller importing from '../router/index.js'.
10. Test plan (deliverables)
cost.test.ts
- Golden-vector
computeCostUsd:- 1500 tokens @ 300 bps → 0.0450 USD
- 0 tokens → 0
- Missing snapshot → 0
- Missing modelId in snapshot → 0
- 1_000_000 tokens @ 50 bps → 0.5 USD
- Negative tokens → 0
- Ring-buffer bound: 1500 calls → p50 over last 1000.
- Success rate: 80 successes + 20 failures → 0.8.
- avg_cost_usd: 3 successful calls with costs 0.10, 0.20, 0.30 → avg 0.20.
- Per-model isolation: kimi calls don’t affect claude stats.
resetRouterStats('claude')clears claude only.resetRouterStats()clears everything.- p50 odd / even count cases.
- Frozen result.
- Failure-only model has nonzero
failures, zerosuccesses, zeroavg_cost_usd.
fallback.test.ts (new describe block: “RouteResult cost + modelsAttempted”)
- Happy path:
costUsdset,modelsAttempted === [winner]. - Cascade:
costUsdreflects winner cost,modelsAttempted === [A, B]. - Failed-only path:
FallbackChainExhaustedErrorthrown —modelsAttemptednever exposed (success path only). RouteResultfrozen.
11. Out-of-scope per dispatch
- MCP tool registration (
router_stats) → P1.5.7. - ζ decision-trail recording of cost → P1.5.10.
- Cost parity tests across arbiters → P1.5.8.
- DB persistence of stats → Phase 2+.
fallbackDepthfield onRouteResult→ not in dispatch (deferred).
12. Risk surfaces (carried from audit §12)
The audit’s §12 risks are mitigated as described there. Additional contract-level guard: the aggregates module exports nothing that lets a caller poke at internal mutable state. The only mutation entry point is recordRouterCall; the only read entry point is getRouterStats (which deep-copies/freezes). External assertions against mutability rely on Object.isFrozen checks.