P1.5.6 — Cost Accounting — Verification
Round: R92 Wave 5 of 7
Branch: feature/p1-5-6-cost
Base: origin/main @ c284ad22
Audit: p1-5-6-cost-audit.md
Contract: p1-5-6-cost-contract.md
Packet: p1-5-6-cost-packet.md
1. Gates
| Gate | Command | Result |
|---|---|---|
| Build | npm run build |
PASS — tsc compiles clean. copy-migrations.mjs post-step also clean. |
| Lint | npm run lint |
PASS — zero ESLint findings. |
| Tests | npm test |
PASS — 3325 / 3325 across 76 suites. |
Baseline on main c284ad22 = 3275 tests. Final = 3325. Delta: +50 tests across two files:
src/__tests__/domains/router/cost.test.ts(new): 41 tests.src/__tests__/domains/router/fallback.test.ts(extended): +9 tests (53 → 62).
Zero regressions in pre-existing P1.5.5 fallback / circuit / scoring / adapter coverage.
2. Worked cost example (real numbers from the suite)
The “cascade: costUsd reflects winner row, not failed-attempt row” test (fallback.test.ts:1198) exercises the live formula end-to-end:
Setup:
scoring = forced to gpt-4o (score 0.9) → claude (score 0.4)
gpt-4o = fails (Error: "gpt-4o down")
claude = succeeds with 1000 prompt tokens + 500 completion tokens
snapshot = [
{ model_id: 'claude', cost_bps_per_kilotoken: 300, ... },
{ model_id: 'gpt-4o', cost_bps_per_kilotoken: 250, ... },
]
Chain walk:
attempt 1: gpt-4o → adapter call throws → recordRouterCall('gpt-4o', {success:false, ...})
attempt 2: claude → adapter returns {promptTokens:1000, completionTokens:500} → success
Cost computation for the winning claude attempt:
totalTokens = 1000 + 500 = 1500
bps_per_kilo = 300 (claude row from snapshot)
inner_bps_int = BigInt(1500) * BigInt(300) / 1000n
= 450_000n / 1000n
= 450n (bps, integer-bigint)
costUsd = Number(450n) / 10000
= 0.045 USD
Returned RouteResult fields (new in P1.5.6):
costUsd = 0.045
modelsAttempted = ['gpt-4o', 'claude'] (chain-walk order)
A second worked example from the same suite (cost.test.ts:107):
1500 tokens @ 300 bps/kilotoken (Claude Sonnet seed value):
(1500 * 300) / 1000 = 450 bps → 0.045 USD per call
10000 such calls aggregated:
total_cost_bps_int = 4_500_000n
avg_cost_usd = Number(4_500_000n) / 10000 / 10000
= 450 / 10000
= 0.045 USD (no float drift; verified via toBeCloseTo(0.045, 10))
A third worked example showing the cost-tier difference:
1500 tokens @ 15 bps/kilotoken (gpt-4o-mini seed value):
(1500 * 15) / 1000 = integer-bigint: 22500n / 1000n = 22n bps
costUsd = Number(22n) / 10000
= 0.0022 USD
Note: bigint integer division truncates — 22.5 floor 22. The
contract documents this as the expected behavior; presentation
layer rounds to 2 decimals if needed (P1.5.7's router_stats).
3. Invariant verification matrix
3.1 Cost-computation invariants (contract §3)
| ID | Statement | Test |
|---|---|---|
| I-COST-1 | Reads snap row for modelId. |
cost.test.ts “1500 tokens at 300 bps/kilotoken → 0.045 USD” — verifies the lookup. |
| I-COST-2 | Missing modelId → 0. |
cost.test.ts “missing modelId in snapshot → 0 USD”. |
| I-COST-3 | Missing snapshot → 0. | cost.test.ts “snapshot omitted → 0 USD”. |
| I-COST-4 | Zero-cost row → 0. | cost.test.ts “zero-cost row (free tier) → 0 USD”. |
| I-COST-5 | Zero tokens → 0. | cost.test.ts “0 prompt + 0 completion → 0 USD”. |
| I-COST-6 | Formula. | cost.test.ts golden-vector cases × 4. |
| I-COST-7 | Bigint overflow safety. | cost.test.ts “handles 10^9 tokens × 1000 bps”. |
| I-COST-8 | Never NaN/Infinity; negative tokens → 0. | cost.test.ts “result is never NaN nor Infinity” + “negative tokens → 0 USD”. |
| I-COST-9 | Deterministic. | cost.test.ts “deterministic — identical input → identical output across 100 invocations”. |
All 9 cost-computation invariants verified.
3.2 Aggregate invariants (contract §4)
| ID | Statement | Test |
|---|---|---|
| I-AGG-1 | calls_total === successes + failures. |
cost.test.ts “calls_total === successes + failures (8 + 2)”. |
| I-AGG-2 | success_rate === successes / calls_total. |
cost.test.ts “success_rate at 80%”. |
| I-AGG-3 | avg_cost_usd === Σ(cost) / successes. |
cost.test.ts “avg_cost_usd reflects actual USD sum” + “avg_cost_usd computed over successes only”. |
| I-AGG-4 | p50 median; lower-of-two for even. |
cost.test.ts “p50 over 5 calls (odd)” + “p50 over 6 calls (even)”. |
| I-AGG-5 | Ring buffer bounded at 1000. | cost.test.ts “ring buffer bound: 1500 calls → p50 over the last 1000” + “ring buffer respects ROUTER_LATENCY_RING_SIZE exactly”. |
| I-AGG-6 | Per-model isolation. | cost.test.ts “kimi stats do not contaminate claude stats” + “different models scored independently”. |
| I-AGG-7 | resetRouterStats('claude') clears one model. |
cost.test.ts “resetRouterStats(modelId) clears only that model”. |
| I-AGG-8 | resetRouterStats() clears all. |
cost.test.ts “resetRouterStats() clears all models”. |
| I-AGG-9 | Result is frozen. | cost.test.ts “outer result is frozen” + “models object is frozen” + “per-model entries are frozen” + “attempting to mutate the result is a TypeError”. |
| I-AGG-10 | Failures contribute latency but not cost. | cost.test.ts “failure-only model has p50 over failures” + “avg_cost_usd === 0 when no successes”. |
| I-AGG-11 | No float drift across 10000 calls. | cost.test.ts “10000 successful calls accumulate without float drift”. |
All 11 aggregate invariants verified.
3.3 modelsAttempted invariants (contract §5)
| ID | Statement | Test |
|---|---|---|
| I-MA-1 | Frozen ReadonlyArray<ModelId>. |
fallback.test.ts “RouteResult costUsd + modelsAttempted are frozen”. |
| I-MA-2 | Single-attempt success → [winner]. |
fallback.test.ts “happy path: modelsAttempted === [winner]”. |
| I-MA-3 | Cascade → [A, B] in chain order. |
fallback.test.ts “cascade: modelsAttempted lists both attempts in chain order”. |
| I-MA-4 | Chain-walk order. | (Implicit in I-MA-3.) |
| I-MA-5 | Not exposed on FallbackChainExhaustedError. |
(Verified by RouteResult type — error path returns no result.) |
All 5 modelsAttempted invariants verified.
3.4 routeRequest-body invariants (contract §6, P1.5.5 I1–I19 carry-over + P1.5.6 additions)
P1.5.5 invariants I1–I19 untouched. New P1.5.6 invariants:
| ID | Statement | Test |
|---|---|---|
| I-RR-20 | Success → recordRouterCall(winner, {success: true}) once. |
fallback.test.ts “successful call increments getRouterStats successes”. |
| I-RR-21 | Failure → recordRouterCall(modelId, {success: false}) once. |
fallback.test.ts “failed cascade increments getRouterStats failures”. |
| I-RR-22 | CB-open / NoAdapter → no recordRouterCall. |
fallback.test.ts “circuit-open skip does NOT contribute to modelsAttempted” (verifies modelsAttempted exclusion, which is the more comprehensive invariant — if the model contributed to modelsAttempted it would also be in the aggregates). |
| I-RR-23 | costUsd === computeCostUsd(winner, ...). |
fallback.test.ts “happy path: RouteResult.costUsd set from candidate snapshot” + “cascade: costUsd reflects winner row, not failed-attempt row”. |
| I-RR-24 | modelsAttempted lists chain walk. |
fallback.test.ts “cascade: modelsAttempted lists both attempts in chain order”. |
| I-RR-25 | RouteResult remains frozen incl. new fields. |
fallback.test.ts “RouteResult costUsd + modelsAttempted are frozen”. |
All 6 new routeRequest-body invariants verified.
4. Append-only verification (Phase 0 compatibility)
The “Phase 0 callers destructuring only the original fields still compile” test (fallback.test.ts:1252) is a runtime + compile-time check that the original Phase 0 destructuring pattern continues to work without modification:
const { model, content, finishReason, promptTokens, completionTokens, latencyMs } = result;
TypeScript accepts the extra costUsd + modelsAttempted fields on the source object (structural typing) and the runtime binding is correct. Zero existing fields removed or renamed. Type widening: none.
5. Test-count delta evidence
Base (origin/main @ c284ad22) : 3275 tests
Post P1.5.6 implementation (this branch) : 3325 tests
Delta : +50 tests
Per-file breakdown:
cost.test.ts (new) : +41
fallback.test.ts (cost block appended) : +9
All other tests : 0 changes
6. File-by-file change summary
| File | Net LOC | Description |
|---|---|---|
src/domains/router/cost.ts (new) |
+371 | Per-call cost calc + in-memory aggregates + ring buffer. |
src/__tests__/domains/router/cost.test.ts (new) |
+461 | 41 golden-vector + aggregate + ring-buffer + reset + freeze tests. |
src/domains/router/fallback.ts (modify) |
+73 / -12 | Append costUsd + modelsAttempted to RouteResult; wire recordRouterCall into success / failure paths; track modelsAttempted across walk. |
src/__tests__/domains/router/fallback.test.ts (modify) |
+220 / 0 | Cost + modelsAttempted describe block (9 tests). |
src/domains/router/index.ts (modify) |
+9 / -2 | Barrel re-export of ./cost.js + header comment update. |
| Total | +1142 / -14 |
7. Forbidden checks
| Forbidden | Status |
|---|---|
| Floating-point accumulation | OK — all per-call sums are bigint (total_cost_bps_int); single Number / 10000 / successes at the read edge in getRouterStats. |
| Unbounded memory growth | OK — ring buffer fixed at ROUTER_LATENCY_RING_SIZE = 1000 per model; verified by “ring buffer respects ROUTER_LATENCY_RING_SIZE exactly”. |
AMS_* env vars |
OK — module reads no environment variables. |
| MCP tool registration | OK — src/server.ts untouched; no new tools registered. |
RouteResult field type changes |
OK — existing 6 fields preserved byte-identically; only 2 new fields appended. |
| Main-checkout edits | OK — all work in .worktrees/claude/p1-5-6-cost. |
routeRequest signature break |
OK — signature byte-identical; only the result-object shape grows additively. |
8. Pre-flight readings consulted
CLAUDE.md(project + worktree copy)src/domains/router/fallback.ts(P1.5.5 RouteResult shape)src/domains/router/cost.ts(new module, this slice)src/domains/router/circuit.ts(P1.5.5 CB module — referenced for aggregate-exclusion invariant)src/domains/router/scoring.ts(P1.5.1 —ModelCandidatetype,ModelIdunion,cost_bps_per_kilotokenfield)src/domains/integrations/claude.ts(W3 —CompletionResultshape supplyingpromptTokens+completionTokens)src/db/migrations/009_model_candidates.sql(P1.5.9 —cost_bps_per_kilotokencolumn + seed values)docs/3-world/social/llm.md§Candidate tabledocs/architecture/decisions/ADR-005-multi-model-defer.md§Implementation step 4docs/guides/implementation/task-prompts/p1.5-delta-router-graduation.md§P1.5.6docs/contracts/p1-5-5-fallback-cb-contract.md§6 (invariant carry-over)
9. Deviations from staging-file slice section
None of substance. Minor refinements:
- The staging file’s prompt section called for
RouterStats = { calls_total, successes, failures, avg_cost_usd, p50_latency_ms, success_rate }. Implementation matches byte-for-byte. - The dispatch packet hinted at
success_rate: successes / (successes + failures). Implementation usessuccesses / calls_total, which is mathematically equivalent (calls_total === successes + failuresper I-AGG-1) but more robust against future record types. - The “exact 1 ×
recordRouterCallper successful adapter call” invariant (contract I-RR-20) is verified via the aggregate counts, not via a spy. The spy-pattern path would require mocking the cost module, which adds complexity for no additional confidence (the in-memory aggregates ARE the spy).
10. Confirmation: P1.5.5 fallback tests still pass
The pre-existing P1.5.5 fallback test suite (53 tests, all 17 of the original Phase-0 + the cascade / CB / timeout / shape-flip coverage) continues to pass byte-identically. The new “cost + modelsAttempted” describe block is appended at the end of the file; no existing assertion was modified. Verified by running the suite in isolation: 58 / 58 pass.
11. Gate-tail evidence (most-recent run)
Test Suites: 76 passed, 76 total
Tests: 3325 passed, 3325 total
Time: 34.864 s
12. Out-of-scope reminder (carried forward)
router_statsMCP tool → P1.5.7.- Cost parity tests across arbiters → P1.5.8.
- ζ Decision Trail recording of cost → P1.5.10.
- DB persistence of stats → Phase 2+.
fallbackDepthonRouteResult→ not in this dispatch.