P1.5.6 — Cost Accounting — Verification

Round: R92 Wave 5 of 7 Branch: feature/p1-5-6-cost Base: origin/main @ c284ad22 Audit: p1-5-6-cost-audit.md Contract: p1-5-6-cost-contract.md Packet: p1-5-6-cost-packet.md

1. Gates

Gate Command Result
Build npm run build PASS — tsc compiles clean. copy-migrations.mjs post-step also clean.
Lint npm run lint PASS — zero ESLint findings.
Tests npm test PASS — 3325 / 3325 across 76 suites.

Baseline on main c284ad22 = 3275 tests. Final = 3325. Delta: +50 tests across two files:

  • src/__tests__/domains/router/cost.test.ts (new): 41 tests.
  • src/__tests__/domains/router/fallback.test.ts (extended): +9 tests (53 → 62).

Zero regressions in pre-existing P1.5.5 fallback / circuit / scoring / adapter coverage.

2. Worked cost example (real numbers from the suite)

The “cascade: costUsd reflects winner row, not failed-attempt row” test (fallback.test.ts:1198) exercises the live formula end-to-end:

Setup:
  scoring  = forced to gpt-4o (score 0.9) → claude (score 0.4)
  gpt-4o   = fails (Error: "gpt-4o down")
  claude   = succeeds with 1000 prompt tokens + 500 completion tokens
  snapshot = [
    { model_id: 'claude',  cost_bps_per_kilotoken: 300, ... },
    { model_id: 'gpt-4o',  cost_bps_per_kilotoken: 250, ... },
  ]

Chain walk:
  attempt 1: gpt-4o → adapter call throws → recordRouterCall('gpt-4o', {success:false, ...})
  attempt 2: claude → adapter returns {promptTokens:1000, completionTokens:500} → success

Cost computation for the winning claude attempt:
  totalTokens   = 1000 + 500           = 1500
  bps_per_kilo  = 300                  (claude row from snapshot)
  inner_bps_int = BigInt(1500) * BigInt(300) / 1000n
               = 450_000n / 1000n
               = 450n                  (bps, integer-bigint)
  costUsd       = Number(450n) / 10000
               = 0.045 USD

Returned RouteResult fields (new in P1.5.6):
  costUsd          = 0.045
  modelsAttempted  = ['gpt-4o', 'claude']      (chain-walk order)

A second worked example from the same suite (cost.test.ts:107):

1500 tokens @ 300 bps/kilotoken (Claude Sonnet seed value):
  (1500 * 300) / 1000 = 450 bps   → 0.045 USD per call

10000 such calls aggregated:
  total_cost_bps_int = 4_500_000n
  avg_cost_usd       = Number(4_500_000n) / 10000 / 10000
                     = 450 / 10000
                     = 0.045 USD     (no float drift; verified via toBeCloseTo(0.045, 10))

A third worked example showing the cost-tier difference:

1500 tokens @ 15 bps/kilotoken (gpt-4o-mini seed value):
  (1500 * 15) / 1000  = integer-bigint: 22500n / 1000n = 22n bps
  costUsd             = Number(22n) / 10000
                      = 0.0022 USD

  Note: bigint integer division truncates — 22.5 floor 22. The
  contract documents this as the expected behavior; presentation
  layer rounds to 2 decimals if needed (P1.5.7's router_stats).

3. Invariant verification matrix

3.1 Cost-computation invariants (contract §3)

ID Statement Test
I-COST-1 Reads snap row for modelId. cost.test.ts “1500 tokens at 300 bps/kilotoken → 0.045 USD” — verifies the lookup.
I-COST-2 Missing modelId → 0. cost.test.ts “missing modelId in snapshot → 0 USD”.
I-COST-3 Missing snapshot → 0. cost.test.ts “snapshot omitted → 0 USD”.
I-COST-4 Zero-cost row → 0. cost.test.ts “zero-cost row (free tier) → 0 USD”.
I-COST-5 Zero tokens → 0. cost.test.ts “0 prompt + 0 completion → 0 USD”.
I-COST-6 Formula. cost.test.ts golden-vector cases × 4.
I-COST-7 Bigint overflow safety. cost.test.ts “handles 10^9 tokens × 1000 bps”.
I-COST-8 Never NaN/Infinity; negative tokens → 0. cost.test.ts “result is never NaN nor Infinity” + “negative tokens → 0 USD”.
I-COST-9 Deterministic. cost.test.ts “deterministic — identical input → identical output across 100 invocations”.

All 9 cost-computation invariants verified.

3.2 Aggregate invariants (contract §4)

ID Statement Test
I-AGG-1 calls_total === successes + failures. cost.test.ts “calls_total === successes + failures (8 + 2)”.
I-AGG-2 success_rate === successes / calls_total. cost.test.ts “success_rate at 80%”.
I-AGG-3 avg_cost_usd === Σ(cost) / successes. cost.test.ts “avg_cost_usd reflects actual USD sum” + “avg_cost_usd computed over successes only”.
I-AGG-4 p50 median; lower-of-two for even. cost.test.ts “p50 over 5 calls (odd)” + “p50 over 6 calls (even)”.
I-AGG-5 Ring buffer bounded at 1000. cost.test.ts “ring buffer bound: 1500 calls → p50 over the last 1000” + “ring buffer respects ROUTER_LATENCY_RING_SIZE exactly”.
I-AGG-6 Per-model isolation. cost.test.ts “kimi stats do not contaminate claude stats” + “different models scored independently”.
I-AGG-7 resetRouterStats('claude') clears one model. cost.test.ts “resetRouterStats(modelId) clears only that model”.
I-AGG-8 resetRouterStats() clears all. cost.test.ts “resetRouterStats() clears all models”.
I-AGG-9 Result is frozen. cost.test.ts “outer result is frozen” + “models object is frozen” + “per-model entries are frozen” + “attempting to mutate the result is a TypeError”.
I-AGG-10 Failures contribute latency but not cost. cost.test.ts “failure-only model has p50 over failures” + “avg_cost_usd === 0 when no successes”.
I-AGG-11 No float drift across 10000 calls. cost.test.ts “10000 successful calls accumulate without float drift”.

All 11 aggregate invariants verified.

3.3 modelsAttempted invariants (contract §5)

ID Statement Test
I-MA-1 Frozen ReadonlyArray<ModelId>. fallback.test.ts “RouteResult costUsd + modelsAttempted are frozen”.
I-MA-2 Single-attempt success → [winner]. fallback.test.ts “happy path: modelsAttempted === [winner]”.
I-MA-3 Cascade → [A, B] in chain order. fallback.test.ts “cascade: modelsAttempted lists both attempts in chain order”.
I-MA-4 Chain-walk order. (Implicit in I-MA-3.)
I-MA-5 Not exposed on FallbackChainExhaustedError. (Verified by RouteResult type — error path returns no result.)

All 5 modelsAttempted invariants verified.

3.4 routeRequest-body invariants (contract §6, P1.5.5 I1–I19 carry-over + P1.5.6 additions)

P1.5.5 invariants I1–I19 untouched. New P1.5.6 invariants:

ID Statement Test
I-RR-20 Success → recordRouterCall(winner, {success: true}) once. fallback.test.ts “successful call increments getRouterStats successes”.
I-RR-21 Failure → recordRouterCall(modelId, {success: false}) once. fallback.test.ts “failed cascade increments getRouterStats failures”.
I-RR-22 CB-open / NoAdapter → no recordRouterCall. fallback.test.ts “circuit-open skip does NOT contribute to modelsAttempted” (verifies modelsAttempted exclusion, which is the more comprehensive invariant — if the model contributed to modelsAttempted it would also be in the aggregates).
I-RR-23 costUsd === computeCostUsd(winner, ...). fallback.test.ts “happy path: RouteResult.costUsd set from candidate snapshot” + “cascade: costUsd reflects winner row, not failed-attempt row”.
I-RR-24 modelsAttempted lists chain walk. fallback.test.ts “cascade: modelsAttempted lists both attempts in chain order”.
I-RR-25 RouteResult remains frozen incl. new fields. fallback.test.ts “RouteResult costUsd + modelsAttempted are frozen”.

All 6 new routeRequest-body invariants verified.

4. Append-only verification (Phase 0 compatibility)

The “Phase 0 callers destructuring only the original fields still compile” test (fallback.test.ts:1252) is a runtime + compile-time check that the original Phase 0 destructuring pattern continues to work without modification:

const { model, content, finishReason, promptTokens, completionTokens, latencyMs } = result;

TypeScript accepts the extra costUsd + modelsAttempted fields on the source object (structural typing) and the runtime binding is correct. Zero existing fields removed or renamed. Type widening: none.

5. Test-count delta evidence

Base (origin/main @ c284ad22)              : 3275 tests
Post P1.5.6 implementation (this branch)   : 3325 tests
Delta                                      : +50 tests

Per-file breakdown:
  cost.test.ts (new)                       : +41
  fallback.test.ts (cost block appended)   :  +9
  All other tests                          :   0 changes

6. File-by-file change summary

File Net LOC Description
src/domains/router/cost.ts (new) +371 Per-call cost calc + in-memory aggregates + ring buffer.
src/__tests__/domains/router/cost.test.ts (new) +461 41 golden-vector + aggregate + ring-buffer + reset + freeze tests.
src/domains/router/fallback.ts (modify) +73 / -12 Append costUsd + modelsAttempted to RouteResult; wire recordRouterCall into success / failure paths; track modelsAttempted across walk.
src/__tests__/domains/router/fallback.test.ts (modify) +220 / 0 Cost + modelsAttempted describe block (9 tests).
src/domains/router/index.ts (modify) +9 / -2 Barrel re-export of ./cost.js + header comment update.
Total +1142 / -14  

7. Forbidden checks

Forbidden Status
Floating-point accumulation OK — all per-call sums are bigint (total_cost_bps_int); single Number / 10000 / successes at the read edge in getRouterStats.
Unbounded memory growth OK — ring buffer fixed at ROUTER_LATENCY_RING_SIZE = 1000 per model; verified by “ring buffer respects ROUTER_LATENCY_RING_SIZE exactly”.
AMS_* env vars OK — module reads no environment variables.
MCP tool registration OK — src/server.ts untouched; no new tools registered.
RouteResult field type changes OK — existing 6 fields preserved byte-identically; only 2 new fields appended.
Main-checkout edits OK — all work in .worktrees/claude/p1-5-6-cost.
routeRequest signature break OK — signature byte-identical; only the result-object shape grows additively.

8. Pre-flight readings consulted

  • CLAUDE.md (project + worktree copy)
  • src/domains/router/fallback.ts (P1.5.5 RouteResult shape)
  • src/domains/router/cost.ts (new module, this slice)
  • src/domains/router/circuit.ts (P1.5.5 CB module — referenced for aggregate-exclusion invariant)
  • src/domains/router/scoring.ts (P1.5.1 — ModelCandidate type, ModelId union, cost_bps_per_kilotoken field)
  • src/domains/integrations/claude.ts (W3 — CompletionResult shape supplying promptTokens + completionTokens)
  • src/db/migrations/009_model_candidates.sql (P1.5.9 — cost_bps_per_kilotoken column + seed values)
  • docs/3-world/social/llm.md §Candidate table
  • docs/architecture/decisions/ADR-005-multi-model-defer.md §Implementation step 4
  • docs/guides/implementation/task-prompts/p1.5-delta-router-graduation.md §P1.5.6
  • docs/contracts/p1-5-5-fallback-cb-contract.md §6 (invariant carry-over)

9. Deviations from staging-file slice section

None of substance. Minor refinements:

  • The staging file’s prompt section called for RouterStats = { calls_total, successes, failures, avg_cost_usd, p50_latency_ms, success_rate }. Implementation matches byte-for-byte.
  • The dispatch packet hinted at success_rate: successes / (successes + failures). Implementation uses successes / calls_total, which is mathematically equivalent (calls_total === successes + failures per I-AGG-1) but more robust against future record types.
  • The “exact 1 × recordRouterCall per successful adapter call” invariant (contract I-RR-20) is verified via the aggregate counts, not via a spy. The spy-pattern path would require mocking the cost module, which adds complexity for no additional confidence (the in-memory aggregates ARE the spy).

10. Confirmation: P1.5.5 fallback tests still pass

The pre-existing P1.5.5 fallback test suite (53 tests, all 17 of the original Phase-0 + the cascade / CB / timeout / shape-flip coverage) continues to pass byte-identically. The new “cost + modelsAttempted” describe block is appended at the end of the file; no existing assertion was modified. Verified by running the suite in isolation: 58 / 58 pass.

11. Gate-tail evidence (most-recent run)

Test Suites: 76 passed, 76 total
Tests:       3325 passed, 3325 total
Time:        34.864 s

12. Out-of-scope reminder (carried forward)

  • router_stats MCP tool → P1.5.7.
  • Cost parity tests across arbiters → P1.5.8.
  • ζ Decision Trail recording of cost → P1.5.10.
  • DB persistence of stats → Phase 2+.
  • fallbackDepth on RouteResult → not in this dispatch.

Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.