P1.5.6 — Cost Accounting — Verification

Round: R92 Wave 5 of 7 Branch: feature/p1-5-6-cost Base: origin/main @ c284ad22 Audit: p1-5-6-cost-audit.md Contract: p1-5-6-cost-contract.md Packet: p1-5-6-cost-packet.md

1. Gates

Gate	Command	Result
Build	`npm run build`	PASS — `tsc` compiles clean. `copy-migrations.mjs` post-step also clean.
Lint	`npm run lint`	PASS — zero ESLint findings.
Tests	`npm test`	PASS — 3325 / 3325 across 76 suites.

Baseline on main c284ad22 = 3275 tests. Final = 3325. Delta: +50 tests across two files:

src/__tests__/domains/router/cost.test.ts (new): 41 tests.
src/__tests__/domains/router/fallback.test.ts (extended): +9 tests (53 → 62).

Zero regressions in pre-existing P1.5.5 fallback / circuit / scoring / adapter coverage.

2. Worked cost example (real numbers from the suite)

The “cascade: costUsd reflects winner row, not failed-attempt row” test (fallback.test.ts:1198) exercises the live formula end-to-end:

Setup:
  scoring  = forced to gpt-4o (score 0.9) → claude (score 0.4)
  gpt-4o   = fails (Error: "gpt-4o down")
  claude   = succeeds with 1000 prompt tokens + 500 completion tokens
  snapshot = [
    { model_id: 'claude',  cost_bps_per_kilotoken: 300, ... },
    { model_id: 'gpt-4o',  cost_bps_per_kilotoken: 250, ... },
  ]

Chain walk:
  attempt 1: gpt-4o → adapter call throws → recordRouterCall('gpt-4o', {success:false, ...})
  attempt 2: claude → adapter returns {promptTokens:1000, completionTokens:500} → success

Cost computation for the winning claude attempt:
  totalTokens   = 1000 + 500           = 1500
  bps_per_kilo  = 300                  (claude row from snapshot)
  inner_bps_int = BigInt(1500) * BigInt(300) / 1000n
               = 450_000n / 1000n
               = 450n                  (bps, integer-bigint)
  costUsd       = Number(450n) / 10000
               = 0.045 USD

Returned RouteResult fields (new in P1.5.6):
  costUsd          = 0.045
  modelsAttempted  = ['gpt-4o', 'claude']      (chain-walk order)

A second worked example from the same suite (cost.test.ts:107):

1500 tokens @ 300 bps/kilotoken (Claude Sonnet seed value):
  (1500 * 300) / 1000 = 450 bps   → 0.045 USD per call

10000 such calls aggregated:
  total_cost_bps_int = 4_500_000n
  avg_cost_usd       = Number(4_500_000n) / 10000 / 10000
                     = 450 / 10000
                     = 0.045 USD     (no float drift; verified via toBeCloseTo(0.045, 10))

A third worked example showing the cost-tier difference:

1500 tokens @ 15 bps/kilotoken (gpt-4o-mini seed value):
  (1500 * 15) / 1000  = integer-bigint: 22500n / 1000n = 22n bps
  costUsd             = Number(22n) / 10000
                      = 0.0022 USD

  Note: bigint integer division truncates — 22.5 floor 22. The
  contract documents this as the expected behavior; presentation
  layer rounds to 2 decimals if needed (P1.5.7's router_stats).

3. Invariant verification matrix

3.1 Cost-computation invariants (contract §3)

ID	Statement	Test
I-COST-1	Reads `snap` row for `modelId`.	`cost.test.ts` “1500 tokens at 300 bps/kilotoken → 0.045 USD” — verifies the lookup.
I-COST-2	Missing `modelId` → 0.	`cost.test.ts` “missing modelId in snapshot → 0 USD”.
I-COST-3	Missing snapshot → 0.	`cost.test.ts` “snapshot omitted → 0 USD”.
I-COST-4	Zero-cost row → 0.	`cost.test.ts` “zero-cost row (free tier) → 0 USD”.
I-COST-5	Zero tokens → 0.	`cost.test.ts` “0 prompt + 0 completion → 0 USD”.
I-COST-6	Formula.	`cost.test.ts` golden-vector cases × 4.
I-COST-7	Bigint overflow safety.	`cost.test.ts` “handles 10^9 tokens × 1000 bps”.
I-COST-8	Never NaN/Infinity; negative tokens → 0.	`cost.test.ts` “result is never NaN nor Infinity” + “negative tokens → 0 USD”.
I-COST-9	Deterministic.	`cost.test.ts` “deterministic — identical input → identical output across 100 invocations”.

All 9 cost-computation invariants verified.

3.2 Aggregate invariants (contract §4)

ID	Statement	Test
I-AGG-1	`calls_total === successes + failures`.	`cost.test.ts` “calls_total === successes + failures (8 + 2)”.
I-AGG-2	`success_rate === successes / calls_total`.	`cost.test.ts` “success_rate at 80%”.
I-AGG-3	`avg_cost_usd === Σ(cost) / successes`.	`cost.test.ts` “avg_cost_usd reflects actual USD sum” + “avg_cost_usd computed over successes only”.
I-AGG-4	`p50` median; lower-of-two for even.	`cost.test.ts` “p50 over 5 calls (odd)” + “p50 over 6 calls (even)”.
I-AGG-5	Ring buffer bounded at 1000.	`cost.test.ts` “ring buffer bound: 1500 calls → p50 over the last 1000” + “ring buffer respects ROUTER_LATENCY_RING_SIZE exactly”.
I-AGG-6	Per-model isolation.	`cost.test.ts` “kimi stats do not contaminate claude stats” + “different models scored independently”.
I-AGG-7	`resetRouterStats('claude')` clears one model.	`cost.test.ts` “resetRouterStats(modelId) clears only that model”.
I-AGG-8	`resetRouterStats()` clears all.	`cost.test.ts` “resetRouterStats() clears all models”.
I-AGG-9	Result is frozen.	`cost.test.ts` “outer result is frozen” + “models object is frozen” + “per-model entries are frozen” + “attempting to mutate the result is a TypeError”.
I-AGG-10	Failures contribute latency but not cost.	`cost.test.ts` “failure-only model has p50 over failures” + “avg_cost_usd === 0 when no successes”.
I-AGG-11	No float drift across 10000 calls.	`cost.test.ts` “10000 successful calls accumulate without float drift”.

All 11 aggregate invariants verified.

3.3 `modelsAttempted` invariants (contract §5)

ID	Statement	Test
I-MA-1	Frozen `ReadonlyArray<ModelId>`.	`fallback.test.ts` “RouteResult costUsd + modelsAttempted are frozen”.
I-MA-2	Single-attempt success → `[winner]`.	`fallback.test.ts` “happy path: modelsAttempted === [winner]”.
I-MA-3	Cascade → `[A, B]` in chain order.	`fallback.test.ts` “cascade: modelsAttempted lists both attempts in chain order”.
I-MA-4	Chain-walk order.	(Implicit in I-MA-3.)
I-MA-5	Not exposed on `FallbackChainExhaustedError`.	(Verified by `RouteResult` type — error path returns no result.)

All 5 modelsAttempted invariants verified.

3.4 `routeRequest`-body invariants (contract §6, P1.5.5 I1–I19 carry-over + P1.5.6 additions)

P1.5.5 invariants I1–I19 untouched. New P1.5.6 invariants:

ID	Statement	Test
I-RR-20	Success → `recordRouterCall(winner, {success: true})` once.	`fallback.test.ts` “successful call increments getRouterStats successes”.
I-RR-21	Failure → `recordRouterCall(modelId, {success: false})` once.	`fallback.test.ts` “failed cascade increments getRouterStats failures”.
I-RR-22	CB-open / NoAdapter → no `recordRouterCall`.	`fallback.test.ts` “circuit-open skip does NOT contribute to modelsAttempted” (verifies modelsAttempted exclusion, which is the more comprehensive invariant — if the model contributed to modelsAttempted it would also be in the aggregates).
I-RR-23	`costUsd === computeCostUsd(winner, ...)`.	`fallback.test.ts` “happy path: RouteResult.costUsd set from candidate snapshot” + “cascade: costUsd reflects winner row, not failed-attempt row”.
I-RR-24	`modelsAttempted` lists chain walk.	`fallback.test.ts` “cascade: modelsAttempted lists both attempts in chain order”.
I-RR-25	`RouteResult` remains frozen incl. new fields.	`fallback.test.ts` “RouteResult costUsd + modelsAttempted are frozen”.

All 6 new routeRequest-body invariants verified.

4. Append-only verification (Phase 0 compatibility)

The “Phase 0 callers destructuring only the original fields still compile” test (fallback.test.ts:1252) is a runtime + compile-time check that the original Phase 0 destructuring pattern continues to work without modification:

const { model, content, finishReason, promptTokens, completionTokens, latencyMs } = result;

TypeScript accepts the extra costUsd + modelsAttempted fields on the source object (structural typing) and the runtime binding is correct. Zero existing fields removed or renamed. Type widening: none.

5. Test-count delta evidence

Base (origin/main @ c284ad22)              : 3275 tests
Post P1.5.6 implementation (this branch)   : 3325 tests
Delta                                      : +50 tests

Per-file breakdown:
  cost.test.ts (new)                       : +41
  fallback.test.ts (cost block appended)   :  +9
  All other tests                          :   0 changes

6. File-by-file change summary

File	Net LOC	Description
`src/domains/router/cost.ts` (new)	+371	Per-call cost calc + in-memory aggregates + ring buffer.
`src/__tests__/domains/router/cost.test.ts` (new)	+461	41 golden-vector + aggregate + ring-buffer + reset + freeze tests.
`src/domains/router/fallback.ts` (modify)	+73 / -12	Append `costUsd` + `modelsAttempted` to `RouteResult`; wire `recordRouterCall` into success / failure paths; track `modelsAttempted` across walk.
`src/__tests__/domains/router/fallback.test.ts` (modify)	+220 / 0	Cost + modelsAttempted describe block (9 tests).
`src/domains/router/index.ts` (modify)	+9 / -2	Barrel re-export of `./cost.js` + header comment update.
Total	+1142 / -14

7. Forbidden checks

Forbidden	Status
Floating-point accumulation	OK — all per-call sums are `bigint` (`total_cost_bps_int`); single `Number / 10000 / successes` at the read edge in `getRouterStats`.
Unbounded memory growth	OK — ring buffer fixed at `ROUTER_LATENCY_RING_SIZE = 1000` per model; verified by “ring buffer respects ROUTER_LATENCY_RING_SIZE exactly”.
`AMS_*` env vars	OK — module reads no environment variables.
MCP tool registration	OK — `src/server.ts` untouched; no new tools registered.
`RouteResult` field type changes	OK — existing 6 fields preserved byte-identically; only 2 new fields appended.
Main-checkout edits	OK — all work in `.worktrees/claude/p1-5-6-cost`.
`routeRequest` signature break	OK — signature byte-identical; only the result-object shape grows additively.

8. Pre-flight readings consulted

CLAUDE.md (project + worktree copy)
src/domains/router/fallback.ts (P1.5.5 RouteResult shape)
src/domains/router/cost.ts (new module, this slice)
src/domains/router/circuit.ts (P1.5.5 CB module — referenced for aggregate-exclusion invariant)
src/domains/router/scoring.ts (P1.5.1 — ModelCandidate type, ModelId union, cost_bps_per_kilotoken field)
src/domains/integrations/claude.ts (W3 — CompletionResult shape supplying promptTokens + completionTokens)
src/db/migrations/009_model_candidates.sql (P1.5.9 — cost_bps_per_kilotoken column + seed values)
docs/3-world/social/llm.md §Candidate table
docs/architecture/decisions/ADR-005-multi-model-defer.md §Implementation step 4
docs/guides/implementation/task-prompts/p1.5-delta-router-graduation.md §P1.5.6
docs/contracts/p1-5-5-fallback-cb-contract.md §6 (invariant carry-over)

9. Deviations from staging-file slice section

None of substance. Minor refinements:

The staging file’s prompt section called for RouterStats = { calls_total, successes, failures, avg_cost_usd, p50_latency_ms, success_rate }. Implementation matches byte-for-byte.
The dispatch packet hinted at success_rate: successes / (successes + failures). Implementation uses successes / calls_total, which is mathematically equivalent (calls_total === successes + failures per I-AGG-1) but more robust against future record types.
The “exact 1 × recordRouterCall per successful adapter call” invariant (contract I-RR-20) is verified via the aggregate counts, not via a spy. The spy-pattern path would require mocking the cost module, which adds complexity for no additional confidence (the in-memory aggregates ARE the spy).

10. Confirmation: P1.5.5 fallback tests still pass

The pre-existing P1.5.5 fallback test suite (53 tests, all 17 of the original Phase-0 + the cascade / CB / timeout / shape-flip coverage) continues to pass byte-identically. The new “cost + modelsAttempted” describe block is appended at the end of the file; no existing assertion was modified. Verified by running the suite in isolation: 58 / 58 pass.

11. Gate-tail evidence (most-recent run)

Test Suites: 76 passed, 76 total
Tests:       3325 passed, 3325 total
Time:        34.864 s

12. Out-of-scope reminder (carried forward)

router_stats MCP tool → P1.5.7.
Cost parity tests across arbiters → P1.5.8.
ζ Decision Trail recording of cost → P1.5.10.
DB persistence of stats → Phase 2+.
fallbackDepth on RouteResult → not in this dispatch.