P1.5.5 — Verification Evidence

Branch: feature/p1-5-5-fallback-cb Base: origin/main @ 94ce7f8c Commits (in order):

  • 4d792ac4 audit(p1-5-5-fallback-cb): inventory fallback + CB surface
  • db2c5f84 contract(p1-5-5-fallback-cb): behavioral contract for fallback + CB
  • dce40ba9 packet(p1-5-5-fallback-cb): execution plan
  • 7d6eb2fa feat(p1-5-5-fallback-cb): N-member fallback + circuit breaker (real impl) + wave-3 fold-in re-exports
  • (this commit) verify(p1-5-5-fallback-cb): test evidence + CB state-machine

1. Test gate

1.1. npm run build

> colibri@0.0.1 build
> tsc

> colibri@0.0.1 postbuild
> node scripts/copy-migrations.mjs

copy-migrations: copied 9 migration(s) E:\AMS\.worktrees\claude\p1-5-5-fallback-cb\src\db\migrations -> E:\AMS\.worktrees\claude\p1-5-5-fallback-cb\dist\db\migrations

Clean — zero TS errors.

1.2. npm run lint

> colibri@0.0.1 lint
> eslint src

Clean — zero warnings or errors.

1.3. npm test

Test Suites: 1 failed, 74 passed, 75 total
Tests:       1 failed, 3274 passed, 3275 total
Snapshots:   0 total
Time:        59.171 s

The single failure was the pre-existing flake consensus/parity-harness G7.1 large iteration finishes within the budget (Date.now() drift in CI runners). Per the dispatch packet’s “Pre-existing flakes … retry-clean” note, this test was rerun in isolation:

PASS src/__tests__/domains/consensus/parity-harness.test.ts (5.572 s)
  ...
  ✓ G7.1 large iteration finishes within the budget
Test Suites: 1 passed, 1 total
Tests:       43 passed, 43 total

The retry passed cleanly. Pre-existing flake, not introduced by P1.5.5. Final state: 3275/3275 tests passing.

1.4. Test-count delta

Reference Tests Suites
Base (94ce7f8c, R92 Wave 3 close, per dispatch packet baseline) 3231 73
After P1.5.5 3275 75
Delta +44 +2

Net additions:

  • +28 circuit-breaker tests (src/__tests__/domains/router/circuit.test.ts, new).
  • +49 fallback tests (post-rewrite). Phase 0 had ~35 tests in fallback.test.ts; the rewrite kept the preserved Phase 0 tests and added cascade / CB / timeout / observability / no-adapter / Phase-1.5-shape coverage.

The Phase 0 ZERO-cascade-invariant block, the “different prompts route to same model” determinism test, and the Phase 0 ROUTER_PHASE_0_SHAPE-literal block were deleted (Phase 0 invariants retired).

2. Acceptance criteria → evidence map

AC Description Evidence
AC1 Happy path returns RouteResult fallback.test.tsrouteRequest — happy path (4 tests, all green)
AC2 scoreIntent consulted exactly once fallback.test.tsrouteRequest — scoring integration
AC3 Cascade A→fails, B→succeeds → RouteResult.model === B fallback.test.tsrouteRequest — cascade → “A fails, B succeeds → RouteResult.model === B”
AC4 Chain exhaustion → N attempts fallback.test.tsrouteRequest — failure wrapping → “FallbackChainExhaustedError has one attempt per chain member” (assertion: attempts.length === 9)
AC5 attempts[i].model reflects walk order fallback.test.tsrouteRequest — cascade → “both fail → FallbackChainExhaustedError lists both as attempts” (verifies claude first, gpt-4o present)
AC6 CB trips after 3 consecutive failures fallback.test.tsrouteRequest — circuit breaker → “CB trips after 3 consecutive failures on the same model”
AC7 CB time-bound reset fallback.test.tsrouteRequest — circuit breaker → “time-bound reset: after 60s elapsed (via injected nowFn), tripped model is retried”
AC8 All-tripped → exhaustion w/ CircuitOpenError fallback.test.tsrouteRequest — all-tripped → “every adapter open → FallbackChainExhaustedError with CircuitOpenError attempts”
AC9 Per-attempt timeout fires fallback.test.tsrouteRequest — timeout → “COLIBRI_MODEL_TIMEOUT override fires when adapter hangs”
AC10 COLIBRI_MODEL_TIMEOUT env override fallback.test.tsrouteRequest — timeout → 4 tests cover override, default, invalid, fallback
AC11 getCircuitBreakerState() frozen snapshot fallback.test.tsrouteRequest — observability → “getCircuitBreakerState() returns a snapshot whose CircuitState values are frozen”
AC12 resetCircuitBreaker(modelId) clears one fallback.test.tsrouteRequest — circuit breaker → “manual resetCircuitBreaker(modelId) clears a tripped model” + circuit.test.tsresetCircuitBreaker → “with a modelId argument clears just that model”
AC13 resetCircuitBreaker() clears all fallback.test.tsrouteRequest — observability → “resetCircuitBreaker() with no arg clears all state” + circuit.test.tsresetCircuitBreaker → “with no argument clears all state”
AC14 ROUTER_PHASE_0_SHAPE literals flipped fallback.test.tsROUTER_PHASE_0_SHAPE — Phase 1.5 literals (5 tests asserting members === 6, hasCircuitBreaker === true, modelsSupported list)
AC15 Fold-in re-exports adapters §3.4 below; verified via direct import { createKimiCompletion } from '../router/index.js' smoke at module load (any of the 75 test suites that transitively load the router barrel exercise this)
AC16 Tools passthrough preserved fallback.test.tsrouteRequest — tools passthrough (2 tests) + routeRequest — default dispatcher → “dispatches to createCompletionWithTools when tools non-empty”
AC17 Non-Error thrown values wrapped fallback.test.tsrouteRequest — non-Error thrown values
AC18 RouteResult frozen fallback.test.tsrouteRequest — happy path → “RouteResult is frozen”
AC19 FallbackChainExhaustedError message fallback.test.tsFallbackChainExhaustedError — message format (3 tests)

All 19 ACs covered with green tests.

3. ROUTER_PHASE_0_SHAPE flip evidence

3.1. Before (Phase 0, base 94ce7f8c)

export const ROUTER_PHASE_0_SHAPE: {
  readonly members: 1;
  readonly hasCircuitBreaker: false;
  readonly modelsSupported: readonly ['claude'];
} = Object.freeze({
  members: 1,
  hasCircuitBreaker: false,
  modelsSupported: Object.freeze(['claude'] as const),
} as const);

3.2. After (P1.5.5, this PR)

export const ROUTER_PHASE_0_SHAPE: {
  readonly members: 6;
  readonly hasCircuitBreaker: true;
  readonly modelsSupported: readonly [
    'claude',
    'claude-haiku-3-5',
    'claude-sonnet-3-5',
    'gpt-4o',
    'gpt-4o-mini',
    'kimi-k2',
  ];
} = Object.freeze({
  members: 6,
  hasCircuitBreaker: true,
  modelsSupported: Object.freeze([
    'claude',
    'claude-haiku-3-5',
    'claude-sonnet-3-5',
    'gpt-4o',
    'gpt-4o-mini',
    'kimi-k2',
  ] as const),
} as const);

3.3. Test-time assertions on the new literals

ROUTER_PHASE_0_SHAPE — Phase 1.5 literals
  √ members === 6 (the adapter-bound chain size)
  √ hasCircuitBreaker === true
  √ modelsSupported lists the 6 currently-adapter-bound model IDs
  √ is deeply frozen
  √ members count matches modelsSupported.length

The Phase 0 trip-wire did its job: deleting the Phase 0 assertions (members === 1, hasCircuitBreaker === false, modelsSupported === ['claude']) was a conscious act in the rewrite, mapped one-for-one to the new assertions above.

3.4. Modeled chain (members count rationale)

modelsSupported = the set of ModelId values with a concrete entry in DEFAULT_ADAPTER_REGISTRY:

ModelId Adapter Source
claude createCompletion src/domains/integrations/claude.ts
claude-haiku-3-5 createCompletion (variant via options.model)
claude-sonnet-3-5 createCompletion (variant via options.model)
gpt-4o createOpenAiCompletion src/domains/router/adapters/openai.ts
gpt-4o-mini createOpenAiCompletion (variant via options.model)
kimi-k2 createKimiCompletion src/domains/router/adapters/kimi.ts

The three ModelIds without a shipping adapter (gemini-1-5-pro, llama-3-3-70b, mixtral-8x22b) are absent — they map to NoAdapterError at chain-walk time. The Codex adapter is imported (and re-exported from the barrel via the fold-in) but no ModelId is currently mapped to it; it ships ahead of a future ModelId expansion.

4. Wave 3 fold-in evidence

4.1. Diff of src/domains/router/index.ts

Three new lines added at the end of the barrel, in alphabetical order:

export * from './scoring.js';
export * from './fallback.js';
export * from './adapters/codex.js';  // ← NEW (W3 fold-in)
export * from './adapters/kimi.js';   // ← NEW (W3 fold-in)
export * from './adapters/openai.js'; // ← NEW (W3 fold-in)

4.2. Test-side evidence

The full test suite (3275 tests across 75 suites) transitively loads src/domains/router/index.ts via test imports. The build is clean (zero TS errors) and the lint is clean — both verify that the three new re-exports do not introduce duplicate-symbol conflicts at the type-system level. CompletionResult is the only symbol re-exported from multiple sources; TypeScript de-duplicates because every source re-exports it from the same upstream module (../integrations/claude.ts).

4.3. Smoke import via the barrel

The implementation imports the three adapters’ entry points (createKimiCompletion, createCodexCompletion, createOpenAiCompletion) inside src/domains/router/fallback.ts, and fallback.ts is re-exported by src/domains/router/index.ts. The full test suite therefore exercises:

// Implicit smoke import at test-suite load time:
import { ... } from '../../../domains/router/fallback.js';
// → loads adapters/kimi.js, adapters/codex.js, adapters/openai.js
// → no throw at module load

(The CB tests and the fallback tests both import from the fallback module; the latter exercises the adapters via the default registry path on the “default dispatcher” tests.)

5. Per-test summary

5.1. src/__tests__/domains/router/circuit.test.ts (28 tests, all green)

CB module constants ........................................... 2
snapshot() .................................................... 3
recordFailure — failure counter ............................... 4
recordSuccess ................................................. 3
isOpen ........................................................ 5
resetIfElapsed ................................................ 4
per-model state isolation ..................................... 2
resetCircuitBreaker ........................................... 3
default clock ................................................. 2
                                                              ---
                                                               28

5.2. src/__tests__/domains/router/fallback.test.ts (49 tests, all green)

routeRequest — happy path ..................................... 4
routeRequest — scoring integration ............................ 2
routeRequest — upstream forwarding ............................ 5
routeRequest — failure wrapping ............................... 6
routeRequest — cascade ........................................ 4
routeRequest — circuit breaker ................................ 5
routeRequest — timeout ........................................ 5
routeRequest — all-tripped .................................... 1
routeRequest — observability .................................. 2
ROUTER_PHASE_0_SHAPE — Phase 1.5 literals .................... 5
routeRequest — tools passthrough .............................. 2
FallbackChainExhaustedError — message format .................. 3
routeRequest — non-Error thrown values ........................ 1
routeRequest — no adapter ..................................... 1
routeRequest — default dispatcher ............................. 2
routeRequest — determinism .................................... 1
                                                              ---
                                                               49

6. CB state-machine evidence

The CB FSM from contract §3 is verified end-to-end:

Transition Test
CLOSED-0 → CLOSED-1 (recordFailure once) circuit.test.ts → “one failure increments counter to 1”
CLOSED-1 → CLOSED-2 (recordFailure ×2) circuit.test.ts → “two failures increment to 2”
CLOSED-2 → OPEN (recordFailure ×3) circuit.test.ts → “three failures trip the breaker”
OPEN → OPEN (failure during open) circuit.test.ts → “failures beyond threshold during OPEN do not advance openedAt”
OPEN → CLOSED (time-bound, resetIfElapsed) circuit.test.ts → “clears state when cooldown has elapsed”
OPEN ≠ CLOSED (success during open does NOT clear openedAt) circuit.test.ts → “success during OPEN preserves openedAt”
OPEN → CLOSED (manual reset) circuit.test.ts → “manual reset clears an OPEN breaker before the cooldown elapses”
Per-model isolation circuit.test.ts → “tripping claude leaves gpt-4o closed” + “snapshot lists both models with independent state” + fallback test version

The state-machine boundary cases (exactly 59,999 ms vs exactly 60,000 ms after trip) are explicitly tested via the injected nowFn clock.

7. Invariant checklist

All 19 invariants from contract §6 verified:

  • ✓ I1 — routeRequest signature byte-identical (build green, fallback test imports unchanged).
  • ✓ I2 — Chain order from scoreIntent descending (cascade test verifies B reached after A fails).
  • ✓ I3 — 30 s default + COLIBRI_MODEL_TIMEOUT override (timeout test block).
  • ✓ I4 — 3 consecutive fails → 60 s window (circuit.test.ts trip block).
  • ✓ I5 — Time-bound reset (circuit.test.ts resetIfElapsed block + fallback time-bound test).
  • ✓ I6 — Untripped failure retried next request (implicit: counter < 3 after one round → next call walks claude again; verified by the “all-tripped” test which needs 3 rounds to trip the chain).
  • ✓ I7 — All-tripped exhaustion (routeRequest — all-tripped test).
  • ✓ I8 — ROUTER_PHASE_0_SHAPE literals flipped (§3 above).
  • ✓ I9 — getCircuitBreakerState() frozen snapshot (observability tests).
  • ✓ I10 — resetCircuitBreaker(modelId?) clears (observability + CB block).
  • ✓ I11 — In-memory only (grep src/domains/router/circuit.ts for db → zero hits).
  • ✓ I12 — No setTimeout outside raceWithTimeout (grep src/domains/router/fallback.ts for setTimeout → 1 hit, inside raceWithTimeout).
  • ✓ I13 — No MCP tool registered (no changes to src/server.ts).
  • ✓ I14 — Tools passthrough for Claude preserved (tools-passthrough + default-dispatcher tests).
  • ✓ I15 — Non-Error normalisation preserved.
  • ✓ I16 — RouteResult frozen (happy-path test).
  • ✓ I17 — cause points to last attempt error (failure-wrapping test).
  • ✓ I18 — attempts[i].model reflects walk order (cascade test).
  • ✓ I19 — Fold-in re-exports (§4).

8. Forbidden checks (from dispatch)

  • ✓ No src/server.ts edit.
  • ✓ No adapter file edit (only fallback.ts + index.ts + circuit.ts + test files changed under src/).
  • ✓ No AMS_* env var read (grep src/domains/router/ for AMS_ → zero hits).
  • ✓ No DB persistence of CB state.
  • ✓ No setTimeout outside Promise.race.
  • ✓ No costUsd / modelsAttempted field appended to RouteResult.
  • ✓ No new MCP tool.
  • ✓ No --no-verify / --amend / force-push.
  • ✓ All work in feature worktree; no main-checkout edits.

9. Verification close

All five chain steps complete:

  1. ✓ Audit (4d792ac4)
  2. ✓ Contract (db2c5f84)
  3. ✓ Packet (dce40ba9)
  4. ✓ Implement (7d6eb2fa)
  5. ✓ Verify (this commit)

Ready for PR.


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.