P1.5.5 — N-member Fallback Chain + Circuit Breaker — Behavioral Contract

1. Module surface

1.1. src/domains/router/circuit.ts (new)

export const CIRCUIT_FAILURE_THRESHOLD: 3;
export const CIRCUIT_COOLDOWN_MS: 60_000;

export interface CircuitState {
  readonly failures: number;
  readonly openedAt: number | null;
}

export interface CircuitBreakerOptions {
  readonly nowFn?: () => number; // injectable clock; default Date.now
}

export function recordFailure(modelId: ModelId, options?: CircuitBreakerOptions): void;
export function recordSuccess(modelId: ModelId): void;
export function isOpen(modelId: ModelId, options?: CircuitBreakerOptions): boolean;
export function resetIfElapsed(modelId: ModelId, options?: CircuitBreakerOptions): void;
export function resetCircuitBreaker(modelId?: ModelId): void;
export function snapshot(): ReadonlyMap<ModelId, CircuitState>;
export function getCircuitBreakerState(): ReadonlyMap<ModelId, CircuitState>; // alias of snapshot()

The module owns a private Map<ModelId, CircuitState> initialised lazily on first access. All mutations are confined to the module — callers never write to the map directly.

1.2. src/domains/router/fallback.ts (modified body)

All existing exports preserved byte-identical in name/signature:

  • routeRequest(prompt, options?) → Promise<RouteResult>
  • FallbackChainExhaustedError
  • RouteOptions, RouteResult, FallbackAttempt, CompletionFn, CompletionFnOptions, ScoringFn
  • ROUTER_PHASE_0_SHAPE (the name; the literal values change)

New exports (P1.5.5 additions; consumed by P1.5.7 + tests):

// Re-exports of circuit.ts members at the fallback module level so
// the public δ surface stays a single import path.
export {
  CIRCUIT_FAILURE_THRESHOLD,
  CIRCUIT_COOLDOWN_MS,
  getCircuitBreakerState,
  resetCircuitBreaker,
} from './circuit.js';
export type { CircuitState } from './circuit.js';

// New error types raised by the chain walk (NOT raised on top-level
// success path — callers see them only when inspecting attempts[]).
export class RouterTimeoutError extends Error { readonly code = 'ROUTER_TIMEOUT' as const; ... }
export class CircuitOpenError extends Error { readonly code = 'CIRCUIT_OPEN' as const; ... }
export class NoAdapterError extends Error { readonly code = 'NO_ADAPTER' as const; ... }

These three internal-failure-mode errors join AnthropicApiError / AnthropicConfigError / KimiApiError / etc as legal values inside FallbackAttempt.error. They are not part of the top-level thrown surface — the top-level surface remains FallbackChainExhaustedError (and Error for non-Error throwables, normalized at attempt time).

1.3. src/domains/router/index.ts (Wave 3 fold-in)

Added at the end of the barrel, in alphabetical order:

export * from './adapters/codex.js';
export * from './adapters/kimi.js';
export * from './adapters/openai.js';

The pre-existing export * from './scoring.js' and export * from './fallback.js' lines remain unchanged. Order matters only for symbol-conflict resolution; alphabetical was specified by the dispatch packet.

Symbol-conflict note: All four adapter modules (claude / kimi / codex / openai) re-export CompletionResult from ../integrations/claude.ts (or, in openai’s case, also CompletionResult directly). With export * from multiple modules that all re-export the same symbol from the same upstream, TypeScript de-duplicates the re-export and the symbol resolves identically — no conflict. The same is true for AnthropicTool (kimi/codex re-export it; openai does not). The CHANGED literals in ROUTER_PHASE_0_SHAPE re-exported via fallback.js are picked up unconditionally; downstream consumers see the Phase 1.5 shape.

2. ROUTER_PHASE_0_SHAPE flip

Phase 1.5 literal:

export const ROUTER_PHASE_0_SHAPE: {
  readonly members: 6;
  readonly hasCircuitBreaker: true;
  readonly modelsSupported: readonly [
    'claude',
    'claude-haiku-3-5',
    'claude-sonnet-3-5',
    'gpt-4o',
    'gpt-4o-mini',
    'kimi-k2',
  ];
} = Object.freeze({
  members: 6,
  hasCircuitBreaker: true,
  modelsSupported: Object.freeze([
    'claude',
    'claude-haiku-3-5',
    'claude-sonnet-3-5',
    'gpt-4o',
    'gpt-4o-mini',
    'kimi-k2',
  ] as const),
} as const);

The modelsSupported array is the set of ModelIds for which the default adapter registry has a concrete CompletionFn. members equals modelsSupported.length by construction.

Six entries — not nine — because three ModelId values ('gemini-1-5-pro', 'llama-3-3-70b', 'mixtral-8x22b') lack adapters and Codex is wired into the registry but does not yet correspond to a ModelId value (registry key by abstract router ID). The flipped marker tracks “what the chain can actually call”, not “every ModelId in the union”.

3. Circuit breaker FSM

┌──────────┐  recordFailure ×3   ┌─────────────┐  resetIfElapsed (60s passed)  ┌──────────┐
│  CLOSED  │ ──────────────────▶ │   OPEN      │ ─────────────────────────────▶│  CLOSED  │
│ (fail<3) │                     │ (opened60s) │                                │ (fail=0) │
└──────────┘                     └─────────────┘                                └──────────┘
     ▲                                  │
     │                                  │ isOpen()                              ┌──────────┐
     │                                  └───────────────────────────────────────│  SKIP    │
     │                                                                          └──────────┘
     │ recordSuccess                                                                  │
     └────────────────────────────────────────────────────────────────────────────────┘
                              (zero failures; openedAt unchanged)

3.1. State transitions

Trigger Pre-state Post-state Notes
recordFailure from {failures: 0, openedAt: null} CLOSED-0 {failures: 1, openedAt: null} Counter increment only.
recordFailure from {failures: 1} CLOSED-1 {failures: 2, openedAt: null} Counter increment only.
recordFailure from {failures: 2} CLOSED-2 {failures: 3, openedAt: now} Counter reaches threshold ⇒ trip.
recordFailure from {failures: 3, openedAt: t0} OPEN {failures: 4, openedAt: t0} openedAt is NOT updated on further failures during open. The cooldown anchors to the FIRST trip.
recordSuccess from anything * {failures: 0, openedAt: unchanged} Counter reset only. openedAt preserved — open-state remains time-bound.
resetIfElapsed(t1) with t1 - openedAt >= 60_000 OPEN {failures: 0, openedAt: null} Time-bound clear. Always called BEFORE isOpen in the chain walk.
resetIfElapsed(t1) with t1 - openedAt < 60_000 OPEN (unchanged) No-op.
resetCircuitBreaker(modelId) * {failures: 0, openedAt: null} Manual clear, single model.
resetCircuitBreaker() * (all) (empty map) Manual clear, all models.

3.2. isOpen predicate

function isOpen(modelId, { nowFn = Date.now } = {}): boolean {
  const state = stateMap.get(modelId);
  if (!state || state.openedAt === null) return false;
  return (nowFn() - state.openedAt) < CIRCUIT_COOLDOWN_MS;
}

After resetIfElapsed, state.openedAt is null so isOpen returns false. The chain walk therefore always calls resetIfElapsed first, then isOpen, then the adapter attempt.

3.3. Invariants

  • I-CB-1 (Initialization): Before any call, snapshot() is an empty map.
  • I-CB-2 (Threshold): A model trips iff failures reaches exactly CIRCUIT_FAILURE_THRESHOLD. Two failures + one success + one failure does NOT trip.
  • I-CB-3 (Cooldown anchor): openedAt is set on the trip and not advanced by subsequent failures during the open window.
  • I-CB-4 (Time-bound reset): An elapsed cooldown clears state. A successful call during open does NOT clear openedAt because isOpen blocked the attempt — recordSuccess is never called for a skipped model.
  • I-CB-5 (Per-model): State is keyed by ModelId. Two models share no state.
  • I-CB-6 (Memory-only): No DB write, no file write, no process-shared state. Process exit clears.
  • I-CB-7 (Clock injection): nowFn is the only clock seam. All clock reads route through it; Date.now only used when nowFn is absent.

4. Chain-walk semantics

4.1. Algorithm

async routeRequest(prompt, options = {}):
    1. scoring = options.scoringFn ?? scoreIntent
    2. decision = scoring(prompt, options)
    3. chainOrder = orderedChain(decision.scores)
    4. attempts: FallbackAttempt[] = []
    5. timeoutMs = readTimeoutEnv()
    6. perModelFn = options.completionFnRegistry ?? {}
    7. globalFn = options.completionFn  // legacy / Phase 0 test compat
    8. for modelId of chainOrder:
        a. resetIfElapsed(modelId)
        b. if isOpen(modelId):
             attempts.push({ model: modelId, error: new CircuitOpenError(modelId) })
             continue
        c. adapter = perModelFn[modelId] ?? globalFn ?? defaultAdapterFor(modelId, options.tools)
        d. if adapter is undefined:
             attempts.push({ model: modelId, error: new NoAdapterError(modelId) })
             continue
        e. try:
             upstream = await raceWithTimeout(
                 adapter(prompt, projectUpstreamOptions(options, modelId)),
                 timeoutMs,
             )
             recordSuccess(modelId)
             return freeze({
                 model: modelId,
                 content: upstream.content,
                 finishReason: upstream.stopReason,
                 promptTokens: upstream.promptTokens,
                 completionTokens: upstream.completionTokens,
                 latencyMs: upstream.latencyMs,
             })
           catch err:
             recordFailure(modelId)
             attempts.push({ model: modelId, error: normalize(err) })
    9. throw new FallbackChainExhaustedError(attempts)

4.2. orderedChain(scores)

Sort entries by score descending. Ties broken by ASCII ascending on model_id (the scoring layer already enforces this for the winner; we re-apply the comparator for the full ordering). Returns a ReadonlyArray<ModelId> whose first element matches decision.winner.

function orderedChain(scores: Readonly<Record<ModelId, number>>): readonly ModelId[] {
  return (Object.keys(scores) as ModelId[])
    .sort((a, b) => {
      const da = scores[a] ?? 0;
      const db = scores[b] ?? 0;
      if (da !== db) return db - da; // descending
      return a < b ? -1 : a > b ? 1 : 0; // ASCII asc tie-break
    });
}

4.3. defaultAdapterFor(modelId, tools)

Static adapter registry keyed by ModelId. Built once at module load.

const REGISTRY: Partial<Record<ModelId, CompletionFn>> = {
  'claude':              (p, o) => createCompletion(p, o),
  'claude-sonnet-3-5':   (p, o) => createCompletion(p, o),
  'claude-haiku-3-5':    (p, o) => createCompletion(p, o),
  'kimi-k2':             (p, o) => createKimiCompletion(p, o),
  'gpt-4o':              (p, o) => createOpenAiCompletion(p, o),
  'gpt-4o-mini':         (p, o) => createOpenAiCompletion(p, o),
};

When tools is non-empty AND the resolved modelId is a Claude variant (claude / claude-sonnet-3-5 / claude-haiku-3-5), dispatches to createCompletionWithTools instead of createCompletion. For non-Claude modelIds with tools, the registry returns the plain entry (tools forwarding across non-Claude adapters is P1.5.6+ scope per audit §4); a small logged warning is emitted so an operator knows the tools were dropped — but the call still proceeds.

Returns undefined for ModelIds without a registered adapter. The chain walk records that as a NoAdapterError attempt and moves on.

4.4. projectUpstreamOptions(options, modelId)

Same as Phase 0, with one conditional change: when options.model is absent AND the resolved modelId is the abstract 'claude' (or a Claude variant), no model field is emitted (adapter default applies). For non-Claude adapters, the abstract modelId is emitted as options.model so the adapter has a hint about which provider variant to use. This is conservative — it preserves Phase 0 behavior for the Claude path while letting non-Claude adapters select a sensible model from the abstract ID.

Other fields (maxTokens, systemPrompt, apiKey, fetchFn, logger, delayFn) are forwarded byte-identically to Phase 0.

4.5. raceWithTimeout(promise, timeoutMs)

async function raceWithTimeout<T>(p: Promise<T>, ms: number): Promise<T> {
  let timerId: ReturnType<typeof setTimeout> | undefined;
  const timeoutPromise = new Promise<never>((_, reject) => {
    timerId = setTimeout(() => reject(new RouterTimeoutError(ms)), ms);
  });
  try {
    return await Promise.race([p, timeoutPromise]);
  } finally {
    if (timerId !== undefined) clearTimeout(timerId);
  }
}

setTimeout is inside the Promise.race guard — the only setTimeout in fallback.ts. The dispatch packet forbids any setTimeout outside this guard.

AbortController note: the dispatch packet PR-body bullet calls for an AbortController to cancel the adapter call on timeout, but the current adapters (createCompletion, createKimiCompletion, createCodexCompletion, createOpenAiCompletion) do NOT take a signal option. Wiring AbortController through to the adapters is a W3+ change that would require modifying every adapter — and the dispatch forbiddens prohibit adapter edits. So P1.5.5 ships with the Promise.race cancellation semantic only: the slow adapter promise resolves into a no-op when the race already settled. This satisfies “no leaks at the router boundary” (clearTimeout always called in finally) while leaving the upstream socket lifecycle to the adapter — which already manages it via its own retry / fetch lifecycle. This is a conscious deferral; a follow-up task (P1.5.6 or later) may extend adapters to accept a signal.

4.6. readTimeoutEnv()

function readTimeoutEnv(): number {
  const raw = process.env['COLIBRI_MODEL_TIMEOUT'];
  if (raw === undefined || raw === '') return 30_000;
  const parsed = Number.parseInt(raw, 10);
  if (!Number.isFinite(parsed) || parsed <= 0) return 30_000;
  return parsed;
}

Read at every routeRequest call (env var is per-process, but Jest tests muck with process.env per-test so a fresh read avoids stale values).

4.7. normalize(err)

function normalize(err: unknown): Error {
  return err instanceof Error ? err : new Error(String(err));
}

Same as Phase 0 — preserves AC17.

5. New error classes (internal to chain walk)

5.1. RouterTimeoutError

export class RouterTimeoutError extends Error {
  readonly code = 'ROUTER_TIMEOUT' as const;
  readonly timeoutMs: number;
  constructor(timeoutMs: number) {
    super(`δ router attempt timed out after ${timeoutMs} ms`);
    this.name = 'RouterTimeoutError';
    this.timeoutMs = timeoutMs;
  }
}

Raised when Promise.race settles via the timer branch. Wrapped into FallbackAttempt.error. Treated as a failure (counts toward CB threshold).

5.2. CircuitOpenError

export class CircuitOpenError extends Error {
  readonly code = 'CIRCUIT_OPEN' as const;
  readonly modelId: ModelId;
  constructor(modelId: ModelId) {
    super(`δ circuit open for model='${modelId}'`);
    this.name = 'CircuitOpenError';
    this.modelId = modelId;
  }
}

Raised inline when isOpen(modelId) === true. Appended to attempts so the all-tripped path still produces a non-empty array. Does NOT call recordFailure (the breaker is already tripped).

5.3. NoAdapterError

export class NoAdapterError extends Error {
  readonly code = 'NO_ADAPTER' as const;
  readonly modelId: ModelId;
  constructor(modelId: ModelId) {
    super(`δ no adapter registered for model='${modelId}'`);
    this.name = 'NoAdapterError';
    this.modelId = modelId;
  }
}

Raised inline when the registry returns undefined for a chain member. Appended to attempts; not counted toward CB (the breaker tracks adapter failures, not registry absence).

6. Invariants (top-level, P1.5.5)

ID Invariant Replaces
I1 routeRequest signature byte-identical to Phase 0. Return shape unchanged. (preserved from P0.5.2)
I2 Chain order derived from scoreIntent descending; ASCII tie-break. (new)
I3 Per-attempt timeout: 30s default, configurable via COLIBRI_MODEL_TIMEOUT. (new)
I4 3 consecutive failures on a modelId open a 60s cooldown. replaces P0 I8 (no CB)
I5 Cooldown is time-bound. resetIfElapsed clears state when 60s passed. (new)
I6 A model that fails but is NOT tripped is retried in the next request. (new)
I7 All chain members tripped or failed ⇒ FallbackChainExhaustedError(attempts) with attempts.length === N. replaces P0 I5
I8 ROUTER_PHASE_0_SHAPE.members = N, .hasCircuitBreaker = true, .modelsSupported = readonly [...]. flips P0 I11
I9 getCircuitBreakerState() returns a frozen snapshot for observability. (new)
I10 resetCircuitBreaker(modelId?) clears state for one or all models. (new)
I11 No DB persistence of CB state — in-memory only. (new)
I12 No setTimeout outside Promise.race guard. (new)
I13 No new MCP tool registered (P1.5.7 scope). (preserved from P0 I9)
I14 Tools passthrough preserved for Claude path. (preserved from P0 I13)
I15 Non-Error thrown values normalized via new Error(String(err)). (preserved from P0 AC17)
I16 RouteResult is frozen, with same field shape as Phase 0. (preserved from P0 AC4)
I17 FallbackChainExhaustedError.cause points to last attempt’s error. (preserved from P0)
I18 attempts[i].model matches the attempt order (chain order). (new — Phase 0 had only 1 attempt)
I19 Wave 3 fold-in: src/domains/router/index.ts re-exports ./adapters/{codex,kimi,openai}.js. (new)

7. Acceptance criteria

AC Description Test target
AC1 Happy path: scoring puts claude first, adapter succeeds → RouteResult{model:'claude'}. preserved
AC2 scoreIntent consulted exactly once per routeRequest. preserved
AC3 Cascade: A fails, B succeeds → RouteResult{model:'B'}. Both adapters called. NEW
AC4 Chain exhaustion: every adapter fails → FallbackChainExhaustedError with attempts.length === N. NEW (replaces P0 AC8)
AC5 attempts[i].model reflects walk order. NEW
AC6 CB trip: 3 consecutive failures on model X → 4th call skips X. NEW
AC7 CB time-bound reset: after 60s elapsed (via injected nowFn), model X reattempted. NEW
AC8 All-tripped path: every model open → FallbackChainExhaustedError with attempts[i].error instanceof CircuitOpenError. NEW
AC9 Per-attempt timeout: adapter hangs > 30s → RouterTimeoutError recorded, next chain member tried. NEW
AC10 COLIBRI_MODEL_TIMEOUT env var override → custom timeout applied. NEW
AC11 getCircuitBreakerState() returns frozen snapshot. NEW
AC12 resetCircuitBreaker(modelId) clears single-model state. NEW
AC13 resetCircuitBreaker() (no arg) clears all state. NEW
AC14 ROUTER_PHASE_0_SHAPE.members === N (≥4), .hasCircuitBreaker === true, .modelsSupported.length === N. flipped
AC15 Wave 3 fold-in: index.ts re-exports createKimiCompletion, createCodexCompletion, createOpenAiCompletion (smoke import). NEW
AC16 Tools passthrough preserved for Claude. preserved
AC17 Non-Error thrown values wrapped (preserved AC17). preserved
AC18 RouteResult is frozen (preserved). preserved
AC19 FallbackChainExhaustedError.message mentions count and last attempt error message. preserved/extended

8. Forbiddens

  • No MCP tool registration.
  • No DB persistence of CB state.
  • No setTimeout outside Promise.race.
  • No adapter file edits.
  • No costUsd / modelsAttempted fields appended to RouteResult (P1.5.6 scope).
  • No AMS_* env var reads. COLIBRI_MODEL_TIMEOUT only.
  • No main-checkout edits.

9. Contract close

Behavioral contract complete. Ready to write the execution packet (Step 3).


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.