P1.5 — δ Model Router Graduation — Agent Prompts

Phase 1.5 plan. Do not execute in Phase 0. The prompts below are copy-paste-ready for R91+ executors once Phase 1 κ Rule Engine is shipped and all four ADR-005 §Implementation trigger conditions hold:

Phase 1 κ Rule Engine complete (scoring weights live in κ rule bodies, not hardcoded).

≥2 other model providers with stable tool-use APIs.

≥100 sustained calls/day to justify adapter work.

Human (T0) authorization for the round.

Target round: R91–R100 per docs/5-time/roadmap.md §Phase 1.5. Canonical spec: docs/3-world/social/llm.md.

Phase 0 interface is frozen. Every sub-task below replaces bodies, widens ModelId, or appends modules. No sub-task changes the exported signature of scoreIntent, routeRequest, FallbackChainExhaustedError, RouteOptions, RouteResult, IntentScore, or ScoreContext. The ROUTER_PHASE_0_SHAPE marker is flipped in P1.5.5 as the explicit Phase 0→1.5 transition signal.

Namespace: COLIBRI_* only. Donor AMS_* references from the heritage P0.5 prompt file (p0.5-delta-router.md) are NOT carried forward into Phase 1.5.

Group summary

Task ID	Title	Depends on	Effort	Unblocks
P1.5.1	Real 7-dim intent scoring	P1.5.9; Phase 1 κ complete	M	P1.5.2–P1.5.4
P1.5.2	Adapter: Kimi K2	P1.5.1	M	P1.5.5, P1.5.8
P1.5.3	Adapter: Codex	P1.5.1	M	P1.5.5, P1.5.8
P1.5.4	Adapter: OpenAI (GPT-4o family)	P1.5.1	M	P1.5.5, P1.5.8
P1.5.5	N-member fallback + circuit breaker	P1.5.2, P1.5.3, P1.5.4	L	P1.5.6, P1.5.7
P1.5.6	Cost accounting	P1.5.5	M	P1.5.7
P1.5.7	`router_*` MCP tools (4 tools)	P1.5.6	M	P1.5.8, P1.5.10
P1.5.8	Cross-model parity test suite	P1.5.7	L	production activation
P1.5.9	Model candidates table population	Phase 0 schema present	S	P1.5.1
P1.5.10	ζ decision-trail integration	P1.5.7	M	production activation

Order of ship: P1.5.9 → P1.5.1 → (P1.5.2 ‖ P1.5.3 ‖ P1.5.4) → P1.5.5 → P1.5.6 → P1.5.7 → (P1.5.8 ‖ P1.5.10).

P1.5.1 — Real 7-dimension Intent Scoring

Spec source: docs/3-world/social/llm.md §Target shape → Scoring formula ADR anchor: ADR-005 §Implementation step 1 Worktree: feature/p1-5-1-scoring Branch command: git worktree add .worktrees/claude/p1-5-1-scoring -b feature/p1-5-1-scoring origin/main Estimated effort: M (3–5 hours) Depends on: P1.5.9 (candidate table seeded); Phase 1 κ Rule Engine complete (weights live in κ rule bodies) Unblocks: P1.5.2, P1.5.3, P1.5.4 (adapters can be wired when scoring selects them)

Files to modify

src/domains/router/scoring.ts — replace the Phase 0 constant body with the 7-dimension formula. Keep all exports; widen ModelId.
src/__tests__/domains/router/scoring.test.ts — drop the “winner is always ‘claude’” assertions; add the 7-dimension golden-path test vector.

Files to create

src/domains/router/scoring-weights.ts — κ-facing weight lookup (read-only shim over the κ rule engine’s model.scoring.weights.* rules; until Phase 1 κ is shipped this file errors at import with a pointer to the trigger condition).

Acceptance criteria

Pre-flight reading

CLAUDE.md — execution rules.
docs/3-world/social/llm.md §Target shape.
ADR-005 §Implementation + §Phase 1.5 upgrade path.
Current Phase 0 stub: src/domains/router/scoring.ts (the header invariants I1–I7 become the Phase 1.5 test obligations for forward-compat.)

Ready-to-paste agent prompt

You are a Phase 1.5 builder agent for Colibri (R91+).

TASK: P1.5.1 — Real 7-dimension Intent Scoring
Replace the Phase 0 constant-returns-claude scorer with the 7-dimension
weighted formula from the δ concept doc. Preserve the Phase 0 export
signatures verbatim; widen ModelId additively.

PRE-FLIGHT READING:
1. CLAUDE.md (execution rules)
2. docs/3-world/social/llm.md §Target shape (7-dim table + tie-break)
3. docs/architecture/decisions/ADR-005-multi-model-defer.md §Implementation
4. src/domains/router/scoring.ts (Phase 0 header invariants I1–I7 become
   forward-compat obligations — read them now)

WORKTREE SETUP:
git fetch origin
git worktree add .worktrees/claude/p1-5-1-scoring -b feature/p1-5-1-scoring origin/main
cd .worktrees/claude/p1-5-1-scoring

FILES TO MODIFY:
- src/domains/router/scoring.ts
  * Keep ALL exports with same names and signatures.
  * Widen ModelId = 'claude' | 'kimi' | 'codex' | 'openai' | 'gemini' | 'llama' | 'mixtral' (at minimum).
  * Replace PHASE_0_CLAUDE_WINNER constant with a per-call scoring body.
  * Scoring body:
    1. Look up weights via scoring-weights.ts (κ rule query).
    2. Enumerate enabled candidates from mcp_model_candidates table.
    3. For each candidate m compute:
         domain = (task.domain ∈ m.profile) ? 1 : 0
         window = min(1, m.context_window_tokens / estimated_prompt_tokens)
         cost   = 1 − (m.cost_bps_per_kilotoken / max_cost_bps) clamped [0,1]
         latency= 1 − (m.p50_latency_ms / task.deadline_ms) clamped [0,1]
         reliab = successRateLast100(m.model_id)
         skill  = |m.strengths ∩ task.skill_req| / |task.skill_req|
         pref   = context.operator_preference[m.model_id] ?? 0.5
         score  = Σ (weight_i_bps × input_i_bps) / 1e8   (int64 math)
    4. Tie-break: higher reliability, lower cost, alphabetical model_id.
    5. Return frozen { scores, winner }.
  * Keep Object.freeze on both levels (I6 obligation from Phase 0 header).
  * Remain pure w.r.t. randomness and wall clock (I4/I5 obligations).

FILES TO CREATE:
- src/domains/router/scoring-weights.ts
  * Thin read-only shim over κ rule lookup.
  * Exports: loadScoringWeights(ruleVersionHash): ScoringWeightSet
  * ScoringWeightSet keys match the 7 dimensions; values are bps (0–10000).
  * If κ is unreachable, throw KappaRulesUnavailableError with a pointer to
    the trigger condition in ADR-005 §Implementation.

- src/__tests__/domains/router/scoring.test.ts (REWRITE)
  * Drop: "winner is always 'claude'" assertions.
  * Keep: determinism, output-freezing, signature-stability property tests.
  * Add: golden-path test vector reproducing the concept doc worked example:
    Task {domain: 'code_review', tokens: 12_000, deadline_ms: 5_000,
          skill: ['code_review']}
    Expected:
      Claude Sonnet ≈ 0.87 (winner)
      GPT-4o       ≈ 0.79
      Claude Haiku ≈ 0.58
  * Add: tie-break test — two candidates with equal score, verify ordering.
  * Add: property test — same (prompt, context, rule_version, snapshot) → same winner across 100 invocations.

ACCEPTANCE CRITERIA (headline):
✓ All Phase 0 exports keep their signatures (ModelId widens additively)
✓ 7 dimensions implemented with weights from κ (not hardcoded)
✓ Tie-break order matches concept doc (reliability, cost, alphabetical)
✓ Deterministic, pure, frozen output
✓ Golden-path vector reproduces the concept doc example

SUCCESS CHECK:
cd .worktrees/claude/p1-5-1-scoring && npm run build && npm run lint && npm test

WRITEBACK (after success):
task_update(task_id="P1.5.1", status="done", progress=100)
thought_record(task_id="P1.5.1", branch="feature/p1-5-1-scoring",
  commit_sha=<sha>, tests_run=["npm run build","npm run lint","npm test"],
  summary="Replaced Phase 0 constant scorer with 7-dim formula per δ concept doc. Golden-path vector matches. ModelId widened additively. κ rule-driven weights via scoring-weights.ts.")

FORBIDDENS:
✗ No change to exported type names or signatures
✗ No hardcoded weights (must come from κ)
✗ No randomness, no wall-clock reads, no I/O outside κ lookup
✗ No adapter imports in scoring.ts (scoring is pure — adapters come in P1.5.2–4)
✗ Do not edit main checkout

Verification checklist (for reviewer agent)

scoreIntent signature byte-identical to Phase 0 (tsc --noEmit against old callers works).
ModelId widened; no member removed.
7 dimensions implemented exactly (names + formulas match concept doc).
Golden-path vector passes.
Tie-break order correct.
Determinism test passes.
No randomness / no time reads / no I/O beyond κ lookup.
npm run build && npm run lint && npm test green.

Writeback template

task_update:
  task_id: P1.5.1
  status: done
  progress: 100

thought_record:
  task_id: P1.5.1
  branch: feature/p1-5-1-scoring
  commit_sha: <sha>
  tests_run: ["npm run build", "npm run lint", "npm test"]
  summary: "Real 7-dimension intent scoring landed. Replaced PHASE_0_CLAUDE_WINNER constant with weighted score (task_domain_match 0.20, context_window_fit 0.15, cost_efficiency 0.15, latency_fit 0.15, reliability 0.15, skill_match 0.15, operator_preference 0.05). Weights sourced from κ via scoring-weights.ts. Tie-break: reliability, cost, alphabetical. Golden-path vector matches concept doc worked example (Sonnet 0.87 winner, GPT-4o 0.79, Haiku 0.58)."
  blockers: []

Common gotchas

ModelId widening is additive, not replacement. Old callers that exhaustively switch on winner still get type-safety for the new members. Never remove 'claude'.
Weights from κ, not from a const. If you find yourself writing const WEIGHTS = { task_domain_match: 0.20, ... } you’ve skipped the κ path. That violates the trigger condition and makes Phase 1.5 ungovernable by π.
Integer math for determinism. Floating-point summation in different orders gives different last-bit results. Encode weights as bps (0–10000), accumulate as int64, divide once at the end.
Reliability pull-through is deterministic. successRateLast100() must read a frozen snapshot for the scoring call — taking a live DB read mid-score lets two concurrent scores produce different winners, violating I5.
Tie-break is load-bearing for θ consensus. Two arbiters with the same inputs MUST arrive at the same model pick. Never swap the tie-break order.

P1.5.2 — Adapter: Kimi K2

Spec source: docs/3-world/social/llm.md §Phase 1.5 candidate cohort ADR anchor: ADR-005 §Implementation step 2 Worktree: feature/p1-5-2-kimi-adapter Branch command: git worktree add .worktrees/claude/p1-5-2-kimi-adapter -b feature/p1-5-2-kimi-adapter origin/main Estimated effort: M (3–5 hours) Depends on: P1.5.1 (scoring can select kimi); src/domains/integrations/claude.ts (reference adapter shape) Unblocks: P1.5.5 (chain includes Kimi), P1.5.8 (parity test)

Files to create

src/domains/router/adapters/kimi.ts — Kimi K2 wrapper implementing CompletionFn from src/domains/router/fallback.ts.
src/__tests__/domains/router/adapters/kimi.test.ts — 5–10 parity tests (shape match vs Claude adapter).

Files to modify

src/domains/router/index.ts — add export * from './adapters/kimi.js';.

Acceptance criteria

Pre-flight reading

CLAUDE.md — execution rules.
src/domains/integrations/claude.ts — reference adapter; Phase 1.5 adapters mirror its surface exactly.
docs/3-world/social/llm.md §Phase 1.5 candidate cohort (Kimi K2 row: window 200k, balanced latency, medium cost, CN/EN parity).
Kimi K2 API docs (current at adapter-time; do not hard-code a version URL in source).

Ready-to-paste agent prompt

You are a Phase 1.5 builder agent for Colibri (R91+).

TASK: P1.5.2 — Adapter: Kimi K2
Ship a Kimi K2 completion wrapper with the same surface as the Phase 0
Claude adapter, so the δ router fallback chain can swap between them
without special-casing.

PRE-FLIGHT READING:
1. CLAUDE.md
2. src/domains/integrations/claude.ts (the adapter shape to mirror)
3. src/domains/router/fallback.ts (shows how CompletionFn is consumed)
4. docs/3-world/social/llm.md §Phase 1.5 candidate cohort (Kimi row)
5. Kimi K2 API documentation (latest)

WORKTREE SETUP:
git fetch origin
git worktree add .worktrees/claude/p1-5-2-kimi-adapter -b feature/p1-5-2-kimi-adapter origin/main
cd .worktrees/claude/p1-5-2-kimi-adapter

FILES TO CREATE:
- src/domains/router/adapters/kimi.ts
  * Exports:
    - createKimiCompletion(prompt, options?): Promise<CompletionResult>
    - createKimiCompletionWithTools(prompt, tools, options?): Promise<CompletionResult>
    - class KimiApiError extends Error
    - class KimiConfigError extends Error
  * Env:
    - COLIBRI_KIMI_API_KEY — required; validation at call time (matches the
      Design Invariant 5 pattern from src/config.ts).
    - COLIBRI_KIMI_BASE_URL — optional; defaults to the documented Kimi endpoint.
  * Injection seams: options may provide { fetchFn, logger, delayFn } matching
    the Claude adapter.
  * Response mapping:
    - Kimi content → CompletionResult.content
    - Kimi finish_reason → CompletionResult.stopReason
    - Kimi usage.prompt_tokens → CompletionResult.promptTokens
    - Kimi usage.completion_tokens → CompletionResult.completionTokens
    - Measure wall-clock latency locally → CompletionResult.latencyMs
  * Tool-use mapping: translate Kimi's tool_calls shape to the
    AnthropicTool-shaped array the router expects. Unknown tool names → skip +
    log via options.logger, never throw.

- src/__tests__/domains/router/adapters/kimi.test.ts
  * 5–10 parity tests:
    1. Happy path: mock Kimi returns "hello" → CompletionResult.content === "hello"
    2. Token accounting: promptTokens + completionTokens mirror Claude adapter test
    3. Error: 401 from Kimi → KimiApiError thrown with status 401
    4. Config: missing COLIBRI_KIMI_API_KEY → KimiConfigError
    5. Tool call: mock tool_calls → response shape matches Claude adapter's
    6. Injection seam: fetchFn override reaches the adapter
    7. Latency measurement: mock 50ms delay → latencyMs >= 50
    (remaining 3 parity checks at author's discretion)

FILES TO MODIFY:
- src/domains/router/index.ts
  * Add: export * from './adapters/kimi.js';

ACCEPTANCE CRITERIA (headline):
✓ Same return shape as Claude adapter
✓ Kimi env namespace is COLIBRI_KIMI_*, not AMS_*
✓ Tool-use mapping preserves Anthropic-shape expected by router
✓ 5–10 parity tests pass
✓ No MCP tool registered

SUCCESS CHECK:
cd .worktrees/claude/p1-5-2-kimi-adapter && npm run build && npm run lint && npm test

WRITEBACK (after success):
task_update(task_id="P1.5.2", status="done", progress=100)
thought_record(task_id="P1.5.2", branch="feature/p1-5-2-kimi-adapter",
  commit_sha=<sha>, tests_run=["npm run build","npm run lint","npm test"],
  summary="Kimi K2 adapter ships with surface parity to the Phase 0 Claude adapter. COLIBRI_KIMI_API_KEY + COLIBRI_KIMI_BASE_URL env. Tool-call response maps to AnthropicTool shape. 7 parity tests pass.")

FORBIDDENS:
✗ No AMS_* env vars. Ever.
✗ No MCP tool registration (Phase 1.5 tool surface lands in P1.5.7)
✗ Do not leak Kimi-specific error shapes into CompletionResult callers
✗ Do not edit main checkout

Verification checklist (for reviewer agent)

CompletionResult return shape identical to Claude adapter.
Env var namespace is COLIBRI_KIMI_*.
Tool-use response shape matches Anthropic-SDK tool shape.
5–10 parity tests present and passing.
Call-time env validation (not module-load-time).
No MCP tool registered.
npm run build && npm run lint && npm test green.

Writeback template

task_update:
  task_id: P1.5.2
  status: done
  progress: 100

thought_record:
  task_id: P1.5.2
  branch: feature/p1-5-2-kimi-adapter
  commit_sha: <sha>
  tests_run: ["npm run build", "npm run lint", "npm test"]
  summary: "Kimi K2 adapter ships with surface parity to the Phase 0 Claude adapter. Env: COLIBRI_KIMI_API_KEY + COLIBRI_KIMI_BASE_URL. Tool-use response maps into AnthropicTool-shape array expected by the router. 5–10 parity tests green."
  blockers: []

Common gotchas

Parity is structural, not bit-identical. Different providers format content differently; what must match is the CompletionResult field set and their types. Content diff is expected.
Env validation is call-time, not import-time. Module import must not throw on a missing key; createKimiCompletion throws when called without one. Matches the ANTHROPIC_API_KEY pattern (CLAUDE.md §T0 decision 3).
Tool-use mapping is the hard part. Kimi’s tool response shape differs from Claude’s. Your adapter translates; the router stays provider-agnostic.
Don’t hardcode a model version string. Kimi K2’s version evolves; take it from options.model with a documented default from the candidate table (P1.5.9), not from a const KIMI_VERSION = ... in the adapter.

P1.5.3 — Adapter: Codex

Spec source: docs/3-world/social/llm.md §Target shape (spec-only — Codex appears in the cohort under “others” family) ADR anchor: ADR-005 §Implementation step 2 Worktree: feature/p1-5-3-codex-adapter Branch command: git worktree add .worktrees/claude/p1-5-3-codex-adapter -b feature/p1-5-3-codex-adapter origin/main Estimated effort: M (3–5 hours) Depends on: P1.5.1 (scoring can select codex) Unblocks: P1.5.5, P1.5.8

Files to create

src/domains/router/adapters/codex.ts — Codex wrapper implementing CompletionFn.
src/__tests__/domains/router/adapters/codex.test.ts — parity tests.

Files to modify

src/domains/router/index.ts — add Codex re-export.

Acceptance criteria

Pre-flight reading

CLAUDE.md.
src/domains/router/adapters/kimi.ts (recently landed — the second adapter inherits its structure).
src/domains/integrations/claude.ts.
Codex API docs (current).

Ready-to-paste agent prompt

You are a Phase 1.5 builder agent for Colibri (R91+).

TASK: P1.5.3 — Adapter: Codex
Ship a Codex completion wrapper with the same surface as the Kimi and
Claude adapters. The structural template is the Kimi adapter (P1.5.2) —
fork it and re-target the provider.

PRE-FLIGHT READING:
1. CLAUDE.md
2. src/domains/router/adapters/kimi.ts (structural template)
3. src/domains/integrations/claude.ts (original surface)
4. Codex API documentation (current)

WORKTREE SETUP:
git fetch origin
git worktree add .worktrees/claude/p1-5-3-codex-adapter -b feature/p1-5-3-codex-adapter origin/main
cd .worktrees/claude/p1-5-3-codex-adapter

FILES TO CREATE:
- src/domains/router/adapters/codex.ts
  * Exports:
    - createCodexCompletion(prompt, options?)
    - createCodexCompletionWithTools(prompt, tools, options?)
    - class CodexApiError extends Error
    - class CodexConfigError extends Error
  * Env: COLIBRI_CODEX_API_KEY, COLIBRI_CODEX_BASE_URL (optional).
  * Translate Codex's tool_calls response into AnthropicTool shape.
  * Injection seams: fetchFn / logger / delayFn.

- src/__tests__/domains/router/adapters/codex.test.ts
  * 5–10 parity tests (same cases as Kimi adapter tests, re-aimed at Codex).

FILES TO MODIFY:
- src/domains/router/index.ts
  * Add: export * from './adapters/codex.js';

ACCEPTANCE CRITERIA:
✓ Same return shape as Kimi / Claude adapters
✓ COLIBRI_CODEX_* env namespace
✓ Tool-use mapping to AnthropicTool shape
✓ 5–10 parity tests pass
✓ No MCP tool registered

SUCCESS CHECK:
cd .worktrees/claude/p1-5-3-codex-adapter && npm run build && npm run lint && npm test

WRITEBACK:
task_update(task_id="P1.5.3", status="done", progress=100)
thought_record(task_id="P1.5.3", ...
  summary="Codex adapter ships with surface parity to Kimi + Claude. COLIBRI_CODEX_* env. Tool-call mapping preserves router's provider-agnostic contract.")

FORBIDDENS:
✗ No AMS_* env vars
✗ No MCP tool registration
✗ Do not duplicate adapter logic — share with Kimi only via types, not imports
✗ Do not edit main checkout

Verification checklist

CompletionResult shape parity with Kimi + Claude adapters.
COLIBRI_CODEX_* env namespace.
Parity tests present.
No MCP tool registered.
Gates green.

Writeback template

task_update:
  task_id: P1.5.3
  status: done
  progress: 100

thought_record:
  task_id: P1.5.3
  branch: feature/p1-5-3-codex-adapter
  commit_sha: <sha>
  tests_run: ["npm run build", "npm run lint", "npm test"]
  summary: "Codex adapter ships with surface parity to Kimi + Claude adapters. Env: COLIBRI_CODEX_API_KEY + COLIBRI_CODEX_BASE_URL. Tool-use response translated into AnthropicTool shape. 5–10 parity tests green."
  blockers: []

Common gotchas

Template-copy, not framework-extract. Two adapters do not justify a shared base class; three don’t either. Copy, change the provider bits, ship.
Finish-reason vocabulary. Codex’s finish reasons differ from Anthropic’s; normalise to the CompletionResult.stopReason string set the router expects.
Call-time env validation, not import-time. Same rule as Kimi.

P1.5.4 — Adapter: OpenAI (GPT-4o family)

Spec source: docs/3-world/social/llm.md §Phase 1.5 candidate cohort (GPT-4o + GPT-4o mini rows) ADR anchor: ADR-005 §Implementation step 2 Worktree: feature/p1-5-4-openai-adapter Branch command: git worktree add .worktrees/claude/p1-5-4-openai-adapter -b feature/p1-5-4-openai-adapter origin/main Estimated effort: M (3–5 hours) Depends on: P1.5.1 Unblocks: P1.5.5, P1.5.8

Files to create

src/domains/router/adapters/openai.ts.
src/__tests__/domains/router/adapters/openai.test.ts.

Files to modify

src/domains/router/index.ts.

Acceptance criteria

Pre-flight reading

CLAUDE.md.
src/domains/router/adapters/kimi.ts + codex.ts (structural templates).
src/domains/integrations/claude.ts.
OpenAI API docs (current — chat completions + tool calling).

Ready-to-paste agent prompt

You are a Phase 1.5 builder agent for Colibri (R91+).

TASK: P1.5.4 — Adapter: OpenAI (GPT-4o family)
Ship an OpenAI adapter covering GPT-4o and GPT-4o mini under a single
module. Model id is selected via options.model with a default from the
candidate table.

PRE-FLIGHT READING:
1. CLAUDE.md
2. src/domains/router/adapters/kimi.ts (structural template)
3. src/domains/router/adapters/codex.ts (second template)
4. OpenAI chat completion + tool calling docs (current)

WORKTREE SETUP:
git fetch origin
git worktree add .worktrees/claude/p1-5-4-openai-adapter -b feature/p1-5-4-openai-adapter origin/main
cd .worktrees/claude/p1-5-4-openai-adapter

FILES TO CREATE:
- src/domains/router/adapters/openai.ts
  * Exports: createOpenAiCompletion, createOpenAiCompletionWithTools,
    OpenAiApiError, OpenAiConfigError.
  * Env: COLIBRI_OPENAI_API_KEY + COLIBRI_OPENAI_BASE_URL.
  * Handles gpt-4o and gpt-4o-mini under one adapter; options.model selects.
  * Maps OpenAI's tool_calls to AnthropicTool shape.

- src/__tests__/domains/router/adapters/openai.test.ts
  * 5–10 parity tests, including one "options.model switches between 4o and 4o-mini".

FILES TO MODIFY:
- src/domains/router/index.ts
  * Add: export * from './adapters/openai.js';

ACCEPTANCE CRITERIA:
✓ Surface parity with Kimi / Codex / Claude adapters
✓ GPT-4o + GPT-4o mini both reachable through options.model
✓ COLIBRI_OPENAI_* env namespace
✓ Tool-use mapping to AnthropicTool shape
✓ 5–10 parity tests

SUCCESS CHECK:
cd .worktrees/claude/p1-5-4-openai-adapter && npm run build && npm run lint && npm test

WRITEBACK:
task_update(task_id="P1.5.4", status="done", progress=100)
thought_record(task_id="P1.5.4", ...
  summary="OpenAI adapter ships with GPT-4o + GPT-4o mini under one module. COLIBRI_OPENAI_* env. Tool-call response translates into AnthropicTool shape.")

FORBIDDENS:
✗ No AMS_* env vars
✗ No MCP tool registration
✗ Do not split into two files per GPT model; one adapter handles both
✗ Do not edit main checkout

Verification checklist

Shape parity.
Both GPT-4o and GPT-4o mini reachable via options.model.
COLIBRI_OPENAI_* env namespace.
Tool-use mapping correct.
Gates green.

Writeback template

task_update:
  task_id: P1.5.4
  status: done
  progress: 100

thought_record:
  task_id: P1.5.4
  branch: feature/p1-5-4-openai-adapter
  commit_sha: <sha>
  tests_run: ["npm run build", "npm run lint", "npm test"]
  summary: "OpenAI adapter ships with GPT-4o + GPT-4o mini under a single module. Env: COLIBRI_OPENAI_API_KEY + COLIBRI_OPENAI_BASE_URL. tool_calls response mapped into AnthropicTool shape. options.model selects the specific GPT-4o variant. 5–10 parity tests green."
  blockers: []

Common gotchas

function_call vs tool_calls. OpenAI has both legacy and new tool-calling shapes; your adapter accepts both on the response side and emits the new form upstream.
Separate base URL. Azure OpenAI and OpenAI proper diverge; the base URL env var lets operators point at their Azure deployment without forking the adapter.
Streaming out of scope. Phase 1.5 ships non-streaming first; streaming is a later round.

P1.5.5 — N-member Fallback Chain + Circuit Breaker

Spec source: docs/3-world/social/llm.md §Fallback chain + P0.5.2 heritage prompt (circuit-breaker semantics) ADR anchor: ADR-005 §Implementation step 3 Worktree: feature/p1-5-5-fallback-cb Branch command: git worktree add .worktrees/claude/p1-5-5-fallback-cb -b feature/p1-5-5-fallback-cb origin/main Estimated effort: L (6–10 hours) Depends on: P1.5.2, P1.5.3, P1.5.4 (chain has members to try) Unblocks: P1.5.6, P1.5.7

Files to modify

src/domains/router/fallback.ts — replace single-call body with N-member cascade + circuit-breaker gating. Flip ROUTER_PHASE_0_SHAPE literals (members: N, hasCircuitBreaker: true, modelsSupported: readonly [...] widened).
src/__tests__/domains/router/fallback.test.ts — drop ROUTER_PHASE_0_SHAPE.members === 1 assertions; add cascade + circuit-breaker coverage.

Files to create

src/domains/router/circuit.ts — in-memory circuit-breaker state.
src/__tests__/domains/router/circuit.test.ts — trip + reset tests (use fake timers).

Acceptance criteria

Pre-flight reading

CLAUDE.md.
Current Phase 0 src/domains/router/fallback.ts — the signatures that must not change.
Heritage docs/guides/implementation/task-prompts/p0.5-delta-router.md §P0.5.2 (circuit-breaker semantics come from here; namespace must flip to COLIBRI_*).
docs/3-world/social/llm.md §Fallback chain.

Ready-to-paste agent prompt

You are a Phase 1.5 builder agent for Colibri (R91+).

TASK: P1.5.5 — N-member Fallback Chain + Circuit Breaker
Replace the Phase 0 single-call body of routeRequest with an N-member
cascade driven by scoreIntent, gated by an in-memory circuit breaker.
This is the core graduation from the Phase 0 stub to a production router.

PRE-FLIGHT READING:
1. CLAUDE.md
2. src/domains/router/fallback.ts (signatures to preserve; ROUTER_PHASE_0_SHAPE to flip)
3. src/domains/router/scoring.ts (after P1.5.1 — provides the chain order)
4. src/domains/router/adapters/{claude,kimi,codex,openai}.ts (adapters to dispatch to)
5. docs/3-world/social/llm.md §Fallback chain
6. docs/guides/implementation/task-prompts/p0.5-delta-router.md §P0.5.2 (CB semantics — swap AMS_* → COLIBRI_*)

WORKTREE SETUP:
git fetch origin
git worktree add .worktrees/claude/p1-5-5-fallback-cb -b feature/p1-5-5-fallback-cb origin/main
cd .worktrees/claude/p1-5-5-fallback-cb

FILES TO CREATE:
- src/domains/router/circuit.ts
  * In-memory Map<ModelId, { failures: number, openedAt: number | null }>.
  * Exports:
    - recordFailure(modelId): bumps failures; at 3 sets openedAt = now
    - recordSuccess(modelId): clears failures (note: recordSuccess may be called
      before reset window elapses; the **open-state cooldown is time-bound**,
      but failure-count reset on success is still valid — success under an
      open breaker does not bypass the cooldown window)
    - isOpen(modelId, now = Date.now()): true if openedAt != null AND (now - openedAt) < 60_000
    - resetIfElapsed(modelId, now): clears state if the 60s window has passed
    - snapshot(): readonly state for observability
    - resetCircuitBreaker(modelId?): manual reset
  * Constants:
    - CIRCUIT_FAILURE_THRESHOLD = 3
    - CIRCUIT_COOLDOWN_MS = 60_000
  * Clock is injectable (nowFn in options) so fake timers work in tests.

- src/__tests__/domains/router/circuit.test.ts
  * 3 consecutive failures → isOpen true
  * 2 failures + 1 success → isOpen false; counter reset
  * isOpen persists for 60s (fake timer)
  * After 60s: first check after elapsed clears state
  * resetCircuitBreaker(modelId) manual clears
  * snapshot() returns frozen view

FILES TO MODIFY:
- src/domains/router/fallback.ts
  * KEEP exports: routeRequest, FallbackChainExhaustedError, RouteOptions,
    RouteResult, FallbackAttempt, CompletionFn, CompletionFnOptions, ScoringFn.
  * FLIP ROUTER_PHASE_0_SHAPE literals:
      members: <N> (number of candidates currently enabled)
      hasCircuitBreaker: true
      modelsSupported: readonly ['claude', 'kimi', 'codex', 'openai', 'gemini', 'llama', 'mixtral'] (or current set)
  * Replace body of routeRequest:
    1. scoring = options.scoringFn ?? scoreIntent
    2. decision = scoring(prompt, options)
    3. chainOrder = keys of decision.scores sorted by score desc
    4. attempts: FallbackAttempt[] = []
    5. for each modelId in chainOrder:
       - if isOpen(modelId) → skip (log via options.logger)
       - resolve adapter via adapter registry (see below)
       - try adapter(prompt, projectUpstreamOptions(options, modelId)) with 30s timeout
       - on success: recordSuccess(modelId); return RouteResult{model: modelId, ...}
       - on error/timeout: recordFailure(modelId); attempts.push({model, error}); continue
    6. if attempts.length > 0 && no success → throw FallbackChainExhaustedError(attempts)
    7. if every candidate was open (no attempts made) → throw FallbackChainExhaustedError with
       a synthetic "all-open" attempts array (one entry per skipped candidate).
  * Adapter registry is a static Record<ModelId, CompletionFn> wired in this file
    from the imports (createCompletion / createKimiCompletion / createCodexCompletion /
    createOpenAiCompletion / ...). options.completionFn override still works for tests.
  * Per-attempt timeout: Promise.race(adapter, timeoutPromise(COLIBRI_MODEL_TIMEOUT default 30_000)).

- src/__tests__/domains/router/fallback.test.ts
  * REMOVE: "ROUTER_PHASE_0_SHAPE.members === 1" assertion.
  * REMOVE: "winner always 'claude'" assertions (scoring test already covers).
  * ADD: cascade — model A fails, model B succeeds → RouteResult.model === B
  * ADD: chain exhaustion — all fail → FallbackChainExhaustedError with N attempts
  * ADD: circuit open skip — model A tripped → not tried even if first in chainOrder
  * ADD: timeout — adapter hangs > 30s → treated as failure; next model tried
  * ADD: ROUTER_PHASE_0_SHAPE new-literal assertion (members === N, hasCircuitBreaker true)

ACCEPTANCE CRITERIA (headline):
✓ routeRequest signature + return-shape unchanged (cost + modelsAttempted fields come in P1.5.6)
✓ Chain order from scoreIntent (descending)
✓ 30s per-attempt timeout
✓ CB: 3 fails → 60s open; time-bound reset
✓ All-fail → FallbackChainExhaustedError with N attempts
✓ ROUTER_PHASE_0_SHAPE literals flipped

SUCCESS CHECK:
cd .worktrees/claude/p1-5-5-fallback-cb && npm run build && npm run lint && npm test

WRITEBACK:
task_update(task_id="P1.5.5", status="done", progress=100)
thought_record(task_id="P1.5.5", ...
  summary="N-member fallback with CB shipped. scoreIntent drives chain order. 3-fail / 60s CB. 30s COLIBRI_MODEL_TIMEOUT per attempt. ROUTER_PHASE_0_SHAPE flipped to signal Phase 1.5 boundary.")

FORBIDDENS:
✗ No AMS_* env vars. COLIBRI_MODEL_TIMEOUT only.
✗ No setTimeout outside the Promise.race guard
✗ No MCP tool registration (tools ship in P1.5.7)
✗ Do not change exported signatures
✗ Do not persist CB state to DB — in-memory only this round
✗ Do not edit main checkout

Verification checklist

Signatures of routeRequest, FallbackChainExhaustedError, RouteOptions, RouteResult unchanged.
ROUTER_PHASE_0_SHAPE literals flipped to Phase 1.5 values.
Chain order matches scoreIntent descending.
CB semantics correct: 3-fail / 60s / time-bound reset / per-model.
Per-attempt 30s timeout via COLIBRI_MODEL_TIMEOUT.
All-tripped path still throws FallbackChainExhaustedError.
snapshot() + resetCircuitBreaker() exports present (for P1.5.7).
Gates green.

Writeback template

task_update:
  task_id: P1.5.5
  status: done
  progress: 100

thought_record:
  task_id: P1.5.5
  branch: feature/p1-5-5-fallback-cb
  commit_sha: <sha>
  tests_run: ["npm run build", "npm run lint", "npm test"]
  summary: "N-member fallback + circuit breaker shipped. Chain order driven by scoreIntent. 3-consecutive-failure trip opens a 60s window (time-bound reset, per-model). Per-attempt 30s timeout via COLIBRI_MODEL_TIMEOUT. ROUTER_PHASE_0_SHAPE literals flipped to signal the Phase 0→1.5 boundary intentionally."
  blockers: []

Common gotchas

Reset is time-bound, not success-bound. Reading “failure count resets on success” in an older CB reference is wrong for Colibri’s spec. The 60s cooldown holds the model out of the chain regardless of what the other candidates are doing. Only after the 60s elapses can the model be tried again.
COLIBRI_MODEL_TIMEOUT, not AMS_MODEL_TIMEOUT. Donor namespace is forbidden.
Promise.race leaks if adapter resolves after timeout. Wrap the adapter in an AbortController and cancel on timeout; without it you’ll leak open sockets in long-running processes.
ROUTER_PHASE_0_SHAPE flip is the trip-wire. If you find yourself leaving members: 1 “to keep old tests passing”, you’ve missed the whole point of the marker. Flip it and rewrite the assertion.

P1.5.6 — Cost Accounting

Spec source: docs/3-world/social/llm.md §Candidate table (cost_bps_per_kilotoken) ADR anchor: ADR-005 §Implementation step 4 Worktree: feature/p1-5-6-cost Branch command: git worktree add .worktrees/claude/p1-5-6-cost -b feature/p1-5-6-cost origin/main Estimated effort: M (3–5 hours) Depends on: P1.5.5 Unblocks: P1.5.7

Files to create

src/domains/router/cost.ts — per-call token → USD translator + in-memory aggregates (calls_total, avg_cost_usd, p50_latency_ms, success_rate per ModelId).
src/__tests__/domains/router/cost.test.ts — token-to-USD math + aggregate rollup tests.

Files to modify

src/domains/router/fallback.ts — append costUsd: number and modelsAttempted: ReadonlyArray<ModelId> fields to RouteResult; emit aggregate updates per successful and failed call.
src/domains/router/index.ts — re-export cost module.

Acceptance criteria

computeCostUsd(modelId, promptTokens, completionTokens) → number reads cost_bps_per_kilotoken from the candidate table, computes USD (2 decimals).
Per-model aggregates tracked in-memory: calls_total, successes, failures, avg_cost_usd, p50_latency_ms, success_rate.
RouteResult gains costUsd: number and modelsAttempted: ReadonlyArray<ModelId> (append-only; Phase 0 callers continue to work).
p50 computed over a bounded ring buffer (last 1000 latencies per model; fixed memory).
getRouterStats() → { models: Record<ModelId, RouterStats> } export for P1.5.7’s router_stats tool.
resetRouterStats(modelId?) export for test harness + operator use.
Monetary math uses integer cents (bps derived) throughout; final conversion to USD is a single divide at the edge.
npm run build && npm run lint && npm test green.

Pre-flight reading

CLAUDE.md.
src/domains/router/fallback.ts (the RouteResult shape to append to).
docs/3-world/social/llm.md §Candidate table.
src/db/schema.sql — mcp_model_candidates schema (columns named in concept doc).

Ready-to-paste agent prompt

You are a Phase 1.5 builder agent for Colibri (R91+).

TASK: P1.5.6 — Cost Accounting
Add per-call token → USD translation and per-model aggregates (calls,
latency p50, success rate) so the router_stats tool and the ζ decision
trail have real cost and performance data to surface.

PRE-FLIGHT READING:
1. CLAUDE.md
2. src/domains/router/fallback.ts (RouteResult shape)
3. docs/3-world/social/llm.md §Candidate table (cost_bps_per_kilotoken)
4. src/db/schema.sql (mcp_model_candidates schema)

WORKTREE SETUP:
git fetch origin
git worktree add .worktrees/claude/p1-5-6-cost -b feature/p1-5-6-cost origin/main
cd .worktrees/claude/p1-5-6-cost

FILES TO CREATE:
- src/domains/router/cost.ts
  * computeCostUsd(modelId, promptTokens, completionTokens): number
    - Reads m.cost_bps_per_kilotoken from mcp_model_candidates (cached snapshot)
    - Formula: ((promptTokens + completionTokens) * cost_bps_per_kilotoken / 1000) / 10000 (bps → USD)
    - Returns number with 4-decimal internal precision; 2-decimal presentation in router_stats
  * recordRouterCall(modelId, { promptTokens, completionTokens, latencyMs, success })
    - Updates in-memory aggregates map.
    - latency ring buffer bounded at 1000 per model; overwrite oldest.
  * getRouterStats(): { models: Record<ModelId, RouterStats> }
    - RouterStats = { calls_total, successes, failures, avg_cost_usd, p50_latency_ms, success_rate }
  * resetRouterStats(modelId?): manual reset.

- src/__tests__/domains/router/cost.test.ts
  * computeCostUsd: known (promptTokens, completionTokens, bps/kilotoken) → expected USD
  * Ring buffer bound: 1500 calls → p50 is over the last 1000
  * success_rate: 80 successes + 20 failures → 0.8
  * avg_cost_usd reflects actual USD sums
  * Per-model isolation: kimi stats don't contaminate claude stats
  * resetRouterStats(modelId) clears only that model
  * resetRouterStats() clears all

FILES TO MODIFY:
- src/domains/router/fallback.ts
  * Append to RouteResult:
      readonly costUsd: number;
      readonly modelsAttempted: ReadonlyArray<ModelId>;
  * On success: call recordRouterCall(winner, {promptTokens, completionTokens, latencyMs, success: true})
  * On every failed attempt: call recordRouterCall(attempt.model, { promptTokens: 0, completionTokens: 0, latencyMs: <measured>, success: false })
  * On throw: aggregates still updated for every attempted model.
  * costUsd on RouteResult = computeCostUsd(winner, promptTokens, completionTokens).
  * modelsAttempted = list of all models actually called (success + fail), in chain order.

- src/domains/router/index.ts
  * Add: export * from './cost.js';

ACCEPTANCE CRITERIA (headline):
✓ RouteResult gets costUsd + modelsAttempted (append-only)
✓ Per-model aggregates correct
✓ Integer-bps math with single USD divide at edge
✓ Ring buffer bounded
✓ P1.5.7 can build router_stats on top of this

SUCCESS CHECK:
cd .worktrees/claude/p1-5-6-cost && npm run build && npm run lint && npm test

WRITEBACK:
task_update(task_id="P1.5.6", status="done", progress=100)
thought_record(task_id="P1.5.6", ...
  summary="Per-call cost accounting + per-model aggregates shipped. RouteResult gets costUsd + modelsAttempted. Bounded latency ring buffer. Integer-bps math throughout.")

FORBIDDENS:
✗ No floating-point accumulation
✗ No unbounded memory growth (ring buffer is mandatory)
✗ No AMS_* env vars
✗ No MCP tool registration (tool ships in P1.5.7)
✗ Do not change existing RouteResult field types (only APPEND)
✗ Do not edit main checkout

Verification checklist

computeCostUsd matches known test vectors.
Per-model aggregates isolated.
Ring buffer bounded at 1000/model.
RouteResult gets only appended fields.
Gates green.

Writeback template

task_update:
  task_id: P1.5.6
  status: done
  progress: 100

thought_record:
  task_id: P1.5.6
  branch: feature/p1-5-6-cost
  commit_sha: <sha>
  tests_run: ["npm run build", "npm run lint", "npm test"]
  summary: "Cost accounting shipped. computeCostUsd(modelId, promptTokens, completionTokens) reads from mcp_model_candidates; integer-bps math with single final USD divide. In-memory aggregates per ModelId: calls_total, successes, failures, avg_cost_usd, p50_latency_ms (1000-slot ring buffer), success_rate. RouteResult gains costUsd + modelsAttempted as append-only fields."
  blockers: []

Common gotchas

Append-only rule on RouteResult. Phase 0 callers destructure { model, content, finishReason, promptTokens, completionTokens, latencyMs }. Breaking any of those names silently breaks downstream. Only add new fields.
Ring buffer, not unbounded array. An unbounded latency log runs the process out of memory in a 24-hour smoke test. 1000 slots per model is a hard ceiling.
USD math at the edge. All in-loop math is integer bps. The only divide-by-ten-thousand happens at the API surface.

P1.5.7 — `router_*` MCP Tools (4 tools)

Spec source: ADR-005 §Decision (the 4 router tools) ADR anchor: ADR-004 R75 Wave H amendment (tool surface add) Worktree: feature/p1-5-7-router-tools Branch command: git worktree add .worktrees/claude/p1-5-7-router-tools -b feature/p1-5-7-router-tools origin/main Estimated effort: M (3–5 hours) Depends on: P1.5.6 (stats aggregates ready) Unblocks: P1.5.8, P1.5.10

Files to create

src/domains/router/tools.ts — 4 MCP tool handlers with Zod schemas: router_score, router_call, router_fallback, router_stats.
src/__tests__/domains/router/tools.test.ts — per-tool handler tests + schema validation + ζ emission verification.

Files to modify

src/server.ts — register the 4 new tools via the existing middleware wrapper (inlined 5-stage, per CLAUDE.md §9.1).
src/domains/router/index.ts — re-export tools module.

Acceptance criteria

router_score(prompt, context?) returns { scores: Record<ModelId, number>, winner: ModelId, rule_version_hash: string }. Zod input schema enforces prompt non-empty.
router_call(prompt, options?) wraps routeRequest and returns the full RouteResult including costUsd, modelsAttempted. Zod schema mirrors RouteOptions.
router_fallback(model_id?, reset?) inspects getCircuitBreakerState(); if reset: true and a model_id is provided, calls resetCircuitBreaker(model_id). Returns { circuitState: Record<ModelId, CircuitState> }.
router_stats() wraps getRouterStats() and returns the { models: Record<ModelId, RouterStats> } shape.
All 4 tools emit a thought_record (type 'decision') with the decision-trail shape from P1.5.10 (the shape landed in that sub-task; P1.5.7 references it).
All 4 tools registered via src/server.ts — the 14-tool count grows to 18.
Schema validation errors return MCP InvalidParams with a Zod-issue translation.
Call-time auth errors (missing adapter env vars) surface as FallbackChainExhaustedError → MCP error with attempts array in data.
npm run build && npm run lint && npm test green.

Pre-flight reading

CLAUDE.md.
src/server.ts (existing tool-registration pattern).
src/domains/router/fallback.ts (after P1.5.5/6 land).
src/domains/router/cost.ts (after P1.5.6).
ADR-004.
Existing β tool handlers (src/domains/tasks/tools.ts) — Zod + MCP handler pattern.

Ready-to-paste agent prompt

You are a Phase 1.5 builder agent for Colibri (R91+).

TASK: P1.5.7 — router_* MCP Tools
Register the 4 Phase 1.5 router tools on the MCP surface: router_score,
router_call, router_fallback, router_stats. Grow the shipped tool surface
from 14 to 18.

PRE-FLIGHT READING:
1. CLAUDE.md
2. src/server.ts (tool registration pattern)
3. src/domains/tasks/tools.ts (existing Zod + MCP handler pattern)
4. src/domains/router/{scoring,fallback,cost,circuit}.ts (after P1.5.1–6 land)
5. docs/architecture/decisions/ADR-004-tool-surface.md
6. docs/3-world/social/llm.md §Decision-trail recording (emission shape)

WORKTREE SETUP:
git fetch origin
git worktree add .worktrees/claude/p1-5-7-router-tools -b feature/p1-5-7-router-tools origin/main
cd .worktrees/claude/p1-5-7-router-tools

FILES TO CREATE:
- src/domains/router/tools.ts
  * Export 4 MCP tool factories: { router_score, router_call, router_fallback, router_stats }
  * Each returns { name, description, inputSchema (Zod), handler }
  * router_score:
    - Input: { prompt: z.string().min(1), context: z.record(z.any()).optional() }
    - Handler: call scoreIntent(prompt, context); emit thought_record; return { scores, winner, rule_version_hash }
  * router_call:
    - Input: mirrors RouteOptions (maxTokens, systemPrompt, model, apiKey NOT accepted from MCP, tools, etc.)
    - Handler: call routeRequest(prompt, options); emit thought_record; return RouteResult
  * router_fallback:
    - Input: { model_id: z.string().optional(), reset: z.boolean().optional() }
    - Handler: if reset && model_id → resetCircuitBreaker(model_id). Return snapshot().
  * router_stats:
    - Input: {} (no params)
    - Handler: return getRouterStats()

- src/__tests__/domains/router/tools.test.ts
  * One test per tool: Zod validation passes on valid input
  * One test per tool: Zod rejects bad input with MCP-style error
  * router_call on failure: response surfaces FallbackChainExhaustedError
    with attempts in data
  * Each tool emits a thought_record of type 'decision' with routing-decision shape
  * router_fallback reset=true clears the CB state
  * router_stats empty-state case returns 0-count per model

FILES TO MODIFY:
- src/server.ts
  * Import the 4 tool factories.
  * Register them via the existing middleware wrapper.
  * Tool count moves from 14 to 18 (ADR-004 R75 Wave H amendment noted tools=14; P1.5.7 amends upward).

- src/domains/router/index.ts
  * Add: export * from './tools.js';

ACCEPTANCE CRITERIA (headline):
✓ 4 tools registered: router_score, router_call, router_fallback, router_stats
✓ Zod validation on every input
✓ Every handler emits thought_record ('decision' type) with routing-decision shape
✓ apiKey NOT accepted from MCP (secrets only from env)
✓ Tool count 14 → 18

SUCCESS CHECK:
cd .worktrees/claude/p1-5-7-router-tools && npm run build && npm run lint && npm test

WRITEBACK:
task_update(task_id="P1.5.7", status="done", progress=100)
thought_record(task_id="P1.5.7", ...
  summary="4 router_* MCP tools registered. Phase 1.5 tool surface += 4 → 18 total. Every call emits thought_record (routing-decision shape per concept doc).")

FORBIDDENS:
✗ Do not accept apiKey from MCP input (secrets come from env, never from tool params)
✗ Do not accept completionFn / scoringFn / fetchFn / logger / delayFn from MCP input (injection seams are for tests only)
✗ No AMS_* env vars
✗ Do not break any of the existing 14 tools
✗ Do not edit main checkout

Verification checklist

4 tools registered on MCP surface.
Zod schemas match the concept doc’s router_* input shapes.
Injection seams (apiKey, completionFn, etc.) not accepted via MCP input.
Every tool emits a thought_record with routing-decision shape.
server_info-style introspection (if present) shows the new tool count.
Gates green.

Writeback template

task_update:
  task_id: P1.5.7
  status: done
  progress: 100

thought_record:
  task_id: P1.5.7
  branch: feature/p1-5-7-router-tools
  commit_sha: <sha>
  tests_run: ["npm run build", "npm run lint", "npm test"]
  summary: "4 router_* MCP tools registered: router_score, router_call, router_fallback, router_stats. Zod-validated; every call emits a thought_record of type 'decision' with routing-decision shape per δ concept doc. Tool surface grows from 14 to 18. apiKey + injection seams blocked at the MCP boundary."
  blockers: []

Common gotchas

apiKey MUST NOT be accepted from MCP input. Secrets come from COLIBRI_*_API_KEY env vars only. Accepting an apiKey field in the tool schema lets a caller exfiltrate the server’s keys or inject their own — a trust-boundary violation.
Injection seams are for tests only. completionFn, scoringFn, fetchFn, logger, delayFn are pluggable via direct function calls from test code. They are not in the MCP Zod schema.
The thought_record is load-bearing. Without it, the ζ chain has gaps — and because the Merkle root depends on the chain being gapless, a missed emission fails the next audit_verify_chain.
Tool count drift. Pre-Phase-1.5 canon says “14 shipped tools” in many places. A sweep-pass round after Phase 1.5 ships will update those references; this sub-task only adds the code.

P1.5.8 — Cross-Model Parity Test Suite

Spec source: ADR-005 §Implementation step 5 (“Add cross-model parity tests”) ADR anchor: ADR-005 §What Phase 1.5 adds Worktree: feature/p1-5-8-parity Branch command: git worktree add .worktrees/claude/p1-5-8-parity -b feature/p1-5-8-parity origin/main Estimated effort: L (6–10 hours) Depends on: P1.5.7 Unblocks: production activation

Files to create

src/__tests__/domains/router/parity.test.ts — integration suite running every adapter through the same contract fixtures.

Acceptance criteria

Pre-flight reading

CLAUDE.md.
All 4 adapter source files + their tests.
src/domains/router/tools.ts (P1.5.7).
docs/3-world/social/llm.md §Decision-trail recording.

Ready-to-paste agent prompt

You are a Phase 1.5 builder agent for Colibri (R91+).

TASK: P1.5.8 — Cross-Model Parity Test Suite
Write a single integration file that runs every Phase 1.5 adapter through
the same set of contract fixtures. No live network; all mocked via
injected fetchFn.

PRE-FLIGHT READING:
1. CLAUDE.md
2. src/domains/router/adapters/{claude,kimi,codex,openai}.ts
3. src/domains/router/tools.ts
4. src/domains/router/{scoring,fallback,cost,circuit}.ts
5. docs/3-world/social/llm.md §Decision-trail recording

WORKTREE SETUP:
git fetch origin
git worktree add .worktrees/claude/p1-5-8-parity -b feature/p1-5-8-parity origin/main
cd .worktrees/claude/p1-5-8-parity

FILES TO CREATE:
- src/__tests__/domains/router/parity.test.ts
  * Test matrix: 4 adapters × 7 scenarios = 28 parity tests
  * Scenarios:
    1. determinism: same (prompt, mocked response) → structurally equal CompletionResult
    2. tool_use: mock tool-call response → AnthropicTool-shaped entry on response
    3. error_401: routes to FallbackChainExhaustedError attempt when invoked via router_call
    4. error_500: same
    5. timeout: adapter hangs > 30s → failure attempt
    6. cost_shape: after 10 successes + 2 failures, router_stats has the expected aggregates
    7. zeta_emission: every call emits a thought_record with type 'routing_decision' and full shape
  * Cross-cutting tests:
    - circuit_breaker_cross_model: trip claude CB → next call routes to runner-up
    - rule_version_hash_constancy: same κ version → same hash across calls
  * No live network. Every adapter call intercepted via options.fetchFn = mock.

ACCEPTANCE CRITERIA (headline):
✓ 28 per-adapter parity tests + 2 cross-cutting tests
✓ All mocked; no network I/O
✓ 5-run-in-a-row determinism check (no flake)
✓ ζ emission shape matches concept doc exactly

SUCCESS CHECK:
cd .worktrees/claude/p1-5-8-parity && npm run build && npm run lint && npm test

WRITEBACK:
task_update(task_id="P1.5.8", status="done", progress=100)
thought_record(task_id="P1.5.8", ...
  summary="Cross-model parity suite shipped. 4 adapters × 7 scenarios + 2 cross-cutting tests. All mocked; no network. Tests pass 5 runs in a row with zero flake.")

FORBIDDENS:
✗ No live network calls in tests
✗ No AMS_* env vars
✗ Do not use real time (all timeouts via fake timers or injected delayFn)
✗ Do not edit main checkout

Verification checklist

28 per-adapter tests + 2 cross-cutting tests.
Zero live network.
Fake timers for the timeout scenarios.
ζ emission shape matches concept doc.
5 repeat runs, no flake.
Gates green.

Writeback template

task_update:
  task_id: P1.5.8
  status: done
  progress: 100

thought_record:
  task_id: P1.5.8
  branch: feature/p1-5-8-parity
  commit_sha: <sha>
  tests_run: ["npm run build", "npm run lint", "npm test"]
  summary: "Cross-model parity suite shipped. 4 adapters × 7 scenarios (determinism, tool_use, error 401/500, timeout, cost shape, ζ emission) + 2 cross-cutting tests (cross-model CB, rule-version-hash constancy). All mocked via injected fetchFn + fake timers. 5 consecutive runs, zero flake."
  blockers: []

Common gotchas

No real time. Every timeout test uses fake timers or an injected delayFn. Wall-clock-dependent tests are the single biggest source of CI flake.
Inject fetchFn, don’t monkey-patch global fetch. The adapters already accept options.fetchFn. Use it; globally stubbing fetch leaks across test files.
Assertion on structural equality, not on content. Different providers produce different wording for the same prompt. Assert shape, not string contents.

P1.5.9 — Model Candidates Table Population

Spec source: docs/3-world/social/llm.md §Candidate table + §Phase 1.5 candidate cohort ADR anchor: Phase 1.5 precondition for P1.5.1 scoring Worktree: feature/p1-5-9-candidates Branch command: git worktree add .worktrees/claude/p1-5-9-candidates -b feature/p1-5-9-candidates origin/main Estimated effort: S (1–2 hours) Depends on: Phase 0 schema (mcp_model_candidates table columns present) Unblocks: P1.5.1 (scoring reads candidates)

Files to create

src/db/migrations/NNN-model-candidates-seed.sql — seed 7 additional rows alongside the existing Claude Sonnet row.

Files to modify

src/db/schema.sql — if any column in the concept doc’s table is missing (review at build time), add as part of this migration.

Acceptance criteria

Migration file named with the next sequential number after the final Phase 0 migration.
8 total rows in mcp_model_candidates post-migration: Claude 3.5 Sonnet (already present), Claude 3.5 Haiku, GPT-4o, GPT-4o mini, Gemini 1.5 Pro, Llama 3.3 70B, Mixtral 8x22B, Kimi K2.
Columns per row: model_id, provider, context_window_tokens, latency_tier (fast | balanced | slow), cost_bps_per_kilotoken, domain_fit_profile (bitmask), enabled (boolean, default false for the 7 new rows so they do NOT activate until each adapter lands).
Column values for the 7 new rows match the concept doc §Phase 1.5 candidate cohort table exactly (window, cost tier, latency tier, strengths-as-bitmask).
Migration is idempotent (INSERT OR IGNORE on primary key).
Migration is reversible — a DOWN section deletes the 7 new rows.
npm run build && npm run lint && npm test green.

Pre-flight reading

CLAUDE.md.
src/db/schema.sql.
src/db/migrations/ directory — numbering convention.
docs/3-world/social/llm.md §Candidate table + §Phase 1.5 candidate cohort.

Ready-to-paste agent prompt

You are a Phase 1.5 builder agent for Colibri (R91+).

TASK: P1.5.9 — Model Candidates Table Population
Seed the 7 additional candidate rows alongside the existing Claude Sonnet
row. All 7 ship as enabled=false so the candidate activates only when its
adapter and parity tests land.

PRE-FLIGHT READING:
1. CLAUDE.md
2. src/db/schema.sql (mcp_model_candidates table)
3. src/db/migrations/ (numbering convention)
4. docs/3-world/social/llm.md §Candidate table + §Phase 1.5 candidate cohort

WORKTREE SETUP:
git fetch origin
git worktree add .worktrees/claude/p1-5-9-candidates -b feature/p1-5-9-candidates origin/main
cd .worktrees/claude/p1-5-9-candidates

FILES TO CREATE:
- src/db/migrations/NNN-model-candidates-seed.sql
  * UP section: INSERT OR IGNORE 7 rows:
    - 'claude-haiku-3-5'    | 'anthropic' | 200_000 | 'fast'     | <low bps>    | <triage bitmask> | 0
    - 'gpt-4o'              | 'openai'    | 128_000 | 'balanced' | <high bps>   | <general bitmask>| 0
    - 'gpt-4o-mini'         | 'openai'    | 128_000 | 'fast'     | <low bps>    | <triage bitmask> | 0
    - 'gemini-1-5-pro'      | 'google'    | 1_000_000 | 'slow'   | <medium bps> | <longctx bitmask>| 0
    - 'llama-3-3-70b'       | 'meta'      | 128_000 | 'balanced' | <low bps>    | <selfhost bitmask>| 0
    - 'mixtral-8x22b'       | 'mistral'   | 64_000  | 'fast'     | <low bps>    | <openweight bitmask>| 0
    - 'kimi-k2'             | 'moonshot'  | 200_000 | 'balanced' | <medium bps> | <cn-en-parity bitmask>| 0
  * DOWN section: DELETE those 7 rows by model_id.
  * Use indicative cost_bps values from the concept doc "indicative" ranges — the live post-task callback refreshes them.

FILES TO MODIFY:
- src/db/schema.sql (ONLY if a column is missing from the concept doc)

ACCEPTANCE CRITERIA (headline):
✓ 8 total candidates post-migration
✓ All 7 new rows ship enabled=false (activate per-adapter)
✓ Idempotent (INSERT OR IGNORE)
✓ Reversible (DOWN removes rows)
✓ Column values match concept doc cohort table

SUCCESS CHECK:
cd .worktrees/claude/p1-5-9-candidates && npm run build && npm run lint && npm test

WRITEBACK:
task_update(task_id="P1.5.9", status="done", progress=100)
thought_record(task_id="P1.5.9", ...
  summary="7 candidate rows seeded alongside the Phase 0 Claude Sonnet row. All new rows enabled=false until their adapter + parity tests land. Migration is idempotent + reversible.")

FORBIDDENS:
✗ Do not set enabled=true on rows whose adapters don't exist yet
✗ Do not invent cost/latency values — use concept doc indicative ranges
✗ Do not skip the DOWN section
✗ Do not edit main checkout

Verification checklist

8 rows post-migration.
7 new rows ship enabled = false.
Values match concept doc §Phase 1.5 candidate cohort.
INSERT OR IGNORE + DOWN block present.
Gates green.

Writeback template

task_update:
  task_id: P1.5.9
  status: done
  progress: 100

thought_record:
  task_id: P1.5.9
  branch: feature/p1-5-9-candidates
  commit_sha: <sha>
  tests_run: ["npm run build", "npm run lint", "npm test"]
  summary: "Model candidates seeded: 7 additional rows alongside the Phase 0 Claude Sonnet row. Claude 3.5 Haiku, GPT-4o, GPT-4o mini, Gemini 1.5 Pro, Llama 3.3 70B, Mixtral 8x22B, Kimi K2. All new rows ship enabled=false; each flips true when its adapter + parity tests land. Migration idempotent + reversible."
  blockers: []

Common gotchas

enabled = false on the new rows is intentional. Activation is gated by the adapter landing. Flipping enabled before the adapter exists makes scoreIntent pick a model that routeRequest can’t call → instant FallbackChainExhaustedError.
Indicative costs, not committed costs. Real live costs come from the post-task callback that updates the table. The seed is a plausible starting point, not an SLA.
Bitmask semantics come from ξ. The domain_fit_profile bitmask layout is defined in docs/3-world/social/identity.md; re-read before setting the bits.

P1.5.10 — ζ Decision-Trail Integration

Spec source: docs/3-world/social/llm.md §Decision-trail recording ADR anchor: ADR-005 §Consequences “the thought chain logs every call” Worktree: feature/p1-5-10-zeta Branch command: git worktree add .worktrees/claude/p1-5-10-zeta -b feature/p1-5-10-zeta origin/main Estimated effort: M (3–5 hours) Depends on: P1.5.7 (router tools emit; this sub-task tightens the shape) Unblocks: production activation

Files to modify

src/domains/router/fallback.ts — emit a thought_record of type 'decision' with the full routing-decision shape on every routeRequest call (success and failure).
src/domains/router/tools.ts — ensure the 4 MCP tool handlers also emit the shape (some may already from P1.5.7).

Files to create

src/domains/router/trail.ts — shared helper emitRoutingDecision(record) that writes a ζ thought_record with the exact JSON shape from the concept doc.
src/__tests__/domains/router/trail.test.ts — shape-match tests against the JSON template.

Acceptance criteria

The emitted record has these fields, exactly: type: 'routing_decision', routing_mode ('single' | 'ensemble' | 'pipeline' | 'fail'), chosen_model_id, candidates_considered (array of ModelId), scores (Record<ModelId, number>), fallback_attempts (number), rule_version_hash ("rv:sha256:..."), decision_hash (SHA-256 hex of inputs || chosen).
decision_hash input concatenation is deterministic: JSON-canonical of { prompt, context, rule_version_hash, candidates_considered } + chosen model id.
routing_mode = 'fail' is emitted on FallbackChainExhaustedError.
routing_mode = 'single' is emitted on every successful Phase 1.5 call (ensemble + pipeline modes remain spec-only until a later round).
fallback_attempts counts the number of tried candidates before the winner (or the total attempt count on fail).
Record is chained via the existing ζ thought_record hash-chain (reads previous_hash from the last record in the session, stores the new hash).
Emission failure does NOT swallow the original result — the router still returns the RouteResult (log the ζ emission error via options.logger).
Shape-match tests compare against a canonical JSON fixture of the concept-doc example.
npm run build && npm run lint && npm test green.

Pre-flight reading

CLAUDE.md.
src/domains/trail/ (ζ directory — hash-chain invariants).
docs/3-world/social/llm.md §Decision-trail recording.
src/domains/router/fallback.ts + tools.ts (where emission hooks in).

Ready-to-paste agent prompt

You are a Phase 1.5 builder agent for Colibri (R91+).

TASK: P1.5.10 — ζ Decision-Trail Integration
Tighten the routing-decision ζ emission to the exact shape from the
concept doc. Every router call (success + fail) emits a thought_record
of type 'decision' with the routing_decision payload.

PRE-FLIGHT READING:
1. CLAUDE.md
2. src/domains/trail/ (hash-chain invariants — previous_hash threading)
3. docs/3-world/social/llm.md §Decision-trail recording
4. src/domains/router/fallback.ts + tools.ts (emission points)

WORKTREE SETUP:
git fetch origin
git worktree add .worktrees/claude/p1-5-10-zeta -b feature/p1-5-10-zeta origin/main
cd .worktrees/claude/p1-5-10-zeta

FILES TO CREATE:
- src/domains/router/trail.ts
  * emitRoutingDecision(record: RoutingDecisionRecord): void
    - record shape matches the concept doc exactly:
      {
        type: 'routing_decision',
        routing_mode: 'single' | 'ensemble' | 'pipeline' | 'fail',
        chosen_model_id: string,
        candidates_considered: string[],
        scores: Record<string, number>,
        fallback_attempts: number,
        rule_version_hash: string,  // "rv:sha256:..."
        decision_hash: string        // SHA-256 hex of canonical inputs || chosen
      }
    - Writes into the ζ chain via the existing thought_record API.
  * computeDecisionHash(inputs, chosenModelId): string
    - Canonical-JSON the inputs object { prompt, context, rule_version_hash, candidates_considered }
    - Concatenate with chosen_model_id
    - SHA-256 → hex string

- src/__tests__/domains/router/trail.test.ts
  * Shape match against a golden fixture copied from the concept doc:
    {
      type: "routing_decision",
      routing_mode: "single",
      chosen_model_id: "claude-sonnet-3.5",
      candidates_considered: ["claude-sonnet-3.5", "gpt-4o", "claude-haiku-3.5"],
      scores: {"claude-sonnet-3.5": 0.87, "gpt-4o": 0.79, "claude-haiku-3.5": 0.58},
      fallback_attempts: 0,
      rule_version_hash: "rv:sha256:...",
      decision_hash: "SHA-256(...)"
    }
  * Determinism: two identical inputs → same decision_hash
  * fail case: FallbackChainExhaustedError emits routing_mode='fail'
  * fallback_attempts counts correctly when cascade happens

FILES TO MODIFY:
- src/domains/router/fallback.ts
  * After computing the winner (or catching the final exception), call
    emitRoutingDecision with the appropriate record.
  * On success: routing_mode='single'; fallback_attempts = modelsAttempted.length - 1
  * On fail: routing_mode='fail'; fallback_attempts = attempts.length
  * Emission errors MUST NOT override the original return/throw — catch and log via options.logger.

- src/domains/router/tools.ts
  * Ensure router_score, router_call, router_fallback, router_stats all emit
    via emitRoutingDecision where relevant.
  * router_stats may use routing_mode='pipeline' or a dedicated shape — keep
    the base shape consistent.

ACCEPTANCE CRITERIA (headline):
✓ Shape matches concept doc §Decision-trail recording exactly
✓ decision_hash is SHA-256 of canonical-JSON inputs || chosen
✓ 'fail' emitted on exhaustion
✓ Chain integrity preserved (previous_hash threading)
✓ Emission errors logged, not thrown

SUCCESS CHECK:
cd .worktrees/claude/p1-5-10-zeta && npm run build && npm run lint && npm test

WRITEBACK:
task_update(task_id="P1.5.10", status="done", progress=100)
thought_record(task_id="P1.5.10", ...
  summary="ζ decision-trail integration shipped. Every router call emits a thought_record ('routing_decision' type) with the exact shape from the concept doc. decision_hash is SHA-256 of canonical-JSON inputs + chosen model. fail-mode emission on FallbackChainExhaustedError. Hash-chain integrity preserved. Emission errors are logged, not thrown.")

FORBIDDENS:
✗ Do not invent new fields in the routing_decision record
✗ Do not swallow the RouteResult on emission failure
✗ Do not skip the hash chain — every record threads previous_hash
✗ No AMS_* env vars
✗ Do not edit main checkout

Verification checklist

Record shape matches concept doc exactly (field names + types).
decision_hash is deterministic over canonical-JSON inputs.
Success and fail paths both emit.
Hash-chain integrity preserved.
Emission errors do not override the router result.
Gates green.

Writeback template

task_update:
  task_id: P1.5.10
  status: done
  progress: 100

thought_record:
  task_id: P1.5.10
  branch: feature/p1-5-10-zeta
  commit_sha: <sha>
  tests_run: ["npm run build", "npm run lint", "npm test"]
  summary: "ζ decision-trail integration shipped. emitRoutingDecision() writes a thought_record of type 'decision' with the exact routing_decision shape from the δ concept doc: type, routing_mode, chosen_model_id, candidates_considered, scores, fallback_attempts, rule_version_hash, decision_hash. decision_hash is SHA-256 over canonical-JSON inputs || chosen. Success → routing_mode='single'; FallbackChainExhaustedError → routing_mode='fail'. Hash-chain integrity preserved. Emission failures logged, not thrown."
  blockers: []

Common gotchas

Canonical JSON matters. Two arbiters with the same inputs MUST produce the same decision_hash. JavaScript JSON.stringify is NOT canonical (key order undefined). Use a canonical-JSON encoder (sorted keys, no trailing whitespace) or accept that decision_hash becomes unverifiable.
Don’t swallow the result on emission error. If ζ is momentarily unavailable, the caller still gets its RouteResult; the ζ layer logs the miss and a later audit may be weaker, but the execution layer is not blocked.
routing_mode = 'fail' is load-bearing. Without it an audit cannot distinguish a deliberate failure from a missing record. Emit on every failure path.

Next group

Phase 1.5 activation kicks off subsequent Phase 2 work (λ reputation per-model quality data depends on P1.5.10’s per-call ζ records). Phase 2 prompts will land when Phase 1.5 closes.

Back to task-prompts index

P1.5 — δ Model Router Graduation — Agent Prompts

Group summary

P1.5.1 — Real 7-dimension Intent Scoring

Files to modify

Files to create

Acceptance criteria

Pre-flight reading

Ready-to-paste agent prompt

Verification checklist (for reviewer agent)

Writeback template

Common gotchas

P1.5.2 — Adapter: Kimi K2

Files to create

Files to modify

Acceptance criteria

Pre-flight reading

Ready-to-paste agent prompt

Verification checklist (for reviewer agent)

Writeback template

Common gotchas

P1.5.3 — Adapter: Codex

Files to create

Files to modify

Acceptance criteria

Pre-flight reading

Ready-to-paste agent prompt

Verification checklist

Writeback template

Common gotchas

P1.5.4 — Adapter: OpenAI (GPT-4o family)

Files to create

Files to modify

Acceptance criteria

Pre-flight reading

Ready-to-paste agent prompt

Verification checklist

Writeback template

Common gotchas

P1.5.5 — N-member Fallback Chain + Circuit Breaker

Files to modify

Files to create

Acceptance criteria

Pre-flight reading

Ready-to-paste agent prompt

Verification checklist

Writeback template

Common gotchas

P1.5.6 — Cost Accounting

Files to create

Files to modify

Acceptance criteria

Pre-flight reading

Ready-to-paste agent prompt

Verification checklist

Writeback template

Common gotchas

P1.5.7 — router_* MCP Tools (4 tools)

Files to create

Files to modify

Acceptance criteria

Pre-flight reading

Ready-to-paste agent prompt

Verification checklist

Writeback template

Common gotchas

P1.5.8 — Cross-Model Parity Test Suite

Files to create

Acceptance criteria

Pre-flight reading

Ready-to-paste agent prompt

Verification checklist

Writeback template

Common gotchas

P1.5.9 — Model Candidates Table Population

Files to create

Files to modify

Acceptance criteria

Pre-flight reading

Ready-to-paste agent prompt

Verification checklist

P1.5.7 — `router_*` MCP Tools (4 tools)