P0.5 — δ Model Router — Agent Prompts

⚠ Phase 0 stubs shipped in R75 Wave I per ADR-005 §Decision. This task group graduated from “spec-only” to “library stubs” on 2026-04-18. Phase 0 delivered: constant scoring (always Claude) in src/domains/router/scoring.ts (PR #149) + single-member fallback chain in src/domains/router/fallback.ts (PR #150). No MCP tools; library-only. The donor prompt below is preserved for Phase 1.5 when δ graduates to real multi-model routing; do not run it against Phase 0 — it references AMS_MODEL_1..8 env vars (donor namespace, not supported) and an 8-model fallback chain (Phase 1.5 scope). COLIBRI equivalents are TBD when δ is revived in Phase 1.5.

Canonical spec (Phase 0 shape, reconciled): task-breakdown.md §P0.5 Decision record: ADR-005 Wave I landing commits: PR #149 (P0.5.1 scoring) · PR #150 (P0.5.2 fallback)

Heritage prompt content (Phase 1.5 starting point) — click to expand

> Everything below this line was written for a donor model-router design and is kept as a Phase 1.5 starting point. It references `AMS_MODEL_1..8` env vars (donor namespace) and an 8-model fallback chain (donor algorithm). Both must be re-earned against Colibri's `COLIBRI_MODEL_*` namespace and Phase 1.5 scope when δ lands. Do not execute any of the sub-tasks below in Phase 0. ## Group summary | Task ID | Title | Depends on | Effort | Unblocks | |---------|-------|------------|--------|----------| | P0.5.1 | Intent Scoring Matrix | P0.2.2 | M | P0.5.2 | | P0.5.2 | 8-Model Fallback Chain | P0.5.1 | M | production routing | --- ## P0.5.1 — Intent Scoring Matrix **Spec source:** [task-breakdown.md §P0.5.1](/AMS/guides/implementation/task-breakdown.html) **Extraction reference:** `docs/reference/extractions/delta-model-router-extraction.md` **Worktree:** `feature/p0-5-1-scoring` **Branch command:** `git worktree add .worktrees/claude/p0-5-1-scoring -b feature/p0-5-1-scoring origin/main` **Estimated effort:** M (Medium — 2-3 hours) **Depends on:** P0.2.2 (Database for storing scoring rules/cache) **Unblocks:** P0.5.2 (Fallback chain uses scoring to pick model order) ### Files to create - `src/domains/router/scoring.ts` — Intent scoring algorithm - `tests/domains/router/scoring.test.ts` — Deterministic scoring tests ### Acceptance criteria - [ ] `scoreIntent(prompt, context)` → `{ scores: Record<ModelId, number>, winner: ModelId }` - [ ] Scoring factors: prompt length, complexity keywords, context size, tool requirements - [ ] All scores in range [0, 100] (integer) - [ ] Deterministic: same input always returns same winner - [ ] No external API calls in scoring (pure function) - [ ] Test: 10 sample prompts with expected model winners ### Pre-flight reading - `CLAUDE.md` — worktree rules - `docs/guides/implementation/task-breakdown.md` §P0.5.1 — full spec - `docs/reference/extractions/delta-model-router-extraction.md` (scoring section) - `docs/reference/greek-vocabulary.md` — δ (delta) concept description ### Ready-to-paste agent prompt ```text You are a Phase 0 builder agent for Colibri. TASK: P0.5.1 — Intent Scoring Matrix Implement deterministic intent scoring for model selection without API calls. FILES TO READ FIRST: 1. CLAUDE.md (execution rules) 2. docs/guides/implementation/task-breakdown.md §P0.5.1 3. docs/reference/extractions/delta-model-router-extraction.md (scoring section) 4. src/config.ts (model list configuration) WORKTREE SETUP: git fetch origin git worktree add .worktrees/claude/p0-5-1-scoring -b feature/p0-5-1-scoring origin/main cd .worktrees/claude/p0-5-1-scoring FILES TO CREATE: - src/domains/router/scoring.ts * scoreIntent(prompt: string, context: {toolCount?: number, complexity?: string}): {scores: Record<string, number>, winner: string} * Scoring factors (all integers, range [0, 100]): - Prompt length: short (0-100 chars) → small models get +20; long (>1000 chars) → large models get +30 - Complexity keywords: ["analyze", "reason", "plan", "synthesize"] → +25 for capable models - Context size: large context (>5000 chars) → strong-context models +20 - Tool requirements: count of unique tool mentions → +10 per tool for multi-tool capable models * Models (from config): claude-opus, claude-sonnet, claude-haiku, etc. * Winner: model with highest score * Pure function: no DB calls, no randomness, no API calls * Tie-breaking: if tied, prefer model by alphabetical order (deterministic) - tests/domains/router/scoring.test.ts * Test 10 sample prompts with expected winners: 1. Short simple task → haiku (small, fast) 2. Long complex analysis → opus (capable) 3. Multi-tool prompt → sonnet (balanced) 4. Reasoning chain → opus 5. Code completion → sonnet ... (5 more representative prompts) * Test determinism: same input twice → same winner * Test score ranges [0, 100] * Test tie-breaking (same score → alphabetical) ACCEPTANCE CRITERIA (headline): ✓ scoreIntent returns {scores: Record<ModelId, number>, winner: ModelId} ✓ Factors: prompt length, complexity keywords, context size, tool count ✓ Scores [0, 100], deterministic, pure (no API calls) ✓ 10 sample prompts with verified winners SUCCESS CHECK: cd .worktrees/claude/p0-5-1-scoring && npm test && npm run lint WRITEBACK (after success): task_update(task_id="P0.5.1", status="done", progress=100) thought_record(task_id="P0.5.1", branch="feature/p0-5-1-scoring", commit_sha=, tests_run=["npm test","npm run lint"], summary="Implemented deterministic intent scoring with factors: prompt length, complexity keywords, context size, tool requirements. 10 sample prompts verify model selection.") FORBIDDENS: ✗ No external API calls (pure function only) ✗ No randomness (deterministic always) ✗ No hardcoding model lists (read from config) ✗ Do not edit main checkout NEXT: P0.5.2 — 8-Model Fallback Chain (uses scoring to order fallback attempts) ``` ### Verification checklist (for reviewer agent) - [ ] scoreIntent is pure (no API calls, no DB queries) - [ ] All scores integers in [0, 100] - [ ] Same input twice → same winner (deterministic) - [ ] Tie-breaking is deterministic (alphabetical, not random) - [ ] 10 sample prompts with verified expected winners - [ ] Complexity keywords and tool counting working - [ ] npm test and npm run lint pass ### Writeback template ```yaml task_update: task_id: P0.5.1 status: done progress: 100 thought_record: task_id: P0.5.1 branch: feature/p0-5-1-scoring commit_sha: tests_run: ["npm test", "npm run lint"] summary: "Implemented deterministic intent scoring with four factors: prompt length (small prompts favor haiku/sonnet, long favor opus), complexity keywords (analyze/reason/plan favor capable models, +25 bonus), context size (large context favors strong-context models), tool requirements (+10 per unique tool for multi-tool capable models). Winner is highest score; ties broken alphabetically. Pure function with no API calls. 10 representative sample prompts verify correct model selection." blockers: [] ``` ### Common gotchas - **No API calls in scoring** — if you call Claude to score an intent, you've already committed to paying for a model call. The whole point of scoring is to decide which model to use BEFORE calling any API. Keep it lightweight. - **Determinism is critical** — the router must be predictable. Same prompt always routes to same model. Use deterministic tie-breaking (alphabetical, not random). - **Complexity keywords are case-insensitive** — "Analyze", "ANALYZE", "analyze" should all trigger the bonus. Use .toLowerCase() before matching. - **Tool requirements count** — tools are marked in context.tools or extracted from prompt. Count unique tool names, not mentions. "call_tool x twice" still counts as 1 tool. --- ## P0.5.2 — 8-Model Fallback Chain **Spec source:** [task-breakdown.md §P0.5.2](/AMS/guides/implementation/task-breakdown.html) **Extraction reference:** `docs/reference/extractions/delta-model-router-extraction.md` **Worktree:** `feature/p0-5-2-fallback` **Branch command:** `git worktree add .worktrees/claude/p0-5-2-fallback -b feature/p0-5-2-fallback origin/main` **Estimated effort:** M (Medium — 2-3 hours) **Depends on:** P0.5.1 (Uses scoring to order chain) **Unblocks:** Production routing (resilience to model outages) ### Files to create - `src/domains/router/fallback.ts` — Fallback chain orchestration + circuit breaker - `tests/domains/router/fallback.test.ts` — Fallback sequence + circuit breaker tests ### Acceptance criteria - [ ] 8 model slots configured via env vars: `AMS_MODEL_1` through `AMS_MODEL_8` - [ ] `routeRequest(prompt, context)` tries models in priority order - [ ] On model error / timeout: tries next model in chain - [ ] If all 8 fail: throws `AllModelsFailedError` with per-model error log - [ ] Circuit breaker: model marked unavailable for 60s after 3 consecutive failures - [ ] Test: mock models 1-7 failing → verify model 8 is used ### Pre-flight reading - `CLAUDE.md` — execution rules - `docs/guides/implementation/task-breakdown.md` §P0.5.2 — full spec - `docs/reference/extractions/delta-model-router-extraction.md` (fallback section) - `src/domains/router/scoring.ts` — scoring output for model ordering ### Ready-to-paste agent prompt ```text You are a Phase 0 builder agent for Colibri. TASK: P0.5.2 — 8-Model Fallback Chain Implement resilient routing with fallback attempts and circuit breaker. FILES TO READ FIRST: 1. CLAUDE.md (execution rules) 2. docs/guides/implementation/task-breakdown.md §P0.5.2 3. docs/reference/extractions/delta-model-router-extraction.md (fallback section) 4. src/domains/router/scoring.ts (model ordering) 5. src/config.ts (model configuration) WORKTREE SETUP: git fetch origin git worktree add .worktrees/claude/p0-5-2-fallback -b feature/p0-5-2-fallback origin/main cd .worktrees/claude/p0-5-2-fallback FILES TO CREATE: - src/domains/router/fallback.ts * routeRequest(prompt: string, context: any): Promise<{model: string, result: any}> * Priority order from scoring (or default AMS_MODEL_1..8 order) * Try each model in sequence: 1. Check circuit breaker: if model unavailable for 60s, skip 2. Call model with timeout (30s default, or AMS_MODEL_TIMEOUT) 3. On success: return {model, result} 4. On error/timeout: log error, try next model * Circuit breaker: - Track failures per model (3 consecutive failures = trip) - Unavailable for 60s; then reset counter - Use in-memory state or DB (recommendation: start with in-memory) * If all 8 models fail: - Throw AllModelsFailedError with per-model error log - Error object: {models: [{name, status, error, timeout?}]} * Export functions: - routeRequest(prompt, context) - getCircuitBreakerState() → debug info - resetCircuitBreaker(modelId) → manual reset - tests/domains/router/fallback.test.ts * Test happy path: first model succeeds * Test fallback: model 1 fails, model 2 succeeds * Test all fail: all 8 models fail → AllModelsFailedError thrown * Test circuit breaker: 3 failures → model marked unavailable * Test circuit breaker reset after 60s (use fake timers for speed) * Test timeout: model 1 times out (>30s), model 2 used * Test per-model error logging in AllModelsFailedError ACCEPTANCE CRITERIA (headline): ✓ 8 model slots via AMS_MODEL_1..8 env vars ✓ routeRequest tries models in order, fallback on error ✓ Circuit breaker: 3 failures → 60s unavailable ✓ AllModelsFailedError with per-model log if all fail ✓ Test models 1-7 fail → model 8 used SUCCESS CHECK: cd .worktrees/claude/p0-5-2-fallback && npm test && npm run lint WRITEBACK (after success): task_update(task_id="P0.5.2", status="done", progress=100) thought_record(task_id="P0.5.2", branch="feature/p0-5-2-fallback", commit_sha=, tests_run=["npm test","npm run lint"], summary="Implemented 8-model fallback chain with circuit breaker. routeRequest tries models in priority order with 30s timeout. Circuit breaker marks model unavailable after 3 consecutive failures for 60s. AllModelsFailedError thrown with per-model error log if all fail.") FORBIDDENS: ✗ Do not have circuit breaker state survive process exit yet (in-memory ok for P0.5.2) ✗ Do not skip timeout (30s default) on model calls ✗ Do not throw immediately on first failure (always try all 8) ✗ Do not edit main checkout NEXT: P0.6.1 — Skill Schema (agents spawn with router routing their tasks) Production routing complete after this task. ``` ### Verification checklist (for reviewer agent) - [ ] 8 model slots read from AMS_MODEL_1..8 env vars - [ ] routeRequest calls scoring to determine priority order (or uses env order) - [ ] Each model attempt has 30s timeout - [ ] Circuit breaker tracks failures per model - [ ] 3 consecutive failures → mark unavailable for 60s - [ ] AllModelsFailedError includes per-model error details - [ ] Test covers fallback scenario (models 1-7 fail, 8 succeeds) - [ ] Test circuit breaker trip and reset - [ ] npm test and npm run lint pass ### Writeback template ```yaml task_update: task_id: P0.5.2 status: done progress: 100 thought_record: task_id: P0.5.2 branch: feature/p0-5-2-fallback commit_sha: tests_run: ["npm test", "npm run lint"] summary: "Implemented 8-model fallback chain with circuit breaker resilience. routeRequest(prompt, context) tries models in priority order (from AMS_MODEL_1..8 env vars), with 30s timeout per attempt. On error/timeout, tries next model. Circuit breaker marks model unavailable for 60s after 3 consecutive failures. If all 8 models fail, throws AllModelsFailedError with per-model error log including status, error message, and timeout info. Handles transient outages gracefully." blockers: [] ``` ### Common gotchas - **Circuit breaker state must persist in memory for now** — P0.5.2 can use in-memory state (Map of {modelId → {failureCount, unavailableUntil}}). Later (if needed) this could be moved to DB for persistence across restarts, but for P0.5.2 in-memory is fine. Note: restarts will reset circuit state. - **Timeout is per-attempt, not total** — if model 1 takes 30s and times out, you still get another 30s for model 2. Total runtime could be 8*30s in the worst case (all timeouts). This is acceptable for a phase that's not yet optimizing latency. - **Error logging must include per-model details** — when all 8 fail, the AllModelsFailedError must include {models: [{name, status, error, timeout?}]} so debugging tools can see which models were tried and why they failed. - **Reset circuit breaker after 60s, not "on next success"** — if model 1 has 3 failures and is marked unavailable, it stays unavailable for 60s even if model 2 succeeds in the meantime. This prevents rapid re-attempts against a broken model. --- ## Next group [p0.6-epsilon-skills.md](/AMS/guides/implementation/task-prompts/p0.6-epsilon-skills.html) — ε Skill Registry (3 tasks: Skill Schema, Skill CRUD+Discovery, Agent Spawning) [Back to task-prompts index](/AMS/guides/implementation/task-prompts/) </details>