⚠ Phase 0 stubs shipped in R75 Wave I per ADR-005 §Decision. This task group graduated from “spec-only” to “library stubs” on 2026-04-18. Phase 0 delivered: constant scoring (always Claude) in src/domains/router/scoring.ts (PR #149) + single-member fallback chain in src/domains/router/fallback.ts (PR #150). No MCP tools; library-only. The donor prompt below is preserved for Phase 1.5 when δ graduates to real multi-model routing; do not run it against Phase 0 — it references AMS_MODEL_1..8 env vars (donor namespace, not supported) and an 8-model fallback chain (Phase 1.5 scope). COLIBRI equivalents are TBD when δ is revived in Phase 1.5.
Heritage prompt content (Phase 1.5 starting point) — click to expand
> Everything below this line was written for a donor model-router design and is kept as a Phase 1.5 starting point. It references `AMS_MODEL_1..8` env vars (donor namespace) and an 8-model fallback chain (donor algorithm). Both must be re-earned against Colibri's `COLIBRI_MODEL_*` namespace and Phase 1.5 scope when δ lands. Do not execute any of the sub-tasks below in Phase 0.
## Group summary
| Task ID | Title | Depends on | Effort | Unblocks |
|---------|-------|------------|--------|----------|
| P0.5.1 | Intent Scoring Matrix | P0.2.2 | M | P0.5.2 |
| P0.5.2 | 8-Model Fallback Chain | P0.5.1 | M | production routing |
---
## P0.5.1 — Intent Scoring Matrix
**Spec source:** [task-breakdown.md §P0.5.1](/AMS/guides/implementation/task-breakdown.html)
**Extraction reference:** `docs/reference/extractions/delta-model-router-extraction.md`
**Worktree:** `feature/p0-5-1-scoring`
**Branch command:** `git worktree add .worktrees/claude/p0-5-1-scoring -b feature/p0-5-1-scoring origin/main`
**Estimated effort:** M (Medium — 2-3 hours)
**Depends on:** P0.2.2 (Database for storing scoring rules/cache)
**Unblocks:** P0.5.2 (Fallback chain uses scoring to pick model order)
### Files to create
- `src/domains/router/scoring.ts` — Intent scoring algorithm
- `tests/domains/router/scoring.test.ts` — Deterministic scoring tests
### Acceptance criteria
- [ ] `scoreIntent(prompt, context)` → `{ scores: Record<ModelId, number>, winner: ModelId }`
- [ ] Scoring factors: prompt length, complexity keywords, context size, tool requirements
- [ ] All scores in range [0, 100] (integer)
- [ ] Deterministic: same input always returns same winner
- [ ] No external API calls in scoring (pure function)
- [ ] Test: 10 sample prompts with expected model winners
### Pre-flight reading
- `CLAUDE.md` — worktree rules
- `docs/guides/implementation/task-breakdown.md` §P0.5.1 — full spec
- `docs/reference/extractions/delta-model-router-extraction.md` (scoring section)
- `docs/reference/greek-vocabulary.md` — δ (delta) concept description
### Ready-to-paste agent prompt
```text
You are a Phase 0 builder agent for Colibri.
TASK: P0.5.1 — Intent Scoring Matrix
Implement deterministic intent scoring for model selection without API calls.
FILES TO READ FIRST:
1. CLAUDE.md (execution rules)
2. docs/guides/implementation/task-breakdown.md §P0.5.1
3. docs/reference/extractions/delta-model-router-extraction.md (scoring section)
4. src/config.ts (model list configuration)
WORKTREE SETUP:
git fetch origin
git worktree add .worktrees/claude/p0-5-1-scoring -b feature/p0-5-1-scoring origin/main
cd .worktrees/claude/p0-5-1-scoring
FILES TO CREATE:
- src/domains/router/scoring.ts
* scoreIntent(prompt: string, context: {toolCount?: number, complexity?: string}): {scores: Record<string, number>, winner: string}
* Scoring factors (all integers, range [0, 100]):
- Prompt length: short (0-100 chars) → small models get +20; long (>1000 chars) → large models get +30
- Complexity keywords: ["analyze", "reason", "plan", "synthesize"] → +25 for capable models
- Context size: large context (>5000 chars) → strong-context models +20
- Tool requirements: count of unique tool mentions → +10 per tool for multi-tool capable models
* Models (from config): claude-opus, claude-sonnet, claude-haiku, etc.
* Winner: model with highest score
* Pure function: no DB calls, no randomness, no API calls
* Tie-breaking: if tied, prefer model by alphabetical order (deterministic)
- tests/domains/router/scoring.test.ts
* Test 10 sample prompts with expected winners:
1. Short simple task → haiku (small, fast)
2. Long complex analysis → opus (capable)
3. Multi-tool prompt → sonnet (balanced)
4. Reasoning chain → opus
5. Code completion → sonnet
... (5 more representative prompts)
* Test determinism: same input twice → same winner
* Test score ranges [0, 100]
* Test tie-breaking (same score → alphabetical)
ACCEPTANCE CRITERIA (headline):
✓ scoreIntent returns {scores: Record<ModelId, number>, winner: ModelId}
✓ Factors: prompt length, complexity keywords, context size, tool count
✓ Scores [0, 100], deterministic, pure (no API calls)
✓ 10 sample prompts with verified winners
SUCCESS CHECK:
cd .worktrees/claude/p0-5-1-scoring && npm test && npm run lint
WRITEBACK (after success):
task_update(task_id="P0.5.1", status="done", progress=100)
thought_record(task_id="P0.5.1", branch="feature/p0-5-1-scoring",
commit_sha=, tests_run=["npm test","npm run lint"],
summary="Implemented deterministic intent scoring with factors: prompt length, complexity keywords, context size, tool requirements. 10 sample prompts verify model selection.")
FORBIDDENS:
✗ No external API calls (pure function only)
✗ No randomness (deterministic always)
✗ No hardcoding model lists (read from config)
✗ Do not edit main checkout
NEXT:
P0.5.2 — 8-Model Fallback Chain (uses scoring to order fallback attempts)
```
### Verification checklist (for reviewer agent)
- [ ] scoreIntent is pure (no API calls, no DB queries)
- [ ] All scores integers in [0, 100]
- [ ] Same input twice → same winner (deterministic)
- [ ] Tie-breaking is deterministic (alphabetical, not random)
- [ ] 10 sample prompts with verified expected winners
- [ ] Complexity keywords and tool counting working
- [ ] npm test and npm run lint pass
### Writeback template
```yaml
task_update:
task_id: P0.5.1
status: done
progress: 100
thought_record:
task_id: P0.5.1
branch: feature/p0-5-1-scoring
commit_sha:
tests_run: ["npm test", "npm run lint"]
summary: "Implemented deterministic intent scoring with four factors: prompt length (small prompts favor haiku/sonnet, long favor opus), complexity keywords (analyze/reason/plan favor capable models, +25 bonus), context size (large context favors strong-context models), tool requirements (+10 per unique tool for multi-tool capable models). Winner is highest score; ties broken alphabetically. Pure function with no API calls. 10 representative sample prompts verify correct model selection."
blockers: []
```
### Common gotchas
- **No API calls in scoring** — if you call Claude to score an intent, you've already committed to paying for a model call. The whole point of scoring is to decide which model to use BEFORE calling any API. Keep it lightweight.
- **Determinism is critical** — the router must be predictable. Same prompt always routes to same model. Use deterministic tie-breaking (alphabetical, not random).
- **Complexity keywords are case-insensitive** — "Analyze", "ANALYZE", "analyze" should all trigger the bonus. Use .toLowerCase() before matching.
- **Tool requirements count** — tools are marked in context.tools or extracted from prompt. Count unique tool names, not mentions. "call_tool x twice" still counts as 1 tool.
---
## P0.5.2 — 8-Model Fallback Chain
**Spec source:** [task-breakdown.md §P0.5.2](/AMS/guides/implementation/task-breakdown.html)
**Extraction reference:** `docs/reference/extractions/delta-model-router-extraction.md`
**Worktree:** `feature/p0-5-2-fallback`
**Branch command:** `git worktree add .worktrees/claude/p0-5-2-fallback -b feature/p0-5-2-fallback origin/main`
**Estimated effort:** M (Medium — 2-3 hours)
**Depends on:** P0.5.1 (Uses scoring to order chain)
**Unblocks:** Production routing (resilience to model outages)
### Files to create
- `src/domains/router/fallback.ts` — Fallback chain orchestration + circuit breaker
- `tests/domains/router/fallback.test.ts` — Fallback sequence + circuit breaker tests
### Acceptance criteria
- [ ] 8 model slots configured via env vars: `AMS_MODEL_1` through `AMS_MODEL_8`
- [ ] `routeRequest(prompt, context)` tries models in priority order
- [ ] On model error / timeout: tries next model in chain
- [ ] If all 8 fail: throws `AllModelsFailedError` with per-model error log
- [ ] Circuit breaker: model marked unavailable for 60s after 3 consecutive failures
- [ ] Test: mock models 1-7 failing → verify model 8 is used
### Pre-flight reading
- `CLAUDE.md` — execution rules
- `docs/guides/implementation/task-breakdown.md` §P0.5.2 — full spec
- `docs/reference/extractions/delta-model-router-extraction.md` (fallback section)
- `src/domains/router/scoring.ts` — scoring output for model ordering
### Ready-to-paste agent prompt
```text
You are a Phase 0 builder agent for Colibri.
TASK: P0.5.2 — 8-Model Fallback Chain
Implement resilient routing with fallback attempts and circuit breaker.
FILES TO READ FIRST:
1. CLAUDE.md (execution rules)
2. docs/guides/implementation/task-breakdown.md §P0.5.2
3. docs/reference/extractions/delta-model-router-extraction.md (fallback section)
4. src/domains/router/scoring.ts (model ordering)
5. src/config.ts (model configuration)
WORKTREE SETUP:
git fetch origin
git worktree add .worktrees/claude/p0-5-2-fallback -b feature/p0-5-2-fallback origin/main
cd .worktrees/claude/p0-5-2-fallback
FILES TO CREATE:
- src/domains/router/fallback.ts
* routeRequest(prompt: string, context: any): Promise<{model: string, result: any}>
* Priority order from scoring (or default AMS_MODEL_1..8 order)
* Try each model in sequence:
1. Check circuit breaker: if model unavailable for 60s, skip
2. Call model with timeout (30s default, or AMS_MODEL_TIMEOUT)
3. On success: return {model, result}
4. On error/timeout: log error, try next model
* Circuit breaker:
- Track failures per model (3 consecutive failures = trip)
- Unavailable for 60s; then reset counter
- Use in-memory state or DB (recommendation: start with in-memory)
* If all 8 models fail:
- Throw AllModelsFailedError with per-model error log
- Error object: {models: [{name, status, error, timeout?}]}
* Export functions:
- routeRequest(prompt, context)
- getCircuitBreakerState() → debug info
- resetCircuitBreaker(modelId) → manual reset
- tests/domains/router/fallback.test.ts
* Test happy path: first model succeeds
* Test fallback: model 1 fails, model 2 succeeds
* Test all fail: all 8 models fail → AllModelsFailedError thrown
* Test circuit breaker: 3 failures → model marked unavailable
* Test circuit breaker reset after 60s (use fake timers for speed)
* Test timeout: model 1 times out (>30s), model 2 used
* Test per-model error logging in AllModelsFailedError
ACCEPTANCE CRITERIA (headline):
✓ 8 model slots via AMS_MODEL_1..8 env vars
✓ routeRequest tries models in order, fallback on error
✓ Circuit breaker: 3 failures → 60s unavailable
✓ AllModelsFailedError with per-model log if all fail
✓ Test models 1-7 fail → model 8 used
SUCCESS CHECK:
cd .worktrees/claude/p0-5-2-fallback && npm test && npm run lint
WRITEBACK (after success):
task_update(task_id="P0.5.2", status="done", progress=100)
thought_record(task_id="P0.5.2", branch="feature/p0-5-2-fallback",
commit_sha=, tests_run=["npm test","npm run lint"],
summary="Implemented 8-model fallback chain with circuit breaker. routeRequest tries models in priority order with 30s timeout. Circuit breaker marks model unavailable after 3 consecutive failures for 60s. AllModelsFailedError thrown with per-model error log if all fail.")
FORBIDDENS:
✗ Do not have circuit breaker state survive process exit yet (in-memory ok for P0.5.2)
✗ Do not skip timeout (30s default) on model calls
✗ Do not throw immediately on first failure (always try all 8)
✗ Do not edit main checkout
NEXT:
P0.6.1 — Skill Schema (agents spawn with router routing their tasks)
Production routing complete after this task.
```
### Verification checklist (for reviewer agent)
- [ ] 8 model slots read from AMS_MODEL_1..8 env vars
- [ ] routeRequest calls scoring to determine priority order (or uses env order)
- [ ] Each model attempt has 30s timeout
- [ ] Circuit breaker tracks failures per model
- [ ] 3 consecutive failures → mark unavailable for 60s
- [ ] AllModelsFailedError includes per-model error details
- [ ] Test covers fallback scenario (models 1-7 fail, 8 succeeds)
- [ ] Test circuit breaker trip and reset
- [ ] npm test and npm run lint pass
### Writeback template
```yaml
task_update:
task_id: P0.5.2
status: done
progress: 100
thought_record:
task_id: P0.5.2
branch: feature/p0-5-2-fallback
commit_sha:
tests_run: ["npm test", "npm run lint"]
summary: "Implemented 8-model fallback chain with circuit breaker resilience. routeRequest(prompt, context) tries models in priority order (from AMS_MODEL_1..8 env vars), with 30s timeout per attempt. On error/timeout, tries next model. Circuit breaker marks model unavailable for 60s after 3 consecutive failures. If all 8 models fail, throws AllModelsFailedError with per-model error log including status, error message, and timeout info. Handles transient outages gracefully."
blockers: []
```
### Common gotchas
- **Circuit breaker state must persist in memory for now** — P0.5.2 can use in-memory state (Map of {modelId → {failureCount, unavailableUntil}}). Later (if needed) this could be moved to DB for persistence across restarts, but for P0.5.2 in-memory is fine. Note: restarts will reset circuit state.
- **Timeout is per-attempt, not total** — if model 1 takes 30s and times out, you still get another 30s for model 2. Total runtime could be 8*30s in the worst case (all timeouts). This is acceptable for a phase that's not yet optimizing latency.
- **Error logging must include per-model details** — when all 8 fail, the AllModelsFailedError must include {models: [{name, status, error, timeout?}]} so debugging tools can see which models were tried and why they failed.
- **Reset circuit breaker after 60s, not "on next success"** — if model 1 has 3 failures and is marked unavailable, it stays unavailable for 60s even if model 2 succeeds in the meantime. This prevents rapid re-attempts against a broken model.
---
## Next group
[p0.6-epsilon-skills.md](/AMS/guides/implementation/task-prompts/p0.6-epsilon-skills.html) — ε Skill Registry (3 tasks: Skill Schema, Skill CRUD+Discovery, Agent Spawning)
[Back to task-prompts index](/AMS/guides/implementation/task-prompts/)
</details>