ADR-005: Multi-Model Routing Deferred to Phase 1.5

Status: Accepted Date: 2026-04-09 Round: R73 Supersedes: None Superseded by: None


Context

The human asked at R73 kickoff whether Colibri should route tasks across multiple models (Claude, Kimi, Codex, GPT-class) from the start, or begin single-model. The δ Model Router concept already has a full spec: intent scoring across 8 model candidates, fallback chain, retry policy, cost/latency balancing.

However:

  • Phase 0 is not started. No TypeScript exists. Even a single-model integration is new work.
  • The M2 laptop dev environment is primarily configured for Claude Code and the Anthropic API.
  • Multi-model routing requires token economics, quality calibration, and fallback testing — none of which have data yet.
  • The other models (Kimi K2, Codex, GPT-4/5 class) each have their own tool-use quirks that require adapter layers.

The question is not “can Colibri route?” — the spec is clear. The question is “when?”


Decision

Phase 0 ships Claude-only.

  • Router interface is present. δ tools (router_score, router_select, router_call, router_fallback, router_stats) exist in the Phase 0 tool surface (see ADR-004) as thin stubs that always resolve to Claude.
  • Scoring function returns a constant. router_score returns {claude: 1.0, all_others: 0.0} for every intent.
  • Fallback chain has one member. If Claude fails, the call fails — no cascade.
  • Adapter layer is single-target. router_call wraps the Anthropic SDK only.

Multi-model routing lands in Phase 1.5 (round R85 or so, after Phase 1 κ Rule Engine is complete but before Phase 2 λ Reputation).

What Phase 1.5 adds

Item Phase 0 Phase 1.5
Scoring function Constant → Claude Intent-driven scoring across N models
Adapter layer Anthropic SDK only Anthropic + Kimi + Codex + OpenAI (at minimum)
Fallback chain 1 member N members with cost/latency ranking
Quality calibration None Per-intent model accuracy tracking
Cost accounting None Per-call token count + USD estimate
Router tests Single target Full cross-model parity suite

Consequences

Positive

  • Phase 0 ships sooner. Not being gated on multi-model means the runtime can actually run faster.
  • Scoring function is testable in isolation. When Phase 1.5 lands, the scoring logic slots into an already-present router interface.
  • No breaking change at Phase 1.5. The interface is stable from Phase 0; Phase 1.5 only changes the scorer and adapter.
  • Dev environment stays simple. Only Anthropic API keys needed during Phase 0.

Negative

  • Vendor lock. Phase 0 depends on Claude being available. Mitigation: the adapter layer is isolated; swapping Claude for any other single model is a one-file change.
  • Cost exposure. Every call goes to Claude; no cheaper-model offloading for low-stakes calls. Mitigation: Phase 0 call volume is small; cost is bounded.
  • Quality drift if a single model regresses. No N-model voting. Mitigation: the thought chain logs every call, so regressions are diagnosable.

Neutral

  • The Phase 1.5 milestone is now a hard dependency of Phase 2 λ Reputation (which needs per-model quality data).

Implementation

Phase 0 router stub (P0.5)

The P0.5 task group implements:

  • P0.5.1router_score stub that returns constant vector
  • P0.5.2router_call wrapping Anthropic SDK

That’s the entire router in Phase 0. router_select, router_fallback, router_stats are present as no-op tools that log their calls for future use.

Phase 1.5 upgrade path

When Phase 1.5 begins:

  1. Write the scoring function in src/domains/router/scoring.ts — replace the constant
  2. Add adapter files: src/domains/router/adapters/kimi.ts, codex.ts, openai.ts
  3. Wire fallback chain in src/domains/router/fallback.ts
  4. Add cost accounting to router_stats
  5. Add cross-model parity tests

Phase 1.5 does not change the router tool interface. Existing callers continue to work unchanged.

How do we know when to trigger Phase 1.5?

Phase 1.5 begins when:

  • Phase 1 κ Rule Engine is complete (scoring rules must be deterministic)
  • At least two other model providers have stable APIs with tool-use support
  • There is a concrete call volume (≥100 calls/day sustained) to justify the adapter work
  • The human authorizes the round

Without all four, Phase 1.5 stays deferred.


Alternatives Considered

Alternative A — Multi-model from day one

Rejected. Phase 0 has zero code. Adding multi-model adapters before the runtime runs at all is feature creep. The risk of never shipping is higher than the cost of single-model Phase 0.

Alternative B — Single-model but “pluggable” from day one

Considered and partially adopted. The router interface is pluggable from day one — the scoring function is a stub, not an absence. An executor in Phase 1.5 replaces the stub without touching the interface.

Alternative C — Start with Claude and Kimi together, defer others

Rejected. Two is as much adapter work as four from a testing perspective, and Kimi’s tool-use behavior differs enough from Claude’s that cross-model parity tests would need to be non-trivial in Phase 0. Pushing to Phase 1.5 and adding several models at once amortizes the testing cost.

Alternative D — Use an existing abstraction library (LangChain, LlamaIndex, etc.)

Rejected. Those libraries don’t expose the per-call audit and thought-chain integration that Colibri needs. Writing the adapter in-house is fewer lines and fewer abstractions than adopting and then working around a third-party library.


Verification

This decision is verified if and only if:

  • Phase 0 exit criteria do not include multi-model routing
  • The P0.5 task group delivers exactly the Claude-only stub
  • docs/reference/mcp-tools-phase-0.md documents the δ router tools as “Claude-only stubs, multi-model deferred to Phase 1.5”
  • docs/roadmap/ has a Phase 1.5 milestone with the upgrade-path content from this ADR
  • At round R80 (Phase 0 completion), Sigma verifies that no multi-model code exists in src/domains/router/

References

  • docs/concepts/δ-model-router.md — Full model router spec
  • docs/reference/extractions/delta-model-router-extraction.md — Router algorithm extraction
  • docs/guides/implementation/task-prompts/p0.5-delta-router.md — P0.5 per-task agent prompt
  • ADR-004 — Tool surface boundaries (the router is counted)
  • docs/roadmap/R73-R80-plan.md — Where Phase 1.5 sits in the overall plan

R75 Wave I — Phase 0 stubs shipped (2026-04-18)

Status of this ADR: Accepted (unchanged). Wave I implemented exactly what the §Decision section required — Phase 0 ships δ as library stubs, not as absence.

Landed artefacts:

File PR Contents
src/domains/router/scoring.ts #149 Constant-returns-claude scoring ({ claude: 1.0 } for every input). Signature matches the Phase 1.5 target so the activation is a formula replacement, not an interface rewrite.
src/domains/router/fallback.ts #150 Single-member fallback chain (Claude). On Claude failure the call fails — no cascade, no circuit breaker in Phase 0.
src/domains/router/index.ts #149/#150 barrel Exports both modules.
src/__tests__/domains/router/scoring.test.ts #149 Determinism + shape tests against the constant scorer.
src/__tests__/domains/router/fallback.test.ts #150 Single-member chain behaviour tests.

What did NOT land (still Phase 1.5):

  • Real scoring factors (prompt length, complexity keywords, context size, tool requirements)
  • Multi-member fallback chain
  • Circuit breaker (3-failure trip, 60s unavailability window)
  • Adapter layer for non-Claude providers
  • Any δ-facing MCP tool — router_* tools remain deferred to Phase 1.5

Graduation: docs/3-world/social/llm.md moved colibri_code: none → colibri_code: partial at Wave I close. The 8/15 concept graduation state after Wave I: α β γ δ ε ζ η ν shipped at partial; θ ι κ λ μ ξ π remain colibri_code: none.

This ADR remains the authoritative Phase 0 / Phase 1.5 boundary for δ. Any future addition to src/domains/router/ that goes beyond constant scoring + single-member fallback must either (a) be labelled Phase 1.5 work and open a new round, or (b) propose an amendment to this ADR.

Phase 1.5 graduation prompts: see docs/guides/implementation/task-prompts/p1.5-delta-router-graduation.md for the 10-sub-task plan (P1.5.1–P1.5.10) activating at R91+.


R73 unification corpus. Answers “do we need multi-model routing now?” — not yet; plan it, don’t ship it. Wave I postscript: stubs shipped per §Decision; Phase 0 closed at 28/28.


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.