δ Model Router — Algorithm Extraction
Algorithmic content extracted from AMS
src/claude/router/smart.jsfor Colibri implementation reference.
8-Model Matrix
Models registered in src/config/claude.js (as of R45):
| Model ID | Generation | Class | Alias |
|---|---|---|---|
claude-opus-4-6 |
Claude 4 | Most capable, complex reasoning | opus |
claude-sonnet-4-6 |
Claude 4 | Balanced speed/capability | sonnet, default |
claude-haiku-4-5-20251001 |
Claude 4 | Fastest | haiku |
claude-3-5-sonnet-20241022 |
Claude 3.5 | Previous-gen balanced | — |
claude-3-5-haiku-20241022 |
Claude 3.5 | Previous-gen fast | — |
claude-3-opus-20240229 |
Claude 3 | Previous-gen capable | — |
claude-3-sonnet-20240229 |
Claude 3 | Previous-gen balanced | — |
claude-3-haiku-20240307 |
Claude 3 | Previous-gen compact | — |
The costPer1kTokens field is a routing weight (lower = preferred for cost-sensitive routing), not a live billing rate.
Scoring Heuristic (calculateModelScore)
The router uses a heuristic accumulator in src/claude/router/smart.js. Each candidate model is scored and the highest wins.
Important: The simplified quality*0.40 + cost*0.30 + latency*0.20 + load*0.10 formula appears in design docs but is NOT the implementation. The actual algorithm:
score = 0.5 (base score for all candidates)
+ 0.30 if model is the PRIMARY pick for the detected intent
+ 0.15 if model is in the FALLBACK list for the detected intent
+ 0.10 per matched model strength (coding, speed, reasoning)
e.g., task has code → model has "coding" strength → +0.10
+ 0.20 if task complexity == "very_high" AND model supports extended thinking
+ 0.15 if request contains code AND model excels at coding
+ cost_rank_bonus if priority == "cost"
additive bonus based on cost weight rank (lower cost = higher bonus)
+ 0.20 if priority == "speed" AND model has "speed" strength
- 0.50 if estimated_token_count > 0.90 * model.contextWindow
(hard penalty for near-overflow)
score = clamp(score, 0.0, 1.0)
Default fallback: If no model scores above 0.0, select claude-3-5-sonnet-20241022.
Weighted Scoring Summary Table
| Factor | Addend | Condition |
|---|---|---|
| Base | +0.50 | Always |
| Intent: primary | +0.30 | Model is top pick for detected intent |
| Intent: fallback | +0.15 | Model is secondary pick for intent |
| Strength match | +0.10 per match | Model strength matches task signal |
| Complexity bonus | +0.20 | Complexity = very_high AND extended thinking supported |
| Code bonus | +0.15 | Request has code AND model excels at coding |
| Cost preference | +variable | priority = “cost”; rank-based additive |
| Speed preference | +0.20 | priority = “speed” AND model has speed strength |
| Context overflow | −0.50 | Estimated tokens > 90% of model context window |
Fallback Chain
When the selected (highest-scoring) model fails:
Step 1: RETRY
Same model, exponential backoff
Max 2 retry attempts
Backoff: 100ms, 200ms
Step 2: ESCALATE (intra-provider)
Next-highest-scoring model from the SAME provider (Anthropic)
Re-run scoring, exclude failed model, pick top result
Step 3: CROSS-PROVIDER
Highest-scoring model from a DIFFERENT provider
(Currently Anthropic-only; cross-provider is designed for future providers)
Step 4: MANUAL RETRY
If all candidates fail:
Queue request for manual retry
Log failure with model list and error reasons
Return error to caller with retry_after hint
Intent Mapping
The router maintains an intent-to-model preference table. Detected intent determines which models receive the +0.30 primary bonus:
| Task Type / Intent | Preferred Model | Rationale |
|---|---|---|
| Complex reasoning, long-form analysis | Opus class | Extended thinking support |
| Balanced task, code writing | Sonnet class | Speed/capability tradeoff |
| Simple retrieval, fast response | Haiku class | Lowest latency |
| Previous-gen compatibility | 3.5 Sonnet/Haiku | Stable outputs |
| Cost-sensitive batch work | Haiku class | Lowest cost weight |
Intent is detected from request context: task type field, presence of code blocks, token count estimate, complexity flag.
Context Window Check
Hard penalty applied before final selection:
estimated_tokens = len(prompt) / 4 (heuristic: ~4 chars per token)
+ len(context_snapshot) / 4
+ expected_output_tokens (default: 4096)
if estimated_tokens > 0.90 * model.contextWindow:
score -= 0.50
# This effectively eliminates the model from selection
# unless ALL candidates would exceed their windows
Context window limits by model class:
- Claude 4 Opus/Sonnet: 200K tokens
- Claude 4 Haiku: 200K tokens
- Claude 3.5 Sonnet/Haiku: 200K tokens
- Claude 3 Opus: 200K tokens
- Claude 3 Sonnet/Haiku: 200K tokens
6 Execution Mode Vocabulary
Design vocabulary only — NOT implemented as runtime constants.
The tool-chain engine (src/claude/tool-chain/engine.js) uses its own separate ChainMode enum (SEQUENTIAL | PARALLEL | CONDITIONAL | LOOP | DAG) for internal pipeline composition. The 6 modes below are design-level dispatch concepts, not runtime code:
| Mode | Planned behavior |
|---|---|
SINGLE |
Route to one model, wait for complete response |
PARALLEL |
Send to N models simultaneously, return first or best response |
CHAINED |
Model A output feeds into Model B (e.g., draft → review) |
SWARM |
Multiple agents work on decomposed sub-tasks |
PLAN |
Model generates plan, human/PM approves, then execute |
COWORK |
Two models collaborate on shared context |
Context Sharing (Multi-Model Routes)
When multiple models participate (CHAINED, PARALLEL, COWORK):
shared_context = {
conversation_history: [...serialized messages],
task_metadata: { id, phase, priority, type },
memory_snapshots: [...relevant memory_short_term entries],
model_formatting: { per-model instruction prefixes }
}
Before dispatch to each model:
trim shared_context to fit model.contextWindow
apply model-specific formatting from surface adapters in src/claude/
See Also
-
[[concepts/δ-model-router δ Model Router]] — concept overview -
[[architecture/router Router Architecture]] — implementation details -
[[extractions/nu-integrations-extraction ν Integrations Extraction]] — Claude API sub-modules -
[[reference/intelligence-layer Intelligence Layer]] — full intelligence component map -
[[reference/claude-core Claude Core]] — core sub-module architecture