δ Model Router — Algorithm Extraction

Algorithmic content extracted from AMS src/claude/router/smart.js for Colibri implementation reference.

8-Model Matrix

Models registered in src/config/claude.js (as of R45):

Model ID Generation Class Alias
claude-opus-4-6 Claude 4 Most capable, complex reasoning opus
claude-sonnet-4-6 Claude 4 Balanced speed/capability sonnet, default
claude-haiku-4-5-20251001 Claude 4 Fastest haiku
claude-3-5-sonnet-20241022 Claude 3.5 Previous-gen balanced
claude-3-5-haiku-20241022 Claude 3.5 Previous-gen fast
claude-3-opus-20240229 Claude 3 Previous-gen capable
claude-3-sonnet-20240229 Claude 3 Previous-gen balanced
claude-3-haiku-20240307 Claude 3 Previous-gen compact

The costPer1kTokens field is a routing weight (lower = preferred for cost-sensitive routing), not a live billing rate.

Scoring Heuristic (calculateModelScore)

The router uses a heuristic accumulator in src/claude/router/smart.js. Each candidate model is scored and the highest wins.

Important: The simplified quality*0.40 + cost*0.30 + latency*0.20 + load*0.10 formula appears in design docs but is NOT the implementation. The actual algorithm:

score = 0.5   (base score for all candidates)

+ 0.30   if model is the PRIMARY pick for the detected intent
+ 0.15   if model is in the FALLBACK list for the detected intent

+ 0.10   per matched model strength (coding, speed, reasoning)
         e.g., task has code → model has "coding" strength → +0.10

+ 0.20   if task complexity == "very_high" AND model supports extended thinking

+ 0.15   if request contains code AND model excels at coding

+ cost_rank_bonus   if priority == "cost"
                    additive bonus based on cost weight rank (lower cost = higher bonus)

+ 0.20   if priority == "speed" AND model has "speed" strength

- 0.50   if estimated_token_count > 0.90 * model.contextWindow
          (hard penalty for near-overflow)

score = clamp(score, 0.0, 1.0)

Default fallback: If no model scores above 0.0, select claude-3-5-sonnet-20241022.

Weighted Scoring Summary Table

Factor Addend Condition
Base +0.50 Always
Intent: primary +0.30 Model is top pick for detected intent
Intent: fallback +0.15 Model is secondary pick for intent
Strength match +0.10 per match Model strength matches task signal
Complexity bonus +0.20 Complexity = very_high AND extended thinking supported
Code bonus +0.15 Request has code AND model excels at coding
Cost preference +variable priority = “cost”; rank-based additive
Speed preference +0.20 priority = “speed” AND model has speed strength
Context overflow −0.50 Estimated tokens > 90% of model context window

Fallback Chain

When the selected (highest-scoring) model fails:

Step 1: RETRY
  Same model, exponential backoff
  Max 2 retry attempts
  Backoff: 100ms, 200ms

Step 2: ESCALATE (intra-provider)
  Next-highest-scoring model from the SAME provider (Anthropic)
  Re-run scoring, exclude failed model, pick top result

Step 3: CROSS-PROVIDER
  Highest-scoring model from a DIFFERENT provider
  (Currently Anthropic-only; cross-provider is designed for future providers)

Step 4: MANUAL RETRY
  If all candidates fail:
  Queue request for manual retry
  Log failure with model list and error reasons
  Return error to caller with retry_after hint

Intent Mapping

The router maintains an intent-to-model preference table. Detected intent determines which models receive the +0.30 primary bonus:

Task Type / Intent Preferred Model Rationale
Complex reasoning, long-form analysis Opus class Extended thinking support
Balanced task, code writing Sonnet class Speed/capability tradeoff
Simple retrieval, fast response Haiku class Lowest latency
Previous-gen compatibility 3.5 Sonnet/Haiku Stable outputs
Cost-sensitive batch work Haiku class Lowest cost weight

Intent is detected from request context: task type field, presence of code blocks, token count estimate, complexity flag.

Context Window Check

Hard penalty applied before final selection:

estimated_tokens = len(prompt) / 4   (heuristic: ~4 chars per token)
                 + len(context_snapshot) / 4
                 + expected_output_tokens (default: 4096)

if estimated_tokens > 0.90 * model.contextWindow:
    score -= 0.50
    # This effectively eliminates the model from selection
    # unless ALL candidates would exceed their windows

Context window limits by model class:

  • Claude 4 Opus/Sonnet: 200K tokens
  • Claude 4 Haiku: 200K tokens
  • Claude 3.5 Sonnet/Haiku: 200K tokens
  • Claude 3 Opus: 200K tokens
  • Claude 3 Sonnet/Haiku: 200K tokens

6 Execution Mode Vocabulary

Design vocabulary only — NOT implemented as runtime constants.

The tool-chain engine (src/claude/tool-chain/engine.js) uses its own separate ChainMode enum (SEQUENTIAL | PARALLEL | CONDITIONAL | LOOP | DAG) for internal pipeline composition. The 6 modes below are design-level dispatch concepts, not runtime code:

Mode Planned behavior
SINGLE Route to one model, wait for complete response
PARALLEL Send to N models simultaneously, return first or best response
CHAINED Model A output feeds into Model B (e.g., draft → review)
SWARM Multiple agents work on decomposed sub-tasks
PLAN Model generates plan, human/PM approves, then execute
COWORK Two models collaborate on shared context

Context Sharing (Multi-Model Routes)

When multiple models participate (CHAINED, PARALLEL, COWORK):

shared_context = {
  conversation_history: [...serialized messages],
  task_metadata: { id, phase, priority, type },
  memory_snapshots: [...relevant memory_short_term entries],
  model_formatting: { per-model instruction prefixes }
}

Before dispatch to each model:
  trim shared_context to fit model.contextWindow
  apply model-specific formatting from surface adapters in src/claude/

See Also

  • [[concepts/δ-model-router δ Model Router]] — concept overview
  • [[architecture/router Router Architecture]] — implementation details
  • [[extractions/nu-integrations-extraction ν Integrations Extraction]] — Claude API sub-modules
  • [[reference/intelligence-layer Intelligence Layer]] — full intelligence component map
  • [[reference/claude-core Claude Core]] — core sub-module architecture

Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.