R76.P2 Audit — δ Phase 1.5 Graduation Plan

Inventory of inputs for authoring the Phase 1.5 δ task-prompts file at docs/guides/implementation/task-prompts/p1.5-delta-router-graduation.md.

Scope: planning content only. No src/ changes. No interface changes. This round writes prompts for executors who will act at R91+ per docs/5-time/roadmap.md §Phase 1.5 (R91–R100).


1. Shipped Phase 0 δ stubs (the “do not break” baseline)

Wave I of R75 landed δ as library stubs per ADR-005 §Decision. Phase 1.5 replaces the body of these modules without changing their exported interfaces. Every signature below is frozen.

src/domains/router/scoring.ts (PR #149)

  • Exports: scoreIntent(prompt, context) → IntentScore, types ModelId, ScoreContext, IntentScore.
  • Phase 0 body: returns a frozen constant { scores: { claude: 1.0 }, winner: 'claude' } regardless of input.
  • Phase 0 invariants I1–I7 written into the file header:
    • I1–I3: winner is always 'claude'; no other score keys exist.
    • I4: pure function — no I/O, randomness, time-dependent state.
    • I5: deterministic.
    • I6: output is deeply frozen (both levels).
    • I7: interface is forward-compatible — ModelId widens additively, ScoreContext keeps its open index signature, IntentScore shape stays.
  • Phase 1.5 upgrade note in the file header: “Replace PHASE_0_CLAUDE_WINNER and the body of scoreIntent with the real scoring matrix. Widen ModelId. Callers continue to work unchanged.”

src/domains/router/fallback.ts (PR #150)

  • Exports: routeRequest(prompt, options) → Promise<RouteResult>, FallbackChainExhaustedError, ROUTER_PHASE_0_SHAPE, types RouteOptions, RouteResult, FallbackAttempt, CompletionFn, CompletionFnOptions, ScoringFn.
  • Phase 0 body: single upstream call; on failure throws FallbackChainExhaustedError with attempts.length === 1. No retry, no circuit breaker, no timer, no env var reads.
  • ROUTER_PHASE_0_SHAPE marker: literal-typed constant { members: 1, hasCircuitBreaker: false, modelsSupported: ['claude'] } whose value is asserted by tests. Phase 1.5 must flip the literals as part of the transition so the shape assertion fails loudly until the impl + marker both graduate.
  • Injection seams already present: completionFn, scoringFn, fetchFn, logger, delayFn are pluggable via RouteOptions — Phase 1.5 adapters hook in here.

src/domains/router/index.ts

  • Barrel that re-exports both modules. Phase 1.5 appends adapters/* exports.

Existing Phase 0 tests

  • src/__tests__/domains/router/scoring.test.ts — determinism + shape tests.
  • src/__tests__/domains/router/fallback.test.ts — single-member chain + ROUTER_PHASE_0_SHAPE assertion + error shape + injection-seam tests.
  • Total Phase 0 tests on δ: 75 (scoring + fallback combined — counted from passing suite at e455b00a).

Phase 1.5 constraint: every existing test must continue to pass, except (a) the ROUTER_PHASE_0_SHAPE literal assertion, which flips intentionally; (b) tests asserting winner === 'claude' for arbitrary inputs, which become property-based “winner ∈ enabled candidates” assertions.


2. ADR-005 §Implementation + §Phase 1.5 upgrade path

docs/architecture/decisions/ADR-005-multi-model-defer.md is the authority. Key obligations the Phase 1.5 plan must honour:

2.1 Trigger conditions (all four must hold before Phase 1.5 starts)

  1. Phase 1 κ Rule Engine is complete — scoring rules must be deterministic, and κ is where the per-dimension weights live (7 dimensions, weights stored as bps; see §3 below). Without κ, scoring policy is hardcoded, which violates “routing policy is governed, not coded” (δ concept doc §Target shape).
  2. ≥2 other model providers with stable tool-use APIs.
  3. ≥100 sustained calls/day to justify adapter work.
  4. Human (T0) authorization for the round.

2.2 Phase 1.5 upgrade path (from the ADR itself)

  1. Write the scoring function in src/domains/router/scoring.ts — replace the constant.
  2. Add adapter files: src/domains/router/adapters/kimi.ts, codex.ts, openai.ts.
  3. Wire fallback chain in src/domains/router/fallback.ts.
  4. Add cost accounting to router_stats.
  5. Add cross-model parity tests.

Phase 1.5 does not change the router tool interface. Existing callers continue to work unchanged.

2.3 Postscript obligations (Wave I section)

The ADR’s Wave I postscript lists what explicitly does NOT land in Phase 0 and therefore is Phase 1.5 scope:

  • Real scoring factors (prompt length, complexity keywords, context size, tool requirements).
  • Multi-member fallback chain.
  • Circuit breaker (3-failure trip, 60s unavailability window).
  • Adapter layer for non-Claude providers.
  • Any δ-facing MCP tool — router_* tools remain deferred.

3. 7-dimension scoring formula (δ concept doc)

From docs/3-world/social/llm.md §Target shape → scoring algorithm:

score(m, T) = Σ ( weight_i × normalized_input_i(m, T) )   for i ∈ 7 dimensions

Each normalized input is bounded [0, 1]; weights sum to 1.0 (stored as bps, 0–10000; accumulates as int64; final scaling is deterministic).

Dimension Formula Default weight
task_domain_match 1 if task.domain ∈ model.profile else 0 0.20
context_window_fit min(1, model.window / task.tokens) 0.15
cost_efficiency 1 − (model.cost_per_1k / max_cost) clamped [0,1] 0.15
latency_fit 1 − (model.p50_ms / task.deadline_ms) clamped 0.15
reliability success rate over last 100 invocations 0.15
skill_match overlap of model strengths with task skill requirement 0.15
operator_preference operator-supplied override weight 0.05

Tie-break order: (a) higher reliability; (b) lower cost; (c) alphabetical on model_id.

3.1 Worked routing example from the concept doc

Task: “Code review of 50KB pull request, response budget ≤ 5s.”

Model domain window cost latency reliability skill pref score
Claude Sonnet 1.0 1.00 0.55 0.80 0.96 1.0 0.5 0.87
GPT-4o 1.0 1.00 0.55 0.20 0.92 1.0 0.5 0.79
Claude Haiku 0.8 1.00 0.90 0.95 0.75 0.7 0.5 0.58

Sonnet wins. Haiku is fastest and cheapest, but its reliability drag loses it. This example becomes the golden-path test vector in P1.5.1’s acceptance.

3.2 Feedback loop (also spec-only for Phase 1.5)

new_weight = old_weight × (1 + 0.05 × feedback_delta)
new_weight = clamp(new_weight, 0.5, 1.5)

Feedback delta ∈ [−1, +1] from post-task outcome. Clamp prevents runaway. Feedback surface is λ reputation, not δ — δ reads, does not write, so Phase 1.5 ships the read path only.


4. Candidate cohort (δ concept doc §Phase 1.5 candidate cohort)

Eight initial candidates, covering four provider families plus a self-host and a Chinese/English-parity option:

Model Window Cost / 1k (bps) p50 Strengths
Claude 3.5 Sonnet 200k high balanced code review, structured output
Claude 3.5 Haiku 200k low fast classification, triage
GPT-4o 128k high balanced multimodal, general reasoning
GPT-4o mini 128k low fast cheap fallback
Gemini 1.5 Pro 1M medium slow huge-context synthesis
Llama 3.3 70B 128k low varies self-host, data sovereignty
Mixtral 8x22B 64k low fast permissive open-weight
Kimi K2 200k medium balanced Chinese/English parity

Costs and latencies are indicative; Phase 1.5 populates the live mcp_model_candidates table from fresh values updated via a post-task callback.

4.1 Candidate table schema (schema-ready in Phase 0, populated in Phase 1.5)

From the δ concept doc §Candidate table:

  • model_id (e.g. "claude-opus-4-6")
  • provider
  • context_window_tokens
  • latency_tier (fast | balanced | slow)
  • cost_bps_per_kilotoken
  • domain_fit_profile (bitmask over the 8 ξ domains)
  • enabled (boolean)

Phase 0 has a single row (Claude Sonnet). Phase 1.5 populates the other seven.


5. Tool surface Phase 1.5 adds (router_* family)

From ADR-004 R75 Wave H amendment + the δ concept doc §Phase 0 posture: the 14-tool Phase 0 surface has zero δ-facing tools. Phase 1.5 introduces the router_* family. The roadmap file (docs/5-time/roadmap.md §Phase 1.5) lists three target tools (router_score_task, router_call, router_feedback); the ADR-005 Decision listing includes a fourth (router_stats). This plan uses four Phase 1.5 tools (matching ADR-005), with router_score_task/router_score treated as the same function under either name:

Tool Purpose Input shape Output shape
router_score evaluate a prompt, return ranked models {prompt, context?} {scores: Record<ModelId, number>, winner, rule_version_hash}
router_call execute a prompt with the selected model (and cascade) {prompt, options?} {model, content, finishReason, tokens, latencyMs, costUsd, modelsAttempted}
router_fallback inspect / manually trigger a fallback cascade {model_id?, reset?} {circuitState: Record<ModelId, CircuitState>}
router_stats per-model call stats + cost aggregate {} {models: Record<ModelId, {calls_total, avg_cost_usd, p50_latency_ms, success_rate}>}

All four tools emit a ζ thought_record of thought_type = 'decision' bearing the routing-decision shape (see §6).


6. ζ decision-trail shape Phase 1.5 must emit

From the δ concept doc §Decision-trail recording:

{
  "type": "routing_decision",
  "routing_mode": "single" | "ensemble" | "pipeline" | "fail",
  "chosen_model_id": "claude-sonnet-3.5",
  "candidates_considered": ["claude-sonnet-3.5", "gpt-4o", "claude-haiku-3.5"],
  "scores": {"claude-sonnet-3.5": 0.87, "gpt-4o": 0.79, "claude-haiku-3.5": 0.58},
  "fallback_attempts": 0,
  "rule_version_hash": "rv:sha256:...",
  "decision_hash": "SHA-256(inputs || chosen)"
}

decision_hash is load-bearing for θ consensus (Phase 2+ work): arbiters include routing-decision hashes in the Merkle root, so two arbiters with the same κ rule version and the same task context must arrive at the same pick. δ is deterministic by construction.


7. Circuit-breaker semantics (from P0.5.2 donor notes, to honour in Phase 1.5)

From docs/guides/implementation/task-prompts/p0.5-delta-router.md (heritage body, P0.5.2 donor prompt):

  • Trip condition: 3 consecutive failures on the same model.
  • Unavailability window: 60s from the 3rd failure.
  • Reset rule: after 60s elapsed, counter resets to zero on first new call. (Not “on next success anywhere” — the reset is per-model + time-bound.)
  • State storage: in-memory for the initial Phase 1.5 ship; a later round may migrate to DB if durability across restart becomes a requirement.
  • Per-attempt timeout: 30s (configurable via COLIBRI_MODEL_TIMEOUT), not a total chain budget.

Namespace note: donor text uses AMS_MODEL_*. Phase 1.5 uses COLIBRI_MODEL_* per S17 §3 (env var namespace). Every reference in the new prompts file must be COLIBRI_*.


8. File inventory for the new prompts doc

8.1 Canonical template to mimic

docs/guides/implementation/task-prompts/p0.9-nu-integrations.md — most recently shaped spec-ready prompt file (R74.5 style, namespace-correct). Each sub-task block follows this layout:

  • Short header with spec source, extraction reference, worktree branch, branch command, estimated effort, depends-on, unblocks.
  • Files to create list.
  • Acceptance criteria checkbox list (6–10 items).
  • Pre-flight reading list.
  • Ready-to-paste agent prompt inside a text code fence.
  • Verification checklist (for reviewer agent) checkbox list.
  • Writeback template YAML block.
  • Common gotchas bullet list.

8.2 Files to touch in this round (R76.P2)

Path Action
docs/guides/implementation/task-prompts/p1.5-delta-router-graduation.md create — 10 P1.5.x sub-task prompts
docs/guides/implementation/task-prompts/index.md edit — append file entry under a new Phase 1.5 section heading
docs/architecture/decisions/ADR-005-multi-model-defer.md edit — one-line postscript pointer at bottom of R75 Wave I section
docs/audits/r76-p2-delta-15-planning-audit.md create — this file
docs/contracts/r76-p2-delta-15-planning-contract.md create
docs/packets/r76-p2-delta-15-planning-packet.md create
docs/verification/r76-p2-delta-15-planning-verification.md create

No src/ files. No test files. No concept docs. No roadmap edits (P3’s job).

8.3 Files explicitly NOT in scope

  • src/domains/router/* — frozen until R91+.
  • src/__tests__/domains/router/* — frozen until R91+.
  • docs/3-world/social/llm.md — concept doc already lists Phase 1.5 shape; no change.
  • docs/5-time/roadmap.md — roadmap already references R91–R100 Phase 1.5; P3 owns any edit.
  • docs/guides/implementation/task-breakdown.md — Phase 0 task list, not Phase 1.5.

9. Risks + mitigations

Risk Mitigation
Plan drifts from ADR-005 upgrade path Every sub-task cites ADR-005 §Implementation or §Phase 1.5 upgrade path line.
Plan drifts from 7-dimension formula P1.5.1 sub-task embeds the exact table from the concept doc.
Candidate cohort drifts from the 8-model table P1.5.9 sub-task embeds the exact table from the concept doc.
router_* tool surface drifts from ADR-005 P1.5.7 sub-task lists exactly 4 tools with the ADR-005 shape.
Phase 0 interface broken by Phase 1.5 work Every sub-task “Forbiddens” block cites the frozen signature.
ζ decision-trail shape drifts P1.5.10 embeds the exact JSON shape from the concept doc.
COLIBRI_* vs AMS_* namespace regression Every env var reference in new prompts uses COLIBRI_MODEL_*.
Circuit-breaker semantics drift P1.5.5 embeds the exact “3 failures / 60s / 30s timeout / per-model / time-bound reset” rule set.

10. Definition of done for R76.P2

  • docs/guides/implementation/task-prompts/p1.5-delta-router-graduation.md exists with 10 sub-task entries (P1.5.1–P1.5.10).
  • Every entry has: depends-on + input + output + acceptance criteria + effort + agent prompt.
  • docs/guides/implementation/task-prompts/index.md lists the new file under a Phase 1.5 section.
  • ADR-005 R75 Wave I postscript ends with a one-line pointer to the new file.
  • 4 chain docs (audit + contract + packet + verification) land under docs/{audits,contracts,packets,verification}/r76-p2-delta-15-planning-*.md.
  • npm run build && npm run lint && npm test green (docs-only round — no-op against the shipped test count).

Written 2026-04-18 as step 1 of the R76.P2 5-step chain.


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.