R76.P2 Audit — δ Phase 1.5 Graduation Plan
Inventory of inputs for authoring the Phase 1.5 δ task-prompts file at
docs/guides/implementation/task-prompts/p1.5-delta-router-graduation.md.
Scope: planning content only. No src/ changes. No interface changes. This
round writes prompts for executors who will act at R91+ per
docs/5-time/roadmap.md §Phase 1.5 (R91–R100).
1. Shipped Phase 0 δ stubs (the “do not break” baseline)
Wave I of R75 landed δ as library stubs per ADR-005 §Decision. Phase 1.5 replaces the body of these modules without changing their exported interfaces. Every signature below is frozen.
src/domains/router/scoring.ts (PR #149)
- Exports:
scoreIntent(prompt, context) → IntentScore, typesModelId,ScoreContext,IntentScore. - Phase 0 body: returns a frozen constant
{ scores: { claude: 1.0 }, winner: 'claude' }regardless of input. - Phase 0 invariants I1–I7 written into the file header:
- I1–I3: winner is always
'claude'; no other score keys exist. - I4: pure function — no I/O, randomness, time-dependent state.
- I5: deterministic.
- I6: output is deeply frozen (both levels).
- I7: interface is forward-compatible —
ModelIdwidens additively,ScoreContextkeeps its open index signature,IntentScoreshape stays.
- I1–I3: winner is always
- Phase 1.5 upgrade note in the file header: “Replace
PHASE_0_CLAUDE_WINNERand the body ofscoreIntentwith the real scoring matrix. WidenModelId. Callers continue to work unchanged.”
src/domains/router/fallback.ts (PR #150)
- Exports:
routeRequest(prompt, options) → Promise<RouteResult>,FallbackChainExhaustedError,ROUTER_PHASE_0_SHAPE, typesRouteOptions,RouteResult,FallbackAttempt,CompletionFn,CompletionFnOptions,ScoringFn. - Phase 0 body: single upstream call; on failure throws
FallbackChainExhaustedErrorwithattempts.length === 1. No retry, no circuit breaker, no timer, no env var reads. ROUTER_PHASE_0_SHAPEmarker: literal-typed constant{ members: 1, hasCircuitBreaker: false, modelsSupported: ['claude'] }whose value is asserted by tests. Phase 1.5 must flip the literals as part of the transition so the shape assertion fails loudly until the impl + marker both graduate.- Injection seams already present:
completionFn,scoringFn,fetchFn,logger,delayFnare pluggable viaRouteOptions— Phase 1.5 adapters hook in here.
src/domains/router/index.ts
- Barrel that re-exports both modules. Phase 1.5 appends
adapters/*exports.
Existing Phase 0 tests
src/__tests__/domains/router/scoring.test.ts— determinism + shape tests.src/__tests__/domains/router/fallback.test.ts— single-member chain +ROUTER_PHASE_0_SHAPEassertion + error shape + injection-seam tests.- Total Phase 0 tests on δ: 75 (scoring + fallback combined — counted from
passing suite at
e455b00a).
Phase 1.5 constraint: every existing test must continue to pass, except
(a) the ROUTER_PHASE_0_SHAPE literal assertion, which flips intentionally;
(b) tests asserting winner === 'claude' for arbitrary inputs, which become
property-based “winner ∈ enabled candidates” assertions.
2. ADR-005 §Implementation + §Phase 1.5 upgrade path
docs/architecture/decisions/ADR-005-multi-model-defer.md is the authority.
Key obligations the Phase 1.5 plan must honour:
2.1 Trigger conditions (all four must hold before Phase 1.5 starts)
- Phase 1 κ Rule Engine is complete — scoring rules must be deterministic,
and κ is where the per-dimension weights live (7 dimensions, weights stored
as
bps; see §3 below). Without κ, scoring policy is hardcoded, which violates “routing policy is governed, not coded” (δ concept doc §Target shape). - ≥2 other model providers with stable tool-use APIs.
- ≥100 sustained calls/day to justify adapter work.
- Human (T0) authorization for the round.
2.2 Phase 1.5 upgrade path (from the ADR itself)
- Write the scoring function in
src/domains/router/scoring.ts— replace the constant. - Add adapter files:
src/domains/router/adapters/kimi.ts,codex.ts,openai.ts. - Wire fallback chain in
src/domains/router/fallback.ts. - Add cost accounting to
router_stats. - Add cross-model parity tests.
Phase 1.5 does not change the router tool interface. Existing callers continue to work unchanged.
2.3 Postscript obligations (Wave I section)
The ADR’s Wave I postscript lists what explicitly does NOT land in Phase 0 and therefore is Phase 1.5 scope:
- Real scoring factors (prompt length, complexity keywords, context size, tool requirements).
- Multi-member fallback chain.
- Circuit breaker (3-failure trip, 60s unavailability window).
- Adapter layer for non-Claude providers.
- Any δ-facing MCP tool —
router_*tools remain deferred.
3. 7-dimension scoring formula (δ concept doc)
From docs/3-world/social/llm.md §Target shape → scoring algorithm:
score(m, T) = Σ ( weight_i × normalized_input_i(m, T) ) for i ∈ 7 dimensions
Each normalized input is bounded [0, 1]; weights sum to 1.0 (stored as
bps, 0–10000; accumulates as int64; final scaling is deterministic).
| Dimension | Formula | Default weight |
|---|---|---|
task_domain_match |
1 if task.domain ∈ model.profile else 0 |
0.20 |
context_window_fit |
min(1, model.window / task.tokens) |
0.15 |
cost_efficiency |
1 − (model.cost_per_1k / max_cost) clamped [0,1] |
0.15 |
latency_fit |
1 − (model.p50_ms / task.deadline_ms) clamped |
0.15 |
reliability |
success rate over last 100 invocations | 0.15 |
skill_match |
overlap of model strengths with task skill requirement | 0.15 |
operator_preference |
operator-supplied override weight | 0.05 |
Tie-break order: (a) higher reliability; (b) lower cost; (c) alphabetical on
model_id.
3.1 Worked routing example from the concept doc
Task: “Code review of 50KB pull request, response budget ≤ 5s.”
| Model | domain | window | cost | latency | reliability | skill | pref | score |
|---|---|---|---|---|---|---|---|---|
| Claude Sonnet | 1.0 | 1.00 | 0.55 | 0.80 | 0.96 | 1.0 | 0.5 | 0.87 |
| GPT-4o | 1.0 | 1.00 | 0.55 | 0.20 | 0.92 | 1.0 | 0.5 | 0.79 |
| Claude Haiku | 0.8 | 1.00 | 0.90 | 0.95 | 0.75 | 0.7 | 0.5 | 0.58 |
Sonnet wins. Haiku is fastest and cheapest, but its reliability drag loses it. This example becomes the golden-path test vector in P1.5.1’s acceptance.
3.2 Feedback loop (also spec-only for Phase 1.5)
new_weight = old_weight × (1 + 0.05 × feedback_delta)
new_weight = clamp(new_weight, 0.5, 1.5)
Feedback delta ∈ [−1, +1] from post-task outcome. Clamp prevents runaway.
Feedback surface is λ reputation, not δ — δ reads, does not write, so Phase 1.5
ships the read path only.
4. Candidate cohort (δ concept doc §Phase 1.5 candidate cohort)
Eight initial candidates, covering four provider families plus a self-host and a Chinese/English-parity option:
| Model | Window | Cost / 1k (bps) | p50 | Strengths |
|---|---|---|---|---|
| Claude 3.5 Sonnet | 200k | high | balanced | code review, structured output |
| Claude 3.5 Haiku | 200k | low | fast | classification, triage |
| GPT-4o | 128k | high | balanced | multimodal, general reasoning |
| GPT-4o mini | 128k | low | fast | cheap fallback |
| Gemini 1.5 Pro | 1M | medium | slow | huge-context synthesis |
| Llama 3.3 70B | 128k | low | varies | self-host, data sovereignty |
| Mixtral 8x22B | 64k | low | fast | permissive open-weight |
| Kimi K2 | 200k | medium | balanced | Chinese/English parity |
Costs and latencies are indicative; Phase 1.5 populates the live
mcp_model_candidates table from fresh values updated via a post-task
callback.
4.1 Candidate table schema (schema-ready in Phase 0, populated in Phase 1.5)
From the δ concept doc §Candidate table:
model_id(e.g."claude-opus-4-6")providercontext_window_tokenslatency_tier(fast | balanced | slow)cost_bps_per_kilotokendomain_fit_profile(bitmask over the 8 ξ domains)enabled(boolean)
Phase 0 has a single row (Claude Sonnet). Phase 1.5 populates the other seven.
5. Tool surface Phase 1.5 adds (router_* family)
From ADR-004 R75 Wave H amendment + the δ concept doc §Phase 0 posture: the
14-tool Phase 0 surface has zero δ-facing tools. Phase 1.5 introduces the
router_* family. The roadmap file
(docs/5-time/roadmap.md §Phase 1.5) lists three target tools
(router_score_task, router_call, router_feedback); the ADR-005 Decision
listing includes a fourth (router_stats). This plan uses four Phase 1.5 tools
(matching ADR-005), with router_score_task/router_score treated as the same
function under either name:
| Tool | Purpose | Input shape | Output shape |
|---|---|---|---|
router_score |
evaluate a prompt, return ranked models | {prompt, context?} |
{scores: Record<ModelId, number>, winner, rule_version_hash} |
router_call |
execute a prompt with the selected model (and cascade) | {prompt, options?} |
{model, content, finishReason, tokens, latencyMs, costUsd, modelsAttempted} |
router_fallback |
inspect / manually trigger a fallback cascade | {model_id?, reset?} |
{circuitState: Record<ModelId, CircuitState>} |
router_stats |
per-model call stats + cost aggregate | {} |
{models: Record<ModelId, {calls_total, avg_cost_usd, p50_latency_ms, success_rate}>} |
All four tools emit a ζ thought_record of thought_type = 'decision'
bearing the routing-decision shape (see §6).
6. ζ decision-trail shape Phase 1.5 must emit
From the δ concept doc §Decision-trail recording:
{
"type": "routing_decision",
"routing_mode": "single" | "ensemble" | "pipeline" | "fail",
"chosen_model_id": "claude-sonnet-3.5",
"candidates_considered": ["claude-sonnet-3.5", "gpt-4o", "claude-haiku-3.5"],
"scores": {"claude-sonnet-3.5": 0.87, "gpt-4o": 0.79, "claude-haiku-3.5": 0.58},
"fallback_attempts": 0,
"rule_version_hash": "rv:sha256:...",
"decision_hash": "SHA-256(inputs || chosen)"
}
decision_hash is load-bearing for θ consensus (Phase 2+ work): arbiters
include routing-decision hashes in the Merkle root, so two arbiters with the
same κ rule version and the same task context must arrive at the same pick.
δ is deterministic by construction.
7. Circuit-breaker semantics (from P0.5.2 donor notes, to honour in Phase 1.5)
From docs/guides/implementation/task-prompts/p0.5-delta-router.md
(heritage body, P0.5.2 donor prompt):
- Trip condition: 3 consecutive failures on the same model.
- Unavailability window: 60s from the 3rd failure.
- Reset rule: after 60s elapsed, counter resets to zero on first new call. (Not “on next success anywhere” — the reset is per-model + time-bound.)
- State storage: in-memory for the initial Phase 1.5 ship; a later round may migrate to DB if durability across restart becomes a requirement.
- Per-attempt timeout: 30s (configurable via
COLIBRI_MODEL_TIMEOUT), not a total chain budget.
Namespace note: donor text uses AMS_MODEL_*. Phase 1.5 uses
COLIBRI_MODEL_* per S17 §3 (env var namespace). Every reference in the new
prompts file must be COLIBRI_*.
8. File inventory for the new prompts doc
8.1 Canonical template to mimic
docs/guides/implementation/task-prompts/p0.9-nu-integrations.md — most
recently shaped spec-ready prompt file (R74.5 style, namespace-correct). Each
sub-task block follows this layout:
- Short header with spec source, extraction reference, worktree branch, branch command, estimated effort, depends-on, unblocks.
- Files to create list.
- Acceptance criteria checkbox list (6–10 items).
- Pre-flight reading list.
- Ready-to-paste agent prompt inside a
textcode fence. - Verification checklist (for reviewer agent) checkbox list.
- Writeback template YAML block.
- Common gotchas bullet list.
8.2 Files to touch in this round (R76.P2)
| Path | Action |
|---|---|
docs/guides/implementation/task-prompts/p1.5-delta-router-graduation.md |
create — 10 P1.5.x sub-task prompts |
docs/guides/implementation/task-prompts/index.md |
edit — append file entry under a new Phase 1.5 section heading |
docs/architecture/decisions/ADR-005-multi-model-defer.md |
edit — one-line postscript pointer at bottom of R75 Wave I section |
docs/audits/r76-p2-delta-15-planning-audit.md |
create — this file |
docs/contracts/r76-p2-delta-15-planning-contract.md |
create |
docs/packets/r76-p2-delta-15-planning-packet.md |
create |
docs/verification/r76-p2-delta-15-planning-verification.md |
create |
No src/ files. No test files. No concept docs. No roadmap edits (P3’s job).
8.3 Files explicitly NOT in scope
src/domains/router/*— frozen until R91+.src/__tests__/domains/router/*— frozen until R91+.docs/3-world/social/llm.md— concept doc already lists Phase 1.5 shape; no change.docs/5-time/roadmap.md— roadmap already references R91–R100 Phase 1.5; P3 owns any edit.docs/guides/implementation/task-breakdown.md— Phase 0 task list, not Phase 1.5.
9. Risks + mitigations
| Risk | Mitigation |
|---|---|
| Plan drifts from ADR-005 upgrade path | Every sub-task cites ADR-005 §Implementation or §Phase 1.5 upgrade path line. |
| Plan drifts from 7-dimension formula | P1.5.1 sub-task embeds the exact table from the concept doc. |
| Candidate cohort drifts from the 8-model table | P1.5.9 sub-task embeds the exact table from the concept doc. |
router_* tool surface drifts from ADR-005 |
P1.5.7 sub-task lists exactly 4 tools with the ADR-005 shape. |
| Phase 0 interface broken by Phase 1.5 work | Every sub-task “Forbiddens” block cites the frozen signature. |
| ζ decision-trail shape drifts | P1.5.10 embeds the exact JSON shape from the concept doc. |
COLIBRI_* vs AMS_* namespace regression |
Every env var reference in new prompts uses COLIBRI_MODEL_*. |
| Circuit-breaker semantics drift | P1.5.5 embeds the exact “3 failures / 60s / 30s timeout / per-model / time-bound reset” rule set. |
10. Definition of done for R76.P2
docs/guides/implementation/task-prompts/p1.5-delta-router-graduation.mdexists with 10 sub-task entries (P1.5.1–P1.5.10).- Every entry has: depends-on + input + output + acceptance criteria + effort + agent prompt.
docs/guides/implementation/task-prompts/index.mdlists the new file under a Phase 1.5 section.- ADR-005 R75 Wave I postscript ends with a one-line pointer to the new file.
- 4 chain docs (audit + contract + packet + verification) land under
docs/{audits,contracts,packets,verification}/r76-p2-delta-15-planning-*.md. npm run build && npm run lint && npm testgreen (docs-only round — no-op against the shipped test count).
Written 2026-04-18 as step 1 of the R76.P2 5-step chain.