R76.P2 Audit — δ Phase 1.5 Graduation Plan

Inventory of inputs for authoring the Phase 1.5 δ task-prompts file at docs/guides/implementation/task-prompts/p1.5-delta-router-graduation.md.

Scope: planning content only. No src/ changes. No interface changes. This round writes prompts for executors who will act at R91+ per docs/5-time/roadmap.md §Phase 1.5 (R91–R100).

1. Shipped Phase 0 δ stubs (the “do not break” baseline)

Wave I of R75 landed δ as library stubs per ADR-005 §Decision. Phase 1.5 replaces the body of these modules without changing their exported interfaces. Every signature below is frozen.

`src/domains/router/scoring.ts` (PR #149)

Exports: scoreIntent(prompt, context) → IntentScore, types ModelId, ScoreContext, IntentScore.
Phase 0 body: returns a frozen constant { scores: { claude: 1.0 }, winner: 'claude' } regardless of input.
Phase 0 invariants I1–I7 written into the file header:
- I1–I3: winner is always 'claude'; no other score keys exist.
- I4: pure function — no I/O, randomness, time-dependent state.
- I5: deterministic.
- I6: output is deeply frozen (both levels).
- I7: interface is forward-compatible — ModelId widens additively, ScoreContext keeps its open index signature, IntentScore shape stays.
Phase 1.5 upgrade note in the file header: “Replace PHASE_0_CLAUDE_WINNER and the body of scoreIntent with the real scoring matrix. Widen ModelId. Callers continue to work unchanged.”

`src/domains/router/fallback.ts` (PR #150)

Exports: routeRequest(prompt, options) → Promise<RouteResult>, FallbackChainExhaustedError, ROUTER_PHASE_0_SHAPE, types RouteOptions, RouteResult, FallbackAttempt, CompletionFn, CompletionFnOptions, ScoringFn.
Phase 0 body: single upstream call; on failure throws FallbackChainExhaustedError with attempts.length === 1. No retry, no circuit breaker, no timer, no env var reads.
ROUTER_PHASE_0_SHAPE marker: literal-typed constant { members: 1, hasCircuitBreaker: false, modelsSupported: ['claude'] } whose value is asserted by tests. Phase 1.5 must flip the literals as part of the transition so the shape assertion fails loudly until the impl + marker both graduate.
Injection seams already present: completionFn, scoringFn, fetchFn, logger, delayFn are pluggable via RouteOptions — Phase 1.5 adapters hook in here.

`src/domains/router/index.ts`

Barrel that re-exports both modules. Phase 1.5 appends adapters/* exports.

Existing Phase 0 tests

src/__tests__/domains/router/scoring.test.ts — determinism + shape tests.
src/__tests__/domains/router/fallback.test.ts — single-member chain + ROUTER_PHASE_0_SHAPE assertion + error shape + injection-seam tests.
Total Phase 0 tests on δ: 75 (scoring + fallback combined — counted from passing suite at e455b00a).

Phase 1.5 constraint: every existing test must continue to pass, except (a) the ROUTER_PHASE_0_SHAPE literal assertion, which flips intentionally; (b) tests asserting winner === 'claude' for arbitrary inputs, which become property-based “winner ∈ enabled candidates” assertions.

2. ADR-005 §Implementation + §Phase 1.5 upgrade path

docs/architecture/decisions/ADR-005-multi-model-defer.md is the authority. Key obligations the Phase 1.5 plan must honour:

2.1 Trigger conditions (all four must hold before Phase 1.5 starts)

Phase 1 κ Rule Engine is complete — scoring rules must be deterministic, and κ is where the per-dimension weights live (7 dimensions, weights stored as bps; see §3 below). Without κ, scoring policy is hardcoded, which violates “routing policy is governed, not coded” (δ concept doc §Target shape).
≥2 other model providers with stable tool-use APIs.
≥100 sustained calls/day to justify adapter work.
Human (T0) authorization for the round.

2.2 Phase 1.5 upgrade path (from the ADR itself)

Write the scoring function in src/domains/router/scoring.ts — replace the constant.
Add adapter files: src/domains/router/adapters/kimi.ts, codex.ts, openai.ts.
Wire fallback chain in src/domains/router/fallback.ts.
Add cost accounting to router_stats.
Add cross-model parity tests.

Phase 1.5 does not change the router tool interface. Existing callers continue to work unchanged.

2.3 Postscript obligations (Wave I section)

The ADR’s Wave I postscript lists what explicitly does NOT land in Phase 0 and therefore is Phase 1.5 scope:

Real scoring factors (prompt length, complexity keywords, context size, tool requirements).
Multi-member fallback chain.
Circuit breaker (3-failure trip, 60s unavailability window).
Adapter layer for non-Claude providers.
Any δ-facing MCP tool — router_* tools remain deferred.

3. 7-dimension scoring formula (δ concept doc)

From docs/3-world/social/llm.md §Target shape → scoring algorithm:

score(m, T) = Σ ( weight_i × normalized_input_i(m, T) )   for i ∈ 7 dimensions

Each normalized input is bounded [0, 1]; weights sum to 1.0 (stored as bps, 0–10000; accumulates as int64; final scaling is deterministic).

Dimension	Formula	Default weight
`task_domain_match`	`1` if `task.domain ∈ model.profile` else `0`	`0.20`
`context_window_fit`	`min(1, model.window / task.tokens)`	`0.15`
`cost_efficiency`	`1 − (model.cost_per_1k / max_cost)` clamped `[0,1]`	`0.15`
`latency_fit`	`1 − (model.p50_ms / task.deadline_ms)` clamped	`0.15`
`reliability`	success rate over last 100 invocations	`0.15`
`skill_match`	overlap of model strengths with task skill requirement	`0.15`
`operator_preference`	operator-supplied override weight	`0.05`

Tie-break order: (a) higher reliability; (b) lower cost; (c) alphabetical on model_id.

3.1 Worked routing example from the concept doc

Task: “Code review of 50KB pull request, response budget ≤ 5s.”

Model	domain	window	cost	latency	reliability	skill	pref	score
Claude Sonnet	1.0	1.00	0.55	0.80	0.96	1.0	0.5	0.87
GPT-4o	1.0	1.00	0.55	0.20	0.92	1.0	0.5	0.79
Claude Haiku	0.8	1.00	0.90	0.95	0.75	0.7	0.5	0.58

Sonnet wins. Haiku is fastest and cheapest, but its reliability drag loses it. This example becomes the golden-path test vector in P1.5.1’s acceptance.

3.2 Feedback loop (also spec-only for Phase 1.5)

new_weight = old_weight × (1 + 0.05 × feedback_delta)
new_weight = clamp(new_weight, 0.5, 1.5)

Feedback delta ∈ [−1, +1] from post-task outcome. Clamp prevents runaway. Feedback surface is λ reputation, not δ — δ reads, does not write, so Phase 1.5 ships the read path only.

4. Candidate cohort (δ concept doc §Phase 1.5 candidate cohort)

Eight initial candidates, covering four provider families plus a self-host and a Chinese/English-parity option:

Model	Window	Cost / 1k (bps)	p50	Strengths
Claude 3.5 Sonnet	200k	high	balanced	code review, structured output
Claude 3.5 Haiku	200k	low	fast	classification, triage
GPT-4o	128k	high	balanced	multimodal, general reasoning
GPT-4o mini	128k	low	fast	cheap fallback
Gemini 1.5 Pro	1M	medium	slow	huge-context synthesis
Llama 3.3 70B	128k	low	varies	self-host, data sovereignty
Mixtral 8x22B	64k	low	fast	permissive open-weight
Kimi K2	200k	medium	balanced	Chinese/English parity

Costs and latencies are indicative; Phase 1.5 populates the live mcp_model_candidates table from fresh values updated via a post-task callback.

4.1 Candidate table schema (schema-ready in Phase 0, populated in Phase 1.5)

From the δ concept doc §Candidate table:

model_id (e.g. "claude-opus-4-6")
provider
context_window_tokens
latency_tier (fast | balanced | slow)
cost_bps_per_kilotoken
domain_fit_profile (bitmask over the 8 ξ domains)
enabled (boolean)

Phase 0 has a single row (Claude Sonnet). Phase 1.5 populates the other seven.

5. Tool surface Phase 1.5 adds (`router_*` family)

From ADR-004 R75 Wave H amendment + the δ concept doc §Phase 0 posture: the 14-tool Phase 0 surface has zero δ-facing tools. Phase 1.5 introduces the router_* family. The roadmap file (docs/5-time/roadmap.md §Phase 1.5) lists three target tools (router_score_task, router_call, router_feedback); the ADR-005 Decision listing includes a fourth (router_stats). This plan uses four Phase 1.5 tools (matching ADR-005), with router_score_task/router_score treated as the same function under either name:

Tool	Purpose	Input shape	Output shape
`router_score`	evaluate a prompt, return ranked models	`{prompt, context?}`	`{scores: Record<ModelId, number>, winner, rule_version_hash}`
`router_call`	execute a prompt with the selected model (and cascade)	`{prompt, options?}`	`{model, content, finishReason, tokens, latencyMs, costUsd, modelsAttempted}`
`router_fallback`	inspect / manually trigger a fallback cascade	`{model_id?, reset?}`	`{circuitState: Record<ModelId, CircuitState>}`
`router_stats`	per-model call stats + cost aggregate	`{}`	`{models: Record<ModelId, {calls_total, avg_cost_usd, p50_latency_ms, success_rate}>}`

All four tools emit a ζ thought_record of thought_type = 'decision' bearing the routing-decision shape (see §6).

6. ζ decision-trail shape Phase 1.5 must emit

From the δ concept doc §Decision-trail recording:

{
  "type": "routing_decision",
  "routing_mode": "single" | "ensemble" | "pipeline" | "fail",
  "chosen_model_id": "claude-sonnet-3.5",
  "candidates_considered": ["claude-sonnet-3.5", "gpt-4o", "claude-haiku-3.5"],
  "scores": {"claude-sonnet-3.5": 0.87, "gpt-4o": 0.79, "claude-haiku-3.5": 0.58},
  "fallback_attempts": 0,
  "rule_version_hash": "rv:sha256:...",
  "decision_hash": "SHA-256(inputs || chosen)"
}

decision_hash is load-bearing for θ consensus (Phase 2+ work): arbiters include routing-decision hashes in the Merkle root, so two arbiters with the same κ rule version and the same task context must arrive at the same pick. δ is deterministic by construction.

7. Circuit-breaker semantics (from P0.5.2 donor notes, to honour in Phase 1.5)

From docs/guides/implementation/task-prompts/p0.5-delta-router.md (heritage body, P0.5.2 donor prompt):

Trip condition: 3 consecutive failures on the same model.
Unavailability window: 60s from the 3rd failure.
Reset rule: after 60s elapsed, counter resets to zero on first new call. (Not “on next success anywhere” — the reset is per-model + time-bound.)
State storage: in-memory for the initial Phase 1.5 ship; a later round may migrate to DB if durability across restart becomes a requirement.
Per-attempt timeout: 30s (configurable via COLIBRI_MODEL_TIMEOUT), not a total chain budget.

Namespace note: donor text uses AMS_MODEL_*. Phase 1.5 uses COLIBRI_MODEL_* per S17 §3 (env var namespace). Every reference in the new prompts file must be COLIBRI_*.

8. File inventory for the new prompts doc

8.1 Canonical template to mimic

docs/guides/implementation/task-prompts/p0.9-nu-integrations.md — most recently shaped spec-ready prompt file (R74.5 style, namespace-correct). Each sub-task block follows this layout:

Short header with spec source, extraction reference, worktree branch, branch command, estimated effort, depends-on, unblocks.
Files to create list.
Acceptance criteria checkbox list (6–10 items).
Pre-flight reading list.
Ready-to-paste agent prompt inside a text code fence.
Verification checklist (for reviewer agent) checkbox list.
Writeback template YAML block.
Common gotchas bullet list.

8.2 Files to touch in this round (R76.P2)

Path	Action
`docs/guides/implementation/task-prompts/p1.5-delta-router-graduation.md`	create — 10 P1.5.x sub-task prompts
`docs/guides/implementation/task-prompts/index.md`	edit — append file entry under a new Phase 1.5 section heading
`docs/architecture/decisions/ADR-005-multi-model-defer.md`	edit — one-line postscript pointer at bottom of R75 Wave I section
`docs/audits/r76-p2-delta-15-planning-audit.md`	create — this file
`docs/contracts/r76-p2-delta-15-planning-contract.md`	create
`docs/packets/r76-p2-delta-15-planning-packet.md`	create
`docs/verification/r76-p2-delta-15-planning-verification.md`	create

No src/ files. No test files. No concept docs. No roadmap edits (P3’s job).

8.3 Files explicitly NOT in scope

src/domains/router/* — frozen until R91+.
src/__tests__/domains/router/* — frozen until R91+.
docs/3-world/social/llm.md — concept doc already lists Phase 1.5 shape; no change.
docs/5-time/roadmap.md — roadmap already references R91–R100 Phase 1.5; P3 owns any edit.
docs/guides/implementation/task-breakdown.md — Phase 0 task list, not Phase 1.5.

9. Risks + mitigations

Risk	Mitigation
Plan drifts from ADR-005 upgrade path	Every sub-task cites ADR-005 §Implementation or §Phase 1.5 upgrade path line.
Plan drifts from 7-dimension formula	P1.5.1 sub-task embeds the exact table from the concept doc.
Candidate cohort drifts from the 8-model table	P1.5.9 sub-task embeds the exact table from the concept doc.
`router_*` tool surface drifts from ADR-005	P1.5.7 sub-task lists exactly 4 tools with the ADR-005 shape.
Phase 0 interface broken by Phase 1.5 work	Every sub-task “Forbiddens” block cites the frozen signature.
ζ decision-trail shape drifts	P1.5.10 embeds the exact JSON shape from the concept doc.
`COLIBRI_` vs `AMS_` namespace regression	Every env var reference in new prompts uses `COLIBRI_MODEL_*`.
Circuit-breaker semantics drift	P1.5.5 embeds the exact “3 failures / 60s / 30s timeout / per-model / time-bound reset” rule set.

10. Definition of done for R76.P2

docs/guides/implementation/task-prompts/p1.5-delta-router-graduation.md exists with 10 sub-task entries (P1.5.1–P1.5.10).
Every entry has: depends-on + input + output + acceptance criteria + effort + agent prompt.
docs/guides/implementation/task-prompts/index.md lists the new file under a Phase 1.5 section.
ADR-005 R75 Wave I postscript ends with a one-line pointer to the new file.
4 chain docs (audit + contract + packet + verification) land under docs/{audits,contracts,packets,verification}/r76-p2-delta-15-planning-*.md.
npm run build && npm run lint && npm test green (docs-only round — no-op against the shipped test count).

Written 2026-04-18 as step 1 of the R76.P2 5-step chain.