P1.5.1 — Real 7-dim Intent Scoring — Audit (Step 1)

This audit inventories the existing δ Router scoring surface against the Phase 1.5 target shape and identifies the exact files, types, and code paths that P1.5.1 must replace or extend.

1. Scope statement

Replace the Phase 0 constant-returns-claude scorer (src/domains/router/scoring.ts) with a real 7-dimension weighted scoring formula per docs/3-world/social/llm.md §Target shape. Source the per-dimension weights from a κ rule pack (P1.2.4 / P1.3.1 surface) via a new thin shim module src/domains/router/scoring-weights.ts. Enumerate enabled candidates from the mcp_model_candidates table seeded by P1.5.9 (migration 009). Widen the ModelId type union additively to at least the 7-member P1.5.9 cohort.

2. Files in scope

2.1 Modify

Path	Reason
`src/domains/router/scoring.ts`	Replace constant body with 7-dim weighted formula. Preserve all exported names + Phase 0 export shapes; widen `ModelId` additively.
`src/__tests__/domains/router/scoring.test.ts`	Drop constant-winner assertions. Add golden-path vector, tie-break, determinism property tests.

2.2 Create

Path	Reason
`src/domains/router/scoring-weights.ts`	κ-facing weight lookup. Thin shim over `RuleRegistry.loadRuleset` + `executeRuleset`. Exports `loadScoringWeights(): ScoringWeightSet`.
`src/domains/router/scoring-rules.kappa` (embedded as `DEFAULT_SCORING_RULES` const string)	κ rule pack source declaring the 7 weight values.

2.3 Touch-not (forbidden by dispatch)

Path	Why
`src/domains/router/fallback.ts`	P1.5.x adapter slices own this.
`src/domains/router/index.ts` (barrel)	Re-exports are sufficient — no new export-paths in P1.5.1.
`src/domains/integrations/claude.ts`	Adapter scope.
`src/db/migrations/*`	P1.5.9 already shipped the candidate table.
`src/server.ts`	No MCP tool registration in P1.5.1 (deferred to P1.5.7).

3. Current state (Phase 0 surface at base SHA `330f38cd`)

3.1 `src/domains/router/scoring.ts` — 131 LOC

ModelId = 'claude' (literal; widens additively in P1.5.1)
ScoreContext — open index signature { toolCount?, complexity?, [k: string]: unknown }
IntentScore = { scores: Readonly<Record<ModelId, number>>, winner: ModelId }
PHASE_0_CLAUDE_WINNER — frozen module constant { scores: { claude: 1.0 }, winner: 'claude' }
scoreIntent(_prompt, _context = {}) returns PHASE_0_CLAUDE_WINNER

3.2 Phase 0 invariants (header I1–I7)

Invariant	Phase 1.5 obligation
I1. winner === ‘claude’ for all inputs	DROPPED — only relevant when Sonnet wins the score
I2. scores.claude === 1.0	DROPPED — replaced by real score
I3. No other key in scores	DROPPED — multi-model unions
I4. Pure: no I/O, no RNG, no clock	KEPT (κ lookup + candidate snapshot only; no clock/RNG)
I5. Deterministic: same input → equal output	KEPT (with frozen κ rule version + frozen candidate snapshot)
I6. Output deeply frozen	KEPT (Object.freeze on both levels)
I7. Interface forward-compatible	PARTIAL FULFILLMENT — this is the activation

3.3 `src/tests/domains/router/scoring.test.ts` — 231 LOC, 25 tests

Six describe blocks, all asserting Phase 0 constant-winner shape. Of these:

AC1, AC4, AC5 — “constant output invariants” → DROP (replace with 7-dim formula assertions)
AC1 — “prompt-invariance” → DROP (prompt now matters via task.tokens / task.deadline_ms)
AC2 — “context-invariance” → DROP (context IS read in P1.5.1)
AC3, AC8 — “determinism and purity” → KEEP / REWORK (still load-bearing; same inputs → same winner)
AC6, AC7 — “immutability” → KEEP (Object.freeze still required)
AC9 — “public interface shape” → KEEP (signature stability is forward-compat AC)

3.4 `mcp_model_candidates` table (P1.5.9 migration 009)

Live at SHA 330f38cd per migration 009_model_candidates.sql:

model_id               TEXT PRIMARY KEY        ('claude-sonnet-3-5', 'gpt-4o', ...)
provider               TEXT NOT NULL           ('anthropic', 'openai', 'google', ...)
context_window_tokens  INTEGER NOT NULL        200_000 / 128_000 / 1_000_000 / etc.
latency_tier           TEXT NOT NULL           CHECK IN ('fast', 'balanced', 'slow')
cost_bps_per_kilotoken INTEGER NOT NULL        CHECK >= 0
domain_fit_profile     INTEGER NOT NULL        8-bit bitmask over ξ domains [0..255]
enabled                INTEGER NOT NULL        0 | 1 (DEFAULT 0)

Seeded rows (8 total, only claude-sonnet-3-5 enabled):

model_id	provider	window	tier	cost	profile	enabled
claude-sonnet-3-5	anthropic	200_000	balanced	300	139	1
claude-haiku-3-5	anthropic	200_000	fast	80	66	0
gpt-4o	openai	128_000	balanced	250	35	0
gpt-4o-mini	openai	128_000	fast	15	65	0
gemini-1-5-pro	google	1_000_000	slow	125	162	0
llama-3-3-70b	meta	128_000	balanced	50	145	0
mixtral-8x22b	mistral	64_000	fast	60	5	0
kimi-k2	moonshot	200_000	balanced	120	73	0

Caveat for P1.5.1: Only Sonnet ships with enabled=1. Without injection, the scoring formula will see exactly one candidate (Sonnet) and trivially pick it. Tests MUST inject a frozen candidate snapshot to exercise the 7-dimension comparison. This is consistent with the staging file’s gotcha #3 (“Reliability pull-through is deterministic. successRateLast100() must read a frozen snapshot for the scoring call.”).

3.5 κ rule engine surface (Phase 1 R87 close)

Module	Public surface used by scoring-weights.ts
`src/domains/rules/registry.ts`	`RuleRegistry.loadRuleset(source: string): RuleRegistry`
`src/domains/rules/engine.ts`	`executeRuleset(registry, event, state, rule_version, epoch): TransitionResult`
`src/domains/rules/engine.ts`	`Mutation = { kind, target, field, new_value, ... }` — `kind='set'` carries `{ target, field, new_value }`
`src/domains/rules/bps-constants.ts`	`BPS_100_PERCENT = 10_000n` — basis-point denominator
`src/domains/rules/parser.ts`	`IntLiteral.value: bigint` — every numeric literal in κ DSL is bigint

κ rule encoding for scoring weights: A κ rule pack source emits 7 set mutations naming the 7 dimensions and their bps values. Example:

rule WeightTaskDomainMatch {
  guards { true -> admit }
  effects { set(weights.task_domain_match, 2000) }
}

executeRuleset returns all_mutations in deterministic order (Admission → StateTransition → Consequence → Promotion, alpha within). The shim filters to kind='set' + target='weights' + projects field → new_value (bigint).

3.6 No `successRateLast100` exists yet

The staging file’s pseudocode references successRateLast100(m.model_id) but no such function is defined anywhere in the repo (verified via grep). λ reputation is not yet wired into router scoring. P1.5.1 will surface reliability via:

a per-candidate reliability_bps field on the injected CandidateSnapshot (test-friendly, deterministic), AND
a default of 0n (no reliability data ⇒ score-neutral) when not supplied.

Wiring λ reputation history into the snapshot loader is a Phase 2+ concern (λ closed Phase 2 in R89 Phase A; integration is a separate task), out of scope for P1.5.1.

4. Algorithm sketch (forward-pointer to contract §3)

scoreIntent(prompt, context):
  candidates = context.candidatesSnapshot ?? loadEnabledCandidates()
  weights    = context.weightsSnapshot ?? loadScoringWeights()
  task       = context.task ?? {}

  for each m in candidates:
    inputs_bps[m] = {
      task_domain_match  : domainMatches(task.domain, m.profile)        ? BPS_100 : BPS_0
      context_window_fit : min(BPS_100, bps(m.window / max(task.tokens, 1)))
      cost_efficiency    : BPS_100 - bps_div(m.cost_bps_per_kilotoken, MAX_COST_BPS)
      latency_fit        : BPS_100 - bps_div(latencyMs(m.tier), max(task.deadline_ms, 1))
      reliability        : m.reliability_bps
      skill_match        : intersect(m.strengths, task.skill_req) / max(|task.skill_req|, 1)
      operator_preference: context.operatorPreference[m.model_id] ?? BPS_50
    }
    score_bps[m] = Σ weight_bps[i] * input_bps[m][i] / BPS_100  // int64 math

  winner = argmax(score_bps) with tie-break: reliability desc → cost asc → model_id asc

  return frozen { scores: { m: score_bps[m] / BPS_100 (as float for display) }, winner }

5. Forward-compatibility obligations

IntentScore.scores: Readonly<Record<ModelId, number>> — number-typed for Phase 0 callers; values lie in [0.0, 1.0] post-scaling.
IntentScore.winner: ModelId — widened union (additive).
scoreIntent(prompt: string, context?: ScoreContext): IntentScore — call signature byte-identical to Phase 0.
ScoreContext — open index signature preserved; new well-known keys (task, operatorPreference, candidatesSnapshot, weightsSnapshot) added optionally without removing existing keys (toolCount, complexity).

6. Risk register

Risk	Mitigation
Floating-point drift between arbiters	All accumulation in `bigint` bps space; single final divide to project to float `[0,1]`.
Non-deterministic candidate ordering	Sort candidates by `model_id` ASCII (UTF-16 code unit) before scoring.
κ rule pack drift between sessions	`loadScoringWeights()` accepts an optional `rule_version_hash` for binding; default uses module-level `DEFAULT_SCORING_RULES` source.
DB unavailable at scoring time	Test-time injection of `candidatesSnapshot` bypasses DB. Production code path uses `try { getDb() } catch { … }` with documented fallback (single-claude default).
Single enabled candidate in seeded DB	Tests inject snapshot. Production scores trivially pick Sonnet — exactly the Phase 0 behaviour, but now via the real formula.

7. Decision: κ rule pack option (a) vs default fallback (b)

Option (a) — Seed minimal default-weights κ rule pack is selected.

Rationale: T0’s dispatch explicitly mandates “REAL implementations, NO STUBS. The 7-dim formula must be functional; weights MUST come from κ rule lookup (not hardcoded).” A loadScoringWeights() that returns hardcoded defaults fails the “not hardcoded” test, even with an ADR citation. Embedding a default-weights κ rule pack source as a module-level string constant (DEFAULT_SCORING_RULES) routes weight resolution through the κ parse + validate + execute pipeline, satisfying the “κ-driven” requirement while not requiring an additional file outside the existing surface.

The rule pack source ships in src/domains/router/scoring-weights.ts as a const DEFAULT_SCORING_RULES: string. It defines 7 rules, each emitting a single set mutation naming a dimension and its bps weight. The shim parses + executes this source on every cold call (cached in a module WeakRef-equivalent: let cachedWeights: ScoringWeightSet | null = null).

The default weights match the concept doc table:

Dimension	Default weight bps
`task_domain_match`	`2000`
`context_window_fit`	`1500`
`cost_efficiency`	`1500`
`latency_fit`	`1500`
`reliability`	`1500`
`skill_match`	`1500`
`operator_preference`	`500`
Sum	10000 (BPS_100_PERCENT)

Callers can later swap in an operator-supplied rule pack via the ScoreContext.weightsSnapshot injection seam without modifying the shim.

8. Citations (live code at `330f38cd`)

src/domains/router/scoring.ts:60-83 — Phase 0 type union + IntentScore shape
src/domains/router/scoring.ts:100-103 — PHASE_0_CLAUDE_WINNER constant
src/domains/router/scoring.ts:125-130 — scoreIntent body
src/domains/router/fallback.ts:307-355 — routeRequest calls scoreIntent (consumer for forward-compat check)
src/db/migrations/009_model_candidates.sql:77-103 — candidate table schema + seeds
src/domains/rules/registry.ts:338-521 — RuleRegistry surface
src/domains/rules/engine.ts:666-713 — executeRuleset
src/domains/rules/bps-constants.ts:38-46 — BPS_100_PERCENT
docs/3-world/social/llm.md:42-61 — 7-dimension target shape table
docs/3-world/social/llm.md:104-112 — worked routing example (Sonnet 0.87 / GPT-4o 0.79 / Haiku 0.58)
docs/architecture/decisions/ADR-005-multi-model-defer.md:39-58 — Phase 1.5 boundary

Step 1 of 5 — audit. Contract follows at docs/contracts/p1-5-1-scoring-contract.md.