P1.5.1 — Real 7-dim Intent Scoring — Audit (Step 1)
This audit inventories the existing δ Router scoring surface against the Phase 1.5 target shape and identifies the exact files, types, and code paths that P1.5.1 must replace or extend.
1. Scope statement
Replace the Phase 0 constant-returns-claude scorer (src/domains/router/scoring.ts)
with a real 7-dimension weighted scoring formula per
docs/3-world/social/llm.md §Target shape. Source
the per-dimension weights from a κ rule pack (P1.2.4 / P1.3.1 surface) via a
new thin shim module src/domains/router/scoring-weights.ts. Enumerate
enabled candidates from the mcp_model_candidates table seeded by P1.5.9
(migration 009). Widen the ModelId type union additively to at least the
7-member P1.5.9 cohort.
2. Files in scope
2.1 Modify
| Path | Reason |
|---|---|
src/domains/router/scoring.ts |
Replace constant body with 7-dim weighted formula. Preserve all exported names + Phase 0 export shapes; widen ModelId additively. |
src/__tests__/domains/router/scoring.test.ts |
Drop constant-winner assertions. Add golden-path vector, tie-break, determinism property tests. |
2.2 Create
| Path | Reason |
|---|---|
src/domains/router/scoring-weights.ts |
κ-facing weight lookup. Thin shim over RuleRegistry.loadRuleset + executeRuleset. Exports loadScoringWeights(): ScoringWeightSet. |
src/domains/router/scoring-rules.kappa (embedded as DEFAULT_SCORING_RULES const string) |
κ rule pack source declaring the 7 weight values. |
2.3 Touch-not (forbidden by dispatch)
| Path | Why |
|---|---|
src/domains/router/fallback.ts |
P1.5.x adapter slices own this. |
src/domains/router/index.ts (barrel) |
Re-exports are sufficient — no new export-paths in P1.5.1. |
src/domains/integrations/claude.ts |
Adapter scope. |
src/db/migrations/* |
P1.5.9 already shipped the candidate table. |
src/server.ts |
No MCP tool registration in P1.5.1 (deferred to P1.5.7). |
3. Current state (Phase 0 surface at base SHA 330f38cd)
3.1 src/domains/router/scoring.ts — 131 LOC
ModelId = 'claude'(literal; widens additively in P1.5.1)ScoreContext— open index signature{ toolCount?, complexity?, [k: string]: unknown }IntentScore = { scores: Readonly<Record<ModelId, number>>, winner: ModelId }PHASE_0_CLAUDE_WINNER— frozen module constant{ scores: { claude: 1.0 }, winner: 'claude' }scoreIntent(_prompt, _context = {})returnsPHASE_0_CLAUDE_WINNER
3.2 Phase 0 invariants (header I1–I7)
| Invariant | Phase 1.5 obligation |
|---|---|
| I1. winner === ‘claude’ for all inputs | DROPPED — only relevant when Sonnet wins the score |
| I2. scores.claude === 1.0 | DROPPED — replaced by real score |
| I3. No other key in scores | DROPPED — multi-model unions |
| I4. Pure: no I/O, no RNG, no clock | KEPT (κ lookup + candidate snapshot only; no clock/RNG) |
| I5. Deterministic: same input → equal output | KEPT (with frozen κ rule version + frozen candidate snapshot) |
| I6. Output deeply frozen | KEPT (Object.freeze on both levels) |
| I7. Interface forward-compatible | PARTIAL FULFILLMENT — this is the activation |
3.3 src/__tests__/domains/router/scoring.test.ts — 231 LOC, 25 tests
Six describe blocks, all asserting Phase 0 constant-winner shape. Of these:
- AC1, AC4, AC5 — “constant output invariants” → DROP (replace with 7-dim formula assertions)
- AC1 — “prompt-invariance” → DROP (prompt now matters via task.tokens / task.deadline_ms)
- AC2 — “context-invariance” → DROP (context IS read in P1.5.1)
- AC3, AC8 — “determinism and purity” → KEEP / REWORK (still load-bearing; same inputs → same winner)
- AC6, AC7 — “immutability” → KEEP (Object.freeze still required)
- AC9 — “public interface shape” → KEEP (signature stability is forward-compat AC)
3.4 mcp_model_candidates table (P1.5.9 migration 009)
Live at SHA 330f38cd per migration 009_model_candidates.sql:
model_id TEXT PRIMARY KEY ('claude-sonnet-3-5', 'gpt-4o', ...)
provider TEXT NOT NULL ('anthropic', 'openai', 'google', ...)
context_window_tokens INTEGER NOT NULL 200_000 / 128_000 / 1_000_000 / etc.
latency_tier TEXT NOT NULL CHECK IN ('fast', 'balanced', 'slow')
cost_bps_per_kilotoken INTEGER NOT NULL CHECK >= 0
domain_fit_profile INTEGER NOT NULL 8-bit bitmask over ξ domains [0..255]
enabled INTEGER NOT NULL 0 | 1 (DEFAULT 0)
Seeded rows (8 total, only claude-sonnet-3-5 enabled):
| model_id | provider | window | tier | cost | profile | enabled |
|---|---|---|---|---|---|---|
| claude-sonnet-3-5 | anthropic | 200_000 | balanced | 300 | 139 | 1 |
| claude-haiku-3-5 | anthropic | 200_000 | fast | 80 | 66 | 0 |
| gpt-4o | openai | 128_000 | balanced | 250 | 35 | 0 |
| gpt-4o-mini | openai | 128_000 | fast | 15 | 65 | 0 |
| gemini-1-5-pro | 1_000_000 | slow | 125 | 162 | 0 | |
| llama-3-3-70b | meta | 128_000 | balanced | 50 | 145 | 0 |
| mixtral-8x22b | mistral | 64_000 | fast | 60 | 5 | 0 |
| kimi-k2 | moonshot | 200_000 | balanced | 120 | 73 | 0 |
Caveat for P1.5.1: Only Sonnet ships with enabled=1. Without injection,
the scoring formula will see exactly one candidate (Sonnet) and trivially
pick it. Tests MUST inject a frozen candidate snapshot to exercise the
7-dimension comparison. This is consistent with the staging file’s
gotcha #3 (“Reliability pull-through is deterministic. successRateLast100()
must read a frozen snapshot for the scoring call.”).
3.5 κ rule engine surface (Phase 1 R87 close)
| Module | Public surface used by scoring-weights.ts |
|---|---|
src/domains/rules/registry.ts |
RuleRegistry.loadRuleset(source: string): RuleRegistry |
src/domains/rules/engine.ts |
executeRuleset(registry, event, state, rule_version, epoch): TransitionResult |
src/domains/rules/engine.ts |
Mutation = { kind, target, field, new_value, ... } — kind='set' carries { target, field, new_value } |
src/domains/rules/bps-constants.ts |
BPS_100_PERCENT = 10_000n — basis-point denominator |
src/domains/rules/parser.ts |
IntLiteral.value: bigint — every numeric literal in κ DSL is bigint |
κ rule encoding for scoring weights: A κ rule pack source emits 7 set
mutations naming the 7 dimensions and their bps values. Example:
rule WeightTaskDomainMatch {
guards { true -> admit }
effects { set(weights.task_domain_match, 2000) }
}
executeRuleset returns all_mutations in deterministic order
(Admission → StateTransition → Consequence → Promotion, alpha within).
The shim filters to kind='set' + target='weights' + projects
field → new_value (bigint).
3.6 No successRateLast100 exists yet
The staging file’s pseudocode references successRateLast100(m.model_id) but
no such function is defined anywhere in the repo (verified via grep). λ
reputation is not yet wired into router scoring. P1.5.1 will surface
reliability via:
- a per-candidate
reliability_bpsfield on the injectedCandidateSnapshot(test-friendly, deterministic), AND - a default of
0n(no reliability data ⇒ score-neutral) when not supplied.
Wiring λ reputation history into the snapshot loader is a Phase 2+ concern (λ closed Phase 2 in R89 Phase A; integration is a separate task), out of scope for P1.5.1.
4. Algorithm sketch (forward-pointer to contract §3)
scoreIntent(prompt, context):
candidates = context.candidatesSnapshot ?? loadEnabledCandidates()
weights = context.weightsSnapshot ?? loadScoringWeights()
task = context.task ?? {}
for each m in candidates:
inputs_bps[m] = {
task_domain_match : domainMatches(task.domain, m.profile) ? BPS_100 : BPS_0
context_window_fit : min(BPS_100, bps(m.window / max(task.tokens, 1)))
cost_efficiency : BPS_100 - bps_div(m.cost_bps_per_kilotoken, MAX_COST_BPS)
latency_fit : BPS_100 - bps_div(latencyMs(m.tier), max(task.deadline_ms, 1))
reliability : m.reliability_bps
skill_match : intersect(m.strengths, task.skill_req) / max(|task.skill_req|, 1)
operator_preference: context.operatorPreference[m.model_id] ?? BPS_50
}
score_bps[m] = Σ weight_bps[i] * input_bps[m][i] / BPS_100 // int64 math
winner = argmax(score_bps) with tie-break: reliability desc → cost asc → model_id asc
return frozen { scores: { m: score_bps[m] / BPS_100 (as float for display) }, winner }
5. Forward-compatibility obligations
IntentScore.scores: Readonly<Record<ModelId, number>>— number-typed for Phase 0 callers; values lie in[0.0, 1.0]post-scaling.IntentScore.winner: ModelId— widened union (additive).scoreIntent(prompt: string, context?: ScoreContext): IntentScore— call signature byte-identical to Phase 0.ScoreContext— open index signature preserved; new well-known keys (task,operatorPreference,candidatesSnapshot,weightsSnapshot) added optionally without removing existing keys (toolCount,complexity).
6. Risk register
| Risk | Mitigation |
|---|---|
| Floating-point drift between arbiters | All accumulation in bigint bps space; single final divide to project to float [0,1]. |
| Non-deterministic candidate ordering | Sort candidates by model_id ASCII (UTF-16 code unit) before scoring. |
| κ rule pack drift between sessions | loadScoringWeights() accepts an optional rule_version_hash for binding; default uses module-level DEFAULT_SCORING_RULES source. |
| DB unavailable at scoring time | Test-time injection of candidatesSnapshot bypasses DB. Production code path uses try { getDb() } catch { … } with documented fallback (single-claude default). |
| Single enabled candidate in seeded DB | Tests inject snapshot. Production scores trivially pick Sonnet — exactly the Phase 0 behaviour, but now via the real formula. |
7. Decision: κ rule pack option (a) vs default fallback (b)
Option (a) — Seed minimal default-weights κ rule pack is selected.
Rationale: T0’s dispatch explicitly mandates “REAL implementations, NO STUBS.
The 7-dim formula must be functional; weights MUST come from κ rule lookup
(not hardcoded).” A loadScoringWeights() that returns hardcoded defaults
fails the “not hardcoded” test, even with an ADR citation. Embedding a
default-weights κ rule pack source as a module-level string constant
(DEFAULT_SCORING_RULES) routes weight resolution through the κ parse +
validate + execute pipeline, satisfying the “κ-driven” requirement while
not requiring an additional file outside the existing surface.
The rule pack source ships in src/domains/router/scoring-weights.ts as
a const DEFAULT_SCORING_RULES: string. It defines 7 rules, each emitting
a single set mutation naming a dimension and its bps weight. The shim
parses + executes this source on every cold call (cached in a module
WeakRef-equivalent: let cachedWeights: ScoringWeightSet | null = null).
The default weights match the concept doc table:
| Dimension | Default weight bps |
|---|---|
task_domain_match |
2000 |
context_window_fit |
1500 |
cost_efficiency |
1500 |
latency_fit |
1500 |
reliability |
1500 |
skill_match |
1500 |
operator_preference |
500 |
| Sum | 10000 (BPS_100_PERCENT) |
Callers can later swap in an operator-supplied rule pack via the
ScoreContext.weightsSnapshot injection seam without modifying the shim.
8. Citations (live code at 330f38cd)
src/domains/router/scoring.ts:60-83— Phase 0 type union + IntentScore shapesrc/domains/router/scoring.ts:100-103—PHASE_0_CLAUDE_WINNERconstantsrc/domains/router/scoring.ts:125-130—scoreIntentbodysrc/domains/router/fallback.ts:307-355—routeRequestcallsscoreIntent(consumer for forward-compat check)src/db/migrations/009_model_candidates.sql:77-103— candidate table schema + seedssrc/domains/rules/registry.ts:338-521— RuleRegistry surfacesrc/domains/rules/engine.ts:666-713— executeRulesetsrc/domains/rules/bps-constants.ts:38-46— BPS_100_PERCENTdocs/3-world/social/llm.md:42-61— 7-dimension target shape tabledocs/3-world/social/llm.md:104-112— worked routing example (Sonnet 0.87 / GPT-4o 0.79 / Haiku 0.58)docs/architecture/decisions/ADR-005-multi-model-defer.md:39-58— Phase 1.5 boundary
Step 1 of 5 — audit. Contract follows at docs/contracts/p1-5-1-scoring-contract.md.