P1.5.1 — Real 7-dim Intent Scoring — Audit (Step 1)

This audit inventories the existing δ Router scoring surface against the Phase 1.5 target shape and identifies the exact files, types, and code paths that P1.5.1 must replace or extend.

1. Scope statement

Replace the Phase 0 constant-returns-claude scorer (src/domains/router/scoring.ts) with a real 7-dimension weighted scoring formula per docs/3-world/social/llm.md §Target shape. Source the per-dimension weights from a κ rule pack (P1.2.4 / P1.3.1 surface) via a new thin shim module src/domains/router/scoring-weights.ts. Enumerate enabled candidates from the mcp_model_candidates table seeded by P1.5.9 (migration 009). Widen the ModelId type union additively to at least the 7-member P1.5.9 cohort.

2. Files in scope

2.1 Modify

Path Reason
src/domains/router/scoring.ts Replace constant body with 7-dim weighted formula. Preserve all exported names + Phase 0 export shapes; widen ModelId additively.
src/__tests__/domains/router/scoring.test.ts Drop constant-winner assertions. Add golden-path vector, tie-break, determinism property tests.

2.2 Create

Path Reason
src/domains/router/scoring-weights.ts κ-facing weight lookup. Thin shim over RuleRegistry.loadRuleset + executeRuleset. Exports loadScoringWeights(): ScoringWeightSet.
src/domains/router/scoring-rules.kappa (embedded as DEFAULT_SCORING_RULES const string) κ rule pack source declaring the 7 weight values.

2.3 Touch-not (forbidden by dispatch)

Path Why
src/domains/router/fallback.ts P1.5.x adapter slices own this.
src/domains/router/index.ts (barrel) Re-exports are sufficient — no new export-paths in P1.5.1.
src/domains/integrations/claude.ts Adapter scope.
src/db/migrations/* P1.5.9 already shipped the candidate table.
src/server.ts No MCP tool registration in P1.5.1 (deferred to P1.5.7).

3. Current state (Phase 0 surface at base SHA 330f38cd)

3.1 src/domains/router/scoring.ts — 131 LOC

  • ModelId = 'claude' (literal; widens additively in P1.5.1)
  • ScoreContext — open index signature { toolCount?, complexity?, [k: string]: unknown }
  • IntentScore = { scores: Readonly<Record<ModelId, number>>, winner: ModelId }
  • PHASE_0_CLAUDE_WINNER — frozen module constant { scores: { claude: 1.0 }, winner: 'claude' }
  • scoreIntent(_prompt, _context = {}) returns PHASE_0_CLAUDE_WINNER

3.2 Phase 0 invariants (header I1–I7)

Invariant Phase 1.5 obligation
I1. winner === ‘claude’ for all inputs DROPPED — only relevant when Sonnet wins the score
I2. scores.claude === 1.0 DROPPED — replaced by real score
I3. No other key in scores DROPPED — multi-model unions
I4. Pure: no I/O, no RNG, no clock KEPT (κ lookup + candidate snapshot only; no clock/RNG)
I5. Deterministic: same input → equal output KEPT (with frozen κ rule version + frozen candidate snapshot)
I6. Output deeply frozen KEPT (Object.freeze on both levels)
I7. Interface forward-compatible PARTIAL FULFILLMENT — this is the activation

3.3 src/__tests__/domains/router/scoring.test.ts — 231 LOC, 25 tests

Six describe blocks, all asserting Phase 0 constant-winner shape. Of these:

  • AC1, AC4, AC5 — “constant output invariants” → DROP (replace with 7-dim formula assertions)
  • AC1 — “prompt-invariance” → DROP (prompt now matters via task.tokens / task.deadline_ms)
  • AC2 — “context-invariance” → DROP (context IS read in P1.5.1)
  • AC3, AC8 — “determinism and purity” → KEEP / REWORK (still load-bearing; same inputs → same winner)
  • AC6, AC7 — “immutability” → KEEP (Object.freeze still required)
  • AC9 — “public interface shape” → KEEP (signature stability is forward-compat AC)

3.4 mcp_model_candidates table (P1.5.9 migration 009)

Live at SHA 330f38cd per migration 009_model_candidates.sql:

model_id               TEXT PRIMARY KEY        ('claude-sonnet-3-5', 'gpt-4o', ...)
provider               TEXT NOT NULL           ('anthropic', 'openai', 'google', ...)
context_window_tokens  INTEGER NOT NULL        200_000 / 128_000 / 1_000_000 / etc.
latency_tier           TEXT NOT NULL           CHECK IN ('fast', 'balanced', 'slow')
cost_bps_per_kilotoken INTEGER NOT NULL        CHECK >= 0
domain_fit_profile     INTEGER NOT NULL        8-bit bitmask over ξ domains [0..255]
enabled                INTEGER NOT NULL        0 | 1 (DEFAULT 0)

Seeded rows (8 total, only claude-sonnet-3-5 enabled):

model_id provider window tier cost profile enabled
claude-sonnet-3-5 anthropic 200_000 balanced 300 139 1
claude-haiku-3-5 anthropic 200_000 fast 80 66 0
gpt-4o openai 128_000 balanced 250 35 0
gpt-4o-mini openai 128_000 fast 15 65 0
gemini-1-5-pro google 1_000_000 slow 125 162 0
llama-3-3-70b meta 128_000 balanced 50 145 0
mixtral-8x22b mistral 64_000 fast 60 5 0
kimi-k2 moonshot 200_000 balanced 120 73 0

Caveat for P1.5.1: Only Sonnet ships with enabled=1. Without injection, the scoring formula will see exactly one candidate (Sonnet) and trivially pick it. Tests MUST inject a frozen candidate snapshot to exercise the 7-dimension comparison. This is consistent with the staging file’s gotcha #3 (“Reliability pull-through is deterministic. successRateLast100() must read a frozen snapshot for the scoring call.”).

3.5 κ rule engine surface (Phase 1 R87 close)

Module Public surface used by scoring-weights.ts
src/domains/rules/registry.ts RuleRegistry.loadRuleset(source: string): RuleRegistry
src/domains/rules/engine.ts executeRuleset(registry, event, state, rule_version, epoch): TransitionResult
src/domains/rules/engine.ts Mutation = { kind, target, field, new_value, ... }kind='set' carries { target, field, new_value }
src/domains/rules/bps-constants.ts BPS_100_PERCENT = 10_000n — basis-point denominator
src/domains/rules/parser.ts IntLiteral.value: bigint — every numeric literal in κ DSL is bigint

κ rule encoding for scoring weights: A κ rule pack source emits 7 set mutations naming the 7 dimensions and their bps values. Example:

rule WeightTaskDomainMatch {
  guards { true -> admit }
  effects { set(weights.task_domain_match, 2000) }
}

executeRuleset returns all_mutations in deterministic order (Admission → StateTransition → Consequence → Promotion, alpha within). The shim filters to kind='set' + target='weights' + projects field → new_value (bigint).

3.6 No successRateLast100 exists yet

The staging file’s pseudocode references successRateLast100(m.model_id) but no such function is defined anywhere in the repo (verified via grep). λ reputation is not yet wired into router scoring. P1.5.1 will surface reliability via:

  • a per-candidate reliability_bps field on the injected CandidateSnapshot (test-friendly, deterministic), AND
  • a default of 0n (no reliability data ⇒ score-neutral) when not supplied.

Wiring λ reputation history into the snapshot loader is a Phase 2+ concern (λ closed Phase 2 in R89 Phase A; integration is a separate task), out of scope for P1.5.1.

4. Algorithm sketch (forward-pointer to contract §3)

scoreIntent(prompt, context):
  candidates = context.candidatesSnapshot ?? loadEnabledCandidates()
  weights    = context.weightsSnapshot ?? loadScoringWeights()
  task       = context.task ?? {}

  for each m in candidates:
    inputs_bps[m] = {
      task_domain_match  : domainMatches(task.domain, m.profile)        ? BPS_100 : BPS_0
      context_window_fit : min(BPS_100, bps(m.window / max(task.tokens, 1)))
      cost_efficiency    : BPS_100 - bps_div(m.cost_bps_per_kilotoken, MAX_COST_BPS)
      latency_fit        : BPS_100 - bps_div(latencyMs(m.tier), max(task.deadline_ms, 1))
      reliability        : m.reliability_bps
      skill_match        : intersect(m.strengths, task.skill_req) / max(|task.skill_req|, 1)
      operator_preference: context.operatorPreference[m.model_id] ?? BPS_50
    }
    score_bps[m] = Σ weight_bps[i] * input_bps[m][i] / BPS_100  // int64 math

  winner = argmax(score_bps) with tie-break: reliability desc → cost asc → model_id asc

  return frozen { scores: { m: score_bps[m] / BPS_100 (as float for display) }, winner }

5. Forward-compatibility obligations

  • IntentScore.scores: Readonly<Record<ModelId, number>> — number-typed for Phase 0 callers; values lie in [0.0, 1.0] post-scaling.
  • IntentScore.winner: ModelId — widened union (additive).
  • scoreIntent(prompt: string, context?: ScoreContext): IntentScore — call signature byte-identical to Phase 0.
  • ScoreContext — open index signature preserved; new well-known keys (task, operatorPreference, candidatesSnapshot, weightsSnapshot) added optionally without removing existing keys (toolCount, complexity).

6. Risk register

Risk Mitigation
Floating-point drift between arbiters All accumulation in bigint bps space; single final divide to project to float [0,1].
Non-deterministic candidate ordering Sort candidates by model_id ASCII (UTF-16 code unit) before scoring.
κ rule pack drift between sessions loadScoringWeights() accepts an optional rule_version_hash for binding; default uses module-level DEFAULT_SCORING_RULES source.
DB unavailable at scoring time Test-time injection of candidatesSnapshot bypasses DB. Production code path uses try { getDb() } catch { … } with documented fallback (single-claude default).
Single enabled candidate in seeded DB Tests inject snapshot. Production scores trivially pick Sonnet — exactly the Phase 0 behaviour, but now via the real formula.

7. Decision: κ rule pack option (a) vs default fallback (b)

Option (a) — Seed minimal default-weights κ rule pack is selected.

Rationale: T0’s dispatch explicitly mandates “REAL implementations, NO STUBS. The 7-dim formula must be functional; weights MUST come from κ rule lookup (not hardcoded).” A loadScoringWeights() that returns hardcoded defaults fails the “not hardcoded” test, even with an ADR citation. Embedding a default-weights κ rule pack source as a module-level string constant (DEFAULT_SCORING_RULES) routes weight resolution through the κ parse + validate + execute pipeline, satisfying the “κ-driven” requirement while not requiring an additional file outside the existing surface.

The rule pack source ships in src/domains/router/scoring-weights.ts as a const DEFAULT_SCORING_RULES: string. It defines 7 rules, each emitting a single set mutation naming a dimension and its bps weight. The shim parses + executes this source on every cold call (cached in a module WeakRef-equivalent: let cachedWeights: ScoringWeightSet | null = null).

The default weights match the concept doc table:

Dimension Default weight bps
task_domain_match 2000
context_window_fit 1500
cost_efficiency 1500
latency_fit 1500
reliability 1500
skill_match 1500
operator_preference 500
Sum 10000 (BPS_100_PERCENT)

Callers can later swap in an operator-supplied rule pack via the ScoreContext.weightsSnapshot injection seam without modifying the shim.

8. Citations (live code at 330f38cd)

  • src/domains/router/scoring.ts:60-83 — Phase 0 type union + IntentScore shape
  • src/domains/router/scoring.ts:100-103PHASE_0_CLAUDE_WINNER constant
  • src/domains/router/scoring.ts:125-130scoreIntent body
  • src/domains/router/fallback.ts:307-355routeRequest calls scoreIntent (consumer for forward-compat check)
  • src/db/migrations/009_model_candidates.sql:77-103 — candidate table schema + seeds
  • src/domains/rules/registry.ts:338-521 — RuleRegistry surface
  • src/domains/rules/engine.ts:666-713 — executeRuleset
  • src/domains/rules/bps-constants.ts:38-46 — BPS_100_PERCENT
  • docs/3-world/social/llm.md:42-61 — 7-dimension target shape table
  • docs/3-world/social/llm.md:104-112 — worked routing example (Sonnet 0.87 / GPT-4o 0.79 / Haiku 0.58)
  • docs/architecture/decisions/ADR-005-multi-model-defer.md:39-58 — Phase 1.5 boundary

Step 1 of 5 — audit. Contract follows at docs/contracts/p1-5-1-scoring-contract.md.


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.