P1.5.1 — Real 7-dim Intent Scoring — Behavioral Contract (Step 2)
This contract defines the behavioral surface that src/domains/router/scoring.ts
and src/domains/router/scoring-weights.ts MUST satisfy after P1.5.1
graduation. Each clause maps to a verification test in Step 5.
1. Public surface (locked)
1.1 Module: src/domains/router/scoring.ts
export type ModelId =
| 'claude' // legacy abstract alias (Phase 0 / fallback test compat)
| 'claude-sonnet-3-5'
| 'claude-haiku-3-5'
| 'gpt-4o'
| 'gpt-4o-mini'
| 'gemini-1-5-pro'
| 'llama-3-3-70b'
| 'mixtral-8x22b'
| 'kimi-k2';
export interface ScoreContext {
// Phase 0 (kept for compat)
readonly toolCount?: number;
readonly complexity?: string;
// Phase 1.5 (new well-known keys)
readonly task?: TaskShape;
readonly operatorPreference?: Readonly<Partial<Record<ModelId, number>>>;
readonly candidatesSnapshot?: ReadonlyArray<ModelCandidate>;
readonly weightsSnapshot?: ScoringWeightSet;
// Open extension
readonly [key: string]: unknown;
}
export interface TaskShape {
readonly domain?: string; // e.g. 'code_review'
readonly tokens?: number; // estimated input size
readonly deadline_ms?: number; // request budget
readonly skill?: readonly string[]; // required skill tags
}
export interface ModelCandidate {
readonly model_id: ModelId;
readonly provider: string;
readonly context_window_tokens: number;
readonly latency_tier: 'fast' | 'balanced' | 'slow';
readonly cost_bps_per_kilotoken: number;
readonly domain_fit_profile: number; // 8-bit bitmask
readonly enabled: boolean;
readonly reliability_bps?: bigint; // λ pull-through; default 0n
readonly strengths?: readonly string[];// skill-set declaration; default []
readonly task_domains?: readonly string[]; // task-domain whitelist; default []
}
export interface IntentScore {
readonly scores: Readonly<Record<ModelId, number>>;
readonly winner: ModelId;
}
export function scoreIntent(
prompt: string,
context?: ScoreContext,
): IntentScore;
Signature stability: scoreIntent(prompt: string, context?: ScoreContext)
=> IntentScore is byte-identical to Phase 0 modulo widened ModelId. The
'claude' literal stays in the union so existing callers (the P0.5.2
fallback module + its 50+ tests) keep compiling.
1.2 Module: src/domains/router/scoring-weights.ts (NEW)
export interface ScoringWeightSet {
readonly task_domain_match: bigint; // bps
readonly context_window_fit: bigint;
readonly cost_efficiency: bigint;
readonly latency_fit: bigint;
readonly reliability: bigint;
readonly skill_match: bigint;
readonly operator_preference: bigint;
}
export const SCORING_DIMENSIONS: readonly (keyof ScoringWeightSet)[];
export const DEFAULT_SCORING_RULES: string;
export const DEFAULT_RULE_VERSION_HASH: string;
export class KappaRulesUnavailableError extends Error {
readonly code: 'KAPPA_RULES_UNAVAILABLE';
}
export function loadScoringWeights(
source?: string,
): ScoringWeightSet;
Pure module: no DB, no env, no I/O. Source defaults to
DEFAULT_SCORING_RULES (a module-level κ rule pack source string).
2. Algorithm (locked)
2.1 Inputs
| Input | Source | Type |
|---|---|---|
prompt |
argument | string |
task |
context.task |
TaskShape | undefined |
candidates |
context.candidatesSnapshot ?? DB query |
readonly ModelCandidate[] |
weights |
context.weightsSnapshot ?? loadScoringWeights() |
ScoringWeightSet |
operatorPreference |
context.operatorPreference ?? {} |
Record<ModelId, number> |
2.2 Per-dimension input formulae (per concept doc)
All seven dimensions normalize to a bps value in [0, BPS_100_PERCENT] (i.e.
[0n, 10_000n]).
| Dimension | Input formula (bps, bigint) |
|---|---|
task_domain_match |
BPS_100_PERCENT iff task.domain ∈ candidate.task_domains; else 0n |
context_window_fit |
min(BPS_100_PERCENT, (candidate.context_window_tokens / max(task.tokens, 1)) * BPS_100_PERCENT) |
cost_efficiency |
BPS_100_PERCENT - (candidate.cost_bps_per_kilotoken * BPS_100_PERCENT) / max(MAX_COST_BPS, 1), clamped [0, BPS_100_PERCENT] |
latency_fit |
BPS_100_PERCENT - (latencyMs(candidate.tier) * BPS_100_PERCENT) / max(task.deadline_ms, 1), clamped [0, BPS_100_PERCENT] |
reliability |
candidate.reliability_bps ?? 0n, clamped [0, BPS_100_PERCENT] |
skill_match |
(intersect(candidate.strengths, task.skill) * BPS_100_PERCENT) / max(task.skill.length, 1) |
operator_preference |
bps(operatorPreference[candidate.model_id] ?? 0.5), where input float ∈ [0, 1] → bps ∈ [0, 10_000n] |
Constants:
MAX_COST_BPS = 1000n(10% per 1k tokens; per concept doc’s “indicative cost” range)latencyMs(tier):fast → 1000,balanced → 4000,slow → 9000(per concept doc’s latency tiers)
Token-count default: when task.tokens is undefined, fall back to
max(1, prompt.length / 4) (4 chars/token approximation) so the dimension
remains computable for raw-prompt callers.
2.3 Score accumulation (integer math)
for each candidate m sorted by m.model_id ASCII asc:
score_bps_int64 = 0n
for each dim ∈ SCORING_DIMENSIONS (declared order):
weight = weights[dim] // bigint, [0n, 10000n]
input = inputs_bps[m][dim] // bigint, [0n, 10000n]
score_bps_int64 += safe_mul(weight, input) // int64-guarded; product ≤ 10^8
score_per_10000[m] = score_bps_int64 / BPS_100_PERCENT
// score_per_10000 lies in [0n, 10000n] when Σ weights ≤ 10000
scores_float[m] = Number(score_per_10000[m]) / 10000 // single divide to [0,1]
Invariant: Σ weights = 10000n (BPS_100_PERCENT) by construction.
loadScoringWeights() validates this and throws KappaRulesUnavailableError
when the sum drifts. Therefore score_per_10000[m] ∈ [0n, 10000n] and
scores_float[m] ∈ [0.0, 1.0].
2.4 Tie-break
When two or more candidates produce identical score_per_10000 values,
ranking is broken by, in order:
- Higher reliability_bps (descending)
- Lower cost_bps_per_kilotoken (ascending)
- Alphabetical model_id (ASCII / UTF-16 code unit, ascending)
If all three are equal, the candidates are indistinguishable — the lowest ASCII model_id wins (a function of clause 3).
2.5 Output
{
scores: Object.freeze({ [model_id]: scores_float[model_id], ... }),
winner: lowest-ranked-key-after-tie-break,
}
Frozen at both levels. Numeric values in [0.0, 1.0]. winner ∈
ModelId union.
2.6 Empty-candidate fallback
If candidates.length === 0 (no enabled rows in the DB and no injected
snapshot), scoreIntent returns the Phase 0 frozen constant:
{ scores: { claude: 1.0 }, winner: 'claude' }
This preserves the forward-compat invariant that winner: ModelId is
non-null and that the router-fallback test suite (which exercises
routeRequest end-to-end with no injection) continues to assert
result.model === 'claude'.
3. κ rule-pack contract
3.1 DEFAULT_SCORING_RULES grammar
rule WeightTaskDomainMatch { guards { true -> admit } effects { set(weights.task_domain_match, 2000) } }
rule WeightContextWindowFit { guards { true -> admit } effects { set(weights.context_window_fit, 1500) } }
rule WeightCostEfficiency { guards { true -> admit } effects { set(weights.cost_efficiency, 1500) } }
rule WeightLatencyFit { guards { true -> admit } effects { set(weights.latency_fit, 1500) } }
rule WeightReliability { guards { true -> admit } effects { set(weights.reliability, 1500) } }
rule WeightSkillMatch { guards { true -> admit } effects { set(weights.skill_match, 1500) } }
rule WeightOperatorPref { guards { true -> admit } effects { set(weights.operator_preference, 500) } }
Sum invariant: 2000 + 1500 + 1500 + 1500 + 1500 + 1500 + 500 = 10000.
3.2 Required mutations (after executeRuleset)
The RuleRegistry.loadRuleset(DEFAULT_SCORING_RULES) → executeRuleset(...)
→ all_mutations stream MUST contain exactly 7 mutations, each:
kind === 'set'target === 'weights'field∈ the 7 dimension namesnew_valueis abigintin[0n, 10000n]
The shim filters mutations by these predicates, projects to
ScoringWeightSet, validates sum-to-10000, freezes, returns.
3.3 Failure modes
| Cause | Thrown error |
|---|---|
| Parse error in source | KappaRulesUnavailableError wrapping RulesetParseError |
| Validation error | KappaRulesUnavailableError wrapping RulesetValidationError |
| Ambiguous ruleset | KappaRulesUnavailableError wrapping AmbiguousRulesetError |
| Missing dimension after execute | KappaRulesUnavailableError |
| Sum ≠ 10000n | KappaRulesUnavailableError |
The class extends Error, exposes a code: 'KAPPA_RULES_UNAVAILABLE'
constant + a cause field pointing to the underlying κ error.
3.4 Determinism
loadScoringWeights(DEFAULT_SCORING_RULES) MUST return structurally equal
output across N invocations (caching is permitted but not required). The
underlying κ engine is deterministic by construction (P1.3.1) so the shim
inherits this.
4. Determinism contract (κ-grade)
The scoreIntent function MUST be deterministic across:
- 100 sequential invocations with the same
(prompt, context)→ same winner + samescoresvalues - Two arbiters with identical
(prompt, context, rule_version_hash, candidatesSnapshot)→ bit-identicalscores(load-bearing for θ consensus per concept doc §”Decision-trail recording”)
Forbidden inside scoreIntent:
Math.random,Date.now,crypto.*,setTimeout,setIntervalawait/async(sync function only)fetch,XMLHttpRequest,node:fs- Floating-point arithmetic for accumulation (single final divide to float is fine)
Permitted:
- Synchronous bigint arithmetic via
safe_mul/bps_mul Number()cast for final float projectionObject.freeze- Iteration over readonly arrays / objects
5. Acceptance criteria (mapped to verification tests)
| AC | Statement | Verification test |
|---|---|---|
| AC1 | scoreIntent(p, c) returns frozen IntentScore for any well-formed input |
test('output frozen at both levels') |
| AC2 | ModelId widened to 9 members; 'claude' retained |
type-level check in scoring.test.ts |
| AC3 | Golden-path vector: code-review task ⇒ Sonnet wins at ≈ 0.87 | test('golden-path: code-review 12k tokens 5s') |
| AC4 | GPT-4o ≈ 0.79 on golden-path | same test |
| AC5 | Haiku ≈ 0.58 on golden-path | same test |
| AC6 | Tie-break: equal scores ⇒ higher reliability wins | test('tie-break: reliability beats cost') |
| AC7 | Tie-break: equal reliability ⇒ lower cost wins | test('tie-break: cost when reliability tied') |
| AC8 | Tie-break: equal cost ⇒ alphabetical model_id wins | test('tie-break: alphabetical fallback') |
| AC9 | Determinism: same input × 100 ⇒ same winner | test('determinism: 100 invocations') |
| AC10 | Determinism: scores bit-identical across invocations | same test |
| AC11 | Weights sourced from κ (not constants in scoring.ts) | test('weights come from κ rule pack') |
| AC12 | loadScoringWeights() returns 7 dimensions with bps values summing to 10000 |
test('default weights sum to 10000') |
| AC13 | Forward-compat: scoreIntent('text') works (no context, no candidates) |
test('forward-compat: no-context call') |
| AC14 | Forward-compat: signature stable | type-level check (compiles vs. P0.5.2 fallback) |
| AC15 | Empty candidate snapshot ⇒ Phase 0 frozen constant | test('empty snapshot returns Phase 0 default') |
| AC16 | Invalid κ rule pack ⇒ KappaRulesUnavailableError |
test('invalid κ source throws KappaRulesUnavailableError') |
| AC17 | IntentScore.scores is frozen against mutation |
test('scores frozen against mutation') |
| AC18 | IntentScore.winner is read-only |
test('winner read-only') |
6. Forbidden in P1.5.1
- Editing main checkout
- Pushing to main
- Force-push /
--no-verify/--amend - Touching files outside
src/domains/router/scoring.ts,src/domains/router/scoring-weights.ts, tests, + the 5 chain docs - Hardcoding weights as object literals in
scoring.ts - Hardcoding the cohort (must read from
mcp_model_candidatesOR accept injected snapshot) - Removing
'claude'fromModelId - Implementing any adapter (P1.5.2/3/4 scope)
- Registering MCP tools (P1.5.7 scope)
- Wall-clock reads, randomness, file I/O beyond κ + candidate table
- New env vars (no
COLIBRI_*reads in P1.5.1)
7. Non-goals
- λ reputation history wired into
reliability_bps(Phase 2+ integration) - Per-call cost telemetry / metric emission (P1.5.6+)
- Operator-supplied κ rule packs loaded from disk (Phase 1.5+)
- Ensemble / pipeline routing modes (Phase 1.5+ adapter slices)
- ζ Decision Trail records of routing decisions (P1.5.5+)
8. Verification (test commands)
cd .worktrees/claude/p1-5-1-scoring
npm run build && npm run lint && npm test
Expected: build green, lint green, test count ≥ 3117 with the golden-path
vector + tie-break + determinism + κ-load tests passing. Zero regressions
against 330f38cd.
Step 2 of 5 — contract. Packet follows at docs/packets/p1-5-1-scoring-packet.md.