P1.5.1 — Real 7-dim Intent Scoring — Execution Packet (Step 3)
This packet is the implementation plan for Step 4. Every code change is listed below in execution order with file path, identifier name, and the behavior it implements. The packet gates Step 4; no implementation may diverge from this plan without an updated packet.
1. File deltas at a glance
| File | Operation | LOC delta (est.) |
|---|---|---|
src/domains/router/scoring.ts |
Rewrite body, keep exports | +250 / -85 (≈net +165) |
src/domains/router/scoring-weights.ts |
Create | +180 |
src/__tests__/domains/router/scoring.test.ts |
Rewrite | +400 / -200 (≈net +200) |
src/__tests__/domains/router/scoring-weights.test.ts |
Create | +150 |
Forbidden (out of P1.5.1 scope per dispatch): fallback.ts, index.ts,
integrations/claude.ts, any src/db/migrations/*, src/server.ts.
2. Execution order
Step 4.1 — Create src/domains/router/scoring-weights.ts (κ shim)
Step 4.2 — Rewrite src/domains/router/scoring.ts (7-dim formula)
Step 4.3 — Rewrite src/__tests__/domains/router/scoring.test.ts
Step 4.4 — Create src/__tests__/domains/router/scoring-weights.test.ts
Step 4.5 — npm run build (confirm TS compiles)
Step 4.6 — npm run lint (confirm style green)
Step 4.7 — npm test (confirm tests pass)
3. Step 4.1 — src/domains/router/scoring-weights.ts
3.1 Exports
export const SCORING_DIMENSIONS = [
'task_domain_match',
'context_window_fit',
'cost_efficiency',
'latency_fit',
'reliability',
'skill_match',
'operator_preference',
] as const;
export type ScoringDimension = (typeof SCORING_DIMENSIONS)[number];
export type ScoringWeightSet = Readonly<Record<ScoringDimension, bigint>>;
export const DEFAULT_SCORING_RULES: string;
export const DEFAULT_RULE_VERSION_HASH: string;
export class KappaRulesUnavailableError extends Error {
readonly code: 'KAPPA_RULES_UNAVAILABLE';
readonly cause?: Error;
}
export function loadScoringWeights(source?: string): ScoringWeightSet;
3.2 DEFAULT_SCORING_RULES content (verified parses + executes — see packet §4)
rule WeightTaskDomainMatch { guards { true -> admit } effects { set($state.task_domain_match, 2000) } }
rule WeightContextWindowFit { guards { true -> admit } effects { set($state.context_window_fit, 1500) } }
rule WeightCostEfficiency { guards { true -> admit } effects { set($state.cost_efficiency, 1500) } }
rule WeightLatencyFit { guards { true -> admit } effects { set($state.latency_fit, 1500) } }
rule WeightReliability { guards { true -> admit } effects { set($state.reliability, 1500) } }
rule WeightSkillMatch { guards { true -> admit } effects { set($state.skill_match, 1500) } }
rule WeightOperatorPref { guards { true -> admit } effects { set($state.operator_preference, 500) } }
Sum: 2000 + 6×1500 + 500 = 2000 + 9000 + 500 wait: 6×1500=9000? No, 6 is wrong. There are 5 rules with weight 1500 (context, cost, latency, reliability, skill) plus 1 with 2000 (domain) plus 1 with 500 (operator). 2000 + 5×1500 + 500 = 2000 + 7500 + 500 = 10000 ✓
3.3 loadScoringWeights algorithm
function loadScoringWeights(source: string = DEFAULT_SCORING_RULES): ScoringWeightSet {
if (cached !== null && source === DEFAULT_SCORING_RULES) return cached;
let registry;
try {
registry = RuleRegistry.loadRuleset(source);
} catch (err) {
throw new KappaRulesUnavailableError(
'Failed to parse scoring rule pack',
err instanceof Error ? err : undefined,
);
}
let result;
try {
result = executeRuleset(registry, {}, {}, RULE_VERSION_TAG, 0n);
} catch (err) {
throw new KappaRulesUnavailableError(
'Failed to execute scoring rule pack',
err instanceof Error ? err : undefined,
);
}
const weights: Partial<Record<ScoringDimension, bigint>> = {};
for (const m of result.all_mutations) {
if (m.kind !== 'set') continue;
if (m.target !== '$state') continue;
if (!SCORING_DIMENSIONS.includes(m.field as ScoringDimension)) continue;
if (typeof m.new_value !== 'bigint') continue;
if (m.new_value < 0n || m.new_value > BPS_100_PERCENT) continue;
weights[m.field as ScoringDimension] = m.new_value;
}
for (const dim of SCORING_DIMENSIONS) {
if (weights[dim] === undefined) {
throw new KappaRulesUnavailableError(`Missing weight: ${dim}`);
}
}
let sum = 0n;
for (const dim of SCORING_DIMENSIONS) sum += weights[dim]!;
if (sum !== BPS_100_PERCENT) {
throw new KappaRulesUnavailableError(
`Weights sum to ${sum}, expected ${BPS_100_PERCENT}`,
);
}
const frozen = Object.freeze(weights) as ScoringWeightSet;
if (source === DEFAULT_SCORING_RULES) cached = frozen;
return frozen;
}
DEFAULT_RULE_VERSION_HASH is exported for downstream callers (decision-trail
ζ records will include it post-P1.5.5). Its value is the
RuleRegistry.computeVersionHash() of DEFAULT_SCORING_RULES. Computed
lazily on first loadScoringWeights(undefined) call.
4. Step 4.2 — src/domains/router/scoring.ts
4.1 Type additions
export type ModelId =
| 'claude'
| 'claude-sonnet-3-5'
| 'claude-haiku-3-5'
| 'gpt-4o'
| 'gpt-4o-mini'
| 'gemini-1-5-pro'
| 'llama-3-3-70b'
| 'mixtral-8x22b'
| 'kimi-k2';
export interface TaskShape {
readonly domain?: string;
readonly tokens?: number;
readonly deadline_ms?: number;
readonly skill?: readonly string[];
}
export interface ModelCandidate {
readonly model_id: ModelId;
readonly provider: string;
readonly context_window_tokens: number;
readonly latency_tier: 'fast' | 'balanced' | 'slow';
readonly cost_bps_per_kilotoken: number;
readonly domain_fit_profile: number;
readonly enabled: boolean;
readonly reliability_bps?: bigint;
readonly strengths?: readonly string[];
readonly task_domains?: readonly string[];
}
export interface ScoreContext {
readonly toolCount?: number;
readonly complexity?: string;
readonly task?: TaskShape;
readonly operatorPreference?: Readonly<Partial<Record<ModelId, number>>>;
readonly candidatesSnapshot?: ReadonlyArray<ModelCandidate>;
readonly weightsSnapshot?: ScoringWeightSet;
readonly [key: string]: unknown;
}
export interface IntentScore {
readonly scores: Readonly<Record<ModelId, number>>;
readonly winner: ModelId;
}
4.2 Module-level constants
const MAX_COST_BPS = 1000n; // per concept doc "indicative cost" max
const LATENCY_TIER_MS: Readonly<Record<'fast' | 'balanced' | 'slow', bigint>> = Object.freeze({
fast: 1000n,
balanced: 4000n,
slow: 9000n,
});
const DEFAULT_OPERATOR_PREF_BPS = 5000n; // 0.5 in bps
const PHASE_0_CLAUDE_WINNER: IntentScore = Object.freeze({
scores: Object.freeze({ claude: 1.0 }),
winner: 'claude' as const,
});
4.3 Pure helpers (module-private)
function bpsForFraction(num: bigint, denom: bigint): bigint {
// Clamp to [0n, BPS_100_PERCENT].
if (denom <= 0n) return 0n;
const raw = (num * BPS_100_PERCENT) / denom;
if (raw <= 0n) return 0n;
if (raw >= BPS_100_PERCENT) return BPS_100_PERCENT;
return raw;
}
function clampBps(v: bigint): bigint {
if (v < 0n) return 0n;
if (v > BPS_100_PERCENT) return BPS_100_PERCENT;
return v;
}
function estimateTokens(prompt: string, task: TaskShape): bigint {
if (typeof task.tokens === 'number' && task.tokens > 0) return BigInt(task.tokens);
return BigInt(Math.max(1, Math.ceil(prompt.length / 4)));
}
Wait — Math.ceil and Math.max are forbidden inside κ-grade modules
per the determinism corpus. scoring.ts is NOT κ runtime — it’s the router
layer; determinism is enforced via test, not via the corpus scanner. But to
stay strictly κ-compatible (concept doc §”δ is deterministic by construction
— no per-invocation randomness enters scoring”), the helper should avoid
Math. Replacement:
function estimateTokens(prompt: string, task: TaskShape): bigint {
if (typeof task.tokens === 'number' && task.tokens > 0) return BigInt(task.tokens);
// ceil(prompt.length / 4): (len + 3) / 4 integer division
const lenPlus3 = BigInt(prompt.length + 3);
const t = lenPlus3 / 4n;
return t > 0n ? t : 1n;
}
Per-dimension input computation:
function inputBpsForCandidate(
candidate: ModelCandidate,
task: TaskShape,
prompt: string,
operatorPref: Readonly<Partial<Record<ModelId, number>>>,
): Readonly<Record<ScoringDimension, bigint>> {
// task_domain_match: 1 if task.domain ∈ task_domains else 0
const taskDomain = task.domain;
const candidateDomains = candidate.task_domains ?? [];
const taskDomainMatch =
taskDomain !== undefined && candidateDomains.includes(taskDomain)
? BPS_100_PERCENT
: 0n;
// context_window_fit: min(1, window / max(tokens, 1))
const tokens = estimateTokens(prompt, task);
const window = BigInt(candidate.context_window_tokens);
const contextWindowFit =
window >= tokens
? BPS_100_PERCENT
: bpsForFraction(window, tokens);
// cost_efficiency: 1 - cost/MAX_COST clamped to [0,1]
const cost = BigInt(candidate.cost_bps_per_kilotoken);
const costRatio = bpsForFraction(cost, MAX_COST_BPS);
const costEfficiency = clampBps(BPS_100_PERCENT - costRatio);
// latency_fit: 1 - p50_latency/deadline clamped to [0,1]
const deadline =
typeof task.deadline_ms === 'number' && task.deadline_ms > 0
? BigInt(task.deadline_ms)
: 0n;
const p50 = LATENCY_TIER_MS[candidate.latency_tier];
const latencyFit =
deadline === 0n
? 0n // no deadline declared ⇒ cannot satisfy latency dimension
: clampBps(BPS_100_PERCENT - bpsForFraction(p50, deadline));
// reliability: candidate.reliability_bps or 0
const reliability = clampBps(candidate.reliability_bps ?? 0n);
// skill_match: intersect(strengths, skill_req) / |skill_req|
const skillReq = task.skill ?? [];
const strengths = candidate.strengths ?? [];
let overlap = 0n;
for (const s of skillReq) {
if (strengths.includes(s)) overlap += 1n;
}
const skillMatch =
skillReq.length === 0
? 0n
: bpsForFraction(overlap, BigInt(skillReq.length));
// operator_preference: pref[model_id] * 10000, default 0.5
const prefRaw = operatorPref[candidate.model_id];
// prefRaw is a Number in [0, 1]; convert to bps via int math:
// bps = round(prefRaw * 10000) — but we must avoid float drift
// Acceptable: use Math.round here since operator_preference comes from
// operator config, which is the only float-source allowed (concept doc
// §"operator-supplied override weight"). For strict determinism, the
// operator can supply bps-encoded preference via context.operatorPreferenceBps.
// For now we use Math.round as it's deterministic given the same float input.
let operatorPreference: bigint;
if (typeof prefRaw === 'number' && Number.isFinite(prefRaw)) {
const clamped = prefRaw < 0 ? 0 : prefRaw > 1 ? 1 : prefRaw;
operatorPreference = BigInt(Math.round(clamped * 10000));
} else {
operatorPreference = DEFAULT_OPERATOR_PREF_BPS;
}
return Object.freeze({
task_domain_match: taskDomainMatch,
context_window_fit: contextWindowFit,
cost_efficiency: costEfficiency,
latency_fit: latencyFit,
reliability: reliability,
skill_match: skillMatch,
operator_preference: operatorPreference,
});
}
Determinism note: Math.round on a finite Number is deterministic by
IEEE-754 spec; identical Number inputs always produce identical bigint
outputs. The corpus scanner restriction in determinism.ts applies to κ
rule-engine source, not router code; the staging file’s prohibition is
“no randomness, no wall-clock reads, no I/O” — Math.round is none of
these. Determinism is verified via the property test.
4.4 The scoreIntent body
export function scoreIntent(
prompt: string,
context: ScoreContext = {},
): IntentScore {
const candidates = context.candidatesSnapshot;
if (candidates === undefined || candidates.length === 0) {
// Empty snapshot: forward-compat with Phase 0 fallback test suite.
return PHASE_0_CLAUDE_WINNER;
}
const weights = context.weightsSnapshot ?? loadScoringWeights();
const task = context.task ?? {};
const operatorPref = context.operatorPreference ?? {};
// Sort candidates by ASCII model_id asc (deterministic comparator).
const sorted = [...candidates].sort((a, b) =>
a.model_id < b.model_id ? -1 : a.model_id > b.model_id ? 1 : 0,
);
const scoresBps: Record<string, bigint> = {};
const scoresFloat: Record<string, number> = {};
for (const c of sorted) {
const inputs = inputBpsForCandidate(c, task, prompt, operatorPref);
let scoreInt = 0n;
for (const dim of SCORING_DIMENSIONS) {
// safe_mul guards int64; product per term ≤ 10000n * 10000n = 1e8 (well below int64)
scoreInt += safe_mul(weights[dim], inputs[dim]);
}
// scoreInt is in [0n, BPS_100_PERCENT * BPS_100_PERCENT]; divide once.
const scorePerTenThousand = scoreInt / BPS_100_PERCENT;
scoresBps[c.model_id] = scorePerTenThousand;
scoresFloat[c.model_id] = Number(scorePerTenThousand) / 10000;
}
// Tie-break: build (model_id, scoreBps, reliability_bps, cost_bps) tuples,
// sort by:
// 1. scoreBps DESC
// 2. reliability_bps DESC
// 3. cost_bps_per_kilotoken ASC
// 4. model_id ASC
const ranking = sorted.map((c) => ({
model_id: c.model_id,
scoreBps: scoresBps[c.model_id]!,
reliability_bps: c.reliability_bps ?? 0n,
cost_bps: BigInt(c.cost_bps_per_kilotoken),
}));
ranking.sort((a, b) => {
if (a.scoreBps !== b.scoreBps) return b.scoreBps > a.scoreBps ? 1 : -1;
if (a.reliability_bps !== b.reliability_bps)
return b.reliability_bps > a.reliability_bps ? 1 : -1;
if (a.cost_bps !== b.cost_bps) return a.cost_bps > b.cost_bps ? 1 : -1;
return a.model_id < b.model_id ? -1 : a.model_id > b.model_id ? 1 : 0;
});
const winner = ranking[0]!.model_id as ModelId;
return Object.freeze({
scores: Object.freeze(scoresFloat) as Readonly<Record<ModelId, number>>,
winner,
});
}
5. Step 4.3 — src/__tests__/domains/router/scoring.test.ts (REWRITE)
5.1 Drop
- AC1 “constant output invariants” block (5 tests)
- AC1 “prompt-invariance” block (5 tests) — replaced by golden-path
- AC2 “context-invariance” block (5 tests) — context now matters
- “determinism > returns same object identity” — Phase 0-only check
5.2 Keep / rework
- “determinism > same input twice returns deep-equal result” (kept)
- “immutability” block (4 tests, kept)
- AC9 “public interface shape” (kept, expanded for new fields)
5.3 Add
describe('scoreIntent — golden-path vector (concept doc worked example)') {
test('Sonnet wins at ≈0.87, GPT-4o ≈0.79, Haiku ≈0.58') // AC3-5
}
describe('scoreIntent — tie-break') {
test('higher reliability beats lower reliability') // AC6
test('lower cost wins when reliability tied') // AC7
test('alphabetical model_id wins when cost tied') // AC8
}
describe('scoreIntent — determinism (P1.5.1 reload)') {
test('100 sequential invocations produce identical winner') // AC9
test('100 sequential invocations produce identical scores') // AC10
}
describe('scoreIntent — κ-driven weights') {
test('uses weights from κ rule pack') // AC11
test('weightsSnapshot override works') // (forward-compat)
}
describe('scoreIntent — forward-compatibility') {
test('no-context call returns Phase 0 frozen constant') // AC13/15
test('empty candidate snapshot returns Phase 0 frozen constant')
test('signature byte-identical to Phase 0 (compiles vs. fallback.ts)') // AC14
}
5.4 Golden-path fixture details
Task:
domain: 'code_review'
tokens: 12000
deadline_ms: 5000
skill: ['code_review']
Candidates (3 candidates injected):
| model_id | window | tier | cost | reliability_bps | strengths | task_domains |
|---|---|---|---|---|---|---|
| claude-sonnet-3-5 | 200000 | balanced (4000ms) | 300 | 9600n (0.96) | [‘code_review’] | [‘code_review’] |
| gpt-4o | 200000 | slow (9000ms→ no, override to 8000ms — use a 4th tier? doc says GPT-4o is “balanced” which is 4000ms, but worked example says 8s) | 250 | 9200n | [‘code_review’] | [‘code_review’] |
| claude-haiku-3-5 | 200000 | fast (1000ms) | 80 | 7500n | [] | [] |
Worked example inputs (from concept doc table):
- Sonnet: domain=1.0, window=1.0, cost=0.55 (cost=300/MAX=1000⇒1-0.3? no, cost_efficiency=1-0.30=0.70… let me recompute), latency=0.80 (4000/5000⇒1-0.8=0.20? no, doc says “0.80 (4s)” meaning latency_fit value is 0.80 when latency is 4s vs deadline 5s, so 1-(4000/5000)=0.20… but the doc shows 0.80)
The concept doc latency_fit formula is 1 − (model.p50_ms / task.deadline_ms).
For Sonnet at 4000ms vs 5000ms deadline: 1 - (4000/5000) = 1 - 0.8 = 0.20.
But the doc table shows “0.80 (4s)” for Sonnet’s latency_fit. This is an
inverted reading of the formula in the doc. To match the doc’s worked
example numerical outputs (0.87 / 0.79 / 0.58 winner ordering), the test
fixture must back-solve the inputs that give those scores.
Back-solving from doc:
- Default weights: 0.20 / 0.15 / 0.15 / 0.15 / 0.15 / 0.15 / 0.05
- Sonnet target: 0.87. Given doc cells (domain=1.0, window=1.0, cost=0.55, latency=0.80, reliability=0.96, skill=1.0, pref=0.5): 0.20·1.0 + 0.15·1.0 + 0.15·0.55 + 0.15·0.80 + 0.15·0.96 + 0.15·1.0 + 0.05·0.5 = 0.20 + 0.15 + 0.0825 + 0.12 + 0.144 + 0.15 + 0.025 = 0.8715 ≈ 0.87 ✓
- GPT-4o: 0.20·1.0 + 0.15·1.0 + 0.15·0.55 + 0.15·0.20 + 0.15·0.92 + 0.15·1.0 + 0.05·0.5 = 0.20 + 0.15 + 0.0825 + 0.03 + 0.138 + 0.15 + 0.025 = 0.7755 ≈ 0.78 (doc says 0.79; rounding noise)
- Haiku: 0.20·0 + 0.15·1.0 + 0.15·0.90 + 0.15·0.95 + 0.15·0.75 + 0.15·0 + 0.05·0.5 = 0 + 0.15 + 0.135 + 0.1425 + 0.1125 + 0 + 0.025 = 0.565 ≈ 0.58 (doc rounds up to 0.58)
So the doc does use latency_fit = 1 - p50/deadline (Sonnet p50≈1000ms,
not 4000ms). The latency-tier mapping must be:
- Sonnet → p50 = 1000ms ⇒ latency_fit = 1 - 1000/5000 = 0.80 ✓ (matches doc “4s” wallclock per “4 ÷ 5 = 0.8” mismatch — the doc’s “4s” parens is a wallclock observation; the formula in §”Scoring formula” is
1 - p50/deadlineand the table column header islatency_fit. So0.80 = 1 - 1000/5000.) Wait — but the doc says0.80 (4s). The “(4s)” is presumably p50 = 4s but the calc gives 1-4/5=0.2, not 0.8. The doc has internal inconsistency. The scoring formula in §”Scoring formula” is unambiguous:latency_fit = 1 − (model.p50_ms / task.deadline_ms)clamped. To make the worked-example numeric outputs work, p50 must be small for Sonnet, larger for GPT-4o, smallest for Haiku.
Backsolve:
- Sonnet latency_fit = 0.80 ⇒ p50 = 0.20 × 5000 = 1000ms
- GPT-4o latency_fit = 0.20 ⇒ p50 = 0.80 × 5000 = 4000ms
- Haiku latency_fit = 0.95 ⇒ p50 = 0.05 × 5000 = 250ms
The candidate tiers from migration 009 (which P1.5.1 reads):
- Sonnet: balanced
- GPT-4o: balanced
- Haiku: fast
If we map tiers fast→1000, balanced→4000, slow→9000, the result is:
- Sonnet latency_fit = 1 - 4000/5000 = 0.20 (NOT 0.80)
- GPT-4o latency_fit = 1 - 4000/5000 = 0.20 (matches doc)
- Haiku latency_fit = 1 - 1000/5000 = 0.80 (NOT 0.95)
To pass the golden-path test as documented (Sonnet wins at 0.87, etc.), the
test must override the latency tier mapping by providing per-candidate
explicit p50 fields. Add p50_latency_ms?: number to ModelCandidate so
golden-path tests inject the exact p50 values; production code falls back to
LATENCY_TIER_MS[tier] when p50_latency_ms is absent.
5.5 Updated ModelCandidate
export interface ModelCandidate {
readonly model_id: ModelId;
readonly provider: string;
readonly context_window_tokens: number;
readonly latency_tier: 'fast' | 'balanced' | 'slow';
readonly cost_bps_per_kilotoken: number;
readonly domain_fit_profile: number;
readonly enabled: boolean;
// Optional rich fields:
readonly reliability_bps?: bigint;
readonly strengths?: readonly string[];
readonly task_domains?: readonly string[];
readonly p50_latency_ms?: number; // overrides LATENCY_TIER_MS lookup
}
Golden-path fixture:
const SONNET: ModelCandidate = {
model_id: 'claude-sonnet-3-5', provider: 'anthropic',
context_window_tokens: 200_000, latency_tier: 'balanced',
cost_bps_per_kilotoken: 450, domain_fit_profile: 139, enabled: true,
reliability_bps: 9600n, strengths: ['code_review'], task_domains: ['code_review'],
p50_latency_ms: 1000,
};
const GPT4O: ModelCandidate = {
model_id: 'gpt-4o', provider: 'openai',
context_window_tokens: 128_000, latency_tier: 'balanced',
cost_bps_per_kilotoken: 450, domain_fit_profile: 35, enabled: true,
reliability_bps: 9200n, strengths: ['code_review'], task_domains: ['code_review'],
p50_latency_ms: 4000,
};
const HAIKU: ModelCandidate = {
model_id: 'claude-haiku-3-5', provider: 'anthropic',
context_window_tokens: 200_000, latency_tier: 'fast',
cost_bps_per_kilotoken: 100, domain_fit_profile: 66, enabled: true,
reliability_bps: 7500n, strengths: [], task_domains: [],
p50_latency_ms: 250,
};
const TASK: TaskShape = {
domain: 'code_review',
tokens: 12_000,
deadline_ms: 5_000,
skill: ['code_review'],
};
// Cost values back-solved for `cost_efficiency`:
// Sonnet target=0.55: 1 - (450/MAX_COST_BPS=1000) = 1 - 0.45 = 0.55 ⇒ cost=450 ✓
// GPT-4o target=0.55: 1 - 450/1000 = 0.55 ⇒ cost=450 ✓
// Haiku target=0.90: 1 - 100/1000 = 0.90 ⇒ cost=100 ✓
Operator preference for all three = 0.5 (default).
5.6 Expected outputs (test assertions)
expect(result.winner).toBe('claude-sonnet-3-5');
expect(result.scores['claude-sonnet-3-5']).toBeCloseTo(0.87, 2); // ±0.005
expect(result.scores['gpt-4o']).toBeCloseTo(0.78, 2);
expect(result.scores['claude-haiku-3-5']).toBeCloseTo(0.57, 2); // 0.565 rounds to 0.57
(Exact bps math gives 0.8715 / 0.7755 / 0.565 — toBeCloseTo with 2
digits handles minor floor-vs-doc-rounding noise.)
6. Step 4.4 — src/__tests__/domains/router/scoring-weights.test.ts
describe('loadScoringWeights — DEFAULT_SCORING_RULES') {
test('returns 7 dimensions')
test('weights sum to BPS_100_PERCENT (10000n)')
test('individual weights match expected bps values')
test('returns same object across N calls (cache)')
}
describe('loadScoringWeights — error paths') {
test('parse error throws KappaRulesUnavailableError')
test('validation error throws KappaRulesUnavailableError')
test('missing dimension throws KappaRulesUnavailableError')
test('wrong sum throws KappaRulesUnavailableError')
test('non-bigint value throws KappaRulesUnavailableError')
}
describe('SCORING_DIMENSIONS') {
test('contains exactly 7 entries')
test('matches concept doc order')
}
describe('KappaRulesUnavailableError') {
test('code = "KAPPA_RULES_UNAVAILABLE"')
test('preserves cause when underlying κ error provided')
}
7. Step 4.5 / 4.6 / 4.7 — Verification
npm run build # TS compile
npm run lint # ESLint pass
npm test # 3117 + Δ tests; zero regressions vs main 330f38cd
Expected new test count: ≈ 3117 + 20 (golden-path + tie-break + κ shim) ≈ 3137.
Some Phase 0 tests in scoring.test.ts are removed (the “always returns
PHASE_0_CLAUDE_WINNER” assertions don’t survive the formula upgrade) and
replaced with the new ones; net delta is positive.
8. Validation summary (κ rule pack prototype verified)
Already validated (packet drafting):
$ npm run build
... build green
$ node /tmp/test-kappa.mjs # uses dist/domains/rules/*
Registry size: 7
Mutations:
kind=set target=$state field=task_domain_match new_value=2000n
kind=set target=$state field=context_window_fit new_value=1500n
kind=set target=$state field=cost_efficiency new_value=1500n
kind=set target=$state field=latency_fit new_value=1500n
kind=set target=$state field=reliability new_value=1500n
kind=set target=$state field=skill_match new_value=1500n
kind=set target=$state field=operator_preference new_value=500n
All 7 mutations land with the expected target / field / bigint value. Sum = 10000 (BPS_100_PERCENT). Order is alphabetical-within-StateTransition category as κ engine guarantees.
9. Out-of-scope tail (acknowledged, not implemented)
- λ reputation pull-through into
reliability_bpsfromreputations.execution - Live DB read for
mcp_model_candidates(production callers will need this in P1.5.5+; for P1.5.1 the seam iscontext.candidatesSnapshotinjection) - ζ decision-trail records for routing decisions
- π governance hooks for κ rule-pack proposals
- Multi-tier
LATENCY_TIER_MS(“very_slow”, “very_fast” — not in spec)
Step 3 of 5 — packet. Implementation follows at src/domains/router/scoring.ts + scoring-weights.ts.