P1.5.1 — Real 7-dim Intent Scoring — Behavioral Contract (Step 2)

This contract defines the behavioral surface that src/domains/router/scoring.ts and src/domains/router/scoring-weights.ts MUST satisfy after P1.5.1 graduation. Each clause maps to a verification test in Step 5.

1. Public surface (locked)

1.1 Module: `src/domains/router/scoring.ts`

export type ModelId =
  | 'claude'                  // legacy abstract alias (Phase 0 / fallback test compat)
  | 'claude-sonnet-3-5'
  | 'claude-haiku-3-5'
  | 'gpt-4o'
  | 'gpt-4o-mini'
  | 'gemini-1-5-pro'
  | 'llama-3-3-70b'
  | 'mixtral-8x22b'
  | 'kimi-k2';

export interface ScoreContext {
  // Phase 0 (kept for compat)
  readonly toolCount?: number;
  readonly complexity?: string;

  // Phase 1.5 (new well-known keys)
  readonly task?: TaskShape;
  readonly operatorPreference?: Readonly<Partial<Record<ModelId, number>>>;
  readonly candidatesSnapshot?: ReadonlyArray<ModelCandidate>;
  readonly weightsSnapshot?: ScoringWeightSet;

  // Open extension
  readonly [key: string]: unknown;
}

export interface TaskShape {
  readonly domain?: string;             // e.g. 'code_review'
  readonly tokens?: number;             // estimated input size
  readonly deadline_ms?: number;        // request budget
  readonly skill?: readonly string[];   // required skill tags
}

export interface ModelCandidate {
  readonly model_id: ModelId;
  readonly provider: string;
  readonly context_window_tokens: number;
  readonly latency_tier: 'fast' | 'balanced' | 'slow';
  readonly cost_bps_per_kilotoken: number;
  readonly domain_fit_profile: number;   // 8-bit bitmask
  readonly enabled: boolean;
  readonly reliability_bps?: bigint;     // λ pull-through; default 0n
  readonly strengths?: readonly string[];// skill-set declaration; default []
  readonly task_domains?: readonly string[]; // task-domain whitelist; default []
}

export interface IntentScore {
  readonly scores: Readonly<Record<ModelId, number>>;
  readonly winner: ModelId;
}

export function scoreIntent(
  prompt: string,
  context?: ScoreContext,
): IntentScore;

Signature stability: scoreIntent(prompt: string, context?: ScoreContext) => IntentScore is byte-identical to Phase 0 modulo widened ModelId. The 'claude' literal stays in the union so existing callers (the P0.5.2 fallback module + its 50+ tests) keep compiling.

1.2 Module: `src/domains/router/scoring-weights.ts` (NEW)

export interface ScoringWeightSet {
  readonly task_domain_match: bigint;   // bps
  readonly context_window_fit: bigint;
  readonly cost_efficiency: bigint;
  readonly latency_fit: bigint;
  readonly reliability: bigint;
  readonly skill_match: bigint;
  readonly operator_preference: bigint;
}

export const SCORING_DIMENSIONS: readonly (keyof ScoringWeightSet)[];
export const DEFAULT_SCORING_RULES: string;
export const DEFAULT_RULE_VERSION_HASH: string;

export class KappaRulesUnavailableError extends Error {
  readonly code: 'KAPPA_RULES_UNAVAILABLE';
}

export function loadScoringWeights(
  source?: string,
): ScoringWeightSet;

Pure module: no DB, no env, no I/O. Source defaults to DEFAULT_SCORING_RULES (a module-level κ rule pack source string).

2. Algorithm (locked)

2.1 Inputs

Input	Source	Type
`prompt`	argument	`string`
`task`	`context.task`	`TaskShape \| undefined`
`candidates`	`context.candidatesSnapshot` ?? DB query	`readonly ModelCandidate[]`
`weights`	`context.weightsSnapshot` ?? `loadScoringWeights()`	`ScoringWeightSet`
`operatorPreference`	`context.operatorPreference` ?? `{}`	`Record<ModelId, number>`

2.2 Per-dimension input formulae (per concept doc)

All seven dimensions normalize to a bps value in [0, BPS_100_PERCENT] (i.e. [0n, 10_000n]).

Dimension	Input formula (bps, bigint)
`task_domain_match`	`BPS_100_PERCENT` iff `task.domain ∈ candidate.task_domains`; else `0n`
`context_window_fit`	`min(BPS_100_PERCENT, (candidate.context_window_tokens / max(task.tokens, 1)) * BPS_100_PERCENT)`
`cost_efficiency`	`BPS_100_PERCENT - (candidate.cost_bps_per_kilotoken * BPS_100_PERCENT) / max(MAX_COST_BPS, 1)`, clamped `[0, BPS_100_PERCENT]`
`latency_fit`	`BPS_100_PERCENT - (latencyMs(candidate.tier) * BPS_100_PERCENT) / max(task.deadline_ms, 1)`, clamped `[0, BPS_100_PERCENT]`
`reliability`	`candidate.reliability_bps ?? 0n`, clamped `[0, BPS_100_PERCENT]`
`skill_match`	`(intersect(candidate.strengths, task.skill) * BPS_100_PERCENT) / max(task.skill.length, 1)`
`operator_preference`	`bps(operatorPreference[candidate.model_id] ?? 0.5)`, where input float ∈ `[0, 1]` → bps ∈ `[0, 10_000n]`

Constants:

MAX_COST_BPS = 1000n (10% per 1k tokens; per concept doc’s “indicative cost” range)
latencyMs(tier): fast → 1000, balanced → 4000, slow → 9000 (per concept doc’s latency tiers)

Token-count default: when task.tokens is undefined, fall back to max(1, prompt.length / 4) (4 chars/token approximation) so the dimension remains computable for raw-prompt callers.

2.3 Score accumulation (integer math)

for each candidate m sorted by m.model_id ASCII asc:
  score_bps_int64 = 0n
  for each dim ∈ SCORING_DIMENSIONS (declared order):
    weight = weights[dim]              // bigint, [0n, 10000n]
    input  = inputs_bps[m][dim]        // bigint, [0n, 10000n]
    score_bps_int64 += safe_mul(weight, input)   // int64-guarded; product ≤ 10^8
  score_per_10000[m] = score_bps_int64 / BPS_100_PERCENT
  // score_per_10000 lies in [0n, 10000n] when Σ weights ≤ 10000

scores_float[m] = Number(score_per_10000[m]) / 10000  // single divide to [0,1]

Invariant: Σ weights = 10000n (BPS_100_PERCENT) by construction. loadScoringWeights() validates this and throws KappaRulesUnavailableError when the sum drifts. Therefore score_per_10000[m] ∈ [0n, 10000n] and scores_float[m] ∈ [0.0, 1.0].

2.4 Tie-break

When two or more candidates produce identical score_per_10000 values, ranking is broken by, in order:

Higher reliability_bps (descending)
Lower cost_bps_per_kilotoken (ascending)
Alphabetical model_id (ASCII / UTF-16 code unit, ascending)

If all three are equal, the candidates are indistinguishable — the lowest ASCII model_id wins (a function of clause 3).

2.5 Output

{
  scores: Object.freeze({ [model_id]: scores_float[model_id], ... }),
  winner: lowest-ranked-key-after-tie-break,
}

Frozen at both levels. Numeric values in [0.0, 1.0]. winner ∈ ModelId union.

2.6 Empty-candidate fallback

If candidates.length === 0 (no enabled rows in the DB and no injected snapshot), scoreIntent returns the Phase 0 frozen constant:

{ scores: { claude: 1.0 }, winner: 'claude' }

This preserves the forward-compat invariant that winner: ModelId is non-null and that the router-fallback test suite (which exercises routeRequest end-to-end with no injection) continues to assert result.model === 'claude'.

3. κ rule-pack contract

3.1 DEFAULT_SCORING_RULES grammar

rule WeightTaskDomainMatch { guards { true -> admit } effects { set(weights.task_domain_match, 2000) } }
rule WeightContextWindowFit { guards { true -> admit } effects { set(weights.context_window_fit, 1500) } }
rule WeightCostEfficiency  { guards { true -> admit } effects { set(weights.cost_efficiency, 1500) } }
rule WeightLatencyFit      { guards { true -> admit } effects { set(weights.latency_fit, 1500) } }
rule WeightReliability     { guards { true -> admit } effects { set(weights.reliability, 1500) } }
rule WeightSkillMatch      { guards { true -> admit } effects { set(weights.skill_match, 1500) } }
rule WeightOperatorPref    { guards { true -> admit } effects { set(weights.operator_preference, 500) } }

Sum invariant: 2000 + 1500 + 1500 + 1500 + 1500 + 1500 + 500 = 10000.

3.2 Required mutations (after `executeRuleset`)

The RuleRegistry.loadRuleset(DEFAULT_SCORING_RULES) → executeRuleset(...) → all_mutations stream MUST contain exactly 7 mutations, each:

kind === 'set'
target === 'weights'
field ∈ the 7 dimension names
new_value is a bigint in [0n, 10000n]

The shim filters mutations by these predicates, projects to ScoringWeightSet, validates sum-to-10000, freezes, returns.

3.3 Failure modes

Cause	Thrown error
Parse error in source	`KappaRulesUnavailableError` wrapping `RulesetParseError`
Validation error	`KappaRulesUnavailableError` wrapping `RulesetValidationError`
Ambiguous ruleset	`KappaRulesUnavailableError` wrapping `AmbiguousRulesetError`
Missing dimension after execute	`KappaRulesUnavailableError`
Sum ≠ 10000n	`KappaRulesUnavailableError`

The class extends Error, exposes a code: 'KAPPA_RULES_UNAVAILABLE' constant + a cause field pointing to the underlying κ error.

3.4 Determinism

loadScoringWeights(DEFAULT_SCORING_RULES) MUST return structurally equal output across N invocations (caching is permitted but not required). The underlying κ engine is deterministic by construction (P1.3.1) so the shim inherits this.

4. Determinism contract (κ-grade)

The scoreIntent function MUST be deterministic across:

100 sequential invocations with the same (prompt, context) → same winner + same scores values
Two arbiters with identical (prompt, context, rule_version_hash, candidatesSnapshot) → bit-identical scores (load-bearing for θ consensus per concept doc §”Decision-trail recording”)

Forbidden inside scoreIntent:

Math.random, Date.now, crypto.*, setTimeout, setInterval
await / async (sync function only)
fetch, XMLHttpRequest, node:fs
Floating-point arithmetic for accumulation (single final divide to float is fine)

Permitted:

Synchronous bigint arithmetic via safe_mul / bps_mul
Number() cast for final float projection
Object.freeze
Iteration over readonly arrays / objects

5. Acceptance criteria (mapped to verification tests)

AC	Statement	Verification test
AC1	`scoreIntent(p, c)` returns frozen `IntentScore` for any well-formed input	`test('output frozen at both levels')`
AC2	`ModelId` widened to 9 members; `'claude'` retained	type-level check in `scoring.test.ts`
AC3	Golden-path vector: code-review task ⇒ Sonnet wins at ≈ 0.87	`test('golden-path: code-review 12k tokens 5s')`
AC4	GPT-4o ≈ 0.79 on golden-path	same test
AC5	Haiku ≈ 0.58 on golden-path	same test
AC6	Tie-break: equal scores ⇒ higher reliability wins	`test('tie-break: reliability beats cost')`
AC7	Tie-break: equal reliability ⇒ lower cost wins	`test('tie-break: cost when reliability tied')`
AC8	Tie-break: equal cost ⇒ alphabetical model_id wins	`test('tie-break: alphabetical fallback')`
AC9	Determinism: same input × 100 ⇒ same winner	`test('determinism: 100 invocations')`
AC10	Determinism: scores bit-identical across invocations	same test
AC11	Weights sourced from κ (not constants in scoring.ts)	`test('weights come from κ rule pack')`
AC12	`loadScoringWeights()` returns 7 dimensions with bps values summing to 10000	`test('default weights sum to 10000')`
AC13	Forward-compat: `scoreIntent('text')` works (no context, no candidates)	`test('forward-compat: no-context call')`
AC14	Forward-compat: signature stable	type-level check (compiles vs. P0.5.2 fallback)
AC15	Empty candidate snapshot ⇒ Phase 0 frozen constant	`test('empty snapshot returns Phase 0 default')`
AC16	Invalid κ rule pack ⇒ `KappaRulesUnavailableError`	`test('invalid κ source throws KappaRulesUnavailableError')`
AC17	`IntentScore.scores` is frozen against mutation	`test('scores frozen against mutation')`
AC18	`IntentScore.winner` is read-only	`test('winner read-only')`

6. Forbidden in P1.5.1

Editing main checkout
Pushing to main
Force-push / --no-verify / --amend
Touching files outside src/domains/router/scoring.ts, src/domains/router/scoring-weights.ts, tests, + the 5 chain docs
Hardcoding weights as object literals in scoring.ts
Hardcoding the cohort (must read from mcp_model_candidates OR accept injected snapshot)
Removing 'claude' from ModelId
Implementing any adapter (P1.5.2/3/4 scope)
Registering MCP tools (P1.5.7 scope)
Wall-clock reads, randomness, file I/O beyond κ + candidate table
New env vars (no COLIBRI_* reads in P1.5.1)

7. Non-goals

λ reputation history wired into reliability_bps (Phase 2+ integration)
Per-call cost telemetry / metric emission (P1.5.6+)
Operator-supplied κ rule packs loaded from disk (Phase 1.5+)
Ensemble / pipeline routing modes (Phase 1.5+ adapter slices)
ζ Decision Trail records of routing decisions (P1.5.5+)

8. Verification (test commands)

cd .worktrees/claude/p1-5-1-scoring
npm run build && npm run lint && npm test

Expected: build green, lint green, test count ≥ 3117 with the golden-path vector + tie-break + determinism + κ-load tests passing. Zero regressions against 330f38cd.

Step 2 of 5 — contract. Packet follows at docs/packets/p1-5-1-scoring-packet.md.