P1.5.1 — Real 7-dim Intent Scoring — Behavioral Contract (Step 2)

This contract defines the behavioral surface that src/domains/router/scoring.ts and src/domains/router/scoring-weights.ts MUST satisfy after P1.5.1 graduation. Each clause maps to a verification test in Step 5.

1. Public surface (locked)

1.1 Module: src/domains/router/scoring.ts

export type ModelId =
  | 'claude'                  // legacy abstract alias (Phase 0 / fallback test compat)
  | 'claude-sonnet-3-5'
  | 'claude-haiku-3-5'
  | 'gpt-4o'
  | 'gpt-4o-mini'
  | 'gemini-1-5-pro'
  | 'llama-3-3-70b'
  | 'mixtral-8x22b'
  | 'kimi-k2';

export interface ScoreContext {
  // Phase 0 (kept for compat)
  readonly toolCount?: number;
  readonly complexity?: string;

  // Phase 1.5 (new well-known keys)
  readonly task?: TaskShape;
  readonly operatorPreference?: Readonly<Partial<Record<ModelId, number>>>;
  readonly candidatesSnapshot?: ReadonlyArray<ModelCandidate>;
  readonly weightsSnapshot?: ScoringWeightSet;

  // Open extension
  readonly [key: string]: unknown;
}

export interface TaskShape {
  readonly domain?: string;             // e.g. 'code_review'
  readonly tokens?: number;             // estimated input size
  readonly deadline_ms?: number;        // request budget
  readonly skill?: readonly string[];   // required skill tags
}

export interface ModelCandidate {
  readonly model_id: ModelId;
  readonly provider: string;
  readonly context_window_tokens: number;
  readonly latency_tier: 'fast' | 'balanced' | 'slow';
  readonly cost_bps_per_kilotoken: number;
  readonly domain_fit_profile: number;   // 8-bit bitmask
  readonly enabled: boolean;
  readonly reliability_bps?: bigint;     // λ pull-through; default 0n
  readonly strengths?: readonly string[];// skill-set declaration; default []
  readonly task_domains?: readonly string[]; // task-domain whitelist; default []
}

export interface IntentScore {
  readonly scores: Readonly<Record<ModelId, number>>;
  readonly winner: ModelId;
}

export function scoreIntent(
  prompt: string,
  context?: ScoreContext,
): IntentScore;

Signature stability: scoreIntent(prompt: string, context?: ScoreContext) => IntentScore is byte-identical to Phase 0 modulo widened ModelId. The 'claude' literal stays in the union so existing callers (the P0.5.2 fallback module + its 50+ tests) keep compiling.

1.2 Module: src/domains/router/scoring-weights.ts (NEW)

export interface ScoringWeightSet {
  readonly task_domain_match: bigint;   // bps
  readonly context_window_fit: bigint;
  readonly cost_efficiency: bigint;
  readonly latency_fit: bigint;
  readonly reliability: bigint;
  readonly skill_match: bigint;
  readonly operator_preference: bigint;
}

export const SCORING_DIMENSIONS: readonly (keyof ScoringWeightSet)[];
export const DEFAULT_SCORING_RULES: string;
export const DEFAULT_RULE_VERSION_HASH: string;

export class KappaRulesUnavailableError extends Error {
  readonly code: 'KAPPA_RULES_UNAVAILABLE';
}

export function loadScoringWeights(
  source?: string,
): ScoringWeightSet;

Pure module: no DB, no env, no I/O. Source defaults to DEFAULT_SCORING_RULES (a module-level κ rule pack source string).

2. Algorithm (locked)

2.1 Inputs

Input Source Type
prompt argument string
task context.task TaskShape | undefined
candidates context.candidatesSnapshot ?? DB query readonly ModelCandidate[]
weights context.weightsSnapshot ?? loadScoringWeights() ScoringWeightSet
operatorPreference context.operatorPreference ?? {} Record<ModelId, number>

2.2 Per-dimension input formulae (per concept doc)

All seven dimensions normalize to a bps value in [0, BPS_100_PERCENT] (i.e. [0n, 10_000n]).

Dimension Input formula (bps, bigint)
task_domain_match BPS_100_PERCENT iff task.domain ∈ candidate.task_domains; else 0n
context_window_fit min(BPS_100_PERCENT, (candidate.context_window_tokens / max(task.tokens, 1)) * BPS_100_PERCENT)
cost_efficiency BPS_100_PERCENT - (candidate.cost_bps_per_kilotoken * BPS_100_PERCENT) / max(MAX_COST_BPS, 1), clamped [0, BPS_100_PERCENT]
latency_fit BPS_100_PERCENT - (latencyMs(candidate.tier) * BPS_100_PERCENT) / max(task.deadline_ms, 1), clamped [0, BPS_100_PERCENT]
reliability candidate.reliability_bps ?? 0n, clamped [0, BPS_100_PERCENT]
skill_match (intersect(candidate.strengths, task.skill) * BPS_100_PERCENT) / max(task.skill.length, 1)
operator_preference bps(operatorPreference[candidate.model_id] ?? 0.5), where input float ∈ [0, 1] → bps ∈ [0, 10_000n]

Constants:

  • MAX_COST_BPS = 1000n (10% per 1k tokens; per concept doc’s “indicative cost” range)
  • latencyMs(tier): fast → 1000, balanced → 4000, slow → 9000 (per concept doc’s latency tiers)

Token-count default: when task.tokens is undefined, fall back to max(1, prompt.length / 4) (4 chars/token approximation) so the dimension remains computable for raw-prompt callers.

2.3 Score accumulation (integer math)

for each candidate m sorted by m.model_id ASCII asc:
  score_bps_int64 = 0n
  for each dim ∈ SCORING_DIMENSIONS (declared order):
    weight = weights[dim]              // bigint, [0n, 10000n]
    input  = inputs_bps[m][dim]        // bigint, [0n, 10000n]
    score_bps_int64 += safe_mul(weight, input)   // int64-guarded; product ≤ 10^8
  score_per_10000[m] = score_bps_int64 / BPS_100_PERCENT
  // score_per_10000 lies in [0n, 10000n] when Σ weights ≤ 10000

scores_float[m] = Number(score_per_10000[m]) / 10000  // single divide to [0,1]

Invariant: Σ weights = 10000n (BPS_100_PERCENT) by construction. loadScoringWeights() validates this and throws KappaRulesUnavailableError when the sum drifts. Therefore score_per_10000[m] ∈ [0n, 10000n] and scores_float[m] ∈ [0.0, 1.0].

2.4 Tie-break

When two or more candidates produce identical score_per_10000 values, ranking is broken by, in order:

  1. Higher reliability_bps (descending)
  2. Lower cost_bps_per_kilotoken (ascending)
  3. Alphabetical model_id (ASCII / UTF-16 code unit, ascending)

If all three are equal, the candidates are indistinguishable — the lowest ASCII model_id wins (a function of clause 3).

2.5 Output

{
  scores: Object.freeze({ [model_id]: scores_float[model_id], ... }),
  winner: lowest-ranked-key-after-tie-break,
}

Frozen at both levels. Numeric values in [0.0, 1.0]. winnerModelId union.

2.6 Empty-candidate fallback

If candidates.length === 0 (no enabled rows in the DB and no injected snapshot), scoreIntent returns the Phase 0 frozen constant:

{ scores: { claude: 1.0 }, winner: 'claude' }

This preserves the forward-compat invariant that winner: ModelId is non-null and that the router-fallback test suite (which exercises routeRequest end-to-end with no injection) continues to assert result.model === 'claude'.

3. κ rule-pack contract

3.1 DEFAULT_SCORING_RULES grammar

rule WeightTaskDomainMatch { guards { true -> admit } effects { set(weights.task_domain_match, 2000) } }
rule WeightContextWindowFit { guards { true -> admit } effects { set(weights.context_window_fit, 1500) } }
rule WeightCostEfficiency  { guards { true -> admit } effects { set(weights.cost_efficiency, 1500) } }
rule WeightLatencyFit      { guards { true -> admit } effects { set(weights.latency_fit, 1500) } }
rule WeightReliability     { guards { true -> admit } effects { set(weights.reliability, 1500) } }
rule WeightSkillMatch      { guards { true -> admit } effects { set(weights.skill_match, 1500) } }
rule WeightOperatorPref    { guards { true -> admit } effects { set(weights.operator_preference, 500) } }

Sum invariant: 2000 + 1500 + 1500 + 1500 + 1500 + 1500 + 500 = 10000.

3.2 Required mutations (after executeRuleset)

The RuleRegistry.loadRuleset(DEFAULT_SCORING_RULES) → executeRuleset(...) → all_mutations stream MUST contain exactly 7 mutations, each:

  • kind === 'set'
  • target === 'weights'
  • field ∈ the 7 dimension names
  • new_value is a bigint in [0n, 10000n]

The shim filters mutations by these predicates, projects to ScoringWeightSet, validates sum-to-10000, freezes, returns.

3.3 Failure modes

Cause Thrown error
Parse error in source KappaRulesUnavailableError wrapping RulesetParseError
Validation error KappaRulesUnavailableError wrapping RulesetValidationError
Ambiguous ruleset KappaRulesUnavailableError wrapping AmbiguousRulesetError
Missing dimension after execute KappaRulesUnavailableError
Sum ≠ 10000n KappaRulesUnavailableError

The class extends Error, exposes a code: 'KAPPA_RULES_UNAVAILABLE' constant + a cause field pointing to the underlying κ error.

3.4 Determinism

loadScoringWeights(DEFAULT_SCORING_RULES) MUST return structurally equal output across N invocations (caching is permitted but not required). The underlying κ engine is deterministic by construction (P1.3.1) so the shim inherits this.

4. Determinism contract (κ-grade)

The scoreIntent function MUST be deterministic across:

  • 100 sequential invocations with the same (prompt, context) → same winner + same scores values
  • Two arbiters with identical (prompt, context, rule_version_hash, candidatesSnapshot) → bit-identical scores (load-bearing for θ consensus per concept doc §”Decision-trail recording”)

Forbidden inside scoreIntent:

  • Math.random, Date.now, crypto.*, setTimeout, setInterval
  • await / async (sync function only)
  • fetch, XMLHttpRequest, node:fs
  • Floating-point arithmetic for accumulation (single final divide to float is fine)

Permitted:

  • Synchronous bigint arithmetic via safe_mul / bps_mul
  • Number() cast for final float projection
  • Object.freeze
  • Iteration over readonly arrays / objects

5. Acceptance criteria (mapped to verification tests)

AC Statement Verification test
AC1 scoreIntent(p, c) returns frozen IntentScore for any well-formed input test('output frozen at both levels')
AC2 ModelId widened to 9 members; 'claude' retained type-level check in scoring.test.ts
AC3 Golden-path vector: code-review task ⇒ Sonnet wins at ≈ 0.87 test('golden-path: code-review 12k tokens 5s')
AC4 GPT-4o ≈ 0.79 on golden-path same test
AC5 Haiku ≈ 0.58 on golden-path same test
AC6 Tie-break: equal scores ⇒ higher reliability wins test('tie-break: reliability beats cost')
AC7 Tie-break: equal reliability ⇒ lower cost wins test('tie-break: cost when reliability tied')
AC8 Tie-break: equal cost ⇒ alphabetical model_id wins test('tie-break: alphabetical fallback')
AC9 Determinism: same input × 100 ⇒ same winner test('determinism: 100 invocations')
AC10 Determinism: scores bit-identical across invocations same test
AC11 Weights sourced from κ (not constants in scoring.ts) test('weights come from κ rule pack')
AC12 loadScoringWeights() returns 7 dimensions with bps values summing to 10000 test('default weights sum to 10000')
AC13 Forward-compat: scoreIntent('text') works (no context, no candidates) test('forward-compat: no-context call')
AC14 Forward-compat: signature stable type-level check (compiles vs. P0.5.2 fallback)
AC15 Empty candidate snapshot ⇒ Phase 0 frozen constant test('empty snapshot returns Phase 0 default')
AC16 Invalid κ rule pack ⇒ KappaRulesUnavailableError test('invalid κ source throws KappaRulesUnavailableError')
AC17 IntentScore.scores is frozen against mutation test('scores frozen against mutation')
AC18 IntentScore.winner is read-only test('winner read-only')

6. Forbidden in P1.5.1

  • Editing main checkout
  • Pushing to main
  • Force-push / --no-verify / --amend
  • Touching files outside src/domains/router/scoring.ts, src/domains/router/scoring-weights.ts, tests, + the 5 chain docs
  • Hardcoding weights as object literals in scoring.ts
  • Hardcoding the cohort (must read from mcp_model_candidates OR accept injected snapshot)
  • Removing 'claude' from ModelId
  • Implementing any adapter (P1.5.2/3/4 scope)
  • Registering MCP tools (P1.5.7 scope)
  • Wall-clock reads, randomness, file I/O beyond κ + candidate table
  • New env vars (no COLIBRI_* reads in P1.5.1)

7. Non-goals

  • λ reputation history wired into reliability_bps (Phase 2+ integration)
  • Per-call cost telemetry / metric emission (P1.5.6+)
  • Operator-supplied κ rule packs loaded from disk (Phase 1.5+)
  • Ensemble / pipeline routing modes (Phase 1.5+ adapter slices)
  • ζ Decision Trail records of routing decisions (P1.5.5+)

8. Verification (test commands)

cd .worktrees/claude/p1-5-1-scoring
npm run build && npm run lint && npm test

Expected: build green, lint green, test count ≥ 3117 with the golden-path vector + tie-break + determinism + κ-load tests passing. Zero regressions against 330f38cd.


Step 2 of 5 — contract. Packet follows at docs/packets/p1-5-1-scoring-packet.md.


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.