P1.5.7 — router_* MCP Tools Execution Packet

1. File plan

1.1. Create src/domains/router/tools.ts

Structure (mirrors src/domains/consensus/tools.ts):

§1. JSDoc header
§2. Imports
§3. Constants (ModelId enum tuple)
§4. Zod input schemas (4)
§5. Zod output schemas (3 — router_call omits output schema per contract §3.2)
§6. Public types (ts inferences)
§7. Handlers (4 — pure functions of input)
§8. registerRouterTools(ctx)

Estimated size: ~350 LOC including JSDoc.

1.2. Create src/__tests__/domains/router/tools.test.ts

Structure (mirrors src/__tests__/domains/consensus/tools.test.ts):

§1. Imports + fixtures
§2. router_score schema + handler tests
§3. router_call schema rejection tests + direct handler tests with mocked options
§4. router_fallback schema + handler tests
§5. router_stats schema + handler tests
§6. registerRouterTools (count + duplicate guard)

Estimated size: ~400 LOC (28-30 tests).

1.3. Modify src/server.ts

Two edits:

Edit 1 — line 50 area (import block):

+ import { registerRouterTools } from './domains/router/tools.js';

Edit 2 — line 593 area (after registerConsensusTools(ctx);):

+    // P1.5.7: register δ Router MCP tools — router_score, router_call,
+    // router_fallback, router_stats. Phase 1.5 W6 graduation: first δ
+    // MCP surface (Phase 0 P0.5.1/P0.5.2 shipped library-only stubs per
+    // ADR-005). Closes ADR-004 R75 Wave H tool-surface amendment for δ.
+    // Tool count moves from 23 → 27 (Phase 0: 14, λ R89A: 4, θ R89B: 5,
+    // δ Phase 1.5: 4).
+    registerRouterTools(ctx);

1.4. Modify src/domains/router/index.ts

One edit — append export * from './tools.js'; to the barrel.

2. Implementation walk

2.1. router_score handler

export function routerScore(input: RouterScoreInput): RouterScoreOutput {
  const scoringResult = scoreIntent(input.prompt, input.context ?? {});
  const ruleVersionHash = computeScoringRuleVersionHash();
  return Object.freeze({
    scores: scoringResult.scores,
    winner: scoringResult.winner,
    rule_version_hash: ruleVersionHash,
  });
}

Notes:

  • scoreIntent’s output is already frozen; we wrap with Object.freeze for the additional rule_version_hash field.
  • Passes through to the Phase 0 fallback constant when candidatesSnapshot is absent.

2.2. router_call handler

export async function routerCall(
  input: RouterCallInput,
): Promise<RouteResult> {
  // Project the validated MCP input to RouteOptions. The validated shape
  // is already a structural subset; this projection just renames keys
  // and drops `undefined` fields.
  const options: RouteOptions = {};
  if (input.options) {
    if (input.options.maxTokens !== undefined) options.maxTokens = input.options.maxTokens;
    if (input.options.systemPrompt !== undefined) options.systemPrompt = input.options.systemPrompt;
    if (input.options.model !== undefined) options.model = input.options.model;
    if (input.options.task !== undefined) options.task = input.options.task;
    if (input.options.operatorPreference !== undefined) options.operatorPreference = input.options.operatorPreference;
  }
  return await routeRequest(input.prompt, options);
}

Notes:

  • Does NOT pass through apiKey, completionFn, etc. — those keys are not in the Zod-validated input.options shape.
  • routeRequest returns a frozen RouteResult; we pass through unchanged.

2.3. router_fallback handler

export function routerFallback(input: RouterFallbackInput): RouterFallbackOutput {
  if (input.reset === true) {
    if (input.model_id !== undefined) {
      resetCircuitBreaker(input.model_id);
    } else {
      resetCircuitBreaker();
    }
  }
  const stateMap = getCircuitBreakerState();
  const circuitState: Record<string, CircuitState> = {};
  for (const [modelId, state] of stateMap) {
    circuitState[modelId] = state;
  }
  return Object.freeze({ circuitState: Object.freeze(circuitState) });
}

Notes:

  • The reset happens BEFORE the snapshot read — caller sees post-reset state.
  • Object.fromEntries would also work; explicit loop matches the codebase style and is type-narrower (avoid implicit any).

2.4. router_stats handler

export function routerStats(_input: RouterStatsInput): RouterStatsOutput {
  return getRouterStats();
}

Notes:

  • getRouterStats() already returns the exact wire shape; pass through.
  • Input ignored (empty object).

3. Zod schemas — full text

3.1. ModelId enum schema

const MODEL_IDS = [
  'claude', 'claude-sonnet-3-5', 'claude-haiku-3-5',
  'gpt-4o', 'gpt-4o-mini', 'gemini-1-5-pro',
  'llama-3-3-70b', 'mixtral-8x22b', 'kimi-k2',
] as const;

const ModelIdSchema = z.enum(MODEL_IDS);

3.2. TaskShape schema

const TaskShapeSchema = z.object({
  domain: z.string().optional(),
  tokens: z.number().int().nonnegative().optional(),
  deadline_ms: z.number().int().nonnegative().optional(),
  skill: z.array(z.string()).optional(),
}).strict();

3.3. router_score schemas

export const RouterScoreInputSchema = z.object({
  prompt: z.string().min(1),
  context: z.object({
    task: TaskShapeSchema.optional(),
    operatorPreference: z.record(z.string(), z.number().min(0).max(1)).optional(),
  }).strict().optional(),
}).strict();

export const RouterScoreOutputSchema = z.object({
  scores: z.record(z.string(), z.number()),
  winner: ModelIdSchema,
  rule_version_hash: z.string().regex(/^[0-9a-f]{64}$/),
}).strict();

3.4. router_call schemas

export const RouterCallInputSchema = z.object({
  prompt: z.string().min(1),
  options: z.object({
    maxTokens: z.number().int().positive().optional(),
    systemPrompt: z.string().optional(),
    model: z.string().optional(),
    task: TaskShapeSchema.optional(),
    operatorPreference: z.record(z.string(), z.number().min(0).max(1)).optional(),
  }).strict().optional(),
}).strict();

// Output schema OMITTED per contract §3.2 — RouteResult.content shape is variable.

3.5. router_fallback schemas

const CircuitStateSchema = z.object({
  failures: z.number().int().nonnegative(),
  openedAt: z.number().nullable(),
}).strict();

export const RouterFallbackInputSchema = z.object({
  model_id: ModelIdSchema.optional(),
  reset: z.boolean().optional(),
}).strict();

export const RouterFallbackOutputSchema = z.object({
  circuitState: z.record(z.string(), CircuitStateSchema),
}).strict();

3.6. router_stats schemas

export const RouterStatsInputSchema = z.object({}).strict();

const RouterStatsRowSchema = z.object({
  calls_total: z.number().int().nonnegative(),
  successes: z.number().int().nonnegative(),
  failures: z.number().int().nonnegative(),
  avg_cost_usd: z.number().nonnegative(),
  p50_latency_ms: z.number().nonnegative(),
  success_rate: z.number().min(0).max(1),
}).strict();

export const RouterStatsOutputSchema = z.object({
  models: z.record(z.string(), RouterStatsRowSchema),
}).strict();

4. registerRouterTools function

export function registerRouterTools(ctx: ColibriServerContext): void {
  registerColibriTool(
    ctx,
    'router_score',
    {
      title: 'router_score',
      description:
        'Score a prompt across the 9-member δ candidate cohort and return the winning ModelId. Returns the full {scores, winner, rule_version_hash} triple. The rule_version_hash anchors the κ rule pack the scoring formula consulted, so callers can prove which version of policy produced this routing decision.',
      inputSchema: RouterScoreInputSchema,
      outputSchema: RouterScoreOutputSchema,
    },
    (input): RouterScoreOutput => routerScore(input),
  );

  registerColibriTool(
    ctx,
    'router_call',
    {
      title: 'router_call',
      description:
        'Route a prompt through the δ N-member fallback chain (P1.5.5) and return the upstream completion. The chain walks scoreIntent-ranked candidates, skips models whose circuit breakers are open, and races each attempt against COLIBRI_MODEL_TIMEOUT. Throws via MCP HANDLER_ERROR on FallbackChainExhaustedError. apiKey + injection seams are rejected at the strict-input boundary.',
      inputSchema: RouterCallInputSchema,
      // outputSchema omitted: RouteResult.content shape varies by upstream model.
    },
    async (input): Promise<RouteResult> => routerCall(input),
  );

  registerColibriTool(
    ctx,
    'router_fallback',
    {
      title: 'router_fallback',
      description:
        'Inspect or reset the δ circuit-breaker state. With {reset:true, model_id:X}, clears one model. With {reset:true} alone, clears every model. With {reset:false} or {} alone, reads the current snapshot. Returns {circuitState: Record<ModelId, {failures, openedAt}>}.',
      inputSchema: RouterFallbackInputSchema,
      outputSchema: RouterFallbackOutputSchema,
    },
    (input): RouterFallbackOutput => routerFallback(input),
  );

  registerColibriTool(
    ctx,
    'router_stats',
    {
      title: 'router_stats',
      description:
        'Snapshot per-ModelId router aggregates: calls_total, successes, failures, avg_cost_usd, p50_latency_ms, success_rate. Models never touched are absent from the response. Resets are out of scope for this tool; use the test-only resetRouterStats() in cost.ts directly.',
      inputSchema: RouterStatsInputSchema,
      outputSchema: RouterStatsOutputSchema,
    },
    (_input): RouterStatsOutput => routerStats(_input),
  );
}

5. Test plan — 28 tests

5.1. router_score (7 tests)

  1. Empty-candidate fallback returns {claude: 1.0, winner: 'claude'} shape.
  2. Output includes rule_version_hash as 64 lowercase hex chars.
  3. Same input → same output (determinism).
  4. Schema rejects empty prompt ("").
  5. Schema rejects extra key in input (strict mode).
  6. Schema rejects negative task.tokens.
  7. Output passes the output schema.

5.2. router_call (5 tests — schema-only direct-handler)

  1. Schema accepts {prompt: "hi"} (no options).
  2. Schema accepts valid options (subset of RouteOptions).
  3. Schema REJECTS {prompt: "hi", options: {apiKey: "x"}} (forbidden key).
  4. Schema REJECTS {prompt: "hi", options: {completionFn: () => null}} (forbidden key).
  5. Schema REJECTS {prompt: "", ...} (min(1) constraint).

5.3. router_fallback (6 tests)

  1. Empty initial state → {circuitState: {}}.
  2. After 3 failures on gpt-4o (via recordFailure test-API) → circuitState['gpt-4o'].openedAt !== null.
  3. {reset: true, model_id: 'gpt-4o'} clears that model’s state.
  4. {reset: true} (no model_id) clears all models.
  5. Read-only call (no reset) does NOT mutate state.
  6. Schema rejects unknown model_id (e.g. 'made-up').

5.4. router_stats (5 tests)

  1. Empty initial state → {models: {}}.
  2. After one success record → models['claude'].calls_total === 1.
  3. After one success + one failure → success_rate === 0.5 for that model.
  4. Schema rejects extra keys (strict mode).
  5. Output passes the output schema.

5.5. registerRouterTools (2 tests)

  1. After call, ctx._registeredToolNames contains all 4 names; size is exactly 4 (on a fresh ctx).
  2. Second call on same ctx throws.

5.6. Cross-cutting (3 tests)

  1. All 4 handlers return frozen-at-top-level outputs.
  2. Source file tools.ts does NOT contain apiKey, completionFn, completionFnRegistry, scoringFn, fetchFn, delayFn, nowFn in any input schema (compile-time grep test).
  3. MODEL_IDS tuple has exactly 9 entries (matches ModelId union).

6. Build / lint / test gate

npm run build && npm run lint && npm test

Expected delta:

  • TypeScript build: clean (new module, no breaking edits).
  • Lint: clean (matches existing patterns).
  • Tests: +28 → 3325 + 28 = ~3353 (baseline 3325 on cf6221c9).
  • Server test (if it asserts global count) will need a bump 23 → 27 if such an assertion exists.

7. Pre-existing flakes (carry through, do not fix in this slice)

  • consensus/parity-harness G7.1 perf budget — CI-load sensitive, retry-clean
  • reputation/tools.test.ts parallel-migration prefix race — retry-clean
  • kimi.test.ts ● injection seams › 7. latency measurement — timer-imprecision under CI load (introduced by P1.5.2 W3)

Document these in the verification doc as known follow-ups.

8. Forbiddens checklist

  • No edits to scoring.ts, fallback.ts, circuit.ts, cost.ts, adapters/*.ts, scoring-weights.ts.
  • No κ rule edits.
  • No AMS_* env vars (only COLIBRI_*).
  • No apiKey or injection seams in MCP input schemas.
  • No internal thought_record emission (P1.5.10 scope).
  • No δ frontmatter graduation (stays partial).
  • No new tools beyond the 4 named.
  • No agent_spawn or any other deferred Phase 1.5+ tool.
  • No interactive cmds (git rebase -i, git add -i).
  • No edits to main checkout.

9. Implementation order

  1. Write src/domains/router/tools.ts (handlers + schemas + register).
  2. Edit src/domains/router/index.ts — append re-export.
  3. Edit src/server.ts — add import + add registerRouterTools(ctx); call in bootstrap.
  4. Write src/__tests__/domains/router/tools.test.ts.
  5. Run npm run build && npm run lint && npm test.
  6. Fix any issues (especially if server.test.ts asserts a fixed tool count).
  7. Step 5 verify doc.

10. Rollback plan

If npm run build fails: the new file is self-contained; remove the registerRouterTools import + call from src/server.ts and the build is green. The new file can stay (it’s unused without the call site) until the bug is fixed.

If npm test fails on a flaky existing test: retry once; if flake confirmed (consensus parity, reputation prefix, kimi latency timer), document as known and proceed.

If npm test fails on a new test: fix the handler / schema, do NOT skip.


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.