P1.5.7 — router_* MCP Tools Verification

1. Gates

Gate Command Result
Build npm run build PASS — no TypeScript errors
Lint npm run lint PASS — no ESLint errors
Test npm test PASS — 3353/3353 tests across 77 suites

Baseline on cf6221c9 (post-P1.5.6): 3325 tests across 76 suites. Delta after P1.5.7: +28 tests (the new src/__tests__/domains/router/tools.test.ts suite). +1 suite.

2. Files changed

docs/audits/p1-5-7-mcp-tools-audit.md             (new, 241 lines)
docs/contracts/p1-5-7-mcp-tools-contract.md       (new, 315 lines)
docs/packets/p1-5-7-mcp-tools-packet.md           (new, 403 lines)
docs/verification/p1-5-7-mcp-tools-verification.md (new, this file)
src/domains/router/tools.ts                        (new, 502 lines)
src/__tests__/domains/router/tools.test.ts        (new, 339 lines)
src/domains/router/index.ts                        (+1 line — barrel re-export)
src/server.ts                                      (+2 lines import, +14 lines register call)

3. MCP tool surface count proof

3.1. Pre-P1.5.7 surface (23 tools on main cf6221c9)

Wave Domain Tools
Phase 0 P0.2.1 / P0.7 / P0.6 / P0.3 / P0.8 α / ε / ζ / β / η server_ping, server_health, thought_record, thought_record_list, audit_verify_chain, skill_list, task_create, task_get, task_update, task_list, task_next_actions, audit_session_start, merkle_finalize, merkle_root
Total (Phase 0)   14
R89 Phase A (P2.5.1) λ reputation_get, reputation_history, reputation_leaderboard, reputation_check_gates
Total (R89A)   +4 = 18
R89 Phase B (P3.7.1) θ consensus_propose, consensus_vote, consensus_finality, consensus_gossip, vrf_eval
Total (R89B)   +5 = 23

3.2. Post-P1.5.7 surface (27 tools)

Wave Domain Tools
P1.5.7 (this slice) δ router_score, router_call, router_fallback, router_stats
Total (Phase 1.5 W6)   +4 = 27

3.3. Registration call site

src/server.ts:594 (after registerConsensusTools(ctx);):

// P1.5.7: register δ Router MCP tools — router_score, router_call,
// router_fallback, router_stats. Phase 1.5 W6 graduation: first δ
// MCP surface (Phase 0 P0.5.1/P0.5.2 shipped library-only stubs per
// ADR-005). Closes ADR-004 R75 Wave H tool-surface amendment for δ.
// Tool count moves from 23 → 27 (Phase 0: 14, λ R89A: 4, θ R89B: 5,
// δ Phase 1.5: 4). Underlying handlers wrap REAL functions
// (scoreIntent / routeRequest / getCircuitBreakerState /
// resetCircuitBreaker / getRouterStats); no stubs. apiKey +
// injection seams (completionFn, fetchFn, scoringFn, delayFn,
// logger, nowFn) are rejected at the strict-Zod input boundary;
// secrets come from COLIBRI_*_API_KEY env vars only.
registerRouterTools(ctx);

4. Tool-by-tool acceptance verification

4.1. router_score

AC Status Evidence
Wraps scoreIntent(prompt, context ?? {}) PASS tools.ts:300-321
Returns {scores, winner, rule_version_hash} PASS Output type at tools.ts:266, schema at tools.ts:209-216
Zod input strict + min(1) on prompt PASS tools.ts:138-150
Output schema rule_version_hash validates ^sha256:[0-9a-f]{64}$ PASS tools.ts:213
Test 1: empty-candidate path returns winner: 'claude' PASS Test √ 1
Test 2: rule_version_hash matches sha256 pattern PASS Test √ 2
Test 3: determinism PASS Test √ 3
Test 4: empty prompt rejected PASS Test √ 4
Test 5: extra key rejected (strict) PASS Test √ 5
Test 6: negative tokens rejected PASS Test √ 6
Test 7: output passes output schema PASS Test √ 7

4.2. router_call

AC Status Evidence
Wraps routeRequest(prompt, options) PASS tools.ts:340-371
Returns full RouteResult (including costUsd, modelsAttempted) PASS Return type RouterCallOutput = RouteResult at tools.ts:269
Zod schema rejects apiKey PASS Test √ 10
Zod schema rejects completionFn PASS Test √ 11
Zod schema rejects empty prompt PASS Test √ 12
Zod schema accepts valid options subset PASS Tests √ 8, √ 9
Output schema OMITTED (variable shape) NOTE Documented in contract §3.2 and source §5
FallbackChainExhaustedError propagates to MCP HANDLER_ERROR PASS by inheritance Middleware Stage-4 catch wraps any throw (src/server.ts:361-374); error.message preserves attempted-model list (fallback.ts:312-323)

4.3. router_fallback

AC Status Evidence
Wraps getCircuitBreakerState() PASS tools.ts:376-393
Wraps resetCircuitBreaker(modelId?) PASS tools.ts:378-384
{reset: true, model_id: X} clears one model PASS Test √ 15
{reset: true} clears all models PASS Test √ 16
Read-only call does NOT mutate state PASS Test √ 17
Zod schema rejects unknown model_id PASS Test √ 18
Output is plain object (not Map) PASS Test √ 13, √ 14 (uses toEqual({...}))

4.4. router_stats

AC Status Evidence
Wraps getRouterStats() PASS tools.ts:401-407
Returns {models: Record<ModelId, RouterStats>} PASS Test √ 23
Empty state → {models: {}} (sparse) PASS Test √ 19
After success record → calls_total = 1 PASS Test √ 20
After success+failure → success_rate = 0.5 PASS Test √ 21
Zod input strict (rejects extras) PASS Test √ 22
Output passes output schema PASS Test √ 23

4.5. registerRouterTools

AC Status Evidence
Adds exactly 4 names PASS Test √ 24
Idempotence guard throws on second call PASS Test √ 25

4.6. Cross-cutting

AC Status Evidence
Outputs frozen PASS Test √ 26
No forbidden keys in input schemas (source grep) PASS Test √ 27
MODEL_IDS tuple length 9 matches ModelId union PASS Test √ 28

5. Forbiddens check

All forbiddens listed in the round prompt and the slice file’s FORBIDDENS section verified:

Forbidden Status
Edit main checkout (E:\AMS) NOT VIOLATED — all edits in .worktrees/claude/p1-5-7-mcp-tools
Push to main NOT VIOLATED — push step uses feature/p1-5-7-mcp-tools
Force push NOT VIOLATED — no force push
--no-verify or --amend NOT VIOLATED — 5 sequential commits
Edit underlying router function NOT VIOLATED — scoring.ts, fallback.ts, circuit.ts, cost.ts, adapters/* untouched
Edit any adapter (W3 territory) NOT VIOLATED
Use Zod v4 NOT VIOLATED — uses Zod v3.23 surface (.strict(), .safeParse())
AMS_* env vars NOT VIOLATED — only COLIBRI_* referenced
Implement parity tests (P1.5.8) NOT VIOLATED — scope limited to schema + handler tests
ζ integration (P1.5.10) NOT VIOLATED — no thought_record call in this module
Promote δ frontmatter partialcomplete NOT VIOLATED — no frontmatter edits
Invent tools beyond 4 NOT VIOLATED — exactly 4 tools registered
Accept apiKey in MCP input NOT VIOLATED — strict schemas reject it (Test √ 10)
Accept injection seams (completionFn / scoringFn / fetchFn / logger / delayFn / nowFn) NOT VIOLATED — strict schemas reject them (Test √ 11, √ 27 static grep)

6. Known pre-existing flakes (out of scope)

These flakes were documented in the round prompt and observed during the test runs:

  1. consensus/parity-harness G7.1 perf budget — CI-load sensitive, retry-clean.
    • Hit once during the initial full-suite run (6363 ms > 5000 ms budget).
    • Retry was clean (1637 ms).
    • Out of P1.5.7 scope.
  2. reputation/tools.test.ts parallel-migration prefix race — retry-clean.
    • Not observed in this round’s runs.
    • Out of P1.5.7 scope.
  3. kimi.test.ts ● injection seams › 7. latency measurement: 50ms delay → latencyMs >= 50 — timer-imprecision under CI load (introduced by P1.5.2 W3).
    • Not observed in this round’s runs.
    • Out of P1.5.7 scope; documented as candidate for P1.5.8 parity suite to address.

7. Test tail

Final clean full-suite run:

Test Suites: 77 passed, 77 total
Tests:       3353 passed, 3353 total
Snapshots:   0 total
Time:        38.999 s, estimated 53 s
Ran all test suites.

8. Tool summary table

Tool Input Output Wraps
router_score {prompt: string≥1, context?: {task?, operatorPreference?}} {scores: Record<string,number>, winner: ModelId, rule_version_hash: sha256:hex64} scoreIntent + computeScoringRuleVersionHash
router_call {prompt: string≥1, options?: {maxTokens?, systemPrompt?, model?, task?, operatorPreference?}} RouteResult (full shape from fallback.ts:232-242, includes costUsd, modelsAttempted) routeRequest
router_fallback {model_id?: ModelId, reset?: boolean} {circuitState: Record<ModelId, {failures: number, openedAt: number\|null}>} getCircuitBreakerState + optional resetCircuitBreaker
router_stats {} {models: Record<ModelId, {calls_total, successes, failures, avg_cost_usd, p50_latency_ms, success_rate}>} getRouterStats

9. Compliance with slice file (P1.5.7 §”Ready-to-paste agent prompt”)

Slice requirement Status
Export 4 MCP tool factories PASS
router_score Zod input {prompt, context?} PASS
router_call Zod input mirrors RouteOptions (apiKey NOT accepted) PASS — apiKey + injection seams rejected
router_fallback Zod input {model_id?, reset?} PASS
router_stats Zod input {} PASS
Tool count 14 → 18 (slice file wording, predated R89 A+B) EFFECTIVELY: 23 → 27 — slice was written before R89 closed and is stale. Net effect on δ axis is +4 tools, matching slice intent.
Tests: Zod validation + bad-input rejection + ζ emission PASS for Zod + bad-input. ζ emission deferred to P1.5.10 per round prompt’s explicit forbidden.
Single global completion callable rejection from MCP PASS (Test √ 11)
npm run build && npm run lint && npm test green PASS

10. Deviations from slice / contract

  1. router_call output schema OMITTED: contract §3.2 documents this — RouteResult.content is variable across upstream adapters. The handler stays type-safe at the TS level. This deviates from the consensus exemplar (which includes output schemas on all 5 tools).

  2. rule_version_hash format: contract §2.2 initially specified ^[0-9a-f]{64}$. The live computeVersionHash implementation prefixes with sha256:, so the schema (and Test 2) was adjusted to ^sha256:[0-9a-f]{64}$. Documented in §4.1 above.

  3. ζ emission deferred: The slice file’s acceptance criterion “All 4 tools emit a thought_record (type ‘decision’)” is explicitly deferred to P1.5.10 per the round prompt’s “ζ integration (P1.5.10 scope)” forbidden. The middleware’s audit-enter/audit-exit stages already wrap every tool call; P1.5.10’s SQLite-backed audit sink will produce the thought_record entries without further router-tools edits.

  4. tools param omitted from router_call: Slice file mentions tools in router_call’s RouteOptions mirror. The Anthropic tool-shape pass-through is too complex to validate at the MCP boundary without re-vendoring the schema; deferred to a future round (forward-compatible additive change).

  5. Slice’s “14 → 18” tool-count wording is stale: The slice file was authored before R89 Phase A+B added 9 tools (4 λ + 5 θ). Pre-P1.5.7 surface is actually 23 (14 + 4 λ + 5 θ); post-P1.5.7 is 27. The net δ contribution (+4 tools) matches slice intent.


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.