P2.5.1 — Reputation Query MCP Tools — Execution Packet

Slice: p2-5-1-tools — closes λ Phase 2 at 7/7 Audit: docs/audits/p2-5-1-tools-audit.md @ f8dc2f62 Contract: docs/contracts/p2-5-1-tools-contract.md @ 2bd55de0 Base: origin/main @ 618b1a13

1. File plan

1.1 NEW: `src/domains/reputation/tools.ts` (~280 lines)

Top-of-file docblock cites: audit / contract / packet, source prompt §P2.5.1, selectReputation (P2.1.1), apply_decay/apply_decay_batch (P2.2.1), can_arbitrate/can_govern/max_parallel_tasks/rate_limit_bonus/stake_discount (P2.4.1), BPS_100_PERCENT (κ P1.1.3).

Imports (NodeNext .js suffix throughout):

import type Database from 'better-sqlite3';
import { z } from 'zod';
import { getDb } from '../../db/index.js';
import { registerColibriTool, type ColibriServerContext } from '../../server.js';
import {
  DOMAINS, DomainSchema,
  selectHistory, selectReputation,
  type Domain, type ReputationHistoryRow, type ReputationRow,
} from './schema.js';
import { apply_decay, apply_decay_batch } from './decay.js';
import {
  can_arbitrate, can_govern, max_parallel_tasks,
  rate_limit_bonus, stake_discount,
} from './limits.js';
import { BPS_100_PERCENT } from '../rules/bps-constants.js';

Sections:

§A Constants — DEFAULT_HISTORY_LIMIT = 50, MAX_HISTORY_LIMIT = 500, DEFAULT_LEADERBOARD_LIMIT = 100, MAX_LEADERBOARD_LIMIT = 1000, LEADERBOARD_OVERSHOOT_CAP = 200.
§B Zod input schemas — 4 .strict() objects matching contract §2.1, §3.1, §4.1, §5.1.
§C Public input/output type exports — matching contract §9.
§D Handler reputationGet(db, input) — synchronous; single-domain vs all-domain branch; decay applied.
§E Handler reputationHistory(db, input) — calls selectHistory(db, node_id, domain, { limit, offset }).
§F Handler reputationLeaderboard(db, input) — overshoot SELECT, batch decay, re-sort, slice.
§G Handler reputationCheckGates(db, input) — composes P2.4.1 derivations with selectReputation(db, node_id) and missing-domain fallback rows.
§H registerReputationTools(ctx) — 4 registerColibriTool calls.

1.2 EDIT: `src/server.ts` (1 import + 1 call ≈ +2 lines)

// After existing P0.8.3 merkle import block
import { registerReputationTools } from './domains/reputation/tools.js';

And inside bootstrap() after registerMerkleTools(ctx);:

// P2.5.1: register λ Reputation read-only query tools (reputation_get,
// reputation_history, reputation_leaderboard, reputation_check_gates).
// Closes λ Phase 2 at 7/7 — first λ MCP surface. Handlers lazy-resolve
// getDb() at call-time; DB opened in Phase 2 before any call arrives.
registerReputationTools(ctx);

1.3 NEW: `src/tests/domains/reputation/tools.test.ts` (~450 lines)

Test posture mirrors src/__tests__/domains/reputation/schema.test.ts:

Per-test temp os.tmpdir() paths via randomUUID().
afterEach calls closeDb() and recursively removes temp dirs (Windows WAL lock catch).
Real SQLite via initDb(dbPath) (applies migration 007 + everything earlier).
Direct insertion via SQL for setup (mimics schema.test.ts §4 setup).

Suite breakdown (matches contract §10 AC-1 to AC-9, plus type-matrix coverage):

describe('reputation_get') — 6 tests
- single-domain decay (insert epoch=100 score=8000 → read epoch=100 → 8000; read epoch=196 → matches decay(8000n, 500n, 96n))
- single-domain, row absent → null
- all-domains, 5 rows present → length 5 ordered by domain
- all-domains, 0 rows → empty array
- decay short-circuit when current_epoch ≤ last_activity_epoch (clock skew)
- does NOT mutate last_activity_epoch — assert original value preserved
describe('reputation_history') — 5 tests
- pagination: 100 events → page 1 = 50 ordered DESC, page 2 next 50
- default limit (50)
- empty history → []
- max-cap respected (limit=500 accepted)
- oversized limit (501) rejected by Zod
describe('reputation_leaderboard') — 4 tests
- 10 nodes, decay disabled → top 3 by score DESC
- decay flips ordering (older last_activity_epoch decays more)
- tie-break by node_id ASC
- default limit (100) returns all 10
describe('reputation_check_gates') — 4 tests
- happy path — rep_arb=5000, rep_exec=3000, rep_gov=4000 → all gates true
- banned arbitration → can_arbitrate=false even at threshold score
- missing node — all gates at zero-rep defaults
- cross-domain — rep_arb=5000 but rep_exec=2999 → can_arbitrate=false
describe('Zod rejections') — 6 tests
- bad domain string in each tool
- negative offset
- limit < 1
- missing node_id
- negative current_epoch
- extra unknown keys via .strict()
describe('read-only invariant') — 2 tests
- row count + history count unchanged after every tool invocation
- source file grep: no INSERT/UPDATE/DELETE SQL in tools.ts
describe('registerReputationTools') — 1 test
- registers 4 names, all present in ctx._registeredToolNames

Total target: 28 tests.

1.4 Files NOT changed

src/domains/reputation/schema.ts (P2.1.1) — no change; readers already return the right shapes
src/domains/reputation/compute.ts (P2.1.2) — no change; not used by P2.5.1
src/domains/reputation/decay.ts (P2.2.1) — no change; apply_decay already pure
src/domains/reputation/penalties.ts (P2.2.2) — no change
src/domains/reputation/tokens.ts (P2.3.1) — no change
src/domains/reputation/limits.ts (P2.4.1) — no change
src/domains/reputation/witnesses.ts — no change
src/db/migrations/*.sql — no change; existing schema already supports leaderboard reads

2. Implementation order

Write tools.ts skeleton with imports + Zod schemas + types.
Implement reputationGet — simplest of the four.
Implement reputationHistory — thin pass-through to selectHistory.
Implement reputationLeaderboard — overshoot + sort + slice.
Implement reputationCheckGates — domain map + fallback + P2.4.1 composition.
Write registerReputationTools.
Wire into src/server.ts.
Run npx tsc --noEmit to validate types before touching tests.
Write tools.test.ts in sections matching the suite plan above.
Run npm test -- tools.test.ts iteratively until all pass.
Full npm run build && npm run lint && npm test gate.

3. Risk register

Risk	Mitigation
R1 — `score` casts `bigint → number` lose precision	Asserted-safe: bps range [0, 10000] < 2^14 « Number.MAX_SAFE_INTEGER. Test boundary at score=10000.
R2 — Leaderboard decay-on-read changes ordering past the SELECT cutoff	Document via overshoot (`min(2 × limit, 200)`); test covers ordering flip. Caveat documented in audit §7 + PR body.
R3 — `Date.now()` slips into the handler	Strict review pass + grep in the verification step. Determinism scanner is reputation-domain-scoped but applies.
R4 — A tool accidentally mutates rows	(a) all helpers are pure libraries (`apply_decay` returns new objects, `selectReputation` reads only). (b) grep-test rejects any INSERT/UPDATE/DELETE in `tools.ts`.
R5 — Tool name collision (already-registered name)	`registerColibriTool` throws on duplicate (server.ts:293); we register only fresh `reputation_*` names.
R6 — `.strict()` Zod rejects valid extra-key call (forward-compat hazard)	Matches `thought_record`’s `.strict()` convention; future schema evolution adds a new tool, not new keys.
R7 — Test flake from server-startup-smoke (pre-existing)	Out of scope; documented as carry-over in MEMORY.md. Retry once on rare flake.
R8 — Leaderboard with 0 nodes	Test: empty `reputations` → empty array, no throw.

4. Lint / type checks

Pass npm run lint (eslint + typescript-eslint with strict-boolean-expressions, no-unused-vars, etc.). Particular care: avoid as any, prefer explicit type-guards on Zod-parsed input → handler boundary.
No // eslint-disable directives added in this slice unless required by a known mirror in src/server.ts:408-style cast (and even then, isolate to one line + cite the rationale comment).
npx tsc --noEmit clean against tsconfig.json.

5. Tool registration order in `server.ts`

registerColibriTool(ctx, 'server_ping', ...);   // existing
registerHealthTool(ctx);                         // existing
registerThoughtTools(ctx);                       // existing — 2 tools
registerVerifyChainTool(ctx);                    // existing
registerSkillTools(ctx);                         // existing
registerTaskTools(ctx);                          // existing — 5 tools
registerMerkleTools(ctx);                        // existing — 3 tools
registerReputationTools(ctx);                    // NEW — 4 tools (P2.5.1)

Final 18-tool surface (alpha-sorted):

audit_session_start (η)
audit_verify_chain (ζ)
merkle_finalize (η)
merkle_root (η)
reputation_check_gates (λ — NEW)
reputation_get (λ — NEW)
reputation_history (λ — NEW)
reputation_leaderboard (λ — NEW)
server_health (α)
server_ping (α)
skill_list (ε)
task_create (β)
task_get (β)
task_list (β)
task_next_actions (β)
task_update (β)
thought_record (ζ)
thought_record_list (ζ)

6. Commit plan

#	Commit subject	Files
1	`audit(p2-5-1-tools): inventory surface`	`docs/audits/p2-5-1-tools-audit.md`
2	`contract(p2-5-1-tools): behavioral contract`	`docs/contracts/p2-5-1-tools-contract.md`
3	`packet(p2-5-1-tools): execution plan`	`docs/packets/p2-5-1-tools-packet.md`
4	`feat(p2-5-1-tools): 4 read-only MCP tools wiring λ surface (closes Phase 2 at 7/7)`	`src/domains/reputation/tools.ts` + `src/server.ts` + `src/__tests__/domains/reputation/tools.test.ts`
5	`verify(p2-5-1-tools): test evidence`	`docs/verification/p2-5-1-tools-verification.md`

7. PR plan

Title: feat(p2-5-1-tools): 4 read-only MCP tools wiring λ surface — closes Phase 2 at 7/7 (R89 Wave 4)

Body sections:

Summary — 4 tools, names, what each composes; lists the 14 → 18 surface delta
λ Phase 2 status — closes 7/7; lists prior 6 PRs by number
Decay-on-read cost — leaderboard O(N·log N) sort; materialisation deferred to Phase 6+
Integration test coverage — write-events → read-score hand-calc verification; 28 tests
Read-only invariant — grep-asserted no INSERT/UPDATE/DELETE
Writeback block — task_id, branch, worktree, commits, tests, summary
No proof-grade Merkle anchor — R89 chain reset; falls under R89.A documented failure mode (see #222)

8. Cleanup

No vault sync in this slice.
No frontmatter graduation in this slice (λ → partial is a separate hygiene PR per audit §11).
No .claude/skills/ mirror updates (no canon skill changes).
No DB migration changes (007 already supports everything).

9. Done definition

All 5 commits land on feature/p2-5-1-tools
npm run build && npm run lint && npm test green
All 28 tests in tools.test.ts pass
No regression in existing test count (pre-existing ~2406 + new tools tests)
PR created via gh pr create
Writeback packet drafted (task_update + thought_record skeleton in PR body)