P2.5.1 — Reputation Query MCP Tools — Execution Packet
Slice: p2-5-1-tools — closes λ Phase 2 at 7/7
Audit: docs/audits/p2-5-1-tools-audit.md @ f8dc2f62
Contract: docs/contracts/p2-5-1-tools-contract.md @ 2bd55de0
Base: origin/main @ 618b1a13
1. File plan
1.1 NEW: src/domains/reputation/tools.ts (~280 lines)
Top-of-file docblock cites: audit / contract / packet, source prompt §P2.5.1, selectReputation (P2.1.1), apply_decay/apply_decay_batch (P2.2.1), can_arbitrate/can_govern/max_parallel_tasks/rate_limit_bonus/stake_discount (P2.4.1), BPS_100_PERCENT (κ P1.1.3).
Imports (NodeNext .js suffix throughout):
import type Database from 'better-sqlite3';
import { z } from 'zod';
import { getDb } from '../../db/index.js';
import { registerColibriTool, type ColibriServerContext } from '../../server.js';
import {
DOMAINS, DomainSchema,
selectHistory, selectReputation,
type Domain, type ReputationHistoryRow, type ReputationRow,
} from './schema.js';
import { apply_decay, apply_decay_batch } from './decay.js';
import {
can_arbitrate, can_govern, max_parallel_tasks,
rate_limit_bonus, stake_discount,
} from './limits.js';
import { BPS_100_PERCENT } from '../rules/bps-constants.js';
Sections:
- §A Constants —
DEFAULT_HISTORY_LIMIT = 50,MAX_HISTORY_LIMIT = 500,DEFAULT_LEADERBOARD_LIMIT = 100,MAX_LEADERBOARD_LIMIT = 1000,LEADERBOARD_OVERSHOOT_CAP = 200. - §B Zod input schemas — 4
.strict()objects matching contract §2.1, §3.1, §4.1, §5.1. - §C Public input/output type exports — matching contract §9.
- §D Handler
reputationGet(db, input)— synchronous; single-domain vs all-domain branch; decay applied. - §E Handler
reputationHistory(db, input)— callsselectHistory(db, node_id, domain, { limit, offset }). - §F Handler
reputationLeaderboard(db, input)— overshoot SELECT, batch decay, re-sort, slice. - §G Handler
reputationCheckGates(db, input)— composes P2.4.1 derivations withselectReputation(db, node_id)and missing-domain fallback rows. - §H
registerReputationTools(ctx)— 4registerColibriToolcalls.
1.2 EDIT: src/server.ts (1 import + 1 call ≈ +2 lines)
// After existing P0.8.3 merkle import block
import { registerReputationTools } from './domains/reputation/tools.js';
And inside bootstrap() after registerMerkleTools(ctx);:
// P2.5.1: register λ Reputation read-only query tools (reputation_get,
// reputation_history, reputation_leaderboard, reputation_check_gates).
// Closes λ Phase 2 at 7/7 — first λ MCP surface. Handlers lazy-resolve
// getDb() at call-time; DB opened in Phase 2 before any call arrives.
registerReputationTools(ctx);
1.3 NEW: src/__tests__/domains/reputation/tools.test.ts (~450 lines)
Test posture mirrors src/__tests__/domains/reputation/schema.test.ts:
- Per-test temp
os.tmpdir()paths viarandomUUID(). afterEachcallscloseDb()and recursively removes temp dirs (Windows WAL lock catch).- Real SQLite via
initDb(dbPath)(applies migration 007 + everything earlier). - Direct insertion via SQL for setup (mimics schema.test.ts §4 setup).
Suite breakdown (matches contract §10 AC-1 to AC-9, plus type-matrix coverage):
describe('reputation_get')— 6 tests- single-domain decay (insert epoch=100 score=8000 → read epoch=100 → 8000; read epoch=196 → matches
decay(8000n, 500n, 96n)) - single-domain, row absent →
null - all-domains, 5 rows present → length 5 ordered by domain
- all-domains, 0 rows → empty array
- decay short-circuit when current_epoch ≤ last_activity_epoch (clock skew)
- does NOT mutate
last_activity_epoch— assert original value preserved
- single-domain decay (insert epoch=100 score=8000 → read epoch=100 → 8000; read epoch=196 → matches
describe('reputation_history')— 5 tests- pagination: 100 events → page 1 = 50 ordered DESC, page 2 next 50
- default limit (50)
- empty history →
[] - max-cap respected (limit=500 accepted)
- oversized limit (501) rejected by Zod
describe('reputation_leaderboard')— 4 tests- 10 nodes, decay disabled → top 3 by score DESC
- decay flips ordering (older
last_activity_epochdecays more) - tie-break by
node_id ASC - default limit (100) returns all 10
describe('reputation_check_gates')— 4 tests- happy path —
rep_arb=5000, rep_exec=3000, rep_gov=4000→ all gates true - banned arbitration →
can_arbitrate=falseeven at threshold score - missing node — all gates at zero-rep defaults
- cross-domain — rep_arb=5000 but rep_exec=2999 →
can_arbitrate=false
- happy path —
describe('Zod rejections')— 6 tests- bad domain string in each tool
- negative offset
- limit < 1
- missing node_id
- negative current_epoch
- extra unknown keys via
.strict()
describe('read-only invariant')— 2 tests- row count + history count unchanged after every tool invocation
- source file grep: no
INSERT/UPDATE/DELETESQL intools.ts
describe('registerReputationTools')— 1 test- registers 4 names, all present in
ctx._registeredToolNames
- registers 4 names, all present in
Total target: 28 tests.
1.4 Files NOT changed
src/domains/reputation/schema.ts(P2.1.1) — no change; readers already return the right shapessrc/domains/reputation/compute.ts(P2.1.2) — no change; not used by P2.5.1src/domains/reputation/decay.ts(P2.2.1) — no change;apply_decayalready puresrc/domains/reputation/penalties.ts(P2.2.2) — no changesrc/domains/reputation/tokens.ts(P2.3.1) — no changesrc/domains/reputation/limits.ts(P2.4.1) — no changesrc/domains/reputation/witnesses.ts— no changesrc/db/migrations/*.sql— no change; existing schema already supports leaderboard reads
2. Implementation order
- Write
tools.tsskeleton with imports + Zod schemas + types. - Implement
reputationGet— simplest of the four. - Implement
reputationHistory— thin pass-through toselectHistory. - Implement
reputationLeaderboard— overshoot + sort + slice. - Implement
reputationCheckGates— domain map + fallback + P2.4.1 composition. - Write
registerReputationTools. - Wire into
src/server.ts. - Run
npx tsc --noEmitto validate types before touching tests. - Write
tools.test.tsin sections matching the suite plan above. - Run
npm test -- tools.test.tsiteratively until all pass. - Full
npm run build && npm run lint && npm testgate.
3. Risk register
| Risk | Mitigation |
|---|---|
R1 — score casts bigint → number lose precision |
Asserted-safe: bps range [0, 10000] < 2^14 « Number.MAX_SAFE_INTEGER. Test boundary at score=10000. |
| R2 — Leaderboard decay-on-read changes ordering past the SELECT cutoff | Document via overshoot (min(2 × limit, 200)); test covers ordering flip. Caveat documented in audit §7 + PR body. |
R3 — Date.now() slips into the handler |
Strict review pass + grep in the verification step. Determinism scanner is reputation-domain-scoped but applies. |
| R4 — A tool accidentally mutates rows | (a) all helpers are pure libraries (apply_decay returns new objects, selectReputation reads only). (b) grep-test rejects any INSERT/UPDATE/DELETE in tools.ts. |
| R5 — Tool name collision (already-registered name) | registerColibriTool throws on duplicate (server.ts:293); we register only fresh reputation_* names. |
R6 — .strict() Zod rejects valid extra-key call (forward-compat hazard) |
Matches thought_record’s .strict() convention; future schema evolution adds a new tool, not new keys. |
| R7 — Test flake from server-startup-smoke (pre-existing) | Out of scope; documented as carry-over in MEMORY.md. Retry once on rare flake. |
| R8 — Leaderboard with 0 nodes | Test: empty reputations → empty array, no throw. |
4. Lint / type checks
- Pass
npm run lint(eslint + typescript-eslint withstrict-boolean-expressions,no-unused-vars, etc.). Particular care: avoidas any, prefer explicit type-guards on Zod-parsed input → handler boundary. - No
// eslint-disabledirectives added in this slice unless required by a known mirror insrc/server.ts:408-style cast (and even then, isolate to one line + cite the rationale comment). npx tsc --noEmitclean againsttsconfig.json.
5. Tool registration order in server.ts
registerColibriTool(ctx, 'server_ping', ...); // existing
registerHealthTool(ctx); // existing
registerThoughtTools(ctx); // existing — 2 tools
registerVerifyChainTool(ctx); // existing
registerSkillTools(ctx); // existing
registerTaskTools(ctx); // existing — 5 tools
registerMerkleTools(ctx); // existing — 3 tools
registerReputationTools(ctx); // NEW — 4 tools (P2.5.1)
Final 18-tool surface (alpha-sorted):
audit_session_start(η)audit_verify_chain(ζ)merkle_finalize(η)merkle_root(η)reputation_check_gates(λ — NEW)reputation_get(λ — NEW)reputation_history(λ — NEW)reputation_leaderboard(λ — NEW)server_health(α)server_ping(α)skill_list(ε)task_create(β)task_get(β)task_list(β)task_next_actions(β)task_update(β)thought_record(ζ)thought_record_list(ζ)
6. Commit plan
| # | Commit subject | Files |
|---|---|---|
| 1 | audit(p2-5-1-tools): inventory surface |
docs/audits/p2-5-1-tools-audit.md |
| 2 | contract(p2-5-1-tools): behavioral contract |
docs/contracts/p2-5-1-tools-contract.md |
| 3 | packet(p2-5-1-tools): execution plan |
docs/packets/p2-5-1-tools-packet.md |
| 4 | feat(p2-5-1-tools): 4 read-only MCP tools wiring λ surface (closes Phase 2 at 7/7) |
src/domains/reputation/tools.ts + src/server.ts + src/__tests__/domains/reputation/tools.test.ts |
| 5 | verify(p2-5-1-tools): test evidence |
docs/verification/p2-5-1-tools-verification.md |
7. PR plan
Title: feat(p2-5-1-tools): 4 read-only MCP tools wiring λ surface — closes Phase 2 at 7/7 (R89 Wave 4)
Body sections:
- Summary — 4 tools, names, what each composes; lists the 14 → 18 surface delta
- λ Phase 2 status — closes 7/7; lists prior 6 PRs by number
- Decay-on-read cost — leaderboard O(N·log N) sort; materialisation deferred to Phase 6+
- Integration test coverage — write-events → read-score hand-calc verification; 28 tests
- Read-only invariant — grep-asserted no INSERT/UPDATE/DELETE
- Writeback block — task_id, branch, worktree, commits, tests, summary
- No proof-grade Merkle anchor — R89 chain reset; falls under R89.A documented failure mode (see #222)
8. Cleanup
- No vault sync in this slice.
- No frontmatter graduation in this slice (λ → partial is a separate hygiene PR per audit §11).
- No
.claude/skills/mirror updates (no canon skill changes). - No DB migration changes (007 already supports everything).
9. Done definition
- All 5 commits land on
feature/p2-5-1-tools npm run build && npm run lint && npm testgreen- All 28 tests in
tools.test.tspass - No regression in existing test count (pre-existing ~2406 + new tools tests)
- PR created via
gh pr create - Writeback packet drafted (task_update + thought_record skeleton in PR body)