P2.5.1 — Reputation Query MCP Tools — Execution Packet

Slice: p2-5-1-tools — closes λ Phase 2 at 7/7 Audit: docs/audits/p2-5-1-tools-audit.md @ f8dc2f62 Contract: docs/contracts/p2-5-1-tools-contract.md @ 2bd55de0 Base: origin/main @ 618b1a13

1. File plan

1.1 NEW: src/domains/reputation/tools.ts (~280 lines)

Top-of-file docblock cites: audit / contract / packet, source prompt §P2.5.1, selectReputation (P2.1.1), apply_decay/apply_decay_batch (P2.2.1), can_arbitrate/can_govern/max_parallel_tasks/rate_limit_bonus/stake_discount (P2.4.1), BPS_100_PERCENT (κ P1.1.3).

Imports (NodeNext .js suffix throughout):

import type Database from 'better-sqlite3';
import { z } from 'zod';
import { getDb } from '../../db/index.js';
import { registerColibriTool, type ColibriServerContext } from '../../server.js';
import {
  DOMAINS, DomainSchema,
  selectHistory, selectReputation,
  type Domain, type ReputationHistoryRow, type ReputationRow,
} from './schema.js';
import { apply_decay, apply_decay_batch } from './decay.js';
import {
  can_arbitrate, can_govern, max_parallel_tasks,
  rate_limit_bonus, stake_discount,
} from './limits.js';
import { BPS_100_PERCENT } from '../rules/bps-constants.js';

Sections:

  • §A Constants — DEFAULT_HISTORY_LIMIT = 50, MAX_HISTORY_LIMIT = 500, DEFAULT_LEADERBOARD_LIMIT = 100, MAX_LEADERBOARD_LIMIT = 1000, LEADERBOARD_OVERSHOOT_CAP = 200.
  • §B Zod input schemas — 4 .strict() objects matching contract §2.1, §3.1, §4.1, §5.1.
  • §C Public input/output type exports — matching contract §9.
  • §D Handler reputationGet(db, input) — synchronous; single-domain vs all-domain branch; decay applied.
  • §E Handler reputationHistory(db, input) — calls selectHistory(db, node_id, domain, { limit, offset }).
  • §F Handler reputationLeaderboard(db, input) — overshoot SELECT, batch decay, re-sort, slice.
  • §G Handler reputationCheckGates(db, input) — composes P2.4.1 derivations with selectReputation(db, node_id) and missing-domain fallback rows.
  • §H registerReputationTools(ctx) — 4 registerColibriTool calls.

1.2 EDIT: src/server.ts (1 import + 1 call ≈ +2 lines)

// After existing P0.8.3 merkle import block
import { registerReputationTools } from './domains/reputation/tools.js';

And inside bootstrap() after registerMerkleTools(ctx);:

// P2.5.1: register λ Reputation read-only query tools (reputation_get,
// reputation_history, reputation_leaderboard, reputation_check_gates).
// Closes λ Phase 2 at 7/7 — first λ MCP surface. Handlers lazy-resolve
// getDb() at call-time; DB opened in Phase 2 before any call arrives.
registerReputationTools(ctx);

1.3 NEW: src/__tests__/domains/reputation/tools.test.ts (~450 lines)

Test posture mirrors src/__tests__/domains/reputation/schema.test.ts:

  • Per-test temp os.tmpdir() paths via randomUUID().
  • afterEach calls closeDb() and recursively removes temp dirs (Windows WAL lock catch).
  • Real SQLite via initDb(dbPath) (applies migration 007 + everything earlier).
  • Direct insertion via SQL for setup (mimics schema.test.ts §4 setup).

Suite breakdown (matches contract §10 AC-1 to AC-9, plus type-matrix coverage):

  1. describe('reputation_get') — 6 tests
    • single-domain decay (insert epoch=100 score=8000 → read epoch=100 → 8000; read epoch=196 → matches decay(8000n, 500n, 96n))
    • single-domain, row absent → null
    • all-domains, 5 rows present → length 5 ordered by domain
    • all-domains, 0 rows → empty array
    • decay short-circuit when current_epoch ≤ last_activity_epoch (clock skew)
    • does NOT mutate last_activity_epoch — assert original value preserved
  2. describe('reputation_history') — 5 tests
    • pagination: 100 events → page 1 = 50 ordered DESC, page 2 next 50
    • default limit (50)
    • empty history → []
    • max-cap respected (limit=500 accepted)
    • oversized limit (501) rejected by Zod
  3. describe('reputation_leaderboard') — 4 tests
    • 10 nodes, decay disabled → top 3 by score DESC
    • decay flips ordering (older last_activity_epoch decays more)
    • tie-break by node_id ASC
    • default limit (100) returns all 10
  4. describe('reputation_check_gates') — 4 tests
    • happy path — rep_arb=5000, rep_exec=3000, rep_gov=4000 → all gates true
    • banned arbitration → can_arbitrate=false even at threshold score
    • missing node — all gates at zero-rep defaults
    • cross-domain — rep_arb=5000 but rep_exec=2999 → can_arbitrate=false
  5. describe('Zod rejections') — 6 tests
    • bad domain string in each tool
    • negative offset
    • limit < 1
    • missing node_id
    • negative current_epoch
    • extra unknown keys via .strict()
  6. describe('read-only invariant') — 2 tests
    • row count + history count unchanged after every tool invocation
    • source file grep: no INSERT/UPDATE/DELETE SQL in tools.ts
  7. describe('registerReputationTools') — 1 test
    • registers 4 names, all present in ctx._registeredToolNames

Total target: 28 tests.

1.4 Files NOT changed

  • src/domains/reputation/schema.ts (P2.1.1) — no change; readers already return the right shapes
  • src/domains/reputation/compute.ts (P2.1.2) — no change; not used by P2.5.1
  • src/domains/reputation/decay.ts (P2.2.1) — no change; apply_decay already pure
  • src/domains/reputation/penalties.ts (P2.2.2) — no change
  • src/domains/reputation/tokens.ts (P2.3.1) — no change
  • src/domains/reputation/limits.ts (P2.4.1) — no change
  • src/domains/reputation/witnesses.ts — no change
  • src/db/migrations/*.sql — no change; existing schema already supports leaderboard reads

2. Implementation order

  1. Write tools.ts skeleton with imports + Zod schemas + types.
  2. Implement reputationGet — simplest of the four.
  3. Implement reputationHistory — thin pass-through to selectHistory.
  4. Implement reputationLeaderboard — overshoot + sort + slice.
  5. Implement reputationCheckGates — domain map + fallback + P2.4.1 composition.
  6. Write registerReputationTools.
  7. Wire into src/server.ts.
  8. Run npx tsc --noEmit to validate types before touching tests.
  9. Write tools.test.ts in sections matching the suite plan above.
  10. Run npm test -- tools.test.ts iteratively until all pass.
  11. Full npm run build && npm run lint && npm test gate.

3. Risk register

Risk Mitigation
R1 — score casts bigint → number lose precision Asserted-safe: bps range [0, 10000] < 2^14 « Number.MAX_SAFE_INTEGER. Test boundary at score=10000.
R2 — Leaderboard decay-on-read changes ordering past the SELECT cutoff Document via overshoot (min(2 × limit, 200)); test covers ordering flip. Caveat documented in audit §7 + PR body.
R3 — Date.now() slips into the handler Strict review pass + grep in the verification step. Determinism scanner is reputation-domain-scoped but applies.
R4 — A tool accidentally mutates rows (a) all helpers are pure libraries (apply_decay returns new objects, selectReputation reads only). (b) grep-test rejects any INSERT/UPDATE/DELETE in tools.ts.
R5 — Tool name collision (already-registered name) registerColibriTool throws on duplicate (server.ts:293); we register only fresh reputation_* names.
R6 — .strict() Zod rejects valid extra-key call (forward-compat hazard) Matches thought_record’s .strict() convention; future schema evolution adds a new tool, not new keys.
R7 — Test flake from server-startup-smoke (pre-existing) Out of scope; documented as carry-over in MEMORY.md. Retry once on rare flake.
R8 — Leaderboard with 0 nodes Test: empty reputations → empty array, no throw.

4. Lint / type checks

  • Pass npm run lint (eslint + typescript-eslint with strict-boolean-expressions, no-unused-vars, etc.). Particular care: avoid as any, prefer explicit type-guards on Zod-parsed input → handler boundary.
  • No // eslint-disable directives added in this slice unless required by a known mirror in src/server.ts:408-style cast (and even then, isolate to one line + cite the rationale comment).
  • npx tsc --noEmit clean against tsconfig.json.

5. Tool registration order in server.ts

registerColibriTool(ctx, 'server_ping', ...);   // existing
registerHealthTool(ctx);                         // existing
registerThoughtTools(ctx);                       // existing — 2 tools
registerVerifyChainTool(ctx);                    // existing
registerSkillTools(ctx);                         // existing
registerTaskTools(ctx);                          // existing — 5 tools
registerMerkleTools(ctx);                        // existing — 3 tools
registerReputationTools(ctx);                    // NEW — 4 tools (P2.5.1)

Final 18-tool surface (alpha-sorted):

  1. audit_session_start (η)
  2. audit_verify_chain (ζ)
  3. merkle_finalize (η)
  4. merkle_root (η)
  5. reputation_check_gates (λ — NEW)
  6. reputation_get (λ — NEW)
  7. reputation_history (λ — NEW)
  8. reputation_leaderboard (λ — NEW)
  9. server_health (α)
  10. server_ping (α)
  11. skill_list (ε)
  12. task_create (β)
  13. task_get (β)
  14. task_list (β)
  15. task_next_actions (β)
  16. task_update (β)
  17. thought_record (ζ)
  18. thought_record_list (ζ)

6. Commit plan

# Commit subject Files
1 audit(p2-5-1-tools): inventory surface docs/audits/p2-5-1-tools-audit.md
2 contract(p2-5-1-tools): behavioral contract docs/contracts/p2-5-1-tools-contract.md
3 packet(p2-5-1-tools): execution plan docs/packets/p2-5-1-tools-packet.md
4 feat(p2-5-1-tools): 4 read-only MCP tools wiring λ surface (closes Phase 2 at 7/7) src/domains/reputation/tools.ts + src/server.ts + src/__tests__/domains/reputation/tools.test.ts
5 verify(p2-5-1-tools): test evidence docs/verification/p2-5-1-tools-verification.md

7. PR plan

Title: feat(p2-5-1-tools): 4 read-only MCP tools wiring λ surface — closes Phase 2 at 7/7 (R89 Wave 4)

Body sections:

  • Summary — 4 tools, names, what each composes; lists the 14 → 18 surface delta
  • λ Phase 2 status — closes 7/7; lists prior 6 PRs by number
  • Decay-on-read cost — leaderboard O(N·log N) sort; materialisation deferred to Phase 6+
  • Integration test coverage — write-events → read-score hand-calc verification; 28 tests
  • Read-only invariant — grep-asserted no INSERT/UPDATE/DELETE
  • Writeback block — task_id, branch, worktree, commits, tests, summary
  • No proof-grade Merkle anchor — R89 chain reset; falls under R89.A documented failure mode (see #222)

8. Cleanup

  • No vault sync in this slice.
  • No frontmatter graduation in this slice (λ → partial is a separate hygiene PR per audit §11).
  • No .claude/skills/ mirror updates (no canon skill changes).
  • No DB migration changes (007 already supports everything).

9. Done definition

  • All 5 commits land on feature/p2-5-1-tools
  • npm run build && npm run lint && npm test green
  • All 28 tests in tools.test.ts pass
  • No regression in existing test count (pre-existing ~2406 + new tools tests)
  • PR created via gh pr create
  • Writeback packet drafted (task_update + thought_record skeleton in PR body)

Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.