Contract: P0.8.3 η Merkle Root Finalization Tools

Task: P0.8.3 — three new MCP tools: audit_session_start, merkle_finalize, merkle_root Depends on: P0.8.1 (primitives) + migration 006_eta.sql (earned here) Consumed by: writeback protocol (CLAUDE.md §7); future P0.7.3 audit_verify_chain.

1. Scope

A thin SQLite-backed service layer that allows agents to:

Start an audit session and obtain a stable session_id.
Record thought records against that session (reusing thought_record with an added session_id argument — see §4.2).
Finalize the session by computing a Merkle root over every record’s hash field (ordered by rowid ASC) and persisting that root.
Retrieve the finalized root + record count + timestamp.

The Merkle primitives (buildMerkleTree, …) are consumed from src/domains/proof/merkle.ts unchanged. This task is pure glue: SQLite storage, input validation, and MCP registration.

2. Tool surface

2.1 `audit_session_start`

Input schema (Zod)

z.object({
  intent: z.string().min(1, 'intent must not be empty'),
  task_id: z.string().min(1).optional(),
  session_id: z.string().min(1).optional(), // escape hatch for reproducible tests
})

Output (data payload — the α middleware wraps in `{ok, data, ...}`)

{
  session_id: string;  // UUID v4 by default (or the injected one)
  intent: string;
  task_id: string | null;
  started_at: string;  // ISO-8601 UTC
}

Semantics

Generates a fresh UUID v4 via randomUUID() unless session_id is explicitly provided (tests lean on this).
Inserts exactly one row into audit_sessions.
Idempotence: a second call with the same provided session_id triggers a PK UNIQUE violation, mapped to ERR_SESSION_EXISTS.
task_id is opaque — no FK, no validation against tasks table (soft-delete semantics there make FK unreliable). Policy decision: caller owns referential integrity.

2.2 `merkle_finalize`

Input schema (Zod)

z.object({
  session_id: z.string().min(1),
})

Output (success)

{
  session_id: string;
  root: string;         // 64-char lowercase hex
  record_count: number; // non-negative
  finalized_at: string; // ISO-8601 UTC
}

Semantics

Verifies the session exists (reject with ERR_SESSION_NOT_FOUND otherwise).
Verifies the session has NOT already been finalized (reject with ERR_ALREADY_FINALIZED otherwise — PK UNIQUE on merkle_roots.session_id).
Reads all thought records WHERE session_id = ? ORDER BY rowid ASC.
Rejects (AC4) with ERR_NO_RECORDS if count == 0. The CLAUDE.md §7 ordering rule “thought_record MUST precede merkle_finalize” is enforced here.
Builds a Merkle tree via buildMerkleTree(hashes) from P0.8.1.
Inserts a row into merkle_roots (session_id, root, record_count, finalized_at).
Returns the row.

All four steps run inside a single db.transaction so concurrent finalize races cannot double-commit.

2.3 `merkle_root`

Input schema (Zod)

z.object({
  session_id: z.string().min(1),
})

Output (success)

{
  session_id: string;
  root: string;
  record_count: number;
  finalized_at: string;
}

Semantics

SELECT * FROM merkle_roots WHERE session_id = ?.
If no row: return {ok:false, error:{code: 'ERR_NOT_FINALIZED', ...}}.
Otherwise return the row as-is.

3. Error codes

Code	Source	Meaning
`INVALID_PARAMS`	α middleware stage 2 (Zod)	Malformed input (empty string, wrong type)
`ERR_SESSION_EXISTS`	`audit_session_start`	Provided `session_id` already exists
`ERR_SESSION_NOT_FOUND`	`merkle_finalize`	Session never started (no `audit_sessions` row)
`ERR_ALREADY_FINALIZED`	`merkle_finalize`	Session already has a `merkle_roots` row
`ERR_NO_RECORDS`	`merkle_finalize`	Session has zero thought records (AC4)
`ERR_NOT_FINALIZED`	`merkle_root`	Session exists but has no `merkle_roots` row

Tasks and the writeback layer already use the {ok:false, error:{code,message,...}} envelope pattern (src/domains/tasks/repository.ts:700-710). We follow the same.

4. Invariants

4.1 Deterministic root

For the same set of thought-record hashes (any insertion order), merkle_finalize MUST always produce the same root. This is guaranteed by the sortLeaves: true + sortPairs: true options already baked into buildMerkleTree. We add one test that re-inserts the same 5 hashes into a different session in a different order and asserts equal roots.

4.2 `session_id` column on `thought_records`

Migration 005 adds a nullable session_id TEXT column. Existing rows receive NULL. The P0.7.2 API is unchanged — it does not require session_id and does not read it. New consumers (including tests within this task) may insert thought records with a session_id by using raw SQL, because createThoughtRecord does not expose the field. This task does NOT change createThoughtRecord — extending ζ is out of scope.

4.3 `session_id` is NOT part of the hash

Per P0.7.1 contract §3 and the hash-subset rule at src/domains/trail/schema.ts:170-188, only {id, type, task_id, content, timestamp, prev_hash} participate in the hash. Adding session_id to the record does NOT modify the hash. The Merkle tree built by merkle_finalize uses each record’s existing hash value, not a re-computation that includes session_id.

4.4 Transport-first / Phase 2 DB open

All three handlers lazy-resolve getDb() at call time. This matches the pattern established in src/domains/trail/repository.ts:377. It is safe because MCP tool calls cannot arrive until after server.connect(transport) completes, which precedes initDb() in the Phase 0 startup contract (src/startup.ts).

4.5 `ORDER BY rowid ASC`, not `created_at`

All queries that iterate records within a session use ORDER BY rowid ASC. This is the P0.7.2 lesson memorialized in src/domains/trail/repository.ts:220 — SQLite’s rowid is the only ordering primitive immune to ISO-8601 millisecond ties on fast CI.

4.6 Purity

The repository-style functions (startAuditSession, finalizeMerkleRoot, getMerkleRoot) take a Database handle as first argument. No module-level DB state. No env reads at import time. Pure enough to test against :memory: with no singleton setup.

4.7 No side-effects at import time

Importing src/tools/merkle.ts MUST NOT open a DB, read a file, or mutate process.*. This matches every other tool module in the repo.

5. Acceptance criteria → tests mapping

AC	Test (in `src/__tests__/tools/merkle.test.ts`)
AC1: `merkle_finalize` builds Merkle tree of last N unfinalized records, stores root	`describe('merkle_finalize — happy path')` + `describe('merkle_finalize — persistence')`
AC2: `merkle_root` returns current root hash + record count + timestamp	`describe('merkle_root — happy path')` — asserts all three fields
AC3: `audit_session_start` creates audit session record, returns session_id	`describe('audit_session_start')` — asserts UUID v4 shape + row insertion
AC4: Finalization must happen AFTER final thought record (errors if no thought_record)	`it('rejects session with zero records')` in `describe('merkle_finalize — AC4')`
AC5: Test: finalize 5-record session → root matches manual computation	`describe('merkle_finalize — 5-record manual computation')` — the acceptance-headline test

Additional invariants tested (beyond AC):

I1 (§4.1): Deterministic root under shuffled-insert order.
I2 (§4.2): thought_records.session_id column is added + indexed after migration 005 runs.
I3 (§4.3): ζ hash chain remains valid after session_id is populated.
I4: Already-finalized session rejects second finalize.
I5: merkle_root on unfinalized session returns ERR_NOT_FINALIZED.
I6: Tool-registration duplicate guard — second registerMerkleTools(ctx) throws.

6. Migration 005 (exact body)

-- 006_eta — η Proof Store tables + ζ session linkage (P0.8.3).
--
-- Introduces:
--   audit_sessions  — intent/task_id/started_at, one row per audit_session_start call
--   merkle_roots    — one row per finalized session; PK is session_id
--
-- Also adds:
--   thought_records.session_id (nullable TEXT) — grouping key for finalize
--   idx_trail_session on thought_records(session_id, rowid) — finalize lookup path
--
-- Does NOT foreign-key between tables. audit_sessions.task_id is opaque (β soft-delete
-- makes FK unreliable). merkle_roots.session_id is not FK'd to audit_sessions because
-- we enforce referential integrity in handler logic inside a single db.transaction.

CREATE TABLE audit_sessions (
  session_id TEXT PRIMARY KEY,
  intent     TEXT NOT NULL,
  task_id    TEXT,
  started_at TEXT NOT NULL
);

CREATE INDEX idx_audit_sessions_task ON audit_sessions(task_id);

CREATE TABLE merkle_roots (
  session_id    TEXT PRIMARY KEY,
  root          TEXT NOT NULL,
  record_count  INTEGER NOT NULL,
  finalized_at  TEXT NOT NULL
);

ALTER TABLE thought_records ADD COLUMN session_id TEXT;
CREATE INDEX idx_trail_session ON thought_records(session_id);

(The rowid is an implicit tie-breaker in SQLite and cannot appear in a CREATE INDEX column list. A session-scoped ORDER BY rowid ASC query still uses this index for the WHERE session_id = ? filter and returns rows in rowid order via SQLite’s natural table-scan fallback, which is fine for our per-session row counts.)

Migration is idempotent per initDb()’s user_version gating. Running it twice is a no-op — the runner checks PRAGMA user_version and skips already-applied migrations.

7. Non-goals

No audit session end tool in P0.8.3. The task-breakdown demands only audit_session_start + merkle_finalize + merkle_root. Ending a session is implicit: finalize seals it. A dedicated audit_session_end tool belongs to a later task.
No session metadata mutation. Once started, audit_sessions.intent and started_at are read-only at this layer.
No proof retrieval at the MCP layer yet — generateProof / verifyProof are called only inside the test suite. A future merkle_proof tool will wire them up (not in scope here).
No multi-session finalize, no “last N unfinalized records” cross-session roll-up. One session → one root.
No HTTP / REST surface — this is an MCP tool registration only.
No modifications to P0.8.1 (src/domains/proof/merkle.ts) or P0.8.2 (src/domains/proof/retention.ts — not shipped, but claimed by the other agent).

8. Writeback note (CLAUDE.md §7 compliance)

The writeback protocol requires a final thought_record BEFORE merkle_finalize for proof-grade tasks. The three tools shipped here are exactly the primitives that make that protocol executable. The tools themselves are not proof-grade: they are infrastructure. However, this task’s own writeback (PR body + final message) follows the ordering rule: the summary is recorded BEFORE any merkle_finalize call on this session.

Since no live MCP client is running during executor work, the thought_record actually materializes as the PR body + the summary in the final report — same as R75 Wave E tasks.

Contract: P0.8.3 η Merkle Root Finalization Tools

1. Scope

2. Tool surface

2.1 audit_session_start

Input schema (Zod)

Output (data payload — the α middleware wraps in {ok, data, ...})

Semantics

2.2 merkle_finalize

Input schema (Zod)

Output (success)

Semantics

2.3 merkle_root

Input schema (Zod)

Output (success)

Semantics

3. Error codes

4. Invariants

4.1 Deterministic root

4.2 session_id column on thought_records

4.3 session_id is NOT part of the hash

4.4 Transport-first / Phase 2 DB open

4.5 ORDER BY rowid ASC, not created_at

4.6 Purity

4.7 No side-effects at import time

5. Acceptance criteria → tests mapping

6. Migration 005 (exact body)

7. Non-goals

8. Writeback note (CLAUDE.md §7 compliance)

2.1 `audit_session_start`

Output (data payload — the α middleware wraps in `{ok, data, ...}`)

2.2 `merkle_finalize`

2.3 `merkle_root`

4.2 `session_id` column on `thought_records`

4.3 `session_id` is NOT part of the hash

4.5 `ORDER BY rowid ASC`, not `created_at`