Audit: P0.8.2 η Three-Zone Retention

Task: P0.8.2 — second η (Proof Store) surface in Colibri (retention / archival policy) Branch: feature/p0-8-2-retention Worktree: .worktrees/claude/p0-8-2-retention Date: 2026-04-17 Auditor: T3 Executor (Claude Opus 4.7) Base commit: dc660381 (origin/main)


1. Surface inventory

1.1 η surface state (before this task)

Exactly one source file in src/domains/proof/:

File Task Exports
src/domains/proof/merkle.ts P0.8.1 (#136) EMPTY_TREE_ROOT, buildMerkleTree, generateProof, verifyProof, types MerkleProof, MerkleProofNode, MerkleTreeResult

No retention code. No retention.ts. No src/__tests__/domains/proof/retention.test.ts. This task is a greenfield sibling to merkle.ts inside the same domain.

The records this task manages belong to the ζ Decision Trail surface, shipped by P0.7.2:

File Relevant surface
src/domains/trail/schema.ts ThoughtRecordSchema, ThoughtRecord, computeHash, canonicalize, ZERO_HASH, THOUGHT_TYPES
src/domains/trail/repository.ts createThoughtRecord, getThoughtRecord, listThoughtRecords — CRUD used to populate thought_records
src/db/migrations/003_thought_records.sql Defines thought_records table — 9 columns, UNIQUE(hash), indexes on (task_id, created_at) and (prev_hash)

The retention module MUST preserve chain integrity — in particular hash, prev_hash, id, type, task_id, timestamp, agent_id, created_at must never be destroyed, because P0.7.3 (audit_verify_chain, not yet shipped) will need to re-hash the 6-field subset to validate the chain. Only content + derived content_compressed are mutable.

1.3 Position-vs-age interpretation

Task spec (task-breakdown.md §P0.8.2):

  • Hot: last 100 records — full content in DB
  • Warm: records 101–1000 — content compressed (JSON → gzip → base64)
  • Cold: records 1001+ — content hash only (full content deleted)

Donor extraction (docs/reference/extractions/eta-proof-store-extraction.md §”Retention Zones”):

  • Hot: 7 days, Warm: 30 days, Cold: 365 days — TTL-based (time-based zones).

Mismatch resolved in favour of the task spec. The task-breakdown is authoritative; the donor extraction is HERITAGE (quarantine-tagged). Colibri P0.8.2 uses position-based zones ordered by rowid ASC per task_id:

  • Position 1..100 (newest first by rowid DESC) → Hot
  • Position 101..1000 → Warm
  • Position 1001+ → Cold

rowid (not created_at) — per P0.7.2 lesson: millisecond-precision timestamps collide on fast CI. ORDER BY rowid DESC LIMIT 100 OFFSET 0 gives Hot, OFFSET 100 gives Warm, OFFSET 1000 gives Cold. We compute a record’s zone from its rowid rank within the same task_id chain.

This interpretation is documented in the contract for future clarity.

1.4 Test infrastructure state

src/__tests__/domains/proof/merkle.test.ts exists (P0.8.1). Pure unit test — no DB, no MCP. This task’s tests DO need a DB (we archive SQLite rows), so the template follows src/__tests__/domains/trail/repository.test.ts:

  • import Database from 'better-sqlite3' — in-memory
  • Migration SQL loaded once at module scope, exec’d into fresh DB per test via beforeEach
  • afterEach closes handle

No test-path correction needed beyond the one in the task prompt (src/__tests__/domains/proof/retention.test.ts, NOT tests/domains/proof/...).

1.5 Existing audit/contract/packet docs for P0.8.2

None — confirmed via ls docs/audits/. Only P0.8.1 artifacts exist:

p0-8-1-merkle-tree-audit.md
p0-8-1-merkle-tree-contract.md
p0-8-1-merkle-tree-packet.md

This document is the first P0.8.2 artefact.

1.6 Migration number

Next available migration number: 005. Confirmed via ls src/db/migrations/:

001_init.sql
002_tasks.sql
003_thought_records.sql
004_skills.sql

This task ships 005_retention.sql.

1.7 No collision with parallel tasks

  • P0.8.3 (src/tools/merkle.ts) — different file, disjoint concern.
  • P0.9.x (src/domains/integrations/) — different directory.
  • P0.8.1 (src/domains/proof/merkle.ts) — CONSUMED, not modified. No edit to merkle.ts.

2. Files to create

Path Purpose
docs/audits/p0-8-2-retention-audit.md THIS file
docs/contracts/p0-8-2-retention-contract.md Behavioral contract (Zod schemas, invariants, acceptance map)
docs/packets/p0-8-2-retention-packet.md Implementation plan
docs/verification/p0-8-2-retention-verification.md Test + lint evidence
src/db/migrations/005_retention.sql New columns on thought_records: zone TEXT, content_compressed TEXT, content_hash TEXT
src/domains/proof/retention.ts archiveRecord, retrieveRecord, computeZone, types, zod schemas
src/__tests__/domains/proof/retention.test.ts Acceptance-criteria-aligned tests

3. Files to modify

None. No edits to existing source files. The migration adds new columns but does not rewrite thought_records.sql or any existing .ts file.


4. Schema changes

Add three nullable columns to thought_records via 005_retention.sql:

Column Type Default Meaning
zone TEXT 'hot' 'hot' \| 'warm' \| 'cold'. NULL is treated as 'hot' in reads (grace period for legacy rows pre-migration). Fresh writes default to 'hot'.
content_compressed TEXT NULL Base64-encoded gzip of JSON(record). Populated only in Warm. NULL otherwise.
content_hash TEXT NULL SHA-256 (lowercase hex, 64 chars) of the original content string. Populated when transitioning to Warm or Cold — preserved in Cold even after content + content_compressed become NULL.

Rationale for three columns:

  • zone — primary identifier of retention state, indexed for any future “list by zone” operation.
  • content_compressed — compressed payload only used in Warm; nullable because Hot + Cold don’t use it.
  • content_hash — sha256 of the original content, preserves content-level provability after the row’s content is deleted. (Note: distinct from the record’s hash column, which hashes the 6-field subset including content — so the chain hash already commits to content. content_hash is a convenience for cold-zone consumers and matches the task spec “content hash only”.)

No index is added on zone in this migration — archival operations key by id or by task_id, both already indexed. A future task may add idx_trail_zone if retention-pass queries warrant it.

The existing CHECK-less type column pattern is followed here: zone is validated at the Zod/repository layer, not at the DB.


5. Known hazards (tracked for packet / verify)

  1. Cross-worktree leakgit status at start returned clean tree. Re-verify before each commit.
  2. SQLite rowid ordering — use ORDER BY rowid ASC / DESC; NEVER created_at. Position is 1-indexed among records of the same task_id, newest = position 1 (so Hot = positions 1..100).
  3. Jest + zod — do NOT use jest.isolateModulesAsync. Tests use in-memory DB with real migration SQL (the P0.7.2 pattern).
  4. Idempotency — archiveRecord called twice on a record already in the target zone must no-op.
  5. Hash preservation — never NULL out hash, prev_hash, or the 6 subset-hash fields. Only content + content_compressed are touched.
  6. Migration — adding nullable columns via ALTER TABLE is safe; no data rewrite needed. Existing rows get zone=NULL, content_compressed=NULL, content_hash=NULL, which the repository treats as “hot / not yet archived”.
  7. better-sqlite3.exec('') throws on empty SQL. The 005 migration has real SQL (the 3 ALTERs), so the stripSqlComments empty-path in src/db/index.ts doesn’t apply.

6. Consumption plan

archiveRecord(db, id) and retrieveRecord(db, id) are exported from src/domains/proof/retention.ts. No MCP tool is registered by this task — it’s a library surface. A future P0.8.x task or the writeback machinery may wrap these into MCP tools, but that is out-of-scope here (task spec registers no tool).

Pure functions beside the two primitives:

  • computeZone(position: number): 'hot' | 'warm' | 'cold' — decides the target zone given a 1-indexed position in the per-task chain.
  • getRecordPosition(db, id): number | null — returns the record’s position within its task chain, or null if the id is unknown.
  • gzipContent(content: string): string / gunzipContent(compressed: string): string — pure compression helpers.
  • hashContent(content: string): string — SHA-256 hex of the content string (distinct from the chain hash).

These helpers are exported so tests can exercise the pieces independently without needing 1000+ record fixtures.


7. Out of scope

  • No MCP tool registration (that’s a later P0.8.x task, not in task-breakdown for P0.8.2).
  • No scheduled retention pass / cron — archiveRecord is called explicitly by a caller that iterates records.
  • No index on zone column (defer to a later task if needed).
  • No audit_verify_chain modifications to handle Cold records (P0.7.3 will handle the missing-content case when it lands).
  • No unarchive / unzone operation. Forward-only state machine per contract §5.

8. Final checklist (audit gate → proceed to contract)

  • Surface inventory complete
  • Position-vs-age interpretation documented + chosen
  • Files to create + modify listed
  • Migration number confirmed (005)
  • Schema changes enumerated (3 new columns on thought_records)
  • Hazards tracked
  • Out-of-scope fenced

Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.