P0.7.1 — Step 1 Audit

Inventory of the worktree against the task spec for P0.7.1 ζ Hash-Chained Record Schema (ζ Decision Trail task group, first ζ task, Wave C parallel dispatch). Scope: what already exists that the new schema + hasher must integrate with, what the donor extraction prescribes, and what is absent.

Baseline: worktree E:/AMS/.worktrees/claude/p0-7-1-trail-schema/ at commit 3ebbe419 (P0.2.2 SQLite init merged as PR #122, on top of P0.2.1 MCP server + P0.4.1 modes).


§1. Surface being added

Targets this task creates:

  • src/domains/trail/schema.ts — new module. Holds the THOUGHT_TYPES tuple, ThoughtType union, ThoughtRecordSchema (Zod), ThoughtRecord type, ZERO_HASH constant (64-zero hex), canonicalize(value) function, and computeHash(record) function. Does not yet exist.
  • src/__tests__/trail-schema.test.ts — new test file, co-located with other Phase-0 tests (deviation from spec’s tests/domains/trail/schema.test.ts — see §3).
  • src/domains/trail/ — new directory (no siblings yet under src/domains/).

A worktree scan confirms absence of the authoring targets:

  • ls src/domains/ → “No such file or directory”
  • grep -rn "trail\|thought\|chain_hash\|prev_hash\|ZERO_HASH" src/ → zero matches
  • grep -rn "trail-schema\|trail\.test" src/__tests__/ → zero matches

This is a greenfield module — P0.7.1 authors both source and test files in a single PR. No DB table is created by this task (P0.7.2 owns 004_zeta.sql), no MCP tool is registered (P0.7.2 owns thought_record, P0.7.3 owns audit_verify_chain). P0.7.1 ships only the pure schema + hashing primitives.


§2. Adjacent code that the new module must integrate with

2a. src/config.ts (95 lines — P0.1.4, commit 3bd154a7)

Phase-0 environment wrapper. The trail schema does not consume config — it is a pure, synchronous, stateless primitive. The schema module does not read process.env, has no AMS_* guard of its own (transitively enforced by whoever imports it alongside config), and has no eager side-effects.

The loadConfig(env) pure-factory pattern is nevertheless the load-bearing style to mirror: computeHash(record) is analogously pure — takes a record object, returns a hex string, never touches globals.

2b. src/modes.ts (186 lines — P0.4.1, commit a64d7349)

Runtime mode enum. Not consumed by the trail schema. The style is load-bearing: the RUNTIME_MODES tuple as const + derived RuntimeMode union is the pattern the THOUGHT_TYPES tuple + ThoughtType union must mirror exactly. Frozen singletons, pure factories, no eager work — all three properties carry over.

2c. src/server.ts (559 LOC — P0.2.1, commit 40cd679d)

MCP server bootstrap. The trail schema is not consumed by this task (P0.7.2 will register the thought_record tool against the server). However, the server’s AuditSink seam (lines around registerAuditSink per the P0.2.1 contract) is where ζ will plug in at P0.7.2. The schema exported here MUST be compatible with both JSON-RPC wire transport (Zod-validatable, no functions, no Date objects in the validated surface) and with the AuditSink interface (all record fields serialisable as JSON.stringify-able primitives).

  • timestamp is string (ISO 8601), NOT a Date object — JSON wire compatibility.
  • prev_hash and hash are string (hex), NOT Buffer — same reason.
  • content is string for Phase 0 — the canonicalizer, however, handles arbitrary JSON-serialisable shapes so the surface can later accept structured content without schema churn (see §4).

2d. src/db/index.ts (P0.2.2, commit 3ebbe419)

SQLite wrapper. Not consumed by P0.7.1. P0.7.2 will author 004_zeta.sql and build a repository against this module. P0.7.1 is pure — no DB handle, no persistence, no side effects.

2e. src/__tests__/config.test.ts + modes.test.ts

Test-style precedent. Relevant patterns:

  • Pure-factory test style, no process.env mutation for this module (no env consumed).
  • Tests live in src/__tests__/ per jest.config.ts line 15 roots: ['<rootDir>/src'].
  • No jest.isolateModulesAsync (P0.1.4 documents the zod-v3 locale-cache bug under ts-jest ESM). The trail schema uses zod but has no eager validation — no module-load isolation required.
  • Coverage floor: Wave A convention is 100% stmt/func/line and ≥90% branch per jest.config.ts lines 41-46 collecting from src/**/*.ts. The packet targets 100% across the board for this tiny surface.

2f. package.json

Declares zod@^3.23.8 (line 30) as a runtime dependency. The schema file will import { z } from 'zod' — no new dependencies. Node’s built-in crypto module (node:crypto) provides SHA-256 — no third-party hasher.

No additions to package.json#files (that array lists shipped assets; src/domains/trail/schema.ts is a src/ file bundled by tsc into dist/).


§3. Spec reconciliation — load-bearing deviations from the task-spec

The task-breakdown file docs/guides/implementation/task-breakdown.md §P0.7.1 has one deviation from Wave A’s conventions; the dispatch prompt confirms the override.

Deviation 1 — test file location

  • Spec (line 323): tests/domains/trail/schema.test.ts.
  • Wave A convention: src/__tests__/<name>.test.ts per jest.config.ts line 15 roots: ['<rootDir>/src']. Jest will not discover files outside src/.
  • Decision: src/__tests__/trail-schema.test.ts. Confirmed by observing that all existing tests (config.test.ts, modes.test.ts, server.test.ts, smoke.test.ts, db-init.test.ts) live in src/__tests__/. Matches the dispatch prompt’s override.

§4. Hash-input subset — literal spec reading

The spec line is load-bearing and nuanced:

hash = SHA-256(canonical_JSON({id, type, task_id, content, timestamp, prev_hash}))

The hash input is a subset of the record fields, explicitly:

  • Included: id, type, task_id, content, timestamp, prev_hash (6 fields)
  • Excluded: agent_id (metadata), hash (the output itself — would be circular)

This diverges from the donor AMS zeta-decision-trail-extraction.md algorithm (lines 22-34), which computes content_hash = SHA256(content) then chain_hash = SHA256(content_hash + parent_chain_hash). The Colibri Phase-0 design hashes a canonical-JSON projection of six fields in one step — a simpler, flatter algorithm suited to a single hash column. The donor algorithm used two hash columns (content_hash, chain_hash); Colibri uses one. The spec’s intent is integrity of the chain (id + prev_hash + payload), not independent content-hashing.

Rationale for excluding agent_id: agent_id is operational metadata (who authored this thought) rather than a chain-integrity input. Excluding it keeps the hash stable if a record is later rewritten with a corrected agent_id — which is desirable for a provenance correction but would otherwise cascade a chain-break through every subsequent record. The chain still proves “this id + this content at this timestamp follows from this prev_hash”; agent_id is recorded alongside but not anchored.

Rationale for excluding hash: self-reference is impossible. This is mechanical, not design.

The contract must document this subset decision explicitly; the test matrix must include a case where two records differing only in agent_id produce identical hashes (positive assertion of the exclusion).


§5. Canonical-JSON algorithm — load-bearing correctness property

The spec requires:

  • Sorted keys at every nesting depth
  • No whitespace
  • Deterministic across platforms/runs

Algorithm (from dispatch prompt §”Hashing detail”):

  1. Input: a JSON-serialisable value.
  2. If the value is an object: recursively sort keys, then rebuild as a new object in sorted order. Recurse into each value.
  3. If the value is an array: preserve insertion order, recurse into each element.
  4. If the value is a primitive (string, number, boolean, null): return as-is.
  5. Serialise via JSON.stringify(sortedValue) — no second argument, no indent argument. Default output has no whitespace.

Edge cases the test matrix must cover:

  • Nested object: { b: { d: 1, c: 2 }, a: 3 } → must sort both outer (a, b) and inner (c, d). Even though Phase-0 content is string, future-proofing the canonicalizer against nested shapes is load-bearing.
  • Two records authored with keys in different insertion orders (e.g. { id, type, ... } vs { type, id, ... }) must hash identically.
  • Array order preservation: [3, 1, 2] must canonicalize to [3,1,2], not sorted.
  • null values: preserved. The canonicalizer does NOT strip null.
  • undefined: not a valid JSON value; behavior documented as “undefined is skipped during stringify” — same as native JSON.stringify. Tests pin this behavior.

Determinism test: call canonicalize twice on the same input (with a different object-literal author order) and assert identical output strings. Call computeHash twice on the same record and assert identical hex strings.


§6. Acceptance criteria mapping

Spec acceptance criteria (from docs/guides/implementation/task-breakdown.md §P0.7.1 and the dispatch prompt) map to audit facts as follows:

Criterion Source Audit observation
Record schema: { id, type, task_id, agent_id, content, timestamp, prev_hash, hash } (8 fields) spec line 325 ThoughtRecordSchema Zod object with all 8 fields. id/task_id/agent_id/content/timestamp = z.string().min(1), prev_hash/hash = z.string().length(64), type = z.enum(THOUGHT_TYPES).
4 valid types: plan | analysis | decision | reflection spec line 326 THOUGHT_TYPES = ['plan', 'analysis', 'decision', 'reflection'] as const. Matches donor extraction lines 42-47.
hash = SHA-256(canonical_JSON({id, type, task_id, content, timestamp, prev_hash})) spec line 327 computeHash receives the 6-field subset (not agent_id, not hash). Dispatch prompt §”Hashing detail” confirms literal reading.
Canonical JSON: sorted keys, no whitespace (deterministic) spec line 328 Recursive key-sort, JSON.stringify(sorted) with no second arg. Tested cross-insertion-order and cross-nested-object.
First record: prev_hash = "0000...0000" (64 zeros) spec line 329 export const ZERO_HASH = '0'.repeat(64); Tested ZERO_HASH.length === 64 and matches /^0{64}$/.
Two records with identical inputs produce identical hashes spec line 330 Determinism test: computeHash(input) === computeHash(input); also cross-insertion-order test for canonicalizer.

All 6 acceptance criteria have direct test paths. No spec-level ambiguity remains.


§7. Donor genealogy (reference only)

docs/reference/extractions/zeta-decision-trail-extraction.md (extracted R45 from AMS src/controllers/thought.js) documents:

  • 4 record types (lines 42-47): plan | analysis | decision | reflection. Colibri reproduces exactly. observation and hypothesis from earlier AMS drafts are NOT valid.
  • Two-hash algorithm (lines 22-34): content_hash = SHA256(content), chain_hash = SHA256(content_hash + parent.chain_hash). Colibri simplifies to a single hash over a canonical-JSON projection — see §4.
  • 64-zero genesis hash (line 29): preserved as ZERO_HASH.
  • Record schema (lines 51-67) — donor has session_id, parent_id, content, content_hash, chain_hash, metadata{task_id, agent_id, ...}, created_at. Colibri flattens task_id and agent_id to top-level fields and renames parent.chain_hashprev_hash, content_hash + chain_hashhash. No session_id (P0.7.2 may re-introduce), no metadata subobject, no parent_id (linearised via prev_hash), no created_at (renamed timestamp).
  • Tool surface (lines 158-165): 6 donor tools (thought_record, thought_get, thought_tree, thought_trail, thought_verify, thought_search). Phase-0 ζ ships only thought_record + audit_verify_chain per ADR-004 R74.5. Neither lands in P0.7.1 (P0.7.2 and P0.7.3 respectively).
  • FTS5 search (lines 117-135): donor feature. Not in Phase 0.
  • Thought trees / branching (lines 140-154): donor feature. Phase-0 uses a strict linear chain via prev_hash.

None of the donor source lands in P0.7.1. The algorithm is a full rewrite; only the type enum and the ZERO_HASH genesis constant carry over verbatim.


§8. Parallel-wave collision check

Wave C dispatches four tasks in parallel:

Task Owner files Collision with P0.7.1?
P0.2.3 two-phase startup src/server.ts (edits) No — P0.7.1 does not touch server.ts.
P0.3.1 β state machine src/domains/tasks/* No — sibling under src/domains/, disjoint directory.
P0.6.1 ε skill schema src/domains/skills/* No — sibling under src/domains/, disjoint directory.
P0.7.1 (this task) src/domains/trail/* + src/__tests__/trail-schema.test.ts n/a

No shared edits. The src/domains/ parent directory must exist (any of P0.3.1/P0.6.1/P0.7.1 may create it — mkdir -p is idempotent; no merge conflict risk given git tracks files not directories).

Shared infra files the dispatch prompt explicitly forbids this task from touching:

  • src/server.ts, src/db/*, src/config.ts, src/modes.ts — confirmed not modified.
  • package.json, jest.config.ts, tsconfig.json — no new deps, no config changes; confirmed not modified.
  • src/domains/tasks/*, src/domains/skills/* — confirmed not touched.

§9. Baseline verification

$ git rev-parse HEAD
3ebbe419  (P0.2.2 SQLite init merged as PR #122)

$ git log -1 --format=%s
feat(p0-2-2): SQLite init — better-sqlite3 WAL+FK + PRAGMA user_version migrations (#122)

$ node --version
v20.x (from CI matrix — P0.1.3 established Node 20 floor)

$ cat package.json | grep -E "zod|better-sqlite3"
    "better-sqlite3": "^11.5.0",
    "zod": "^3.23.8",

All prerequisites from the dependency chain (P0.1.1 scaffolding, P0.1.2 Jest ESM, P0.1.4 config, P0.2.2 SQLite init) are present at HEAD. zod@^3.23.8 is available; node:crypto is a Node built-in (no package dependency).


§10. Exit criteria for Step 1

  • All target files catalogued as absent in §1.
  • Adjacent integration surfaces documented in §2.
  • Spec vs packet deviation resolved in §3 (one: test file location).
  • Hash-input subset rationale captured in §4.
  • Canonical-JSON algorithm + test edge cases captured in §5.
  • Acceptance criteria mapped to test paths in §6.
  • Donor genealogy acknowledged as reference-only in §7.
  • Parallel-wave collision check completed in §8.
  • Baseline commit + toolchain verified in §9.

Ready to proceed to Step 2 (contract).


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.