R88.B — colibri-verification SKILL.md merkle_finalize failure mode (Contract)

1. PM authorization (combined scope)

PM (T2) issued an explicit authorization on 2026-05-07 to merge what was originally an R88.B feat-only slice with a chore-grade mirror resync, after Step 1 surfaced a pre-existing canonical/mirror drift left as a deferred resync task at the close of post-R83 hygiene (2026-05-05). The PM authorization message reads:

“PM authorizes Option B: combined-scope re-execution. Continue from your current worktree state — do NOT start fresh. … You will now ship mirror resync + 2 surgical edits in a single PR.”

The audit doc (docs/audits/r88-b-verification-skill-merkle-failure-mode-audit.md, committed at 9437d85a) §7 enumerated three options (A — sequential split, B — combined-scope rewrite, C — defer); PM chose Option B with explicit guidance on order of operations. This contract documents the combined scope.

2. Behavioral contract

2.1 Pre-state (at base 2506bb44)

File Lines Status
.agents/skills/colibri-verification/SKILL.md 333 post-R83 hygiene canonical (drift source)
.claude/skills/colibri-verification/SKILL.md 296 pre-R83 mirror (drift target)
diff -q between the two files differ (37-line gap; 9-category divergence per audit §4)

2.2 Post-state (acceptance invariant)

Invariant Verification
Both files byte-identical diff -q .agents/skills/colibri-verification/SKILL.md .claude/skills/colibri-verification/SKILL.md returns empty
Both files contain a new “Common Verification Failures” row covering merkle_finalize returning ERR_NO_RECORDS despite reflection being recorded grep -c 'ERR_NO_RECORDS' <both files> ≥ 1 each
Both files contain a new caveat paragraph above the “Verification Tools Quick Reference” sequence numbered list, clarifying that merkle_finalize / merkle_root may be currently non-functional and that “Symbolic Merkle” is the documented decorative pattern grep -c 'symbolic\|Symbolic Merkle' <both files> ≥ 1 each
Both files reference investigation task 6f309f3a-7d22-4e2c-a02d-3a62fc46c834 grep -c '6f309f3a-7d22-4e2c-a02d-3a62fc46c834' <both files> = 1 each
Both files reference feedback_audit_session_task_binding.md grep -c 'feedback_audit_session_task_binding' <both files> ≥ 1 each
Final canonical line count 333 (pre) + ~10–15 (new content) ≈ 343–348
Final mirror line count == final canonical
No regressions in build / lint / tests npm run build && npm run lint && npm test green
Changelog updated New post-R88.B paragraph at end of canonical file documenting both the resync and the new content; mirror gets the same line by virtue of byte-copy

2.3 Out-of-scope

  • No edits to any other skill file (.agents/skills/colibri-* or .claude/skills/colibri-*)
  • No edits to src/, tests/, or any other code
  • No edits to CLAUDE.md or root files
  • No edits to docs/ other than the four chain artefacts (audit, contract, packet, verification)
  • No new tests (this slice is doc-only)
  • No task_update to any task other than 9a104b4b-ee9e-434c-954c-801afdd91068
  • No new investigation work for 6f309f3a-7d22-4e2c-a02d-3a62fc46c834 — that task remains parked; this slice only references it

3. Precedent reference (R77.C)

The mirror resync follows the precedent established in R77.C (commit 6a67be69, “R77.C: resync 3 drifted .claude/skills/ mirrors from .agents/ canon (#167)”). R77.C was a chore-scope resync round that brought three drifted mirrors (colibri-mcp-server, colibri-tier1-chains, colibri-growth-strategy) byte-clean against canon. R88.B’s mirror resync replicates that pattern for colibri-verification, with two differences:

  1. R77.C was chore-only; R88.B combines the resync with a feat edit per PM authorization.
  2. R77.C did three skills in one slice; R88.B does one.

The post-state acceptance invariant (diff -q returns empty) is the same as R77.C’s, and the verification step uses the same diff command.

4. Order of operations (post-state-equivalent)

The PM-authorized order is:

  1. Mirror resync first — byte-copy canonical → mirror. After this step, diff -q returns empty (intermediate post-state).
  2. Apply edits to canonical — append the failures-table row and the caveat paragraph. After this step, diff -q will differ again (canonical has 2 new sections, mirror does not).
  3. Re-byte-copy canonical → mirror — refresh the mirror to incorporate the edits. After this step, diff -q returns empty (final post-state).

PM authorization explicitly allows the alternative of “applying identical edits to both directly” — provided the post-state is identical. This contract elects the resync-then-edit-then-resync sequence because it provably produces byte-identical files (a cp -p between two paths is the strongest possible byte-equality witness).

5. Content specification — the two new sections

5.1 Failures-table row (canonical insertion point: between line 301 NoThoughtRecordsError row and line 302 Merkle root missing row)

Failure column: merkle_finalize returns ERR_NO_RECORDS despite reflection being recorded and tools being called.

Cause column: even with audit_session_start { task_id } + thought_record { task_id } ordering correct, the current Phase 0 server does not appear to bind individual thought_records to an audit session_id in a way merkle_finalize can resolve — the structural cause is opaque session/record binding (see investigation task 6f309f3a-7d22-4e2c-a02d-3a62fc46c834 and memory file feedback_audit_session_task_binding.md). Earlier rounds (R83/R86) sidestepped this by writing decorative “Symbolic Merkle” strings in seal manifests rather than calling merkle_root.

Resolution column: use audit_verify_chain { task_id } as the actual Phase 0 proof grade — per-task chains validate cleanly via prev_hash linkage. Treat merkle_finalize / merkle_root as best-effort decorations; do not block session close on getting a real Merkle root. Track via investigation task 6f309f3a-7d22-4e2c-a02d-3a62fc46c834.

5.2 Quick Reference caveat paragraph (canonical insertion point: above the numbered list at line 235 1. audit_session_start)

The caveat names the symptom (R87 failure mode), names the workaround attempted in R87 (also failed), names the actual proof grade (audit_verify_chain { task_id }), names the documented decorative pattern (“Symbolic Merkle” naming), cites investigation task 6f309f3a-7d22-4e2c-a02d-3a62fc46c834 and memory file feedback_audit_session_task_binding.md, and explicitly distinguishes the new failure mode from the existing session_id gap paragraph (which is about the input-schema gap, not the runtime opaque binding gap).

6. Test plan

Build / lint / test gate per CLAUDE.md §5:

cd .worktrees/claude/r88-b-verification-skill-merkle-failure-mode
npm run build
npm run lint
npm test

All three MUST be green. This is a doc-only slice so no new tests are added; the existing 1972 tests must continue to pass.

Doc-state gate:

diff -q .agents/skills/colibri-verification/SKILL.md \
        .claude/skills/colibri-verification/SKILL.md
# expect: empty output

grep -c 'ERR_NO_RECORDS' .agents/skills/colibri-verification/SKILL.md \
                         .claude/skills/colibri-verification/SKILL.md
# expect: ≥ 1 each

grep -c '6f309f3a-7d22-4e2c-a02d-3a62fc46c834' \
        .agents/skills/colibri-verification/SKILL.md \
        .claude/skills/colibri-verification/SKILL.md
# expect: 1 each (referenced once in failures row)

grep -c 'feedback_audit_session_task_binding' \
        .agents/skills/colibri-verification/SKILL.md \
        .claude/skills/colibri-verification/SKILL.md
# expect: ≥ 1 each

7. Risk register

Risk Likelihood Impact Mitigation
Mirror resync introduces unintended byte-level drift (BOM, line-endings) Low Medium Use cp -p (Bash); verify with diff -q; do not edit mirror by hand
Edits invalidate the changelog at canonical line 333 High Low Update the line-333 paragraph and add a new R88.B-stamp paragraph; mirror gets both via byte-copy
Build / lint / test gate red Low High This slice touches zero code; the only theoretical regression path is jest collecting .md files (it does not)
Worktree-pin quirk on PR merge (R76 lesson) Medium Low Remove worktree before gh pr merge --squash --delete-branch per memory feedback_gh_pr_merge_delete_branch_quirk.md
Investigation task 6f309f3a-… not found in the live system (we are not attached to MCP) High Low Mention it as a “parked investigation task” reference; the writeback in this PR’s body carries the link forward in memory and PR archaeology

8. Acceptance gate (Step 4 may proceed)

This contract is approved for implementation. PM has explicitly authorized the combined scope. Step 3 (Packet) follows immediately to lay out the exact diff sequence; Step 4 (Implement) executes it.


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.