R88.B — colibri-verification SKILL.md merkle_finalize failure mode (Contract)
1. PM authorization (combined scope)
PM (T2) issued an explicit authorization on 2026-05-07 to merge what was originally an R88.B feat-only slice with a chore-grade mirror resync, after Step 1 surfaced a pre-existing canonical/mirror drift left as a deferred resync task at the close of post-R83 hygiene (2026-05-05). The PM authorization message reads:
“PM authorizes Option B: combined-scope re-execution. Continue from your current worktree state — do NOT start fresh. … You will now ship mirror resync + 2 surgical edits in a single PR.”
The audit doc (docs/audits/r88-b-verification-skill-merkle-failure-mode-audit.md, committed at 9437d85a) §7 enumerated three options (A — sequential split, B — combined-scope rewrite, C — defer); PM chose Option B with explicit guidance on order of operations. This contract documents the combined scope.
2. Behavioral contract
2.1 Pre-state (at base 2506bb44)
| File | Lines | Status |
|---|---|---|
.agents/skills/colibri-verification/SKILL.md |
333 | post-R83 hygiene canonical (drift source) |
.claude/skills/colibri-verification/SKILL.md |
296 | pre-R83 mirror (drift target) |
diff -q between the two |
— | files differ (37-line gap; 9-category divergence per audit §4) |
2.2 Post-state (acceptance invariant)
| Invariant | Verification |
|---|---|
| Both files byte-identical | diff -q .agents/skills/colibri-verification/SKILL.md .claude/skills/colibri-verification/SKILL.md returns empty |
Both files contain a new “Common Verification Failures” row covering merkle_finalize returning ERR_NO_RECORDS despite reflection being recorded |
grep -c 'ERR_NO_RECORDS' <both files> ≥ 1 each |
Both files contain a new caveat paragraph above the “Verification Tools Quick Reference” sequence numbered list, clarifying that merkle_finalize / merkle_root may be currently non-functional and that “Symbolic Merkle” is the documented decorative pattern |
grep -c 'symbolic\|Symbolic Merkle' <both files> ≥ 1 each |
Both files reference investigation task 6f309f3a-7d22-4e2c-a02d-3a62fc46c834 |
grep -c '6f309f3a-7d22-4e2c-a02d-3a62fc46c834' <both files> = 1 each |
Both files reference feedback_audit_session_task_binding.md |
grep -c 'feedback_audit_session_task_binding' <both files> ≥ 1 each |
| Final canonical line count | 333 (pre) + ~10–15 (new content) ≈ 343–348 |
| Final mirror line count | == final canonical |
| No regressions in build / lint / tests | npm run build && npm run lint && npm test green |
| Changelog updated | New post-R88.B paragraph at end of canonical file documenting both the resync and the new content; mirror gets the same line by virtue of byte-copy |
2.3 Out-of-scope
- No edits to any other skill file (
.agents/skills/colibri-*or.claude/skills/colibri-*) - No edits to
src/,tests/, or any other code - No edits to
CLAUDE.mdor root files - No edits to
docs/other than the four chain artefacts (audit, contract, packet, verification) - No new tests (this slice is doc-only)
- No
task_updateto any task other than9a104b4b-ee9e-434c-954c-801afdd91068 - No new investigation work for
6f309f3a-7d22-4e2c-a02d-3a62fc46c834— that task remains parked; this slice only references it
3. Precedent reference (R77.C)
The mirror resync follows the precedent established in R77.C (commit 6a67be69, “R77.C: resync 3 drifted .claude/skills/ mirrors from .agents/ canon (#167)”). R77.C was a chore-scope resync round that brought three drifted mirrors (colibri-mcp-server, colibri-tier1-chains, colibri-growth-strategy) byte-clean against canon. R88.B’s mirror resync replicates that pattern for colibri-verification, with two differences:
- R77.C was chore-only; R88.B combines the resync with a feat edit per PM authorization.
- R77.C did three skills in one slice; R88.B does one.
The post-state acceptance invariant (diff -q returns empty) is the same as R77.C’s, and the verification step uses the same diff command.
4. Order of operations (post-state-equivalent)
The PM-authorized order is:
- Mirror resync first — byte-copy canonical → mirror. After this step,
diff -qreturns empty (intermediate post-state). - Apply edits to canonical — append the failures-table row and the caveat paragraph. After this step,
diff -qwill differ again (canonical has 2 new sections, mirror does not). - Re-byte-copy canonical → mirror — refresh the mirror to incorporate the edits. After this step,
diff -qreturns empty (final post-state).
PM authorization explicitly allows the alternative of “applying identical edits to both directly” — provided the post-state is identical. This contract elects the resync-then-edit-then-resync sequence because it provably produces byte-identical files (a cp -p between two paths is the strongest possible byte-equality witness).
5. Content specification — the two new sections
5.1 Failures-table row (canonical insertion point: between line 301 NoThoughtRecordsError row and line 302 Merkle root missing row)
Failure column: merkle_finalize returns ERR_NO_RECORDS despite reflection being recorded and tools being called.
Cause column: even with audit_session_start { task_id } + thought_record { task_id } ordering correct, the current Phase 0 server does not appear to bind individual thought_records to an audit session_id in a way merkle_finalize can resolve — the structural cause is opaque session/record binding (see investigation task 6f309f3a-7d22-4e2c-a02d-3a62fc46c834 and memory file feedback_audit_session_task_binding.md). Earlier rounds (R83/R86) sidestepped this by writing decorative “Symbolic Merkle” strings in seal manifests rather than calling merkle_root.
Resolution column: use audit_verify_chain { task_id } as the actual Phase 0 proof grade — per-task chains validate cleanly via prev_hash linkage. Treat merkle_finalize / merkle_root as best-effort decorations; do not block session close on getting a real Merkle root. Track via investigation task 6f309f3a-7d22-4e2c-a02d-3a62fc46c834.
5.2 Quick Reference caveat paragraph (canonical insertion point: above the numbered list at line 235 1. audit_session_start)
The caveat names the symptom (R87 failure mode), names the workaround attempted in R87 (also failed), names the actual proof grade (audit_verify_chain { task_id }), names the documented decorative pattern (“Symbolic Merkle” naming), cites investigation task 6f309f3a-7d22-4e2c-a02d-3a62fc46c834 and memory file feedback_audit_session_task_binding.md, and explicitly distinguishes the new failure mode from the existing session_id gap paragraph (which is about the input-schema gap, not the runtime opaque binding gap).
6. Test plan
Build / lint / test gate per CLAUDE.md §5:
cd .worktrees/claude/r88-b-verification-skill-merkle-failure-mode
npm run build
npm run lint
npm test
All three MUST be green. This is a doc-only slice so no new tests are added; the existing 1972 tests must continue to pass.
Doc-state gate:
diff -q .agents/skills/colibri-verification/SKILL.md \
.claude/skills/colibri-verification/SKILL.md
# expect: empty output
grep -c 'ERR_NO_RECORDS' .agents/skills/colibri-verification/SKILL.md \
.claude/skills/colibri-verification/SKILL.md
# expect: ≥ 1 each
grep -c '6f309f3a-7d22-4e2c-a02d-3a62fc46c834' \
.agents/skills/colibri-verification/SKILL.md \
.claude/skills/colibri-verification/SKILL.md
# expect: 1 each (referenced once in failures row)
grep -c 'feedback_audit_session_task_binding' \
.agents/skills/colibri-verification/SKILL.md \
.claude/skills/colibri-verification/SKILL.md
# expect: ≥ 1 each
7. Risk register
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Mirror resync introduces unintended byte-level drift (BOM, line-endings) | Low | Medium | Use cp -p (Bash); verify with diff -q; do not edit mirror by hand |
| Edits invalidate the changelog at canonical line 333 | High | Low | Update the line-333 paragraph and add a new R88.B-stamp paragraph; mirror gets both via byte-copy |
| Build / lint / test gate red | Low | High | This slice touches zero code; the only theoretical regression path is jest collecting .md files (it does not) |
| Worktree-pin quirk on PR merge (R76 lesson) | Medium | Low | Remove worktree before gh pr merge --squash --delete-branch per memory feedback_gh_pr_merge_delete_branch_quirk.md |
Investigation task 6f309f3a-… not found in the live system (we are not attached to MCP) |
High | Low | Mention it as a “parked investigation task” reference; the writeback in this PR’s body carries the link forward in memory and PR archaeology |
8. Acceptance gate (Step 4 may proceed)
This contract is approved for implementation. PM has explicitly authorized the combined scope. Step 3 (Packet) follows immediately to lay out the exact diff sequence; Step 4 (Implement) executes it.