Backup & Restore
Phase 0 Status: This document describes target behavior. No Colibri TypeScript code exists yet. Implementation begins at Phase 0. The commands below assume the Phase 0 server entry point (
src/server.ts→dist/server.js) and primary database (data/colibri.db) have been created by P0.2.1 and P0.2.2.
What the backup covers
Phase 0 Colibri is a single-writer SQLite system. All durable state lives in data/colibri.db — tasks, skills, thought records, Merkle tree, audit chain. A backup of that one file is a full backup of the system.
Three tables are load-bearing and must round-trip without loss:
| Table | Role |
|---|---|
tasks |
β pipeline state (status, progress, deps) |
thought_records |
ζ decision trail (audit chain, HMAC-linked) |
merkle_nodes |
η proof tree (canonical hash roots) |
If any of these three is missing or corrupt after restore, the backup failed.
SQLite WAL mode note
Phase 0 runs SQLite in WAL mode (set at boot by src/db/index.ts). A live DB has three files on disk:
data/colibri.db(main)data/colibri.db-wal(write-ahead log)data/colibri.db-shm(shared memory index)
A naive cp data/colibri.db … while the server is running captures the main file without the WAL, losing recent writes. Use the SQLite .backup command instead — it is WAL-aware and produces a consistent snapshot without stopping the server.
Backup command (canonical)
sqlite3 data/colibri.db ".backup data/backups/colibri-<round>-<date>.db"
Example:
mkdir -p data/backups
sqlite3 data/colibri.db ".backup data/backups/colibri-r75-20260416.db"
The <round> slug (e.g. r75) lets you tie a snapshot to a sealed round; <date> is YYYYMMDD.
Cadence
| Trigger | Retention tier | Action |
|---|---|---|
| End of every round (Sigma seal) | hot | Fresh .backup into data/backups/ |
| End of every session | hot | Fresh .backup into data/backups/ |
| Phase seal (Phase 0 → Phase 1, etc.) | cold | Frozen snapshot, copied off-host |
| Ad-hoc before destructive migration | hot | Extra .backup named colibri-premigrate-… |
Retention tiers
| Tier | Location | Age | Policy |
|---|---|---|---|
| hot | data/backups/ |
0–7 days | Keep every round + session snapshot |
| warm | data/backups/ |
7–30 days | Keep one per round only |
| cold | Off-host (external drive, cloud) | ≥ phase seal | Keep forever; one per phase seal |
Integrity check
After every backup, and before every restore, run:
sqlite3 data/backups/colibri-r75-20260416.db "PRAGMA integrity_check;"
Expected output: exactly ok.
Any other result (missing pages, malformed index, row/page checksum mismatch) means the snapshot is not safe to restore from. Discard it and fall back to the previous hot-tier snapshot.
For a deeper check that also validates foreign keys:
sqlite3 data/backups/colibri-r75-20260416.db "PRAGMA foreign_key_check;"
Expected output: empty (no violations).
Restore runbook
Phase 0 has a single MCP stdio server with no hot-standby. Restore is a stop-the-world operation.
- Stop the Colibri server. Send SIGINT to the process; the signal handler runs writeback, seals the active Merkle tree, closes the DB, and exits. Wait for the process to fully terminate.
- Verify the backup you intend to restore. Run
PRAGMA integrity_check;on the candidate file (see above). If the result is notok, pick a different snapshot. - Move the current DB aside (do not delete — it may be needed for forensic diff):
mv data/colibri.db data/colibri.db.broken-$(date +%Y%m%d-%H%M%S) rm -f data/colibri.db-wal data/colibri.db-shm - Copy the backup into place:
cp data/backups/colibri-r75-20260416.db data/colibri.db - Re-check integrity on the restored file:
sqlite3 data/colibri.db "PRAGMA integrity_check;" - Restart the server (
node dist/server.js, or the.vscode/mcp-settings.example.jsonlauncher). First boot after restore re-opens the DB in WAL mode and rebuildscolibri.db-wal/colibri.db-shmfrom scratch. - Verify the audit chain by calling
audit_verify_chainvia the MCP client. A clean restore returnsok. Abreak_atindex means the chosen backup pre-dates a chain extension and cannot be trusted — try a newer snapshot. - Verify the Merkle tree by calling
merkle_rootand comparing against the externally-stored root for that snapshot (see “External root anchoring” below).
Corruption detection
Signs the live DB is corrupt:
- Startup logs show
SQLITE_CORRUPTorSQLITE_NOTADB. PRAGMA integrity_check;returns anything other thanok.- Tool calls return
SQLITE_ERROR: database disk image is malformed. audit_verify_chainreturns abreak_atindex from a tool call that should have extended the chain cleanly.
If any of these fire, stop writing immediately — SQLite does not self-heal a corrupt page — and go to the restore runbook.
Recovery from corruption
- Do not attempt repair-in-place. Phase 0 has no DB-repair tool (the donor-era
npm run db:repairis not a Phase 0 feature). - Follow the restore runbook against the most recent hot-tier snapshot that passes
PRAGMA integrity_check. - Keep the corrupt file (
data/colibri.db.broken-…) for root-cause analysis. Do not overwrite it. - If no hot-tier snapshot is clean, walk back through warm, then cold. Each generation you walk back loses work that occurred after that snapshot — log the loss in a
thought_recordat next boot so the gap is visible in the audit chain.
External root anchoring
Merkle roots finalized by merkle_finalize are the proof anchors for what the state was at a given moment. They are small (a single hash) and cheap to store off-host.
At every round seal, record the finalized root somewhere outside data/colibri.db:
- A line in the round’s seal document under
docs/. - A line in the session seal.
- Optionally an external append-only log (file, signed commit).
If the DB is lost entirely and no backup restores cleanly, the external root still proves what the canonical state was at the anchor point, even if the corresponding records cannot be recovered.
What NOT to back up
| Path | Why skip |
|---|---|
.worktrees/ |
Ephemeral per-task feature branches (SCRATCH zone) |
temp/ |
Round staging + vault staging (SCRATCH zone; gitignored) |
node_modules/ |
Re-installable from package-lock.json |
data/backups/ |
Don’t back up backups into themselves; use off-host for cold tier |
data/colibri.db-wal, data/colibri.db-shm |
Transient; recreated at next DB open |
Heritage note
data/ams.db (72 MB) is donor runtime state from pre-R53 AMS. It is kept read-only during Phase 0 bootstrap as a task-store + writeback target until R78, after which it is frozen. It is not a backup target for Phase 0 and is not the primary DB. Do not restore into it and do not point Phase 0 code at it. Its presence in the tree is HERITAGE zone genealogy, not a live fallback. See data/README.md for the zone rules.
Cross-links
docs/2-plugin/database.md— SQLite schema, WAL mode, single-writer invariant.docs/3-world/execution/scale.md— why Phase 0 is single-node single-writer, and what changes at scale.docs/guides/troubleshoot.md— symptom table for boot, runtime, and DB issues (DB corruption procedures cross-link back to this file).