Database: data/colibri.db

All state in Colibri lives in a single file: data/colibri.db. This is a SQLite 3 database opened via better-sqlite3 in WAL (Write-Ahead Log) mode. No state exists outside this file. If it is not in the database, it did not happen.

Status: The database file does not yet exist. It is a Phase 0 target created by P0.2.2. When the server boots (Step 5 of Boot Sequence), it creates the file and runs all migrations.

Single Writer, Multiple Readers

Colibri uses a single-writer architecture:

  • One process writer — the Colibri server is the only process that opens the database file for writing.
  • Multiple readers — tools and scripts outside the server can open the database in read-only mode to inspect state.
  • WAL mode — Write-Ahead Log mode allows concurrent reads while the server is writing. Readers do not block writers; writers do not block readers.

There is no horizontal scaling in Phase 0. A second Colibri instance cannot be started on the same database file — the file lock ensures only one writer exists. Scaling to multiple writers (either processes or nodes) requires a different architecture (e.g., distributed consensus) and is deferred to Phase 1+.

WAL Mode

The database opens with WAL mode enabled:

db.pragma('journal_mode = WAL');

How WAL works:

  1. When the server writes a record, the change is first appended to a .wal file (the “log”).
  2. Readers can immediately see committed data from the main database file and uncommitted data from the .wal file.
  3. Periodically (or when the .wal file gets large), the server runs a “checkpoint” — merges the .wal changes into the main database file and deletes the .wal file.

Why WAL instead of the default journal mode?

  • Durability — every write is durable (flushed to disk) before the tool call returns.
  • Concurrent reads — readers don’t wait for writers; they read from the main file or the .wal file as needed.
  • Performance — WAL reduces disk seeks compared to the default journal mode.

Load-Bearing Tables

Phase 0 creates tables as concepts earn them. Three tables are load-bearing:

1. thought_records (ζ Decision Trail)

Stores every decision ever made by the system in a hash-chained sequence:

CREATE TABLE thought_records (
  id TEXT PRIMARY KEY,
  session_id TEXT NOT NULL,
  sequence INTEGER NOT NULL,
  thought_type TEXT NOT NULL,  -- 'action', 'reflection', 'verification'
  content TEXT NOT NULL,       -- markdown or JSON body
  parent_hash TEXT,            -- SHA-256 of previous record
  record_hash TEXT NOT NULL,   -- SHA-256(id || content)
  created_at TEXT NOT NULL,
  created_by TEXT NOT NULL,
  UNIQUE(session_id, sequence)
);

This table is append-only. Records are never updated or deleted — only added. The parent_hash field chains records together so the entire sequence is verifiable with audit_verify_chain.

2. merkle_nodes (η Proof Store)

Stores the Merkle tree built from the thought records:

CREATE TABLE merkle_nodes (
  id TEXT PRIMARY KEY,
  session_id TEXT NOT NULL,
  level INTEGER NOT NULL,      -- 0 (leaves) to n (root)
  position INTEGER NOT NULL,   -- position at this level
  value TEXT NOT NULL,         -- SHA-256 hash
  left_child TEXT,             -- left child node id
  right_child TEXT,            -- right child node id
  created_at TEXT NOT NULL,
  UNIQUE(session_id, level, position)
);

After all thought records are added to a session, merkle_finalize builds this tree. The root node’s value is returned by merkle_root as the final proof hash. Every record is cryptographically bound to this root.

3. audit_events (α Envelope Log)

Stores every tool call, before and after it executes:

CREATE TABLE audit_events (
  id TEXT PRIMARY KEY,
  call_id TEXT NOT NULL,
  session_id TEXT NOT NULL,
  tool_name TEXT NOT NULL,
  stage TEXT NOT NULL,        -- 'enter' or 'exit'
  params_hash TEXT,           -- SHA-256 hash of input
  result_hash TEXT,           -- SHA-256 hash of output (null until exit)
  duration_ms INTEGER,        -- null until exit
  error_code TEXT,            -- if failed
  sequence INTEGER NOT NULL,  -- monotonic per tool
  created_at TEXT NOT NULL,
  UNIQUE(tool_name, sequence)
);

The middleware chain writes an ‘enter’ row at the start of the tool call (Step 3) and an ‘exit’ row at the end (Step 5). This table is the source of truth for what was called, when, and what happened.

Other Phase 0 Tables

Alongside the three load-bearing tables, other concepts introduce tables:

β Task Pipeline

CREATE TABLE tasks (
  id TEXT PRIMARY KEY,
  project_id TEXT NOT NULL,
  title TEXT NOT NULL,
  description TEXT,
  status TEXT NOT NULL,       -- INIT, GATHER, ANALYZE, PLAN, APPLY, VERIFY, DONE, CANCELLED
  created_at TEXT NOT NULL,
  created_by TEXT NOT NULL,
  updated_at TEXT NOT NULL,
  updated_by TEXT NOT NULL,
  estimate_hours REAL,
  priority TEXT,              -- low, normal, high, critical
  assignee TEXT,
  parent_id TEXT,             -- for sub-tasks
  FOREIGN KEY(parent_id) REFERENCES tasks(id)
);

CREATE TABLE task_dependencies (
  task_id TEXT NOT NULL,
  depends_on TEXT NOT NULL,
  created_at TEXT NOT NULL,
  PRIMARY KEY (task_id, depends_on),
  FOREIGN KEY(task_id) REFERENCES tasks(id),
  FOREIGN KEY(depends_on) REFERENCES tasks(id)
);

ε Skill Registry

A minimal skill registry (not a full skill runtime — skills are just discoverable prose playbooks):

CREATE TABLE skills (
  id TEXT PRIMARY KEY,
  name TEXT NOT NULL UNIQUE,  -- e.g., 'colibri-task-executor'
  description TEXT,
  path TEXT NOT NULL,         -- path to SKILL.md in .agents/skills/
  loaded_at TEXT NOT NULL,
  content_hash TEXT           -- SHA-256 of SKILL.md for change detection
);

Schema Initialization and Migrations

When the server boots (Step 5), it:

  1. Creates the database file if it doesn’t exist.
  2. Runs all pending migrations from src/db/schema.sql and any incremental .sql files in src/db/migrations/.
  3. Verifies integrity with PRAGMA integrity_check.

Migrations are idempotent — running the same migration twice has no effect. Each migration includes a check: “if this table exists, skip this migration.”

Example migration (pseudocode):

-- 001-initial-schema.sql
CREATE TABLE IF NOT EXISTS thought_records (
  id TEXT PRIMARY KEY,
  ...
);

CREATE TABLE IF NOT EXISTS merkle_nodes (
  id TEXT PRIMARY KEY,
  ...
);

CREATE TABLE IF NOT EXISTS audit_events (
  id TEXT PRIMARY KEY,
  ...
);

If the database already has these tables, the migrations skip them. This allows the server to be restarted (or a new instance started on an existing database file) without re-running schema initialization.

“Earned” Tables

A table is “earned” if a Phase 0 concept has a sub-task acceptance criterion that requires it. The three load-bearing tables are earned by their concepts:

  • thought_records — earned by P0.7 (ζ Decision Trail implementation).
  • merkle_nodes — earned by P0.8 (η Proof Store implementation).
  • audit_events — earned by α (System Core) and γ (Server Lifecycle) middleware.

Task, dependency, and skill tables are earned by their own acceptance criteria (P0.3, P0.6).

Tables not in Phase 0: The donor AMS runtime had 78 tables across RAG, memory tiers, workflows, conversation threads, alerts, and more. None of those are required for Phase 0. If a concept does not have a Phase 0 task with an acceptance criterion, its tables don’t exist. This keeps the schema small and understandable.

Transactions and Isolation

Every tool call runs inside a SQLite transaction:

const result = db.transaction(() => {
  // Tool handler code here
  // All writes are atomic
})();

If the tool call fails partway through, the transaction is rolled back and the database state is unchanged. This guarantees atomicity: either the entire tool call succeeds and is written, or nothing is written.

SQLite’s default isolation level is SERIALIZABLE (the strongest) within a single-writer process. This means:

  • No dirty reads (reading uncommitted data).
  • No non-repeatable reads (a record you read changes before you finish).
  • No phantom reads (new records appear mid-transaction).

Concurrency is handled at the process level via the tool-lock middleware (Step 1 of the 5-stage chain), not inside the database. Only one tool call executes at a time, so isolation is not an issue in Phase 0.

Health and Integrity

Every 30 seconds, the server runs:

db.pragma('integrity_check');

If the integrity check fails, the server transitions to SAFE_MODE and logs an alert. This detects corruption (disk errors, unexpected crashes, etc.) before it spreads to new writes.

Also, on startup:

db.pragma('quick_check');      // Faster scan of database structure
db.pragma('integrity_check');  // Full integrity verification

If either check fails, the server exits with code 75 (resource error) and does not proceed.

Backups and Recovery

Phase 0 does not prescribe a backup or recovery mechanism. Backup strategy is outside the scope of the server itself. In practice:

  • Users should back up data/colibri.db and the .wal file together (they are a pair).
  • Partial backups (e.g., only the main database file without the .wal file) may miss recent writes.
  • Point-in-time recovery requires operational tooling (e.g., periodic snapshots or a write-ahead log replicated to another location).

These are operational concerns that arrive with Phase 1+ infrastructure requirements.

See Also


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.