Quick Start — Try Colibri

Status: Phase 0 at 100% on non-deferred tasks (28/28 shipped as of R75 Wave I — 2026-04-18). P0.5.1/P0.5.2 shipped as δ library-only stubs per ADR-005 §Decision (PR #149 scoring, PR #150 fallback); full multi-model routing lands Phase 1.5. The MCP server boots over stdio and registers 14 tools at a22dd23e. This page describes the live Phase 0 surface.

Five-minute orientation

Colibri = a TypeScript MCP server (stdio) that:

Executes tasks through the formal 8-state β FSM: INIT → GATHER → ANALYZE → PLAN → APPLY → VERIFY → DONE, plus CANCELLED from any state. Enforced in the β state machine (src/domains/tasks/state-machine.ts) — task_update accepts a status field and routes it through the FSM; illegal transitions are rejected.
Runs on Claude only in Phase 0. The δ model-router ships as library-only stubs in Phase 0 per ADR-005 §Decision — constant scoring (always Claude) + single-member fallback chain (Claude). Full multi-model scoring, N-member fallback, and circuit breaker land in Phase 1.5.
Records every decision as a hash-chained ζ audit trail — SHA-256 linked thought_records verified by audit_verify_chain.
Seals work into a Merkle η proof store — merkle_finalize builds the tree, merkle_root returns the root hash.
Does not spawn sub-agents via MCP. Sub-agents in Phase 0 are spawned with the Task tool (Claude’s built-in sub-agent dispatch) into .worktrees/claude/<task-slug> feature worktrees. The donor agent_spawn / agent_status / agent_list tools and the entire src/domains/agents/ target are deferred to Phase 1.5 per ADR-005.

For whom? Teams running multi-step agentic workflows who need accountability and memory. Not a chatbot. Highly opinionated orchestration runtime.

What ships in Phase 0? 14 MCP tools over stdio (R74.5 planned 19; 5 were closed/struck/deferred during implementation and 1 was added). See the next section.

What Phase 0 delivers — the 14-tool shipped surface

After Phase 0 Waves A–I (28/28 tasks shipped; 100% on non-deferred work), the server exposes these 14 tools across five concept letters. ADR-004 (R74.5 originally planned 19; R75 Wave H amendment reconciles the count to what actually shipped; Wave I did not add tools — δ stubs are library-only).

β Task Pipeline — 5 tools

task_create — Create a task (returns task_id, initial state INIT)
task_list — List tasks with filters (status, priority, owner, tag) and pagination
task_get — Get a single task by id with full fields
task_update — Partial update of mutable fields (description, priority, progress, owner, tags). Accepts status and routes transitions through the β state machine (src/domains/tasks/state-machine.ts) — illegal transitions rejected with ERR_INVALID_TRANSITION. No separate task_transition tool exists in Phase 0 (merged during P0.3.4).
task_next_actions — Return unblocked tasks in priority order

ζ Decision Trail — 4 tools (axis closed in Wave G)

audit_session_start — Open a proof-grade session (returns session_id)
thought_record — Append a hash-chained decision row (thought_type: plan | decision | analysis | reflection | …)
thought_record_list — Read the thought chain for a session
audit_verify_chain — Verify the SHA-256 chain from session start to tip (shipped Wave G, P0.7.3)

η Proof Store — 2 tools (axis complete in Wave F)

merkle_finalize — Build the Merkle tree over the session’s thought records (also serves as the session-close signal — no separate audit_session_end tool in Phase 0)
merkle_root — Return the finalized root hash + metadata

ε Skill Registry — 1 tool in Phase 0

skill_list — List the 23 canonical colibri-* skills discovered on disk

skill_get, skill_reload, and the rest of the ε hot-reload surface are deferred to Phase 1. Phase 0 ships a read-only discovery path plus an in-memory capability index (P0.6.3, Wave H — closes the ε axis).

System Health — 2 tools

server_ping — Minimal <100 ms stdio round-trip
server_health — returns a 6-field payload (status, version, uptime_ms, db_tables, phase, mode) covering liveness + runtime mode + DB schema coverage. Authoritative description in docs/2-plugin/health.md. Absorbs what the R74.5 plan called server_info.

Not in Phase 0 (donor-era, listed so you don’t look for them): task_transition (merged into task_update), task_delete, task_depends_on (deferred), audit_session_end (merged into merkle_finalize), server_info / server_shutdown (phantom tools in the R74.5 plan, never implemented, being struck from docs), agent_spawn, agent_status, agent_list, skill_get, skill_reload, task_create_batch, task_deps, task_eisenhower, task_report, task_critical_path, roadmap_* (12 variants), memory_* (12 variants), context_* (7 variants), analysis_rag_*, thought_plan, thought_decide, merkle_proof, merkle_verify. All deferred; none are registered in Phase 0.

A typical session (what you will do)

Here is the flow a Claude session follows against the live Phase 0 surface (14 tools):

server_ping                                         # stdio is live
server_health                                       # DB open, middleware registered, tools registered
task_next_actions { limit: 5 }                      # find the next unblocked task
audit_session_start { intent: "..." }               # open a proof-grade session
task_update { task_id, status: "GATHER" }           # move the task forward (FSM-enforced)
[executor does audit → contract → packet → implement → verify]
thought_record { thought_type: "decision", content: "..." }
thought_record { thought_type: "analysis", content: "..." }
task_update { task_id, progress: 100, status: "DONE" }
thought_record { thought_type: "reflection", content: "task_id / branch / commit / tests / summary / blockers" }
audit_verify_chain { session_id }
merkle_finalize { session_id }                     # MUST come after the final reflection; also closes the session
merkle_root { session_id }                         # proof of work

Load-bearing ordering rule: the final thought_record { reflection } MUST precede merkle_finalize. Otherwise the reflection is not anchored in the Merkle root. See CLAUDE.md §7 and writeback-protocol.md.

State transition rule: you move tasks via task_update, passing a status field. The state machine (src/domains/tasks/state-machine.ts) enforces legal transitions — illegal jumps (e.g. INIT → DONE) return ERR_INVALID_TRANSITION. The R74.5 plan had a separate task_transition tool; during P0.3.4 implementation the two were merged. The 5-step executor chain (audit → contract → packet → implement → verify, CLAUDE.md §6) maps 1:1 onto the β FSM states GATHER → ANALYZE → PLAN → APPLY → VERIFY.

What makes Colibri different

Execution is formal. Tasks move through an 8-state FSM enforced in middleware, not free-form strings on a to-do list. Illegal jumps (e.g. INIT → DONE) are rejected with a 400 at the contract layer.
Decisions are cryptographic. Every thought_record is SHA-256 chained to the previous row in the session; audit_verify_chain walks the chain and fails on any tampering. The final merkle_root is the commitment.
Sub-agents are contract-bound. Phase 0 dispatches sub-agents via the host Task tool (Claude Code’s built-in Agent/Task dispatch) into isolated .worktrees/claude/<task-slug> worktrees — the MCP agent_spawn family is deferred to Phase 1.5 per ADR-005 §Decision. Writeback ownership depends on the dispatch case: a T3 executor dispatched by PM owns its own writeback (task_update { status: "DONE" } — which routes through the β state-machine at src/domains/tasks/state-machine.ts — plus thought_record { reflection }), while a leaf helper an executor spawns for bounded research/search does NOT call writeback — its parent writes back on its behalf per writeback-protocol.md line 16. And DONE is not convention: the β pipeline hard-blocks the transition at src/domains/tasks/writeback.ts:97 with ERR_WRITEBACK_REQUIRED when no thought_record exists for the task. No silent failures, no ghost work.

Bottom line: agentic work gets memory, proof, and accountability — not just results.

Three paths into the docs

1. “I want to run the server” (engineers)

Read in order:

Task Breakdown (Phase 0) — start with P0.1 setup
Task Prompts — per-task copy-paste prompts
Extractions Index — algorithm reference (pseudocode)

2. “I want to understand the architecture” (architects)

Read in order:

World Schema — the organizational spine across all 15 Greek concepts
α System Core (boot) — entry point (execution axis)
2 — Plugin index — how pieces fit together
5 — Time: round — round → wave → task orchestration
ADR-004 Tool Surface — why 14 tools (R74.5 originally planned 19; Wave H amendment reconciled the count to shipped reality)
ADR-005 Multi-Model Router Phase — why δ Phase 0 ships as library stubs and full routing lands Phase 1.5

3. “I want to see examples” (operators/users)

Read:

Agent Bootstrap — master bootstrap prompt for cold Claude sessions
Writeback Protocol — the ordering rule, worked examples
Glossary — every term explained

Next steps

Read colibri-system.md — the canonical vision (single source of truth).
Read CLAUDE.md — the four-tier agent hierarchy, the worktree rule, the writeback protocol.
Pick one of the three paths above and dig into how.
When Phase 0 starts, open implementation/task-breakdown.md and pick a P0.x task. Create a feature worktree (git worktree add .worktrees/claude/<task-slug> -b feature/<task-slug> origin/main) — never edit the main checkout.

FAQs

Q: Can I run this today? A: Yes — Phase 0 is 100% on non-deferred tasks (28/28). The MCP server boots over stdio at a22dd23e and registers 14 tools. Configure a stdio client (e.g. Claude Desktop, or the .vscode/mcp-settings.example.json) to launch node dist/server.js.

Q: When will Phase 0 be done? A: It is done. As of R75 Wave I, Phase 0 is 28/28 — P0.6.3 (ε capability index) closed the ε axis in Wave H, and P0.5.1 + P0.5.2 shipped as δ library-only stubs in Wave I per ADR-005 §Decision. Next round opens the Phase 0 seal + Phase 1 planning scope. See implementation/task-breakdown.md.

Q: Do I need TypeScript? A: Yes. Stack is TypeScript 5.3+ (ESM, NodeNext), @modelcontextprotocol/sdk, better-sqlite3, Zod v3.23, merkletreejs, gray-matter, Jest (ESM). Chevrotain is spec-only for κ (Phase 1+).

Q: What’s the database? A: SQLite via better-sqlite3. Path: data/colibri.db — created at runtime in WAL mode, single-writer (P0.2.2 shipped). Schema is declared in migrations under src/db/migrations/001_init.sql through 006_eta.sql; src/db/schema.sql is a reference asset (not executed). The legacy data/ams.db is heritage — kept only as the donor task store and writeback target through the Phase 0 bootstrap.

Q: Environment variables? A: Only the COLIBRI_* namespace is read by Phase 0 code (the AMS_* prefix is rejected). COLIBRI_MODE selects one of the four runtime modes (FULL, READONLY, TEST, MINIMAL). ANTHROPIC_API_KEY (vendor-canonical name) is optional and validated at call-time by the ν Claude API wrappers, not at startup — the server boots cleanly when the key is unset.

Q: Can I extend it? A: Yes, with caveats. Custom skills are prose playbooks you drop into .agents/skills/ — see ε Skill Registry. Custom MCP tools and domain extensions must wait for Phase 1 — Phase 0 locks the shipped 14-tool surface.

For a deeper tour, start with colibri-system.md, then walk the Greek concepts via world-schema.md.