Quick Start — Try Colibri

Status: Phase 0 at 100% on non-deferred tasks (28/28 shipped as of R75 Wave I — 2026-04-18). P0.5.1/P0.5.2 shipped as δ library-only stubs per ADR-005 §Decision (PR #149 scoring, PR #150 fallback); full multi-model routing lands Phase 1.5. The MCP server boots over stdio and registers 14 tools at a22dd23e. This page describes the live Phase 0 surface.


Five-minute orientation

Colibri = a TypeScript MCP server (stdio) that:

  • Executes tasks through the formal 8-state β FSM: INIT → GATHER → ANALYZE → PLAN → APPLY → VERIFY → DONE, plus CANCELLED from any state. Enforced in the β state machine (src/domains/tasks/state-machine.ts) — task_update accepts a status field and routes it through the FSM; illegal transitions are rejected.
  • Runs on Claude only in Phase 0. The δ model-router ships as library-only stubs in Phase 0 per ADR-005 §Decision — constant scoring (always Claude) + single-member fallback chain (Claude). Full multi-model scoring, N-member fallback, and circuit breaker land in Phase 1.5.
  • Records every decision as a hash-chained ζ audit trail — SHA-256 linked thought_records verified by audit_verify_chain.
  • Seals work into a Merkle η proof storemerkle_finalize builds the tree, merkle_root returns the root hash.
  • Does not spawn sub-agents via MCP. Sub-agents in Phase 0 are spawned with the Task tool (Claude’s built-in sub-agent dispatch) into .worktrees/claude/<task-slug> feature worktrees. The donor agent_spawn / agent_status / agent_list tools and the entire src/domains/agents/ target are deferred to Phase 1.5 per ADR-005.

For whom? Teams running multi-step agentic workflows who need accountability and memory. Not a chatbot. Highly opinionated orchestration runtime.

What ships in Phase 0? 14 MCP tools over stdio (R74.5 planned 19; 5 were closed/struck/deferred during implementation and 1 was added). See the next section.


What Phase 0 delivers — the 14-tool shipped surface

After Phase 0 Waves A–I (28/28 tasks shipped; 100% on non-deferred work), the server exposes these 14 tools across five concept letters. ADR-004 (R74.5 originally planned 19; R75 Wave H amendment reconciles the count to what actually shipped; Wave I did not add tools — δ stubs are library-only).

β Task Pipeline — 5 tools

  • task_create — Create a task (returns task_id, initial state INIT)
  • task_list — List tasks with filters (status, priority, owner, tag) and pagination
  • task_get — Get a single task by id with full fields
  • task_update — Partial update of mutable fields (description, priority, progress, owner, tags). Accepts status and routes transitions through the β state machine (src/domains/tasks/state-machine.ts) — illegal transitions rejected with ERR_INVALID_TRANSITION. No separate task_transition tool exists in Phase 0 (merged during P0.3.4).
  • task_next_actions — Return unblocked tasks in priority order

ζ Decision Trail — 4 tools (axis closed in Wave G)

  • audit_session_start — Open a proof-grade session (returns session_id)
  • thought_record — Append a hash-chained decision row (thought_type: plan | decision | analysis | reflection | …)
  • thought_record_list — Read the thought chain for a session
  • audit_verify_chain — Verify the SHA-256 chain from session start to tip (shipped Wave G, P0.7.3)

η Proof Store — 2 tools (axis complete in Wave F)

  • merkle_finalize — Build the Merkle tree over the session’s thought records (also serves as the session-close signal — no separate audit_session_end tool in Phase 0)
  • merkle_root — Return the finalized root hash + metadata

ε Skill Registry — 1 tool in Phase 0

  • skill_list — List the 23 canonical colibri-* skills discovered on disk

skill_get, skill_reload, and the rest of the ε hot-reload surface are deferred to Phase 1. Phase 0 ships a read-only discovery path plus an in-memory capability index (P0.6.3, Wave H — closes the ε axis).

System Health — 2 tools

  • server_ping — Minimal <100 ms stdio round-trip
  • server_health — returns a 6-field payload (status, version, uptime_ms, db_tables, phase, mode) covering liveness + runtime mode + DB schema coverage. Authoritative description in docs/2-plugin/health.md. Absorbs what the R74.5 plan called server_info.

Not in Phase 0 (donor-era, listed so you don’t look for them): task_transition (merged into task_update), task_delete, task_depends_on (deferred), audit_session_end (merged into merkle_finalize), server_info / server_shutdown (phantom tools in the R74.5 plan, never implemented, being struck from docs), agent_spawn, agent_status, agent_list, skill_get, skill_reload, task_create_batch, task_deps, task_eisenhower, task_report, task_critical_path, roadmap_* (12 variants), memory_* (12 variants), context_* (7 variants), analysis_rag_*, thought_plan, thought_decide, merkle_proof, merkle_verify. All deferred; none are registered in Phase 0.


A typical session (what you will do)

Here is the flow a Claude session follows against the live Phase 0 surface (14 tools):

1. server_ping                                         # stdio is live
2. server_health                                       # DB open, middleware registered, tools registered
3. task_next_actions { limit: 5 }                      # find the next unblocked task
4. audit_session_start { intent: "..." }               # open a proof-grade session
5. task_update { task_id, status: "GATHER" }           # move the task forward (FSM-enforced)
6. [executor does audit → contract → packet → implement → verify]
7. thought_record { thought_type: "decision", content: "..." }
8. thought_record { thought_type: "analysis", content: "..." }
9. task_update { task_id, progress: 100, status: "DONE" }
10. thought_record { thought_type: "reflection", content: "task_id / branch / commit / tests / summary / blockers" }
11. audit_verify_chain { session_id }
12. merkle_finalize { session_id }                     # MUST come after the final reflection; also closes the session
13. merkle_root { session_id }                         # proof of work

Load-bearing ordering rule: the final thought_record { reflection } MUST precede merkle_finalize. Otherwise the reflection is not anchored in the Merkle root. See CLAUDE.md §7 and writeback-protocol.md.

State transition rule: you move tasks via task_update, passing a status field. The state machine (src/domains/tasks/state-machine.ts) enforces legal transitions — illegal jumps (e.g. INIT → DONE) return ERR_INVALID_TRANSITION. The R74.5 plan had a separate task_transition tool; during P0.3.4 implementation the two were merged. The 5-step executor chain (audit → contract → packet → implement → verify, CLAUDE.md §6) maps 1:1 onto the β FSM states GATHER → ANALYZE → PLAN → APPLY → VERIFY.


What makes Colibri different

  1. Execution is formal. Tasks move through an 8-state FSM enforced in middleware, not free-form strings on a to-do list. Illegal jumps (e.g. INIT → DONE) are rejected with a 400 at the contract layer.
  2. Decisions are cryptographic. Every thought_record is SHA-256 chained to the previous row in the session; audit_verify_chain walks the chain and fails on any tampering. The final merkle_root is the commitment.
  3. Sub-agents are contract-bound. Phase 0 dispatches sub-agents via the host Task tool (Claude Code’s built-in Agent/Task dispatch) into isolated .worktrees/claude/<task-slug> worktrees — the MCP agent_spawn family is deferred to Phase 1.5 per ADR-005 §Decision. Writeback ownership depends on the dispatch case: a T3 executor dispatched by PM owns its own writeback (task_update { status: "DONE" } — which routes through the β state-machine at src/domains/tasks/state-machine.ts — plus thought_record { reflection }), while a leaf helper an executor spawns for bounded research/search does NOT call writeback — its parent writes back on its behalf per writeback-protocol.md line 16. And DONE is not convention: the β pipeline hard-blocks the transition at src/domains/tasks/writeback.ts:97 with ERR_WRITEBACK_REQUIRED when no thought_record exists for the task. No silent failures, no ghost work.

Bottom line: agentic work gets memory, proof, and accountability — not just results.


Three paths into the docs

1. “I want to run the server” (engineers)

Read in order:

2. “I want to understand the architecture” (architects)

Read in order:

3. “I want to see examples” (operators/users)

Read:


Next steps

  1. Read colibri-system.md — the canonical vision (single source of truth).
  2. Read CLAUDE.md — the four-tier agent hierarchy, the worktree rule, the writeback protocol.
  3. Pick one of the three paths above and dig into how.
  4. When Phase 0 starts, open implementation/task-breakdown.md and pick a P0.x task. Create a feature worktree (git worktree add .worktrees/claude/<task-slug> -b feature/<task-slug> origin/main) — never edit the main checkout.

FAQs

Q: Can I run this today? A: Yes — Phase 0 is 100% on non-deferred tasks (28/28). The MCP server boots over stdio at a22dd23e and registers 14 tools. Configure a stdio client (e.g. Claude Desktop, or the .vscode/mcp-settings.example.json) to launch node dist/server.js.

Q: When will Phase 0 be done? A: It is done. As of R75 Wave I, Phase 0 is 28/28 — P0.6.3 (ε capability index) closed the ε axis in Wave H, and P0.5.1 + P0.5.2 shipped as δ library-only stubs in Wave I per ADR-005 §Decision. Next round opens the Phase 0 seal + Phase 1 planning scope. See implementation/task-breakdown.md.

Q: Do I need TypeScript? A: Yes. Stack is TypeScript 5.3+ (ESM, NodeNext), @modelcontextprotocol/sdk, better-sqlite3, Zod v3.23, merkletreejs, gray-matter, Jest (ESM). Chevrotain is spec-only for κ (Phase 1+).

Q: What’s the database? A: SQLite via better-sqlite3. Path: data/colibri.db — created at runtime in WAL mode, single-writer (P0.2.2 shipped). Schema is declared in migrations under src/db/migrations/001_init.sql through 006_eta.sql; src/db/schema.sql is a reference asset (not executed). The legacy data/ams.db is heritage — kept only as the donor task store and writeback target through the Phase 0 bootstrap.

Q: Environment variables? A: Only the COLIBRI_* namespace is read by Phase 0 code (the AMS_* prefix is rejected). COLIBRI_MODE selects one of the four runtime modes (FULL, READONLY, TEST, MINIMAL). ANTHROPIC_API_KEY (vendor-canonical name) is optional and validated at call-time by the ν Claude API wrappers, not at startup — the server boots cleanly when the key is unset.

Q: Can I extend it? A: Yes, with caveats. Custom skills are prose playbooks you drop into .agents/skills/ — see ε Skill Registry. Custom MCP tools and domain extensions must wait for Phase 1 — Phase 0 locks the shipped 14-tool surface.


For a deeper tour, start with colibri-system.md, then walk the Greek concepts via world-schema.md.


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.