Self-Build Loop: How Colibri Builds Colibri
⚠ HORIZON DOCUMENT — the
agent_spawn/skill_getloop is NOT in Phase 0This document describes a Phase 1.5+ horizon where Colibri agents self-build via a persistent agent runtime,
agent_spawn/agent_status/agent_listMCP tools, and anskill_get/skill_invokehot-reload surface. Phase 0 does not ship any of those tools. The Phase 0 self-build loop is simpler and human-supervised:
- Sub-agent dispatch: PM (T2) spawns executors (T3) via the Task tool (Claude’s built-in sub-agent dispatch), not via an
agent_spawnMCP call. Sub-agents run the 5-step chain (audit → contract → packet → implement → verify) inside a feature worktree and exit after writeback.- Skill discovery: PM reads skills from the file system (
.agents/skills/*/SKILL.md) and passes the skill name/contents into the Task-tool prompt. The only Phase 0 ε tool isskill_list(read-only). There is noskill_get, noskill_invoke, no hot-reload.- No persistent agent pool, no
agent_*tools. Every reference below toagent_spawn,agent_status,agent_list, orsrc/domains/agents/describes a Phase 1.5 target that requires the δ Model Router. δ is deferred per ADR-005.- Phase 0 self-build is manual-supervised: human runs
colibri-pmskill → PM reads task-breakdown.md → dispatches Task-tool sub-agents → sub-agents writeback (task_update+thought_record+audit_verify_chain→thought_record→merkle_finalize→merkle_root→audit_session_end) → human reviews PR via GitHub Desktop (because thegitCLI is broken onE:\).Read this file as the Phase 1.5+ dogfooding target, not Phase 0. Phase 0 dogfoods via Task-tool sub-agents supervised by humans.
What This Document Is
This specification describes dogfooding: how Colibri agents will build Colibri itself using Colibri’s own tools. It’s a three-phase evolution from manual bootstrap to full autonomous self-build.
Three Phases of Self-Build
Phase A: Manual Bootstrap (Days 1–7, PR-1 through PR-7)
Status: Human or external agent manually ships PR-1 through PR-7.
- No Colibri runtime yet. No MCP tools.
- Agents read
docs/guides/implementation/pre-mcp-bootstrap-sequence.mdinstead of standard startup. - Writeback is manual: agent creates
docs/packets/{task-id}-packet.mdwith completion record. - Task state is tracked in a human-editable
docs/guides/implementation/PHASE-0-PROGRESS.mdfile. - Human reviews and merges each PR.
- Cycle time: ~1–2 days per PR.
Agents used:
colibri-bootstrap-agent(one agent per PR, manually spawned)colibri-reviewer-agent(PR review, manually spawned)
Loop: Manual → PR → Review → Merge → Next Task Manual Assignment
Phase B: Hybrid (Days 8–21, P0.3.2 through P0.9)
Status: Colibri MCP server is up. task_create, task_update, thought_record tools work. Agents can self-discover unblocked tasks. But humans still spawn agents manually.
- MCP server is running (P0.2.1–P0.2.3 complete).
- Agents use
task_next_actionstool to find unblocked tasks (no more manual assignment). - Agents use
task_updateandthought_recordtools for writeback (no more manual packet files). - Merkle proofs and audit trail work (P0.7, P0.8 tasks).
- Multiple agents run in parallel across different tasks.
- Human still manually spawns agents via
agent_spawntool or direct CLI. - Humans review and merge PRs.
- Cycle time: ~4–8h per task (tasks run in parallel).
Agents used:
colibri-executor-*(N parallel executors, each picks a task)colibri-reviewer(PR review, automatically spawned after PR is created)doc-loop-agent(updates PHASE-0-PROGRESS.md and identifies next tasks)
Loop: Pick (task_next_actions) → Read (task spec) → Implement → Verify → task_update + thought_record → PR → Auto-Review → Merge → Identify Next Unblocked Tasks
Phase C: Full Dogfooding (Day 22+, after PR-7)
Status: Colibri runtime fully operational. colibri-pm orchestrator agent reads task-breakdown.md, picks unblocked tasks, spawns sub-agents using agent_spawn, enforces writeback, finalizes Merkle proofs. Humans only review PRs.
colibri-pmagent is the coordinator.- Every 2 hours (configurable),
colibri-pm:- Reads
task-breakdown.mdand task state from database - Identifies all unblocked
todotasks - Spawns sub-agents for high-priority unblocked tasks using
agent_spawntool - Sub-agents execute tasks, produce writeback (task_update + thought_record)
colibri-pmcollects PRs and enforces writeback contract- Blocked PRs (missing writeback) are held in review; human or
writeback-enforcement-agentflags the issue - Once writeback is complete,
colibri-pmapproves the PR - After merge,
doc-loop-agentupdates task state and identifies new unblocked tasks - Next cycle begins
- Reads
- Humans are out of the execution loop. They only approve/veto completed work in PR review.
- Cycle time: ~2 hours to identify + spawn next batch of tasks.
Agents used:
colibri-pm(orchestrator, runs on timer)colibri-executor-*(N sub-agents, spawned by PM)colibri-reviewer(PR review, autonomous)writeback-enforcement-agent(flags PRs missing writeback)doc-loop-agent(task state update, identifies next tasks)
Loop: (Automated) Pick (task_next_actions + spawn) → Execute → Writeback → PR → Auto-Review → Enforce Writeback → Merge → Identify Next → Spawn Next
The Self-Build Loop: Step-by-Step
This cycle repeats every ~2–4 hours during Phase C:
Step 1: Pick (task_next_actions)
Agent: colibri-pm
Tools: task_next_actions, task_list
colibri-pm calls task_next_actions()
→ Returns unblocked tasks in priority order
Example: [P0.3.2 (Task CRUD), P0.5.1 (Intent Scoring), P0.7.1 (Hash Chain)]
colibri-pm picks top N tasks (N = parallelism level, typically 2–4)
Docs referenced:
docs/guides/implementation/task-dependency-graph.md— critical path decisionsdocs/guides/implementation/task-breakdown.md§Dependencies
Step 2: Read (task spec + extraction file)
Agent: Sub-agent (spawned by PM)
Tools: (file system / skill_get)
Sub-agent reads:
1. docs/guides/implementation/task-breakdown.md §{task-id}
2. docs/reference/extractions/{greek-letter}-*.md (algorithm pseudocode)
3. docs/guides/implementation/task-prompts/{task-id}-prompt.md (if exists)
Example for P0.3.2:
1. task-breakdown.md §P0.3.2
2. beta-task-pipeline-extraction.md (CRUD section)
3. task-prompts/P0.3.2-task-crud-prompt.md
Docs referenced:
docs/guides/implementation/PHASE-0-EXECUTION-GUIDE.md§Task Execution Protocol (5-step chain)docs/reference/extractions/(all 49 extraction files)
Step 3: Implement (execute 5-step chain)
Agent: Sub-agent
Steps: Audit → Contract → Packet → Implementation → Verification
Step 3a: Audit
- Read input files
- Create docs/audits/{task-id}-audit.md
- Commit: audit(P0.x.x): inventory baseline
Step 3b: Contract
- Define public API, invariants, error contracts
- Create docs/contracts/{task-id}-contract.md
- Commit: contract(P0.x.x): define behavioral contract
Step 3c: Packet
- List files, functions, test cases
- Create docs/packets/{task-id}-packet.md
- Commit: packet(P0.x.x): execution plan
Step 3d: Implementation
- Follow the packet exactly
- Create src/* files in worktree
- Commit: feat(P0.x.x): {description}
Step 3e: Verification
- npm test && npm run lint
- Check acceptance criteria
- Commit: verify(P0.x.x): all acceptance criteria pass
Worktree created by agent:
git fetch origin
git worktree add .worktrees/claude/{task-slug} -b feature/{task-slug} origin/main
cd .worktrees/claude/{task-slug}
# (execute 5 steps)
Docs referenced:
docs/guides/implementation/PHASE-0-EXECUTION-GUIDE.md§Task Execution Protocol- Task-specific extraction files
Step 4: Verify (test gate)
Agent: Sub-agent (or autonomous verifier)
Tools: (npm test, npm run lint, npm run build)
Before claiming completion:
npm test → all tests pass
npm run lint → zero errors
npm run build → compiles successfully
If any step fails:
- Fix the failure (don't skip)
- Re-run tests
- Commit fix: fix(P0.x.x): {description of fix}
- Return to Step 4 (test gate again)
If all pass:
- Continue to Step 5 (Writeback)
Step 5: Writeback (task_update + thought_record)
Agent: Sub-agent
Tools: task_update, thought_record, audit_session_start (if proof-grade), merkle_finalize (if proof-grade)
Call 1: task_update
task_update(
task_id="P0.3.2",
status="done",
progress=100
)
Call 2: thought_record
thought_record(
task_id="P0.3.2",
branch="feature/P0.3.2-task-crud",
commit_sha="abc1234def5678...",
tests_run="12 passed, 0 failed",
summary="Task CRUD implementation complete. createTask, getTask, updateTask, deleteTask, listTasks all implemented with prepared statements. No string interpolation. Full roundtrip test passes.",
blockers="none"
)
If proof-grade (critical tasks like writeback enforcement):
Call 3: audit_session_start
audit_session_start(task_id="P0.3.2")
→ returns session_id
Call 4: audit_verify_chain
audit_verify_chain(session_id)
→ {valid: true, first_broken_at: null, broken_count: 0}
Call 5: merkle_finalize
merkle_finalize(session_id)
→ {root: "sha256:...", record_count: 1, timestamp}
Writeback contract: Every task MUST produce task_update + thought_record. Proof-grade work MUST also run audit_session_start → audit_verify_chain → merkle_finalize.
Docs referenced:
docs/guides/implementation/PHASE-0-EXECUTION-GUIDE.md§WritebackCLAUDE.md§Writeback
Step 6: Create PR (push branch, create PR)
Agent: Sub-agent
Tools: (git push, GitHub CLI, or MCP GitHub integration)
git push -u origin feature/P0.3.2-task-crud
# Create PR via gh CLI or MCP GitHub tool
gh pr create --title "PR-N: P0.3.2 Task CRUD" --body "..."
→ PR #42 created
Step 7: Auto-Review (PR review)
Agent: colibri-reviewer (autonomous)
Tools: PR review (approval, request changes, or block)
colibri-reviewer:
1. Reads the PR diff
2. Reads acceptance criteria from task-breakdown.md
3. Checks:
- Did tests pass? (CI artifact)
- Did lint pass? (CI artifact)
- Did build pass? (CI artifact)
- Is writeback present? (thought_record in git log or task DB)
- Are acceptance criteria met? (manual check or diff scan)
4. If all pass: Approve PR
If any fail: Request changes (with specific failure details)
Step 8: Enforce Writeback
Agent: writeback-enforcement-agent (autonomous)
Tools: PR check, task status query
writeback-enforcement-agent:
1. Monitors PRs waiting for merge
2. For each PR, checks: is there a thought_record?
3. If NO: Block PR merge, add comment:
"WRITEBACK REQUIRED: task_update(status=done) + thought_record(...) not found.
PR cannot merge until writeback is recorded."
4. If YES: Remove block, clear comment
5. PM can now merge
Step 9: Merge
Agent: colibri-pm (or human reviewer)
Tools: PR merge
colibri-pm (or human):
1. Checks: approved by colibri-reviewer
2. Checks: writeback is present
3. Merges PR to main
4. Watches merge completion
Step 10: Identify Next Unblocked Tasks
Agent: doc-loop-agent
Tools: task_next_actions, task dependency graph
doc-loop-agent (after merge):
1. Reads task-breakdown.md §Dependencies
2. Checks which tasks are now unblocked by the merged task
Example: P0.3.2 merged → P0.3.3 is now unblocked
3. Updates PHASE-0-PROGRESS.md or task DB
4. Logs: "P0.3.2 ✓ complete. P0.3.3, P0.3.4 now unblocked."
5. Next cycle: colibri-pm picks P0.3.3 and/or P0.3.4
Step 11: Spawn Next Batch
Agent: colibri-pm
Tools: agent_spawn, task_create
colibri-pm (beginning of next cycle):
1. Calls task_next_actions() → [P0.3.3, P0.3.4, P0.4.1, ...]
2. Picks top N unblocked tasks
3. For each task, calls agent_spawn:
agent_spawn(
skill_name="colibri-executor",
prompt="{bootstrap prompt for task_id}"
)
4. Returns: agent_id, worktree_path, pid
5. Logs: "Spawned 4 executors for P0.3.3, P0.3.4, P0.4.1, P0.4.2"
6. Loop repeats every 2–4 hours
Skill Catalog: Skills Used in Self-Build
The 22 .agents/skills/colibri-* skills map to the loop steps:
| Skill | Purpose | Loop Step(s) | Phase |
|---|---|---|---|
| colibri-bootstrap-agent | P0.1.1–P0.1.2 bootstrap | 2–5 | A |
| colibri-executor | General task execution | 2–5 | B, C |
| colibri-reviewer | PR review + acceptance criteria check | 7 | B, C |
| colibri-pm | Orchestration, pick tasks, spawn agents | 1, 11 | B, C |
| doc-loop-agent | Task state update, identify next tasks | 10 | B, C |
| writeback-enforcement-agent | Block PRs missing writeback | 8 | B, C |
| colibri-auditor | Proof-grade work audit | 3e, 5 (proof) | C |
| alpha-executor | α (System Core) specialist | 2–5 | B, C |
| beta-executor | β (Task Pipeline) specialist | 2–5 | B, C |
| gamma-executor | γ (Server Lifecycle) specialist | 2–5 | B, C |
| delta-executor | δ (Model Router) specialist | 2–5 | B, C |
| epsilon-executor | ε (Skill Registry) specialist | 2–5 | B, C |
| zeta-executor | ζ (Decision Trail) specialist | 2–5 | B, C |
| eta-executor | η (Proof Store) specialist | 2–5 | B, C |
| nu-executor | ν (Integrations) specialist | 2–5 | B, C |
| (6 more specialists) | Phase 1–2 task execution | 2–5 | C |
Writeback Contract (Non-Negotiable)
Required for All Tasks
Every completed task MUST produce:
task_update(
task_id="P0.x.x",
status="done",
progress=100
)
thought_record(
task_id="P0.x.x",
branch="feature/P0.x.x-{slug}",
commit_sha="{full 40-char SHA}",
tests_run="{X passed, 0 failed}",
summary="{1–2 sentence summary of what was built}",
blockers="none" or "{list of blockers}"
)
Required for Proof-Grade Work (Critical Tasks)
For high-stakes tasks (writeback enforcement, Merkle finalization, consensus proofs):
audit_session_start(task_id="P0.x.x") → session_id
# (work happens)
audit_verify_chain(session_id) → {valid: true, ...}
thought_record(...) # as above
merkle_finalize(session_id) → {root: "sha256:...", ...}
merkle_root(session_id) → {root: "...", record_count: N, timestamp: T}
Enforcement
- No task update without thought record: PR cannot merge without writeback
- No off-chain work: All work must be recorded in git commits + thought records
- No skipping tests:
npm test && npm run lintmust pass before writeback
Failure Modes and Recovery
Failure Mode 1: Agent Ships Task Without Writeback
Problem: Agent created PR but didn’t call task_update + thought_record.
Detection: writeback-enforcement-agent blocks the PR.
Recovery:
PR is blocked with message: "WRITEBACK REQUIRED"
Options:
A. Original agent calls thought_record retroactively
B. Human manually pushes a commit with thought_record call
C. New agent is spawned to add writeback
D. PR is closed and reverted (if critical failure)
Failure Mode 2: Agent Picks Blocked Task
Problem: Agent calls task_next_actions, but somehow picks a task that isn’t actually unblocked (e.g., a task blocked by another task).
Detection: Task creation fails: task_create or task_update returns { error: "task_blocked_by_P0.x.x" }.
Recovery:
Agent encounters error immediately.
Agent reads the error, identifies the blocking task.
Agent reports: "I picked P0.3.3 but it's blocked by P0.3.2. Please complete P0.3.2 first."
colibri-pm or human confirms: P0.3.2 is still in_progress.
Agent is reassigned a different unblocked task.
This should be rare because task_next_actions filters out blocked tasks.
Failure Mode 3: Two Agents Pick Same Task
Problem: Two agents simultaneously call task_next_actions, both get the same task, both create worktrees with the same branch name.
Detection: First agent pushes branch successfully. Second agent’s git push fails: [rejected] feature/P0.3.2-... (cannot push branch that already exists).
Recovery:
Second agent:
1. Pulls latest: git fetch origin
2. Reads the remote branch: the first agent already owns it
3. Agent self-reports: "Task P0.3.2 is being worked on by Agent A (PID 12345). Releasing this task."
4. Agent calls task_next_actions again → gets a different task
5. Agent proceeds with new task
Failure Mode 4: Tests Fail Mid-Task
Problem: Agent runs npm test && npm run lint and tests fail. Agent is stuck in Step 4 (Verify).
Detection: Test output shows failures.
Recovery:
Agent:
1. Reads test failure output
2. Identifies which test(s) failed
3. Fixes the implementation
4. Commits fix: fix(P0.x.x): {description of fix}
5. Runs npm test again
6. If pass: continue to Step 5 (Writeback)
7. If fail: repeat until pass (agent has time budget; if exceeded, mark task blocked)
Escalation: If agent fixes fail 3 times, task is marked blocked and agent reports to colibri-pm with detailed blockers.
Why This Is Real, Not Vaporware
-
MCP server exists after PR-7. The
src/server.tsfrom P0.2.1 is a real, running process. Agents can calltask_next_actionsand other tools; they don’t need to read from git branches anymore. -
Writeback contract is enforced by code. In P0.3.3 (Writeback Contract Enforcement), a
WritebackRequiredErroris thrown at runtime if a task moves todonewithout a thought_record. This is not a soft recommendation; it’s a compile-time guarantee. -
All 22 skills exist in
.agents/skills/. These are real, version-controlled skill files with SKILL.md frontmatter. The PM agent can callskill_getandskill_listto discover and spawn them. -
All 28 Phase 0 tasks are fully specified. Every task has acceptance criteria, input/output files, and effort estimates. There’s no guesswork—agents execute a machine-readable spec.
-
Task dependency graph is explicit. Every task’s dependencies are documented in
task-breakdown.md.task_next_actionscan compute unblocked tasks deterministically from this graph. -
Audit trail is cryptographic. Each thought_record is SHA-256 hashed and chained (P0.7.1–P0.7.3). Merkle proofs can verify task lineage (P0.8.1–P0.8.3). This is real proof-of-work, not simulation.
-
Colibri is bootstrapped after PR-7. By definition, once the state machine (P0.3.1) works, the task pipeline (P0.3.2–P0.3.4) works, and the MCP server (P0.2.1–P0.2.3) is live, the system can execute its own tasks. It’s self-referential but not circular—it’s bootstrapped.
The Virtuous Cycle
Once Phase C begins:
- Colibri builds more of itself. Each completed task unblocks 1–3 downstream tasks.
- Coverage grows. Complexity grows. Agents learn. As more tasks ship, more capabilities (router, skills, proofs) become available. Agents get better at understanding the codebase.
- Human oversight remains. Humans approve/veto PRs. They can halt any task by refusing to merge.
- Self-build accelerates. By day 30, if the loop is working, 4–6 agents are running in parallel. Cycle time goes from 2–4 hours per task to 30–60 minutes per task.
- Phase 1 ships. Once Phase 0 is complete (~day 35), Phase 1 (κ Rule Engine) begins. Colibri is now a rule engine + orchestrator. Agents build κ using the same loop.
See Also
- [[./first-7-prs.md]] — First 7 PRs that bootstrap the loop
- [[./PHASE-0-EXECUTION-GUIDE.md]] — Phase 0 full roadmap
- [[./task-breakdown.md]] — 28 tasks + dependency graph
- [[../agent-bootstrap.md]] — Agent bootstrap prompt
- [[../../CLAUDE.md]] — Worktree rules, writeback protocol
- [[../../colibri-master-context.md]] — Full system context