Self-Build Loop: How Colibri Builds Colibri

⚠ HORIZON DOCUMENT — the agent_spawn / skill_get loop is NOT in Phase 0

This document describes a Phase 1.5+ horizon where Colibri agents self-build via a persistent agent runtime, agent_spawn / agent_status / agent_list MCP tools, and an skill_get / skill_invoke hot-reload surface. Phase 0 does not ship any of those tools. The Phase 0 self-build loop is simpler and human-supervised:

  • Sub-agent dispatch: PM (T2) spawns executors (T3) via the Task tool (Claude’s built-in sub-agent dispatch), not via an agent_spawn MCP call. Sub-agents run the 5-step chain (audit → contract → packet → implement → verify) inside a feature worktree and exit after writeback.
  • Skill discovery: PM reads skills from the file system (.agents/skills/*/SKILL.md) and passes the skill name/contents into the Task-tool prompt. The only Phase 0 ε tool is skill_list (read-only). There is no skill_get, no skill_invoke, no hot-reload.
  • No persistent agent pool, no agent_* tools. Every reference below to agent_spawn, agent_status, agent_list, or src/domains/agents/ describes a Phase 1.5 target that requires the δ Model Router. δ is deferred per ADR-005.
  • Phase 0 self-build is manual-supervised: human runs colibri-pm skill → PM reads task-breakdown.md → dispatches Task-tool sub-agents → sub-agents writeback (task_update + thought_record + audit_verify_chainthought_recordmerkle_finalizemerkle_rootaudit_session_end) → human reviews PR via GitHub Desktop (because the git CLI is broken on E:\).

Read this file as the Phase 1.5+ dogfooding target, not Phase 0. Phase 0 dogfoods via Task-tool sub-agents supervised by humans.

What This Document Is

This specification describes dogfooding: how Colibri agents will build Colibri itself using Colibri’s own tools. It’s a three-phase evolution from manual bootstrap to full autonomous self-build.


Three Phases of Self-Build

Phase A: Manual Bootstrap (Days 1–7, PR-1 through PR-7)

Status: Human or external agent manually ships PR-1 through PR-7.

  • No Colibri runtime yet. No MCP tools.
  • Agents read docs/guides/implementation/pre-mcp-bootstrap-sequence.md instead of standard startup.
  • Writeback is manual: agent creates docs/packets/{task-id}-packet.md with completion record.
  • Task state is tracked in a human-editable docs/guides/implementation/PHASE-0-PROGRESS.md file.
  • Human reviews and merges each PR.
  • Cycle time: ~1–2 days per PR.

Agents used:

  • colibri-bootstrap-agent (one agent per PR, manually spawned)
  • colibri-reviewer-agent (PR review, manually spawned)

Loop: Manual → PR → Review → Merge → Next Task Manual Assignment


Phase B: Hybrid (Days 8–21, P0.3.2 through P0.9)

Status: Colibri MCP server is up. task_create, task_update, thought_record tools work. Agents can self-discover unblocked tasks. But humans still spawn agents manually.

  • MCP server is running (P0.2.1–P0.2.3 complete).
  • Agents use task_next_actions tool to find unblocked tasks (no more manual assignment).
  • Agents use task_update and thought_record tools for writeback (no more manual packet files).
  • Merkle proofs and audit trail work (P0.7, P0.8 tasks).
  • Multiple agents run in parallel across different tasks.
  • Human still manually spawns agents via agent_spawn tool or direct CLI.
  • Humans review and merge PRs.
  • Cycle time: ~4–8h per task (tasks run in parallel).

Agents used:

  • colibri-executor-* (N parallel executors, each picks a task)
  • colibri-reviewer (PR review, automatically spawned after PR is created)
  • doc-loop-agent (updates PHASE-0-PROGRESS.md and identifies next tasks)

Loop: Pick (task_next_actions) → Read (task spec) → Implement → Verify → task_update + thought_record → PR → Auto-Review → Merge → Identify Next Unblocked Tasks


Phase C: Full Dogfooding (Day 22+, after PR-7)

Status: Colibri runtime fully operational. colibri-pm orchestrator agent reads task-breakdown.md, picks unblocked tasks, spawns sub-agents using agent_spawn, enforces writeback, finalizes Merkle proofs. Humans only review PRs.

  • colibri-pm agent is the coordinator.
  • Every 2 hours (configurable), colibri-pm:
    1. Reads task-breakdown.md and task state from database
    2. Identifies all unblocked todo tasks
    3. Spawns sub-agents for high-priority unblocked tasks using agent_spawn tool
    4. Sub-agents execute tasks, produce writeback (task_update + thought_record)
    5. colibri-pm collects PRs and enforces writeback contract
    6. Blocked PRs (missing writeback) are held in review; human or writeback-enforcement-agent flags the issue
    7. Once writeback is complete, colibri-pm approves the PR
    8. After merge, doc-loop-agent updates task state and identifies new unblocked tasks
    9. Next cycle begins
  • Humans are out of the execution loop. They only approve/veto completed work in PR review.
  • Cycle time: ~2 hours to identify + spawn next batch of tasks.

Agents used:

  • colibri-pm (orchestrator, runs on timer)
  • colibri-executor-* (N sub-agents, spawned by PM)
  • colibri-reviewer (PR review, autonomous)
  • writeback-enforcement-agent (flags PRs missing writeback)
  • doc-loop-agent (task state update, identifies next tasks)

Loop: (Automated) Pick (task_next_actions + spawn) → Execute → Writeback → PR → Auto-Review → Enforce Writeback → Merge → Identify Next → Spawn Next


The Self-Build Loop: Step-by-Step

This cycle repeats every ~2–4 hours during Phase C:

Step 1: Pick (task_next_actions)

Agent: colibri-pm
Tools: task_next_actions, task_list

colibri-pm calls task_next_actions()
→ Returns unblocked tasks in priority order
Example: [P0.3.2 (Task CRUD), P0.5.1 (Intent Scoring), P0.7.1 (Hash Chain)]
colibri-pm picks top N tasks (N = parallelism level, typically 2–4)

Docs referenced:

  • docs/guides/implementation/task-dependency-graph.md — critical path decisions
  • docs/guides/implementation/task-breakdown.md §Dependencies

Step 2: Read (task spec + extraction file)

Agent: Sub-agent (spawned by PM)
Tools: (file system / skill_get)

Sub-agent reads:
  1. docs/guides/implementation/task-breakdown.md §{task-id}
  2. docs/reference/extractions/{greek-letter}-*.md (algorithm pseudocode)
  3. docs/guides/implementation/task-prompts/{task-id}-prompt.md (if exists)

Example for P0.3.2:
  1. task-breakdown.md §P0.3.2
  2. beta-task-pipeline-extraction.md (CRUD section)
  3. task-prompts/P0.3.2-task-crud-prompt.md

Docs referenced:

  • docs/guides/implementation/PHASE-0-EXECUTION-GUIDE.md §Task Execution Protocol (5-step chain)
  • docs/reference/extractions/ (all 49 extraction files)

Step 3: Implement (execute 5-step chain)

Agent: Sub-agent
Steps: Audit → Contract → Packet → Implementation → Verification

Step 3a: Audit
  - Read input files
  - Create docs/audits/{task-id}-audit.md
  - Commit: audit(P0.x.x): inventory baseline

Step 3b: Contract
  - Define public API, invariants, error contracts
  - Create docs/contracts/{task-id}-contract.md
  - Commit: contract(P0.x.x): define behavioral contract

Step 3c: Packet
  - List files, functions, test cases
  - Create docs/packets/{task-id}-packet.md
  - Commit: packet(P0.x.x): execution plan

Step 3d: Implementation
  - Follow the packet exactly
  - Create src/* files in worktree
  - Commit: feat(P0.x.x): {description}

Step 3e: Verification
  - npm test && npm run lint
  - Check acceptance criteria
  - Commit: verify(P0.x.x): all acceptance criteria pass

Worktree created by agent:

git fetch origin
git worktree add .worktrees/claude/{task-slug} -b feature/{task-slug} origin/main
cd .worktrees/claude/{task-slug}
# (execute 5 steps)

Docs referenced:

  • docs/guides/implementation/PHASE-0-EXECUTION-GUIDE.md §Task Execution Protocol
  • Task-specific extraction files

Step 4: Verify (test gate)

Agent: Sub-agent (or autonomous verifier)
Tools: (npm test, npm run lint, npm run build)

Before claiming completion:
  npm test          → all tests pass
  npm run lint      → zero errors
  npm run build     → compiles successfully

If any step fails:
  - Fix the failure (don't skip)
  - Re-run tests
  - Commit fix: fix(P0.x.x): {description of fix}
  - Return to Step 4 (test gate again)

If all pass:
  - Continue to Step 5 (Writeback)

Step 5: Writeback (task_update + thought_record)

Agent: Sub-agent
Tools: task_update, thought_record, audit_session_start (if proof-grade), merkle_finalize (if proof-grade)

Call 1: task_update
  task_update(
    task_id="P0.3.2",
    status="done",
    progress=100
  )

Call 2: thought_record
  thought_record(
    task_id="P0.3.2",
    branch="feature/P0.3.2-task-crud",
    commit_sha="abc1234def5678...",
    tests_run="12 passed, 0 failed",
    summary="Task CRUD implementation complete. createTask, getTask, updateTask, deleteTask, listTasks all implemented with prepared statements. No string interpolation. Full roundtrip test passes.",
    blockers="none"
  )

If proof-grade (critical tasks like writeback enforcement):
  Call 3: audit_session_start
    audit_session_start(task_id="P0.3.2")
    → returns session_id

  Call 4: audit_verify_chain
    audit_verify_chain(session_id)
    → {valid: true, first_broken_at: null, broken_count: 0}

  Call 5: merkle_finalize
    merkle_finalize(session_id)
    → {root: "sha256:...", record_count: 1, timestamp}

Writeback contract: Every task MUST produce task_update + thought_record. Proof-grade work MUST also run audit_session_start → audit_verify_chain → merkle_finalize.

Docs referenced:

  • docs/guides/implementation/PHASE-0-EXECUTION-GUIDE.md §Writeback
  • CLAUDE.md §Writeback

Step 6: Create PR (push branch, create PR)

Agent: Sub-agent
Tools: (git push, GitHub CLI, or MCP GitHub integration)

git push -u origin feature/P0.3.2-task-crud

# Create PR via gh CLI or MCP GitHub tool
gh pr create --title "PR-N: P0.3.2 Task CRUD" --body "..."
→ PR #42 created

Step 7: Auto-Review (PR review)

Agent: colibri-reviewer (autonomous)
Tools: PR review (approval, request changes, or block)

colibri-reviewer:
  1. Reads the PR diff
  2. Reads acceptance criteria from task-breakdown.md
  3. Checks:
     - Did tests pass? (CI artifact)
     - Did lint pass? (CI artifact)
     - Did build pass? (CI artifact)
     - Is writeback present? (thought_record in git log or task DB)
     - Are acceptance criteria met? (manual check or diff scan)
  4. If all pass: Approve PR
     If any fail: Request changes (with specific failure details)

Step 8: Enforce Writeback

Agent: writeback-enforcement-agent (autonomous)
Tools: PR check, task status query

writeback-enforcement-agent:
  1. Monitors PRs waiting for merge
  2. For each PR, checks: is there a thought_record?
  3. If NO: Block PR merge, add comment:
     "WRITEBACK REQUIRED: task_update(status=done) + thought_record(...) not found. 
      PR cannot merge until writeback is recorded."
  4. If YES: Remove block, clear comment
  5. PM can now merge

Step 9: Merge

Agent: colibri-pm (or human reviewer)
Tools: PR merge

colibri-pm (or human):
  1. Checks: approved by colibri-reviewer
  2. Checks: writeback is present
  3. Merges PR to main
  4. Watches merge completion

Step 10: Identify Next Unblocked Tasks

Agent: doc-loop-agent
Tools: task_next_actions, task dependency graph

doc-loop-agent (after merge):
  1. Reads task-breakdown.md §Dependencies
  2. Checks which tasks are now unblocked by the merged task
  Example: P0.3.2 merged → P0.3.3 is now unblocked
  3. Updates PHASE-0-PROGRESS.md or task DB
  4. Logs: "P0.3.2 ✓ complete. P0.3.3, P0.3.4 now unblocked."
  5. Next cycle: colibri-pm picks P0.3.3 and/or P0.3.4

Step 11: Spawn Next Batch

Agent: colibri-pm
Tools: agent_spawn, task_create

colibri-pm (beginning of next cycle):
  1. Calls task_next_actions() → [P0.3.3, P0.3.4, P0.4.1, ...]
  2. Picks top N unblocked tasks
  3. For each task, calls agent_spawn:
     agent_spawn(
       skill_name="colibri-executor",
       prompt="{bootstrap prompt for task_id}"
     )
  4. Returns: agent_id, worktree_path, pid
  5. Logs: "Spawned 4 executors for P0.3.3, P0.3.4, P0.4.1, P0.4.2"
  6. Loop repeats every 2–4 hours

Skill Catalog: Skills Used in Self-Build

The 22 .agents/skills/colibri-* skills map to the loop steps:

Skill Purpose Loop Step(s) Phase
colibri-bootstrap-agent P0.1.1–P0.1.2 bootstrap 2–5 A
colibri-executor General task execution 2–5 B, C
colibri-reviewer PR review + acceptance criteria check 7 B, C
colibri-pm Orchestration, pick tasks, spawn agents 1, 11 B, C
doc-loop-agent Task state update, identify next tasks 10 B, C
writeback-enforcement-agent Block PRs missing writeback 8 B, C
colibri-auditor Proof-grade work audit 3e, 5 (proof) C
alpha-executor α (System Core) specialist 2–5 B, C
beta-executor β (Task Pipeline) specialist 2–5 B, C
gamma-executor γ (Server Lifecycle) specialist 2–5 B, C
delta-executor δ (Model Router) specialist 2–5 B, C
epsilon-executor ε (Skill Registry) specialist 2–5 B, C
zeta-executor ζ (Decision Trail) specialist 2–5 B, C
eta-executor η (Proof Store) specialist 2–5 B, C
nu-executor ν (Integrations) specialist 2–5 B, C
(6 more specialists) Phase 1–2 task execution 2–5 C

Writeback Contract (Non-Negotiable)

Required for All Tasks

Every completed task MUST produce:

task_update(
  task_id="P0.x.x",
  status="done",
  progress=100
)

thought_record(
  task_id="P0.x.x",
  branch="feature/P0.x.x-{slug}",
  commit_sha="{full 40-char SHA}",
  tests_run="{X passed, 0 failed}",
  summary="{1–2 sentence summary of what was built}",
  blockers="none" or "{list of blockers}"
)

Required for Proof-Grade Work (Critical Tasks)

For high-stakes tasks (writeback enforcement, Merkle finalization, consensus proofs):

audit_session_start(task_id="P0.x.x") → session_id

# (work happens)

audit_verify_chain(session_id) → {valid: true, ...}

thought_record(...) # as above

merkle_finalize(session_id) → {root: "sha256:...", ...}

merkle_root(session_id) → {root: "...", record_count: N, timestamp: T}

Enforcement

  • No task update without thought record: PR cannot merge without writeback
  • No off-chain work: All work must be recorded in git commits + thought records
  • No skipping tests: npm test && npm run lint must pass before writeback

Failure Modes and Recovery

Failure Mode 1: Agent Ships Task Without Writeback

Problem: Agent created PR but didn’t call task_update + thought_record.

Detection: writeback-enforcement-agent blocks the PR.

Recovery:

PR is blocked with message: "WRITEBACK REQUIRED"
Options:
  A. Original agent calls thought_record retroactively
  B. Human manually pushes a commit with thought_record call
  C. New agent is spawned to add writeback
  D. PR is closed and reverted (if critical failure)

Failure Mode 2: Agent Picks Blocked Task

Problem: Agent calls task_next_actions, but somehow picks a task that isn’t actually unblocked (e.g., a task blocked by another task).

Detection: Task creation fails: task_create or task_update returns { error: "task_blocked_by_P0.x.x" }.

Recovery:

Agent encounters error immediately.
Agent reads the error, identifies the blocking task.
Agent reports: "I picked P0.3.3 but it's blocked by P0.3.2. Please complete P0.3.2 first."
colibri-pm or human confirms: P0.3.2 is still in_progress.
Agent is reassigned a different unblocked task.

This should be rare because task_next_actions filters out blocked tasks.


Failure Mode 3: Two Agents Pick Same Task

Problem: Two agents simultaneously call task_next_actions, both get the same task, both create worktrees with the same branch name.

Detection: First agent pushes branch successfully. Second agent’s git push fails: [rejected] feature/P0.3.2-... (cannot push branch that already exists).

Recovery:

Second agent:
  1. Pulls latest: git fetch origin
  2. Reads the remote branch: the first agent already owns it
  3. Agent self-reports: "Task P0.3.2 is being worked on by Agent A (PID 12345). Releasing this task."
  4. Agent calls task_next_actions again → gets a different task
  5. Agent proceeds with new task

Failure Mode 4: Tests Fail Mid-Task

Problem: Agent runs npm test && npm run lint and tests fail. Agent is stuck in Step 4 (Verify).

Detection: Test output shows failures.

Recovery:

Agent:
  1. Reads test failure output
  2. Identifies which test(s) failed
  3. Fixes the implementation
  4. Commits fix: fix(P0.x.x): {description of fix}
  5. Runs npm test again
  6. If pass: continue to Step 5 (Writeback)
  7. If fail: repeat until pass (agent has time budget; if exceeded, mark task blocked)

Escalation: If agent fixes fail 3 times, task is marked blocked and agent reports to colibri-pm with detailed blockers.


Why This Is Real, Not Vaporware

  1. MCP server exists after PR-7. The src/server.ts from P0.2.1 is a real, running process. Agents can call task_next_actions and other tools; they don’t need to read from git branches anymore.

  2. Writeback contract is enforced by code. In P0.3.3 (Writeback Contract Enforcement), a WritebackRequiredError is thrown at runtime if a task moves to done without a thought_record. This is not a soft recommendation; it’s a compile-time guarantee.

  3. All 22 skills exist in .agents/skills/. These are real, version-controlled skill files with SKILL.md frontmatter. The PM agent can call skill_get and skill_list to discover and spawn them.

  4. All 28 Phase 0 tasks are fully specified. Every task has acceptance criteria, input/output files, and effort estimates. There’s no guesswork—agents execute a machine-readable spec.

  5. Task dependency graph is explicit. Every task’s dependencies are documented in task-breakdown.md. task_next_actions can compute unblocked tasks deterministically from this graph.

  6. Audit trail is cryptographic. Each thought_record is SHA-256 hashed and chained (P0.7.1–P0.7.3). Merkle proofs can verify task lineage (P0.8.1–P0.8.3). This is real proof-of-work, not simulation.

  7. Colibri is bootstrapped after PR-7. By definition, once the state machine (P0.3.1) works, the task pipeline (P0.3.2–P0.3.4) works, and the MCP server (P0.2.1–P0.2.3) is live, the system can execute its own tasks. It’s self-referential but not circular—it’s bootstrapped.


The Virtuous Cycle

Once Phase C begins:

  • Colibri builds more of itself. Each completed task unblocks 1–3 downstream tasks.
  • Coverage grows. Complexity grows. Agents learn. As more tasks ship, more capabilities (router, skills, proofs) become available. Agents get better at understanding the codebase.
  • Human oversight remains. Humans approve/veto PRs. They can halt any task by refusing to merge.
  • Self-build accelerates. By day 30, if the loop is working, 4–6 agents are running in parallel. Cycle time goes from 2–4 hours per task to 30–60 minutes per task.
  • Phase 1 ships. Once Phase 0 is complete (~day 35), Phase 1 (κ Rule Engine) begins. Colibri is now a rule engine + orchestrator. Agents build κ using the same loop.

See Also

  • [[./first-7-prs.md]] — First 7 PRs that bootstrap the loop
  • [[./PHASE-0-EXECUTION-GUIDE.md]] — Phase 0 full roadmap
  • [[./task-breakdown.md]] — 28 tasks + dependency graph
  • [[../agent-bootstrap.md]] — Agent bootstrap prompt
  • [[../../CLAUDE.md]] — Worktree rules, writeback protocol
  • [[../../colibri-master-context.md]] — Full system context

Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.