Self-Build Loop: How Colibri Builds Colibri

⚠ HORIZON DOCUMENT — the agent_spawn / skill_get loop is NOT in Phase 0

This document describes a Phase 1.5+ horizon where Colibri agents self-build via a persistent agent runtime, agent_spawn / agent_status / agent_list MCP tools, and an skill_get / skill_invoke hot-reload surface. Phase 0 does not ship any of those tools. The Phase 0 self-build loop is simpler and human-supervised:

Sub-agent dispatch: PM (T2) spawns executors (T3) via the Task tool (Claude’s built-in sub-agent dispatch), not via an agent_spawn MCP call. Sub-agents run the 5-step chain (audit → contract → packet → implement → verify) inside a feature worktree and exit after writeback.

Skill discovery: PM reads skills from the file system (.agents/skills/*/SKILL.md) and passes the skill name/contents into the Task-tool prompt. The only Phase 0 ε tool is skill_list (read-only). There is no skill_get, no skill_invoke, no hot-reload.

No persistent agent pool, no agent_* tools. Every reference below to agent_spawn, agent_status, agent_list, or src/domains/agents/ describes a Phase 1.5 target that requires the δ Model Router. δ is deferred per ADR-005.

Phase 0 self-build is manual-supervised: human runs colibri-pm skill → PM reads task-breakdown.md → dispatches Task-tool sub-agents → sub-agents writeback (task_update + thought_record + audit_verify_chain → thought_record → merkle_finalize → merkle_root → audit_session_end) → human reviews PR via GitHub Desktop (because the git CLI is broken on E:\).

Read this file as the Phase 1.5+ dogfooding target, not Phase 0. Phase 0 dogfoods via Task-tool sub-agents supervised by humans.

What This Document Is

This specification describes dogfooding: how Colibri agents will build Colibri itself using Colibri’s own tools. It’s a three-phase evolution from manual bootstrap to full autonomous self-build.

Three Phases of Self-Build

Phase A: Manual Bootstrap (Days 1–7, PR-1 through PR-7)

Status: Human or external agent manually ships PR-1 through PR-7.

No Colibri runtime yet. No MCP tools.
Agents read docs/guides/implementation/pre-mcp-bootstrap-sequence.md instead of standard startup.
Writeback is manual: agent creates docs/packets/{task-id}-packet.md with completion record.
Task state is tracked in a human-editable docs/guides/implementation/PHASE-0-PROGRESS.md file.
Human reviews and merges each PR.
Cycle time: ~1–2 days per PR.

Agents used:

colibri-bootstrap-agent (one agent per PR, manually spawned)
colibri-reviewer-agent (PR review, manually spawned)

Loop: Manual → PR → Review → Merge → Next Task Manual Assignment

Phase B: Hybrid (Days 8–21, P0.3.2 through P0.9)

Status: Colibri MCP server is up. task_create, task_update, thought_record tools work. Agents can self-discover unblocked tasks. But humans still spawn agents manually.

MCP server is running (P0.2.1–P0.2.3 complete).
Agents use task_next_actions tool to find unblocked tasks (no more manual assignment).
Agents use task_update and thought_record tools for writeback (no more manual packet files).
Merkle proofs and audit trail work (P0.7, P0.8 tasks).
Multiple agents run in parallel across different tasks.
Human still manually spawns agents via agent_spawn tool or direct CLI.
Humans review and merge PRs.
Cycle time: ~4–8h per task (tasks run in parallel).

Agents used:

colibri-executor-* (N parallel executors, each picks a task)
colibri-reviewer (PR review, automatically spawned after PR is created)
doc-loop-agent (updates PHASE-0-PROGRESS.md and identifies next tasks)

Loop: Pick (task_next_actions) → Read (task spec) → Implement → Verify → task_update + thought_record → PR → Auto-Review → Merge → Identify Next Unblocked Tasks

Phase C: Full Dogfooding (Day 22+, after PR-7)

Status: Colibri runtime fully operational. colibri-pm orchestrator agent reads task-breakdown.md, picks unblocked tasks, spawns sub-agents using agent_spawn, enforces writeback, finalizes Merkle proofs. Humans only review PRs.

colibri-pm agent is the coordinator.
Every 2 hours (configurable), colibri-pm:
1. Reads task-breakdown.md and task state from database
2. Identifies all unblocked todo tasks
3. Spawns sub-agents for high-priority unblocked tasks using agent_spawn tool
4. Sub-agents execute tasks, produce writeback (task_update + thought_record)
5. colibri-pm collects PRs and enforces writeback contract
6. Blocked PRs (missing writeback) are held in review; human or writeback-enforcement-agent flags the issue
7. Once writeback is complete, colibri-pm approves the PR
8. After merge, doc-loop-agent updates task state and identifies new unblocked tasks
9. Next cycle begins
Humans are out of the execution loop. They only approve/veto completed work in PR review.
Cycle time: ~2 hours to identify + spawn next batch of tasks.

Agents used:

colibri-pm (orchestrator, runs on timer)
colibri-executor-* (N sub-agents, spawned by PM)
colibri-reviewer (PR review, autonomous)
writeback-enforcement-agent (flags PRs missing writeback)
doc-loop-agent (task state update, identifies next tasks)

Loop: (Automated) Pick (task_next_actions + spawn) → Execute → Writeback → PR → Auto-Review → Enforce Writeback → Merge → Identify Next → Spawn Next

The Self-Build Loop: Step-by-Step

This cycle repeats every ~2–4 hours during Phase C:

Step 1: Pick (task_next_actions)

Agent: colibri-pm
Tools: task_next_actions, task_list

colibri-pm calls task_next_actions()
→ Returns unblocked tasks in priority order
Example: [P0.3.2 (Task CRUD), P0.5.1 (Intent Scoring), P0.7.1 (Hash Chain)]
colibri-pm picks top N tasks (N = parallelism level, typically 2–4)

Docs referenced:

docs/guides/implementation/task-dependency-graph.md — critical path decisions
docs/guides/implementation/task-breakdown.md §Dependencies

Step 2: Read (task spec + extraction file)

Agent: Sub-agent (spawned by PM)
Tools: (file system / skill_get)

Sub-agent reads:
docs/guides/implementation/task-breakdown.md §{task-id}
docs/reference/extractions/{greek-letter}-*.md (algorithm pseudocode)
docs/guides/implementation/task-prompts/{task-id}-prompt.md (if exists)

Example for P0.3.2:
task-breakdown.md §P0.3.2
beta-task-pipeline-extraction.md (CRUD section)
task-prompts/P0.3.2-task-crud-prompt.md

Docs referenced:

docs/guides/implementation/PHASE-0-EXECUTION-GUIDE.md §Task Execution Protocol (5-step chain)
docs/reference/extractions/ (all 49 extraction files)

Step 3: Implement (execute 5-step chain)

Agent: Sub-agent
Steps: Audit → Contract → Packet → Implementation → Verification

Step 3a: Audit
  - Read input files
  - Create docs/audits/{task-id}-audit.md
  - Commit: audit(P0.x.x): inventory baseline

Step 3b: Contract
  - Define public API, invariants, error contracts
  - Create docs/contracts/{task-id}-contract.md
  - Commit: contract(P0.x.x): define behavioral contract

Step 3c: Packet
  - List files, functions, test cases
  - Create docs/packets/{task-id}-packet.md
  - Commit: packet(P0.x.x): execution plan

Step 3d: Implementation
  - Follow the packet exactly
  - Create src/* files in worktree
  - Commit: feat(P0.x.x): {description}

Step 3e: Verification
  - npm test && npm run lint
  - Check acceptance criteria
  - Commit: verify(P0.x.x): all acceptance criteria pass

Worktree created by agent:

git fetch origin
git worktree add .worktrees/claude/{task-slug} -b feature/{task-slug} origin/main
cd .worktrees/claude/{task-slug}
# (execute 5 steps)

Docs referenced:

docs/guides/implementation/PHASE-0-EXECUTION-GUIDE.md §Task Execution Protocol
Task-specific extraction files

Step 4: Verify (test gate)

Agent: Sub-agent (or autonomous verifier)
Tools: (npm test, npm run lint, npm run build)

Before claiming completion:
  npm test          → all tests pass
  npm run lint      → zero errors
  npm run build     → compiles successfully

If any step fails:
  - Fix the failure (don't skip)
  - Re-run tests
  - Commit fix: fix(P0.x.x): {description of fix}
  - Return to Step 4 (test gate again)

If all pass:
  - Continue to Step 5 (Writeback)

Step 5: Writeback (task_update + thought_record)

Agent: Sub-agent
Tools: task_update, thought_record, audit_session_start (if proof-grade), merkle_finalize (if proof-grade)

Call 1: task_update
  task_update(
    task_id="P0.3.2",
    status="done",
    progress=100
  )

Call 2: thought_record
  thought_record(
    task_id="P0.3.2",
    branch="feature/P0.3.2-task-crud",
    commit_sha="abc1234def5678...",
    tests_run="12 passed, 0 failed",
    summary="Task CRUD implementation complete. createTask, getTask, updateTask, deleteTask, listTasks all implemented with prepared statements. No string interpolation. Full roundtrip test passes.",
    blockers="none"
  )

If proof-grade (critical tasks like writeback enforcement):
  Call 3: audit_session_start
    audit_session_start(task_id="P0.3.2")
    → returns session_id

  Call 4: audit_verify_chain
    audit_verify_chain(session_id)
    → {valid: true, first_broken_at: null, broken_count: 0}

  Call 5: merkle_finalize
    merkle_finalize(session_id)
    → {root: "sha256:...", record_count: 1, timestamp}

Writeback contract: Every task MUST produce task_update + thought_record. Proof-grade work MUST also run audit_session_start → audit_verify_chain → merkle_finalize.

Docs referenced:

docs/guides/implementation/PHASE-0-EXECUTION-GUIDE.md §Writeback
CLAUDE.md §Writeback

Step 6: Create PR (push branch, create PR)

Agent: Sub-agent
Tools: (git push, GitHub CLI, or MCP GitHub integration)

git push -u origin feature/P0.3.2-task-crud

# Create PR via gh CLI or MCP GitHub tool
gh pr create --title "PR-N: P0.3.2 Task CRUD" --body "..."
→ PR #42 created

Step 7: Auto-Review (PR review)

Agent: colibri-reviewer (autonomous)
Tools: PR review (approval, request changes, or block)

colibri-reviewer:
  1. Reads the PR diff
  2. Reads acceptance criteria from task-breakdown.md
  3. Checks:
     - Did tests pass? (CI artifact)
     - Did lint pass? (CI artifact)
     - Did build pass? (CI artifact)
     - Is writeback present? (thought_record in git log or task DB)
     - Are acceptance criteria met? (manual check or diff scan)
  4. If all pass: Approve PR
     If any fail: Request changes (with specific failure details)

Step 8: Enforce Writeback

Agent: writeback-enforcement-agent (autonomous)
Tools: PR check, task status query

writeback-enforcement-agent:
  1. Monitors PRs waiting for merge
  2. For each PR, checks: is there a thought_record?
  3. If NO: Block PR merge, add comment:
     "WRITEBACK REQUIRED: task_update(status=done) + thought_record(...) not found. 
      PR cannot merge until writeback is recorded."
  4. If YES: Remove block, clear comment
  5. PM can now merge

Step 9: Merge

Agent: colibri-pm (or human reviewer)
Tools: PR merge

colibri-pm (or human):
Checks: approved by colibri-reviewer
Checks: writeback is present
Merges PR to main
Watches merge completion

Step 10: Identify Next Unblocked Tasks

Agent: doc-loop-agent
Tools: task_next_actions, task dependency graph

doc-loop-agent (after merge):
Reads task-breakdown.md §Dependencies
Checks which tasks are now unblocked by the merged task
  Example: P0.3.2 merged → P0.3.3 is now unblocked
Updates PHASE-0-PROGRESS.md or task DB
Logs: "P0.3.2 ✓ complete. P0.3.3, P0.3.4 now unblocked."
Next cycle: colibri-pm picks P0.3.3 and/or P0.3.4

Step 11: Spawn Next Batch

Agent: colibri-pm
Tools: agent_spawn, task_create

colibri-pm (beginning of next cycle):
  1. Calls task_next_actions() → [P0.3.3, P0.3.4, P0.4.1, ...]
  2. Picks top N unblocked tasks
  3. For each task, calls agent_spawn:
     agent_spawn(
       skill_name="colibri-executor",
       prompt="{bootstrap prompt for task_id}"
     )
  4. Returns: agent_id, worktree_path, pid
  5. Logs: "Spawned 4 executors for P0.3.3, P0.3.4, P0.4.1, P0.4.2"
  6. Loop repeats every 2–4 hours

Skill Catalog: Skills Used in Self-Build

The 22 .agents/skills/colibri-* skills map to the loop steps:

Skill	Purpose	Loop Step(s)	Phase
colibri-bootstrap-agent	P0.1.1–P0.1.2 bootstrap	2–5	A
colibri-executor	General task execution	2–5	B, C
colibri-reviewer	PR review + acceptance criteria check	7	B, C
colibri-pm	Orchestration, pick tasks, spawn agents	1, 11	B, C
doc-loop-agent	Task state update, identify next tasks	10	B, C
writeback-enforcement-agent	Block PRs missing writeback	8	B, C
colibri-auditor	Proof-grade work audit	3e, 5 (proof)	C
alpha-executor	α (System Core) specialist	2–5	B, C
beta-executor	β (Task Pipeline) specialist	2–5	B, C
gamma-executor	γ (Server Lifecycle) specialist	2–5	B, C
delta-executor	δ (Model Router) specialist	2–5	B, C
epsilon-executor	ε (Skill Registry) specialist	2–5	B, C
zeta-executor	ζ (Decision Trail) specialist	2–5	B, C
eta-executor	η (Proof Store) specialist	2–5	B, C
nu-executor	ν (Integrations) specialist	2–5	B, C
(6 more specialists)	Phase 1–2 task execution	2–5	C

Writeback Contract (Non-Negotiable)

Required for All Tasks

Every completed task MUST produce:

task_update(
  task_id="P0.x.x",
  status="done",
  progress=100
)

thought_record(
  task_id="P0.x.x",
  branch="feature/P0.x.x-{slug}",
  commit_sha="{full 40-char SHA}",
  tests_run="{X passed, 0 failed}",
  summary="{1–2 sentence summary of what was built}",
  blockers="none" or "{list of blockers}"
)

Required for Proof-Grade Work (Critical Tasks)

For high-stakes tasks (writeback enforcement, Merkle finalization, consensus proofs):

audit_session_start(task_id="P0.x.x") → session_id

# (work happens)

audit_verify_chain(session_id) → {valid: true, ...}

thought_record(...) # as above

merkle_finalize(session_id) → {root: "sha256:...", ...}

merkle_root(session_id) → {root: "...", record_count: N, timestamp: T}

Enforcement

No task update without thought record: PR cannot merge without writeback
No off-chain work: All work must be recorded in git commits + thought records
No skipping tests: npm test && npm run lint must pass before writeback

Failure Modes and Recovery

Failure Mode 1: Agent Ships Task Without Writeback

Problem: Agent created PR but didn’t call task_update + thought_record.

Detection: writeback-enforcement-agent blocks the PR.

Recovery:

PR is blocked with message: "WRITEBACK REQUIRED"
Options:
  A. Original agent calls thought_record retroactively
  B. Human manually pushes a commit with thought_record call
  C. New agent is spawned to add writeback
  D. PR is closed and reverted (if critical failure)

Failure Mode 2: Agent Picks Blocked Task

Problem: Agent calls task_next_actions, but somehow picks a task that isn’t actually unblocked (e.g., a task blocked by another task).

Detection: Task creation fails: task_create or task_update returns { error: "task_blocked_by_P0.x.x" }.

Recovery:

Agent encounters error immediately.
Agent reads the error, identifies the blocking task.
Agent reports: "I picked P0.3.3 but it's blocked by P0.3.2. Please complete P0.3.2 first."
colibri-pm or human confirms: P0.3.2 is still in_progress.
Agent is reassigned a different unblocked task.

This should be rare because task_next_actions filters out blocked tasks.

Failure Mode 3: Two Agents Pick Same Task

Problem: Two agents simultaneously call task_next_actions, both get the same task, both create worktrees with the same branch name.

Detection: First agent pushes branch successfully. Second agent’s git push fails: [rejected] feature/P0.3.2-... (cannot push branch that already exists).

Recovery:

Second agent:
Pulls latest: git fetch origin
Reads the remote branch: the first agent already owns it
Agent self-reports: "Task P0.3.2 is being worked on by Agent A (PID 12345). Releasing this task."
Agent calls task_next_actions again → gets a different task
Agent proceeds with new task

Failure Mode 4: Tests Fail Mid-Task

Problem: Agent runs npm test && npm run lint and tests fail. Agent is stuck in Step 4 (Verify).

Detection: Test output shows failures.

Recovery:

Agent:
Reads test failure output
Identifies which test(s) failed
Fixes the implementation
Commits fix: fix(P0.x.x): {description of fix}
Runs npm test again
If pass: continue to Step 5 (Writeback)
If fail: repeat until pass (agent has time budget; if exceeded, mark task blocked)

Escalation: If agent fixes fail 3 times, task is marked blocked and agent reports to colibri-pm with detailed blockers.

Why This Is Real, Not Vaporware

MCP server exists after PR-7. The src/server.ts from P0.2.1 is a real, running process. Agents can call task_next_actions and other tools; they don’t need to read from git branches anymore.
Writeback contract is enforced by code. In P0.3.3 (Writeback Contract Enforcement), a WritebackRequiredError is thrown at runtime if a task moves to done without a thought_record. This is not a soft recommendation; it’s a compile-time guarantee.
All 22 skills exist in .agents/skills/. These are real, version-controlled skill files with SKILL.md frontmatter. The PM agent can call skill_get and skill_list to discover and spawn them.
All 28 Phase 0 tasks are fully specified. Every task has acceptance criteria, input/output files, and effort estimates. There’s no guesswork—agents execute a machine-readable spec.
Task dependency graph is explicit. Every task’s dependencies are documented in task-breakdown.md. task_next_actions can compute unblocked tasks deterministically from this graph.
Audit trail is cryptographic. Each thought_record is SHA-256 hashed and chained (P0.7.1–P0.7.3). Merkle proofs can verify task lineage (P0.8.1–P0.8.3). This is real proof-of-work, not simulation.
Colibri is bootstrapped after PR-7. By definition, once the state machine (P0.3.1) works, the task pipeline (P0.3.2–P0.3.4) works, and the MCP server (P0.2.1–P0.2.3) is live, the system can execute its own tasks. It’s self-referential but not circular—it’s bootstrapped.

The Virtuous Cycle

Once Phase C begins:

Colibri builds more of itself. Each completed task unblocks 1–3 downstream tasks.
Coverage grows. Complexity grows. Agents learn. As more tasks ship, more capabilities (router, skills, proofs) become available. Agents get better at understanding the codebase.
Human oversight remains. Humans approve/veto PRs. They can halt any task by refusing to merge.
Self-build accelerates. By day 30, if the loop is working, 4–6 agents are running in parallel. Cycle time goes from 2–4 hours per task to 30–60 minutes per task.
Phase 1 ships. Once Phase 0 is complete (~day 35), Phase 1 (κ Rule Engine) begins. Colibri is now a rule engine + orchestrator. Agents build κ using the same loop.

Self-Build Loop: How Colibri Builds Colibri

⚠ HORIZON DOCUMENT — the `agent_spawn` / `skill_get` loop is NOT in Phase 0

What This Document Is

Three Phases of Self-Build

Phase A: Manual Bootstrap (Days 1–7, PR-1 through PR-7)

Phase B: Hybrid (Days 8–21, P0.3.2 through P0.9)

Phase C: Full Dogfooding (Day 22+, after PR-7)

The Self-Build Loop: Step-by-Step

Step 1: Pick (task_next_actions)

Step 2: Read (task spec + extraction file)

Step 3: Implement (execute 5-step chain)

Step 4: Verify (test gate)

Step 5: Writeback (task_update + thought_record)

Step 6: Create PR (push branch, create PR)

Step 7: Auto-Review (PR review)

Step 8: Enforce Writeback

Step 9: Merge

Step 10: Identify Next Unblocked Tasks

Step 11: Spawn Next Batch

Skill Catalog: Skills Used in Self-Build

Writeback Contract (Non-Negotiable)

Required for All Tasks

Required for Proof-Grade Work (Critical Tasks)

Enforcement

Failure Modes and Recovery

Failure Mode 1: Agent Ships Task Without Writeback

Failure Mode 2: Agent Picks Blocked Task

Failure Mode 3: Two Agents Pick Same Task

Failure Mode 4: Tests Fail Mid-Task

Why This Is Real, Not Vaporware

The Virtuous Cycle

See Also

Self-Build Loop: How Colibri Builds Colibri

⚠ HORIZON DOCUMENT — the agent_spawn / skill_get loop is NOT in Phase 0

What This Document Is

Three Phases of Self-Build

Phase A: Manual Bootstrap (Days 1–7, PR-1 through PR-7)

Phase B: Hybrid (Days 8–21, P0.3.2 through P0.9)

Phase C: Full Dogfooding (Day 22+, after PR-7)

The Self-Build Loop: Step-by-Step

Step 1: Pick (task_next_actions)

Step 2: Read (task spec + extraction file)

Step 3: Implement (execute 5-step chain)

Step 4: Verify (test gate)

Step 5: Writeback (task_update + thought_record)

Step 6: Create PR (push branch, create PR)

Step 7: Auto-Review (PR review)

Step 8: Enforce Writeback

Step 9: Merge

Step 10: Identify Next Unblocked Tasks

Step 11: Spawn Next Batch

Skill Catalog: Skills Used in Self-Build

Writeback Contract (Non-Negotiable)

Required for All Tasks

Required for Proof-Grade Work (Critical Tasks)

Enforcement

Failure Modes and Recovery

Failure Mode 1: Agent Ships Task Without Writeback

Failure Mode 2: Agent Picks Blocked Task

Failure Mode 3: Two Agents Pick Same Task

Failure Mode 4: Tests Fail Mid-Task

Why This Is Real, Not Vaporware

The Virtuous Cycle

See Also

⚠ HORIZON DOCUMENT — the `agent_spawn` / `skill_get` loop is NOT in Phase 0