ADR-003 — BFT Consensus: Build from scratch vs libp2p
Status: Accepted
Date: 2026-04-07
Accepted: 2026-05-13 (R93 post-R89B θ Phase 3)
Domain: Legitimacy (θ Consensus)
Context
Colibri’s θ Consensus layer requires Byzantine Fault Tolerant (BFT) consensus: nodes must agree on which events are valid even when up to 1/3 of nodes are malicious or offline.
Reference algorithm is a PBFT-inspired implementation documented in docs/reference/extractions/:
- ~450 lines of domain logic
- Includes: quorum calculation, equivocation detection, view change protocol, Byzantine leader rotation, slashing conditions
- No network transport layer — pure consensus state machine
- Three quorum types: simple majority (>50%), super majority (>66%), unanimous
The decision is specifically about the Node.js implementation. The question is whether to build from scratch following the reference design, or integrate with libp2p’s consensus primitives.
Decision
Option C — minimal in-process spike (no libp2p, no external transport).
Colibri’s θ Consensus layer ships as a single-process BFT state machine: messages, quorum, equivocation, view-change/round-state, finality, gossip-wire, bloom-dedup, adaptive-fanout, time-anchors, parity-harness, and the fork-hook handoff. Direct function calls stand in for network transport; multi-node P2P (via libp2p or any other framework) is not part of Phase 0, Phase 1, Phase 1.5, Phase 2, or Phase 3 — it remains a later-phase activation. This decision was operationalized when R89 Phase B shipped P3.1.x / P3.2.x / P3.3.x / P3.4.1 / P3.5.1 / P3.6.1 / P3.7.1 / P3.8.1 / P3.9.1 across PRs #234–#246 — see src/domains/consensus/{messages,gossip-wire,bloom-dedup,adaptive-fanout,equivocation,fork-hook,finality,round-state,parity-harness,quorum,time-anchors}.ts and the 12 sibling test files under src/__tests__/domains/consensus/. The MCP surface exposes consensus via the five θ tools added in R89 Phase B P3.7.1 (PR #244): consensus_propose, consensus_vote, consensus_finality, consensus_gossip, and vrf_eval.
The acceptance was formalized on 2026-05-13 in the R93 Phase 1 tech-debt sweep; until that round this ADR carried Status: PROPOSED even though the shipped implementation was already aligned with Option C (“minimal in-process spike” per §Two-phase approach below). The rationale below (Options A/B/C, Consequences, Alternatives) is preserved as the original deliberation record.
Option C’s stated “Phase 3b decision point” — pick between Option A (full scratch port with network layer) or Option B (libp2p for transport) once the consensus logic is validated — is deferred to a later phase. The Phase 3 spike has shipped and is correct under direct-call integration tests; the network-transport decision will be re-litigated in a superseding ADR when multi-node P2P becomes a Colibri requirement.
Options
Option A: Build BFT from scratch
Implement BFT consensus in Node.js following the reference algorithm documentation, building:
- Core BFT state machine (~450 lines equivalent)
- Quorum calculation and vote collection
- Equivocation detection and slashing
- View change protocol for Byzantine leader rotation
- Peer discovery and connection management
- IHAVE/IWANT gossip protocol
Pros:
- Full control over every part of the consensus logic
- The reference design is already aligned with Colibri’s specific constraints (subjective finality, experience-token slashing, fork integration)
- No framework dependency; easier to audit
- Consensus logic is already designed to integrate with governance (voting mode selection) and fork (fork trigger on quorum failure)
Cons:
- High implementation risk — BFT protocols are notoriously easy to get subtly wrong
- Network layer (gossip, peer discovery) must also be built
- Testing distributed consensus correctly requires multi-node integration tests
- Estimated 5-6 weeks (per implementation phase estimates) with high variance
Target files (Phase 3 implementation tasks):
src/consensus/voting.js— core BFT votingsrc/consensus/equivocation.js— double-vote detectionsrc/consensus/view-change.js— PBFT view changesrc/consensus/finality.js— 5-level finality trackingsrc/gossip/— IHAVE/IWANT gossip protocol
Option B: Build on libp2p
Use @libp2p/libp2p as the networking layer and a libp2p-compatible consensus module (e.g., @chainsafe/libp2p-gossipsub for gossip, custom BFT layer on top).
Pros:
- Battle-tested P2P networking: peer discovery, NAT traversal, multiplexed streams
- GossipSub (libp2p’s pubsub) is production-grade and replaces the custom IHAVE/IWANT implementation
- Reduces implementation scope by ~40% (network layer is provided)
- Strong TypeScript types and active maintenance
Cons:
- libp2p is a complex framework with its own abstractions (PeerId, Multiaddr, Dialer)
- Adding libp2p adds ~15-20 transitive npm dependencies
- libp2p’s consensus is Ethereum-oriented (CL clients); adapting it to Colibri’s subjective-finality model requires significant customization
- The gossip semantics differ: libp2p GossipSub uses message IDs and mesh topology, not the simple IHAVE/IWANT of
gossip.py - Integration with Colibri’s fork-scoped event logs (ι State Fork) is non-trivial
Option C: Two-phase approach
Phase 3a (spike, 2 weeks): build a minimal BFT state machine (src/consensus/voting.js) without network transport. Use direct function calls between nodes in integration tests. Validate that the consensus logic is correct.
Phase 3b (decision point): after the spike, choose between Option A full port (network layer included) or Option B libp2p for transport only, keeping the custom BFT logic from Phase 3a.
This is the recommended approach per MASTER-TASKS.md.
Consequences
If Option A (full scratch port):
- 5-6 weeks total; highest variance
- Complete control; fully auditable
- Risk: subtle BFT bugs discovered late
If Option B (libp2p):
- 3-4 weeks for network layer; 2-3 weeks for BFT layer on top; ~5-7 weeks total
- Higher dependency footprint
- Risk: libp2p abstraction friction with Colibri’s subjective-finality model
If Option C (two-phase):
- 2-week spike produces validated BFT logic
- Network transport decision deferred until consensus logic is confirmed correct
- Allows an informed Option A vs B decision with real code evidence
Alternatives Considered
- Tendermint / CometBFT: too heavy (Go binary); not compatible with Node.js MCP server process model
- Hyperledger Fabric ordering service: enterprise-oriented, overly complex for Colibri’s scale
- Raft consensus (not BFT): Raft tolerates crash failures, not Byzantine failures; does not satisfy AX-03 (no absolute authority) because a Raft leader can act arbitrarily
References
- Reference algorithm in docs/reference/extractions/theta-consensus-extraction.md (~450 lines)
- θ — Consensus concept — reader-friendly introduction
- S06 — Consensus — full BFT specification
- ADR-002 — VRF library (used for leader election inside BFT)
- Phase 3 implementation tasks in task breakdown
Change log
- 2026-04-07 — Drafted with
Status: PROPOSED. Three options documented (Option A scratch port; Option B libp2p; Option C two-phase spike). Decision deferred pending PM-commissioned spike. - 2026-05-13 — Accepted (R93 Phase 1 tech-debt sweep, formalizing the Option C decision shipped in R89 Phase B PRs #234–#246). Live code:
src/domains/consensus/{messages,gossip-wire,bloom-dedup,adaptive-fanout}.tsand siblings — minimal in-process spike per Option C; NO libp2p, NO external transport. Tests:src/__tests__/domains/consensus/*.test.ts(parity-harness, finality, quorum, etc). MCP surface: five θ tools added in R89 Phase B P3.7.1 PR #244 (consensus_propose,consensus_vote,consensus_finality,consensus_gossip,vrf_eval; surface 18 → 23). No source or test changes in this ADR-acceptance commit — the implementation already landed in R89 Phase B. The Option A vs Option B network-transport decision is deferred to a future ADR.
R93 Phase 1 tech-debt sweep. ADR-003 transitions PROPOSED → ACCEPTED to match the operative BFT implementation on main since R89 Phase B (2026-05-13).