ADR-003 — BFT Consensus: Build from scratch vs libp2p

Status: Accepted
Date: 2026-04-07
Accepted: 2026-05-13 (R93 post-R89B θ Phase 3)
Domain: Legitimacy (θ Consensus)

Context

Colibri’s θ Consensus layer requires Byzantine Fault Tolerant (BFT) consensus: nodes must agree on which events are valid even when up to 1/3 of nodes are malicious or offline.

Reference algorithm is a PBFT-inspired implementation documented in docs/reference/extractions/:

  • ~450 lines of domain logic
  • Includes: quorum calculation, equivocation detection, view change protocol, Byzantine leader rotation, slashing conditions
  • No network transport layer — pure consensus state machine
  • Three quorum types: simple majority (>50%), super majority (>66%), unanimous

The decision is specifically about the Node.js implementation. The question is whether to build from scratch following the reference design, or integrate with libp2p’s consensus primitives.

Decision

Option C — minimal in-process spike (no libp2p, no external transport).

Colibri’s θ Consensus layer ships as a single-process BFT state machine: messages, quorum, equivocation, view-change/round-state, finality, gossip-wire, bloom-dedup, adaptive-fanout, time-anchors, parity-harness, and the fork-hook handoff. Direct function calls stand in for network transport; multi-node P2P (via libp2p or any other framework) is not part of Phase 0, Phase 1, Phase 1.5, Phase 2, or Phase 3 — it remains a later-phase activation. This decision was operationalized when R89 Phase B shipped P3.1.x / P3.2.x / P3.3.x / P3.4.1 / P3.5.1 / P3.6.1 / P3.7.1 / P3.8.1 / P3.9.1 across PRs #234–#246 — see src/domains/consensus/{messages,gossip-wire,bloom-dedup,adaptive-fanout,equivocation,fork-hook,finality,round-state,parity-harness,quorum,time-anchors}.ts and the 12 sibling test files under src/__tests__/domains/consensus/. The MCP surface exposes consensus via the five θ tools added in R89 Phase B P3.7.1 (PR #244): consensus_propose, consensus_vote, consensus_finality, consensus_gossip, and vrf_eval.

The acceptance was formalized on 2026-05-13 in the R93 Phase 1 tech-debt sweep; until that round this ADR carried Status: PROPOSED even though the shipped implementation was already aligned with Option C (“minimal in-process spike” per §Two-phase approach below). The rationale below (Options A/B/C, Consequences, Alternatives) is preserved as the original deliberation record.

Option C’s stated “Phase 3b decision point” — pick between Option A (full scratch port with network layer) or Option B (libp2p for transport) once the consensus logic is validated — is deferred to a later phase. The Phase 3 spike has shipped and is correct under direct-call integration tests; the network-transport decision will be re-litigated in a superseding ADR when multi-node P2P becomes a Colibri requirement.

Options

Option A: Build BFT from scratch

Implement BFT consensus in Node.js following the reference algorithm documentation, building:

  • Core BFT state machine (~450 lines equivalent)
  • Quorum calculation and vote collection
  • Equivocation detection and slashing
  • View change protocol for Byzantine leader rotation
  • Peer discovery and connection management
  • IHAVE/IWANT gossip protocol

Pros:

  • Full control over every part of the consensus logic
  • The reference design is already aligned with Colibri’s specific constraints (subjective finality, experience-token slashing, fork integration)
  • No framework dependency; easier to audit
  • Consensus logic is already designed to integrate with governance (voting mode selection) and fork (fork trigger on quorum failure)

Cons:

  • High implementation risk — BFT protocols are notoriously easy to get subtly wrong
  • Network layer (gossip, peer discovery) must also be built
  • Testing distributed consensus correctly requires multi-node integration tests
  • Estimated 5-6 weeks (per implementation phase estimates) with high variance

Target files (Phase 3 implementation tasks):

  • src/consensus/voting.js — core BFT voting
  • src/consensus/equivocation.js — double-vote detection
  • src/consensus/view-change.js — PBFT view change
  • src/consensus/finality.js — 5-level finality tracking
  • src/gossip/ — IHAVE/IWANT gossip protocol

Option B: Build on libp2p

Use @libp2p/libp2p as the networking layer and a libp2p-compatible consensus module (e.g., @chainsafe/libp2p-gossipsub for gossip, custom BFT layer on top).

Pros:

  • Battle-tested P2P networking: peer discovery, NAT traversal, multiplexed streams
  • GossipSub (libp2p’s pubsub) is production-grade and replaces the custom IHAVE/IWANT implementation
  • Reduces implementation scope by ~40% (network layer is provided)
  • Strong TypeScript types and active maintenance

Cons:

  • libp2p is a complex framework with its own abstractions (PeerId, Multiaddr, Dialer)
  • Adding libp2p adds ~15-20 transitive npm dependencies
  • libp2p’s consensus is Ethereum-oriented (CL clients); adapting it to Colibri’s subjective-finality model requires significant customization
  • The gossip semantics differ: libp2p GossipSub uses message IDs and mesh topology, not the simple IHAVE/IWANT of gossip.py
  • Integration with Colibri’s fork-scoped event logs (ι State Fork) is non-trivial

Option C: Two-phase approach

Phase 3a (spike, 2 weeks): build a minimal BFT state machine (src/consensus/voting.js) without network transport. Use direct function calls between nodes in integration tests. Validate that the consensus logic is correct.

Phase 3b (decision point): after the spike, choose between Option A full port (network layer included) or Option B libp2p for transport only, keeping the custom BFT logic from Phase 3a.

This is the recommended approach per MASTER-TASKS.md.

Consequences

If Option A (full scratch port):

  • 5-6 weeks total; highest variance
  • Complete control; fully auditable
  • Risk: subtle BFT bugs discovered late

If Option B (libp2p):

  • 3-4 weeks for network layer; 2-3 weeks for BFT layer on top; ~5-7 weeks total
  • Higher dependency footprint
  • Risk: libp2p abstraction friction with Colibri’s subjective-finality model

If Option C (two-phase):

  • 2-week spike produces validated BFT logic
  • Network transport decision deferred until consensus logic is confirmed correct
  • Allows an informed Option A vs B decision with real code evidence

Alternatives Considered

  • Tendermint / CometBFT: too heavy (Go binary); not compatible with Node.js MCP server process model
  • Hyperledger Fabric ordering service: enterprise-oriented, overly complex for Colibri’s scale
  • Raft consensus (not BFT): Raft tolerates crash failures, not Byzantine failures; does not satisfy AX-03 (no absolute authority) because a Raft leader can act arbitrarily

References

  • Reference algorithm in docs/reference/extractions/theta-consensus-extraction.md (~450 lines)
  • θ — Consensus concept — reader-friendly introduction
  • S06 — Consensus — full BFT specification
  • ADR-002 — VRF library (used for leader election inside BFT)
  • Phase 3 implementation tasks in task breakdown

Change log

  • 2026-04-07 — Drafted with Status: PROPOSED. Three options documented (Option A scratch port; Option B libp2p; Option C two-phase spike). Decision deferred pending PM-commissioned spike.
  • 2026-05-13 — Accepted (R93 Phase 1 tech-debt sweep, formalizing the Option C decision shipped in R89 Phase B PRs #234–#246). Live code: src/domains/consensus/{messages,gossip-wire,bloom-dedup,adaptive-fanout}.ts and siblings — minimal in-process spike per Option C; NO libp2p, NO external transport. Tests: src/__tests__/domains/consensus/*.test.ts (parity-harness, finality, quorum, etc). MCP surface: five θ tools added in R89 Phase B P3.7.1 PR #244 (consensus_propose, consensus_vote, consensus_finality, consensus_gossip, vrf_eval; surface 18 → 23). No source or test changes in this ADR-acceptance commit — the implementation already landed in R89 Phase B. The Option A vs Option B network-transport decision is deferred to a future ADR.

R93 Phase 1 tech-debt sweep. ADR-003 transitions PROPOSED → ACCEPTED to match the operative BFT implementation on main since R89 Phase B (2026-05-13).


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.