P3.3.2 — Gossip Bloom Filter Dedup — Audit

Step 1 of the 5-step chain (CLAUDE.md §6). Phase 3 θ Wave 4 — per-round Bloom filter that wraps P3.3.1’s buildIWANT.options.seen predicate seam to prevent re-requesting events the receiver has already seen in this gossip round.

§1. Surface inventory at base SHA 9c7165d5

Path Exists? Role
src/domains/consensus/ Yes — Phase 3 θ domain root Wave 1–3 shipped (messages.ts, gossip-wire.ts, round-state.ts, quorum.ts, finality.ts, equivocation.ts, time-anchors.ts, vrf-stub.ts)
src/domains/consensus/gossip-wire.ts Yes — P3.3.1 (e63a8bcf) Direct integration target. BuildIWANTOptions.seen?: (event_id: Buffer) => boolean is the dedup seam this slice fills. The slice MUST NOT modify gossip-wire.ts.
src/domains/consensus/messages.ts Yes — P3.1.1 (e63a8bcf) Indirect dependency only — bloom-dedup does not reuse canonicalSerialize/signMessage. SHA-256 hashing comes from node:crypto (NAMED imports).
src/domains/consensus/bloom-dedup.ts No — to create sizeFilter() + BloomFilter class — sized per s08 formula, per-round in-memory only.
src/__tests__/domains/consensus/bloom-dedup.test.ts No — to create Sizing math, FPR empirical, no false negatives, reset, hash determinism.
node:crypto (createHash) Node ≥ 20 builtin NAMED import for SHA-256. Pattern matches messages.ts §2 imports.

§2. Spec inventory

2.1 s08 — Bloom filter algorithm

docs/spec/s08-gossip.md §Bloom filter algorithm (lines 36–67) defines the sizing formula and the per-round lifecycle. The relevant equations:

m = -n * ln(p) / (ln(2))^2     (bit count)
k = (m / n) * ln(2)            (hash count)

For the default config n = 1000, p = 0.01:

m = -1000 * ln(0.01) / (ln(2))^2  ≈  9585 bits  (~1.2 KB)
k = (9585 / 1000) * ln(2)         ≈  6.643 → round to 7

Spec line 67 (“Each gossip round allocates a fresh filter. Filters are not persisted”) is the lifecycle constraint — the filter is in-memory, exists only for one IHAVE exchange, and is discarded at end-of-round.

Spec line 65 (“False positives: All k bits set by coincidence (different event IDs); rate ≤ p when filter is not over-full”) is the FPR contract — the test must demonstrate observed FPR < 1.5% (margin allowed over the 1% theoretical bound).

2.2 §P3.3.2 task prompt

Source prompt at docs/guides/implementation/task-prompts/p3.1-theta-consensus.md §P3.3.2 (lines 1128–1295). Key acceptance criteria:

Criterion Source line
sizeFilter formulas (m, k) 1147–1148
Default config m≈9585 bits, k=7 1149
insert(event_id: Buffer): void 1150
mightContain(event_id: Buffer): boolean 1151
reset() zeroes the bit array 1152
In-memory only, no persistence 1153
Empirical FPR < 1.5% at n=1000, q=10000 1154
Number-vs-bigint exception documented + isolated to sizeFilter 1155
K independent SHA-256 truncations seeded by i (i ∈ [0, k)) 1156

2.3 P3.3.1 integration seam

src/domains/consensus/gossip-wire.ts line 191–195 defines BuildIWANTOptions.seen?: (event_id: Buffer) => boolean. Line 401–402 documents the seam:

// Default seen predicate is "no dedup" — let everything through to the
// locally_have filter. P3.3.2's Bloom impl swaps a real predicate here.

P3.3.2 does not modify gossip-wire.ts. The bloom filter is consumed by passing filter.mightContain.bind(filter) (or an equivalent closure) as the seen option to buildIWANT. That call-site wiring is the consumer’s concern; this slice ships only the filter primitive plus tests.

§3. Big-int / float boundary

P3.1.1 messages.ts and P3.1.3 round-state.ts enforce a no Math.*, no float, no Date.* discipline. The static scanner in messages.test.ts Group 9 walks the source body and rejects float/Math literals.

Bloom-dedup deliberately breaks one rule:

  • Math.log and Math.ceil and Math.round are used inside sizeFilter(n: number, p: number). There is no bigint stdlib equivalent for ln(x). Adding a polyfill for one helper is worse than documenting a carve-out.

The carve-out is isolated to the sizeFilter helper. The BloomFilter class body uses only:

  • Buffer operations
  • Uint8Array operations
  • number integer arithmetic for bit indexing (byte offset + bit mask)
  • node:crypto.createHash('sha256') per hash function

Math.floor is also unavoidable for the Math.floor(bit / 8) byte-offset arithmetic inside the bit-set loop. Two options:

  1. Use Math.floor(bit / 8) directly inside the class body (same as the task gotcha block line 1286 — “use byte = Math.floor(bit/8)”).
  2. Replace with >>> 3 bitwise shift (integer-only; no Math).

Option 2 is cleaner — >>> 3 is unsigned-right-shift-by-3, which is exactly integer division by 8. It avoids the Math.* token entirely inside the hot path and keeps the static scanner narrative clean: Math.* is confined to sizeFilter. The Uint8Array index is a 32-bit unsigned int, so >>> 3 is safe for m up to ~34B bits — well beyond practical Bloom sizes.

For the bit mask, 1 << (bit & 7) is integer-only too. Decision: the implementation uses bit-shift forms in the class body; Math.* appears only in sizeFilter.

§4. Hash construction

§P3.3.2 line 1156 specifies “K independent SHA-256 truncations seeded by i (i ∈ [0, k))”. The §P3.3.2 prompt block at line 1207 makes this concrete:

indexFor(i, event_id) = (sha256(i.toString() || event_id) read as uint32) mod m

Reading: hash the concatenation i_as_string_bytes || event_id_bytes with SHA-256, then take the first 4 bytes of the digest as a big-endian uint32, and reduce modulo m.

Implementation note: “i.toString()   event_id” in the prompt is loose pseudocode. Two interpretations:

(a) Buffer.from(String(i)) || event_id_bytes — concatenate decimal-string bytes of i, then event_id bytes. (b) Buffer.alloc(4); buf.writeUInt32BE(i); || event_id_bytes — 4-byte BE-encoded i, then event_id bytes.

Both are valid SHA-256 derivations. (b) is more compact, has stable byte length, and is the convention in the κ canonical-encoder family. Decision: use (b) — 4-byte big-endian i prefix, then event_id bytes, hash with SHA-256. The first 4 bytes of the 32-byte digest are read as big-endian uint32 and reduced mod m.

The mod m step uses % m on number. For m ≤ 2^32 (always true at practical Bloom sizes), this is exact in 32-bit unsigned arithmetic. readUInt32BE returns a number in [0, 2^32 - 1]. m is always positive number, so % m is well-defined.

§5. Empirical FPR test design

The acceptance criterion (line 1154) is “observed FPR < 1.5%” over 10000 disjoint queries with 1000 inserts. The empirical FPR is stochastic; the test must:

  1. Use a deterministic seed for the PRNG so the test is reproducible. Pattern from κ + messages tests: a counter-based seed feeding createHash('sha256').update(Buffer.from(String(counter))) to generate distinct 32-byte event_id Buffers.
  2. Insert 1000 distinct event_ids (counters 0–999).
  3. Query 10000 distinct event_ids that were not inserted (counters 1000–10999).
  4. Count the number of mightContain returns of true — that’s the false-positive count.
  5. Assert falsePositives / 10000 < 0.015.

§P3.3.2 gotcha line 1288–1290 also recommends “Run the FPR check 5 times on different seeds to confirm the < 1.5% margin holds (not just one lucky run)”. The test will use a single-seed deterministic dataset for the primary assertion plus a second pass with a different seed offset (counters 10000–19999 inserts, 20000–29999 queries) to confirm the margin is robust.

§6. Static scanner / lint hygiene

The κ-style static scanner pattern (referenced in messages.test.ts Group 9) walks the file body and flags banned tokens. Bloom-dedup’s banned-token policy:

Token Allowed in bloom-dedup.ts? Notes
Math.* Only inside sizeFilter body Carve-out documented in §3 above
Date.* No No timestamps
new Date No
setTimeout / setInterval No
fetch / XMLHttpRequest No
process.hrtime No
await / async function No Pure sync module
float literals (e.g. 0.5) Only the p parameter default is a float — passed in by caller, not literal in module body The Bloom sizing math itself produces float intermediates inside sizeFilter, but Number.isInteger holds at the boundary (Math.ceil/round)
Math.random No Random comes from caller-supplied event_id Buffers or test PRNG
[native code] No

The empirical FPR test in bloom-dedup.test.ts uses test-PRNG via node:crypto.createHash('sha256') keyed by an integer counter — no Math.random, no Date.now() seeding. Deterministic across runs.

§7. Risk inventory

Risk Likelihood Mitigation
FPR test flakes on stochastic margin Low — 1.5% margin over 1.0% theoretical leaves ~50% headroom; 10000 queries gives σ ≈ √(10000 × 0.01 × 0.99) ≈ 9.95 false positives, observed FPR std ≈ 0.001 (0.1%) Use deterministic seed; second-seed cross-check; assert < 0.015 not < 0.01
Math.* token policy bleed Medium — file contains Math.log/ceil/round inside sizeFilter Lint comment in sizeFilter explicitly flags the carve-out; class body uses bit-shifts only
<< overflow for k=7 — 1 << 7 is fine (= 128) Negligible — bit mask is 1 << (bit & 7), range [0,7] Mask & 7 enforces 0..7 input to shift
readUInt32BE modular bias when m is not a power-of-two Theoretical concern — m=9585 is not a power-of-two so reduction digest % m has a slight modular bias Bias is bounded by (2^32 - 1) % 9585 ≈ 4500 / 2^32 ≈ 10^-6 relative — far below the 1% FPR target
Uint8Array size limit Negligible — m=9585 → 1199 bytes, well under V8 array limits
Hash determinism for same input Low — SHA-256 deterministic by spec Test asserts identical bit patterns post-insert from same input set

§8. Files changed

File Change Lines Notes
src/domains/consensus/bloom-dedup.ts new ~200 sizeFilter helper + BloomFilter class
src/__tests__/domains/consensus/bloom-dedup.test.ts new ~250 sizing, FPR, reset, no-false-negatives, determinism
docs/audits/p3-3-2-bloom-dedup-audit.md new (this) this Step 1
docs/contracts/p3-3-2-bloom-dedup-contract.md new step 2 Step 2
docs/packets/p3-3-2-bloom-dedup-packet.md new step 3 Step 3
docs/verification/p3-3-2-bloom-dedup-verification.md new step 5 Step 5

No existing source touched. gossip-wire.ts is the integration consumer but does not need a code change — its BuildIWANTOptions.seen seam was already cut for this purpose at P3.3.1.

§9. Acceptance gate

  • Surface inventoried
  • Spec citations recorded
  • Math.* carve-out scoped and documented
  • Hash construction frozen (4-byte BE seed prefix + event_id, SHA-256, first 4 bytes BE → uint32, mod m)
  • FPR test design specified
  • Risk inventory complete
  • File list complete
  • Step 2 contract describes the BloomFilter class surface byte-for-byte
  • Step 3 packet schedules implementation
  • Step 4 implements
  • Step 5 verifies

Step 1 complete. Proceeding to Step 2 (contract).


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.