P3.3.2 — Gossip Bloom Filter Dedup — Audit
Step 1 of the 5-step chain (CLAUDE.md §6). Phase 3 θ Wave 4 — per-round Bloom filter that wraps P3.3.1’s buildIWANT.options.seen predicate seam to prevent re-requesting events the receiver has already seen in this gossip round.
§1. Surface inventory at base SHA 9c7165d5
| Path | Exists? | Role |
|---|---|---|
src/domains/consensus/ |
Yes — Phase 3 θ domain root | Wave 1–3 shipped (messages.ts, gossip-wire.ts, round-state.ts, quorum.ts, finality.ts, equivocation.ts, time-anchors.ts, vrf-stub.ts) |
src/domains/consensus/gossip-wire.ts |
Yes — P3.3.1 (e63a8bcf) |
Direct integration target. BuildIWANTOptions.seen?: (event_id: Buffer) => boolean is the dedup seam this slice fills. The slice MUST NOT modify gossip-wire.ts. |
src/domains/consensus/messages.ts |
Yes — P3.1.1 (e63a8bcf) |
Indirect dependency only — bloom-dedup does not reuse canonicalSerialize/signMessage. SHA-256 hashing comes from node:crypto (NAMED imports). |
src/domains/consensus/bloom-dedup.ts |
No — to create | sizeFilter() + BloomFilter class — sized per s08 formula, per-round in-memory only. |
src/__tests__/domains/consensus/bloom-dedup.test.ts |
No — to create | Sizing math, FPR empirical, no false negatives, reset, hash determinism. |
node:crypto (createHash) |
Node ≥ 20 builtin | NAMED import for SHA-256. Pattern matches messages.ts §2 imports. |
§2. Spec inventory
2.1 s08 — Bloom filter algorithm
docs/spec/s08-gossip.md §Bloom filter algorithm (lines 36–67) defines the sizing formula and the per-round lifecycle. The relevant equations:
m = -n * ln(p) / (ln(2))^2 (bit count)
k = (m / n) * ln(2) (hash count)
For the default config n = 1000, p = 0.01:
m = -1000 * ln(0.01) / (ln(2))^2 ≈ 9585 bits (~1.2 KB)
k = (9585 / 1000) * ln(2) ≈ 6.643 → round to 7
Spec line 67 (“Each gossip round allocates a fresh filter. Filters are not persisted”) is the lifecycle constraint — the filter is in-memory, exists only for one IHAVE exchange, and is discarded at end-of-round.
Spec line 65 (“False positives: All k bits set by coincidence (different event IDs); rate ≤ p when filter is not over-full”) is the FPR contract — the test must demonstrate observed FPR < 1.5% (margin allowed over the 1% theoretical bound).
2.2 §P3.3.2 task prompt
Source prompt at docs/guides/implementation/task-prompts/p3.1-theta-consensus.md §P3.3.2 (lines 1128–1295). Key acceptance criteria:
| Criterion | Source line |
|---|---|
sizeFilter formulas (m, k) |
1147–1148 |
| Default config m≈9585 bits, k=7 | 1149 |
insert(event_id: Buffer): void |
1150 |
mightContain(event_id: Buffer): boolean |
1151 |
reset() zeroes the bit array |
1152 |
| In-memory only, no persistence | 1153 |
| Empirical FPR < 1.5% at n=1000, q=10000 | 1154 |
Number-vs-bigint exception documented + isolated to sizeFilter |
1155 |
| K independent SHA-256 truncations seeded by i (i ∈ [0, k)) | 1156 |
2.3 P3.3.1 integration seam
src/domains/consensus/gossip-wire.ts line 191–195 defines BuildIWANTOptions.seen?: (event_id: Buffer) => boolean. Line 401–402 documents the seam:
// Default seen predicate is "no dedup" — let everything through to the
// locally_have filter. P3.3.2's Bloom impl swaps a real predicate here.
P3.3.2 does not modify gossip-wire.ts. The bloom filter is consumed by passing filter.mightContain.bind(filter) (or an equivalent closure) as the seen option to buildIWANT. That call-site wiring is the consumer’s concern; this slice ships only the filter primitive plus tests.
§3. Big-int / float boundary
P3.1.1 messages.ts and P3.1.3 round-state.ts enforce a no Math.*, no float, no Date.* discipline. The static scanner in messages.test.ts Group 9 walks the source body and rejects float/Math literals.
Bloom-dedup deliberately breaks one rule:
Math.logandMath.ceilandMath.roundare used insidesizeFilter(n: number, p: number). There is no bigint stdlib equivalent forln(x). Adding a polyfill for one helper is worse than documenting a carve-out.
The carve-out is isolated to the sizeFilter helper. The BloomFilter class body uses only:
BufferoperationsUint8Arrayoperationsnumberinteger arithmetic for bit indexing (byte offset + bit mask)node:crypto.createHash('sha256')per hash function
Math.floor is also unavoidable for the Math.floor(bit / 8) byte-offset arithmetic inside the bit-set loop. Two options:
- Use
Math.floor(bit / 8)directly inside the class body (same as the task gotcha block line 1286 — “usebyte = Math.floor(bit/8)”). - Replace with
>>> 3bitwise shift (integer-only; no Math).
Option 2 is cleaner — >>> 3 is unsigned-right-shift-by-3, which is exactly integer division by 8. It avoids the Math.* token entirely inside the hot path and keeps the static scanner narrative clean: Math.* is confined to sizeFilter. The Uint8Array index is a 32-bit unsigned int, so >>> 3 is safe for m up to ~34B bits — well beyond practical Bloom sizes.
For the bit mask, 1 << (bit & 7) is integer-only too. Decision: the implementation uses bit-shift forms in the class body; Math.* appears only in sizeFilter.
§4. Hash construction
§P3.3.2 line 1156 specifies “K independent SHA-256 truncations seeded by i (i ∈ [0, k))”. The §P3.3.2 prompt block at line 1207 makes this concrete:
indexFor(i, event_id) = (sha256(i.toString() || event_id) read as uint32) mod m
Reading: hash the concatenation i_as_string_bytes || event_id_bytes with SHA-256, then take the first 4 bytes of the digest as a big-endian uint32, and reduce modulo m.
| Implementation note: “i.toString() | event_id” in the prompt is loose pseudocode. Two interpretations: |
(a) Buffer.from(String(i)) || event_id_bytes — concatenate decimal-string bytes of i, then event_id bytes.
(b) Buffer.alloc(4); buf.writeUInt32BE(i); || event_id_bytes — 4-byte BE-encoded i, then event_id bytes.
Both are valid SHA-256 derivations. (b) is more compact, has stable byte length, and is the convention in the κ canonical-encoder family. Decision: use (b) — 4-byte big-endian i prefix, then event_id bytes, hash with SHA-256. The first 4 bytes of the 32-byte digest are read as big-endian uint32 and reduced mod m.
The mod m step uses % m on number. For m ≤ 2^32 (always true at practical Bloom sizes), this is exact in 32-bit unsigned arithmetic. readUInt32BE returns a number in [0, 2^32 - 1]. m is always positive number, so % m is well-defined.
§5. Empirical FPR test design
The acceptance criterion (line 1154) is “observed FPR < 1.5%” over 10000 disjoint queries with 1000 inserts. The empirical FPR is stochastic; the test must:
- Use a deterministic seed for the PRNG so the test is reproducible. Pattern from κ + messages tests: a counter-based seed feeding
createHash('sha256').update(Buffer.from(String(counter)))to generate distinct 32-byte event_id Buffers. - Insert 1000 distinct event_ids (counters 0–999).
- Query 10000 distinct event_ids that were not inserted (counters 1000–10999).
- Count the number of
mightContainreturns oftrue— that’s the false-positive count. - Assert
falsePositives / 10000 < 0.015.
§P3.3.2 gotcha line 1288–1290 also recommends “Run the FPR check 5 times on different seeds to confirm the < 1.5% margin holds (not just one lucky run)”. The test will use a single-seed deterministic dataset for the primary assertion plus a second pass with a different seed offset (counters 10000–19999 inserts, 20000–29999 queries) to confirm the margin is robust.
§6. Static scanner / lint hygiene
The κ-style static scanner pattern (referenced in messages.test.ts Group 9) walks the file body and flags banned tokens. Bloom-dedup’s banned-token policy:
| Token | Allowed in bloom-dedup.ts? |
Notes |
|---|---|---|
Math.* |
Only inside sizeFilter body |
Carve-out documented in §3 above |
Date.* |
No | No timestamps |
new Date |
No | — |
setTimeout / setInterval |
No | — |
fetch / XMLHttpRequest |
No | — |
process.hrtime |
No | — |
await / async function |
No | Pure sync module |
float literals (e.g. 0.5) |
Only the p parameter default is a float — passed in by caller, not literal in module body |
The Bloom sizing math itself produces float intermediates inside sizeFilter, but Number.isInteger holds at the boundary (Math.ceil/round) |
Math.random |
No | Random comes from caller-supplied event_id Buffers or test PRNG |
[native code] |
No | — |
The empirical FPR test in bloom-dedup.test.ts uses test-PRNG via node:crypto.createHash('sha256') keyed by an integer counter — no Math.random, no Date.now() seeding. Deterministic across runs.
§7. Risk inventory
| Risk | Likelihood | Mitigation |
|---|---|---|
| FPR test flakes on stochastic margin | Low — 1.5% margin over 1.0% theoretical leaves ~50% headroom; 10000 queries gives σ ≈ √(10000 × 0.01 × 0.99) ≈ 9.95 false positives, observed FPR std ≈ 0.001 (0.1%) | Use deterministic seed; second-seed cross-check; assert < 0.015 not < 0.01 |
| Math.* token policy bleed | Medium — file contains Math.log/ceil/round inside sizeFilter | Lint comment in sizeFilter explicitly flags the carve-out; class body uses bit-shifts only |
<< overflow for k=7 — 1 << 7 is fine (= 128) |
Negligible — bit mask is 1 << (bit & 7), range [0,7] |
Mask & 7 enforces 0..7 input to shift |
readUInt32BE modular bias when m is not a power-of-two |
Theoretical concern — m=9585 is not a power-of-two so reduction digest % m has a slight modular bias |
Bias is bounded by (2^32 - 1) % 9585 ≈ 4500 / 2^32 ≈ 10^-6 relative — far below the 1% FPR target |
Uint8Array size limit |
Negligible — m=9585 → 1199 bytes, well under V8 array limits | — |
| Hash determinism for same input | Low — SHA-256 deterministic by spec | Test asserts identical bit patterns post-insert from same input set |
§8. Files changed
| File | Change | Lines | Notes |
|---|---|---|---|
src/domains/consensus/bloom-dedup.ts |
new | ~200 | sizeFilter helper + BloomFilter class |
src/__tests__/domains/consensus/bloom-dedup.test.ts |
new | ~250 | sizing, FPR, reset, no-false-negatives, determinism |
docs/audits/p3-3-2-bloom-dedup-audit.md |
new (this) | this | Step 1 |
docs/contracts/p3-3-2-bloom-dedup-contract.md |
new | step 2 | Step 2 |
docs/packets/p3-3-2-bloom-dedup-packet.md |
new | step 3 | Step 3 |
docs/verification/p3-3-2-bloom-dedup-verification.md |
new | step 5 | Step 5 |
No existing source touched. gossip-wire.ts is the integration consumer but does not need a code change — its BuildIWANTOptions.seen seam was already cut for this purpose at P3.3.1.
§9. Acceptance gate
- Surface inventoried
- Spec citations recorded
- Math.* carve-out scoped and documented
- Hash construction frozen (4-byte BE seed prefix + event_id, SHA-256, first 4 bytes BE → uint32, mod m)
- FPR test design specified
- Risk inventory complete
- File list complete
- Step 2 contract describes the BloomFilter class surface byte-for-byte
- Step 3 packet schedules implementation
- Step 4 implements
- Step 5 verifies
Step 1 complete. Proceeding to Step 2 (contract).