P3.3.2 — Gossip Bloom Filter Dedup — Contract

Step 2 of the 5-step chain. Behavioral contract for src/domains/consensus/bloom-dedup.ts.

§1. Module identity

  • Path: src/domains/consensus/bloom-dedup.ts
  • Domain: θ Consensus (Phase 3, R89 Wave 4)
  • Depends on: node:crypto (createHash) — NAMED import. No other module imports.
  • Pure module: no I/O, no DB, no network, no async, no wall-clock. No randomness from the module body — caller supplies event_id Buffers and (in tests) the seed for synthetic event_ids.
  • State: the BloomFilter class holds mutable per-instance state (bits: Uint8Array). Each instance is per-round and discarded at end-of-round. No module-level mutable state.

§2. Function exports

2.1 sizeFilter(n: number, p: number): { m: number, k: number }

Returns the integer bit count m and integer hash count k for a Bloom filter sized to hold n expected items at false-positive rate p.

Math:

m = Math.ceil(-n * Math.log(p) / (Math.log(2) ** 2))
k = Math.round((m / n) * Math.log(2))

Contract:

  • n MUST be a positive integer ≥ 1. Behavior for n ≤ 0 or non-integer n is unspecified (no validation throw — caller’s contract).
  • p MUST be in (0, 1). p ≥ 1 produces m ≤ 0 which then clamps to Math.ceil of a non-positive value — caller’s contract.
  • The return values are always integers (Math.ceil / Math.round enforce this).

Number-vs-bigint exception: the entire θ phase enforces bigint-only arithmetic outside of explicit carve-outs. sizeFilter is the ONE documented carve-out in this module because Math.log has no bigint stdlib equivalent and adding a polyfill for one helper is worse than documenting the carve-out. The returned m and k are integers and flow into the BloomFilter constructor as integer numbers.

Pure function — same inputs always produce the same output. No state.

Test:

  • sizeFilter(1000, 0.01) returns { m: 9586, k: 7 } (Math.ceil(9585.058…) = 9586). The audit cited “m ≈ 9585” from the spec; the actual ceiled value rounds up to 9586. The test asserts the precise computed values, not the spec’s approximation.

2.2 class BloomFilter

Constructor

new BloomFilter({ n: number, p: number })

Builds a fresh filter sized via sizeFilter(n, p). Allocates a zero-initialized Uint8Array of length Math.ceil(m / 8) bytes.

Defaults: the constructor does NOT supply defaults. The caller passes n and p explicitly. The default of n=1000, p=0.01 is a doc-level convention, not an in-code default.

Properties (private, no public getters except via stats()):

  • private readonly m: number — bit count (from sizeFilter)
  • private readonly k: number — hash count (from sizeFilter)
  • private readonly bits: Uint8Arrayceil(m/8) bytes

insert(event_id: Buffer): void

For each i in [0, k), compute bit_index = indexFor(i, event_id) and set bit bit_index in bits. Idempotent: inserting the same event_id twice has no observable effect beyond the first insert.

Pure side-effect: mutates this.bits only. No return value. No exceptions for valid Buffer input.

mightContain(event_id: Buffer): boolean

For each i in [0, k), compute bit_index = indexFor(i, event_id) and check bit bit_index in bits. Returns true iff all k bits are set.

No false negatives: if insert(x) was called, mightContain(x) returns true. (This is Bloom’s correctness guarantee.) False positives bounded by p: when fewer than n items have been inserted, mightContain(y) for y ≠ any inserted x returns true with probability ≤ p. Empirically verified by test §5.4.

Pure function w.r.t. external state — reads this.bits only.

reset(): void

Zeroes the bit array. After reset(), mightContain(x) returns false for all x (until subsequent inserts).

The class’s m and k are unchanged — the filter remains sized for the same (n, p) config. Use reset() between gossip rounds rather than constructing a new filter; same allocation reused.

Side-effect: mutates this.bits (zeros all bytes).

stats(): { m: number, k: number, set_bits: number, occupancy: number }

Returns introspection data:

  • m — bit count (same as constructor)
  • k — hash count
  • set_bits — number of bits currently set in bits
  • occupancyset_bits / m, in [0, 1]

Used by tests and by future ξ observability surfaces. occupancy is a float (Number) — this is acceptable in a return value for diagnostic purposes; it does not feed back into the filter’s main path.

Pure function w.r.t. external state.

§3. Internal helper

3.1 indexFor(i: number, event_id: Buffer): number

Returns a bit index in [0, m) for hash function i and item event_id.

Algorithm:

const seed = Buffer.alloc(4);
seed.writeUInt32BE(i, 0);
const h = createHash('sha256').update(seed).update(event_id).digest();
const u32 = h.readUInt32BE(0);
return u32 % m;
  • 4-byte big-endian encoding of i (range [0, k) where k ≤ 255 in practice; 4 bytes leaves headroom).
  • SHA-256 over seed || event_id — deterministic, no randomness.
  • First 4 bytes of the 32-byte digest read as big-endian uint32.
  • Reduction mod m — modular bias is bounded by (2^32 mod m) / 2^3210^-6 for m = 9586, far below the FPR target.

indexFor is a private method on BloomFilter. It reads this.m and is NOT exported.

§4. Type exports

The module exports:

  • sizeFilter — named function export
  • BloomFilter — named class export

No other types or values are exported. The constructor parameter type { n: number, p: number } is inline (no named interface).

§5. Test contract

The test file src/__tests__/domains/consensus/bloom-dedup.test.ts MUST cover:

5.1 sizeFilter math

  • sizeFilter(1000, 0.01) returns m = 9586 and k = 7
  • sizeFilter(100, 0.01) returns m = 959 and k = 7
  • sizeFilter(10000, 0.001) returns positive integer m and k (sanity)

5.2 Constructor allocates correct size

  • new BloomFilter({n: 1000, p: 0.01}).stats().m === 9586
  • new BloomFilter({n: 1000, p: 0.01}).stats().k === 7

5.3 No false negatives

  • Insert 1000 distinct Buffers
  • For each, assert mightContain(x) === true

5.4 Empirical FPR < 1.5%

  • Insert 1000 distinct Buffers (deterministic seed: counters 0–999)
  • Query 10000 distinct Buffers NOT in the insert set (counters 1000–10999)
  • Count mightContain true-returns
  • Assert observed FPR < 0.015

5.5 FPR robustness with second seed

  • Repeat §5.4 with a different seed offset (counters 100000–100999 inserts, 101000–110999 queries)
  • Assert observed FPR < 0.015

5.6 reset() clears all bits

  • Insert 1000 Buffers
  • Assert stats().set_bits > 0
  • Call reset()
  • Assert stats().set_bits === 0
  • For all 1000 originals, assert mightContain(x) === false

5.7 Hash determinism

  • Two BloomFilter instances with same (n, p) and same insert sequence MUST produce identical bits arrays.
  • Insertion order MUST NOT matter — inserting [a, b, c] and [c, a, b] produces identical bits.

5.8 Static scanner — Math.* confined to sizeFilter

  • Read bloom-dedup.ts source
  • Locate the sizeFilter function boundary
  • Assert all Math.* occurrences are inside the sizeFilter body
  • Assert no Math.random, Math.floor, Math.abs, etc. outside sizeFilter

5.9 Banned token scan

Mirrors messages.test.ts Group 9. Scans the source body (with JSDoc stripped) for:

  • Date. / new Date
  • setTimeout / setInterval / setImmediate
  • fetch / XMLHttpRequest
  • process.hrtime / process.nextTick
  • await / async function
  • Math.random

All MUST be absent.

5.10 stats().occupancy correctness

  • Fresh filter: occupancy === 0
  • After 100 inserts at n=1000: occupancy roughly k * 100 / m ≈ 7 * 100 / 9586 ≈ 0.073 (test loose bound 0.05 < occupancy < 0.1 to allow for hash collisions)
  • After 1000 inserts: occupancy < 1.0 (always — Bloom doesn’t fill completely)

§6. Reuse + integration

  • Reuse: node:crypto.createHash('sha256') — same import style as messages.ts.
  • Integration: gossip-wire.ts’s BuildIWANTOptions.seen? is the consumer seam. No modification to gossip-wire.ts required in this slice.

Example call site (NOT in this slice — illustrative for the reviewer):

const filter = new BloomFilter({ n: 1000, p: 0.01 });
// at start of each gossip round; receiver inserts event_ids it has already seen
for (const id of recentlySeen) filter.insert(id);
// when receiving an IHAVE:
const iwant = buildIWANT(ihave, localStorage.has, {
  seen: (id) => filter.mightContain(id),
  sender_id: 'arbiter-N',
  timestamp_logical: nextLogical(lamport),
});
// at end of round:
filter.reset(); // or discard the filter

§7. Non-goals

This slice is deliberately scoped narrow:

  • ❌ No persistence — filters are in-memory only
  • ❌ No serialization wire format — filters are local to one arbiter’s round state
  • ❌ No multi-round filter sharing — fresh filter per round per s08 §67
  • ❌ No counting Bloom (delete support) — standard Bloom has no remove; reset() is the only clear
  • ❌ No size-doubling on overflow — filter is fixed at construction
  • ❌ No gossip-wire.ts modification — the consumer wiring is downstream

§8. Acceptance gate

  • Module identity declared (path, deps, purity)
  • sizeFilter signature + math + Number-vs-bigint carve-out
  • BloomFilter class surface (ctor, insert, mightContain, reset, stats)
  • indexFor private helper algorithm
  • Test contract enumerates 10 test groups
  • Integration seam to gossip-wire.ts documented
  • Non-goals listed
  • Step 3 packet sequences the implementation
  • Step 4 implements both files
  • Step 5 verifies all 10 test groups pass + build + lint

Step 2 complete. Proceeding to Step 3 (packet).


Back to top

Colibri — documentation-first MCP runtime. Apache 2.0 + Commons Clause.

This site uses Just the Docs, a documentation theme for Jekyll.