Skip to content

Commit 5f113b1

Browse files
committed
docs: explain why this fork exists and how IPFS uses it
1 parent a7b172b commit 5f113b1

File tree

1 file changed

+17
-7
lines changed

1 file changed

+17
-7
lines changed

README.md

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,19 +5,22 @@
55

66
A fast bloom filter with a real bitset, JSON serialization, and thread-safe variants.
77

8-
Forked from [`AndreasBriese/bbloom`](https://github.com/AndreasBriese/bbloom). Uses an inlined SipHash-2-4 for hashing.
8+
## Why this fork
99

10-
## Install
10+
Forked from [`AndreasBriese/bbloom`](https://github.com/AndreasBriese/bbloom) in 2019 after the upstream became unmaintained. The fork fixes safety and correctness issues, and adds features needed for production use:
1111

12-
```sh
13-
go get github.com/ipfs/bbloom
14-
```
12+
- Caller-provided SipHash keys (`NewWithKeys`) to prevent hash-flooding with untrusted input
13+
- Fixed double-hash step to always be odd, avoiding degenerate probe sequences
14+
- SipHash keys preserved across JSON serialization round-trips
15+
- Proper error handling in deserialization
16+
17+
The library may contain IPFS-specific optimizations but works as a general-purpose bloom filter.
1518

1619
## Usage
1720

1821
```go
19-
// create a bloom filter for 65536 items and 1% false-positive rate
20-
bf, _ := bbloom.New(float64(1<<16), float64(0.01))
22+
// create a bloom filter for 65536 items and 0.1% false-positive rate
23+
bf, _ := bbloom.New(float64(1<<16), float64(0.001))
2124

2225
// or specify size and hash locations explicitly
2326
// bf, _ = bbloom.New(650000.0, 7.0)
@@ -43,6 +46,13 @@ restored, _ := bbloom.JSONUnmarshal(data)
4346
restored.Has([]byte("butter")) // true
4447
```
4548

49+
## Used in IPFS
50+
51+
[Kubo](https://github.com/ipfs/kubo) and [Boxo](https://github.com/ipfs/boxo) use this library where CID deduplication or tracking is needed but the number of CIDs is too large to keep in memory as a map. Two main use cases:
52+
53+
- **Blockstore bloom cache**: answers `Has()` checks without hitting the datastore, filtering out the majority of negative lookups.
54+
- **DAG walker dedup**: tracks visited CIDs during DAG traversal in the provider/reprovide system, keeping memory usage proportional to the bloom filter size rather than the number of blocks walked.
55+
4656
## Benchmarks
4757

4858
See [BENCHMARKS.md](BENCHMARKS.md) for comparison against other bloom filter libraries.

0 commit comments

Comments
 (0)