Bloom filter is a probabilistic data structure that's used to determine if an element is in a set.
Implement Bloom Filters and measure
- False Positive Rate vs Size of the filter
- False Positive Rate vs Number of Hash Function
Here's the data formatted as a table:
| False Positive Rate | Bloom Filter Size (Single Hash) |
|---|---|
| 1.000000 | 16 |
| 1.000000 | 32 |
| 1.000000 | 64 |
| 1.000000 | 128 |
| 1.000000 | 256 |
| 0.983000 | 512 |
| 0.857000 | 1024 |
| 0.629000 | 2048 |
| 0.403000 | 4096 |
| 0.234000 | 8192 |
| 0.131000 | 16384 |
| 0.071000 | 32768 |
| 0.032000 | 65536 |
| 0.015000 | 131072 |
| 0.006000 | 262144 |
| False Positive Rate | Bloom Filter Size (10 hash) |
|---|---|
| 1 | 16 |
| 1 | 32 |
| 1 | 64 |
| 1 | 128 |
| 1 | 256 |
| 1 | 512 |
| 1 | 1024 |
| 1 | 2048 |
| 0.929 | 4096 |
| 0.418 | 8192 |
| 0.033 | 16384 |
| 0 | 32768 |
| 0 | 65536 |
| 0 | 131072 |
| 0 | 262144 |
The sudden increase in the false positive rate after adding more hash functions is due to overhashing and the saturation of bits in the Bloom filter.
-
With 10 hash functions the number of FPR reaches a low value much quick.
-
But With too many hash functions FPR increases again 'cause oversaturation of the Bloom filter

