Skip to content

Commit 8ac7391

Browse files
committed
new bloom filter post
1 parent 896830a commit 8ac7391

File tree

1 file changed

+35
-0
lines changed

1 file changed

+35
-0
lines changed

_posts/2025-11-29-bloom-filter.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
---
2+
layout: post
3+
title: "bloom filter"
4+
date: 2025-11-29 14:43:00 +0200
5+
categories: tech
6+
---
7+
8+
I had a lot of topics in mind for this 1st tech post, each one more complex (and fascinating) than the other, but in the end I chose the simplest one: Bloom filters.
9+
10+
Start simple and small, then build up to something more intricate. When in doubt, that's always a good way to go (in my experience at least).
11+
12+
Anyway, Bloom filters: what about them?
13+
14+
I didn't know about their existence until quite recently, and I was surprised that none of my teacher ever mentionned them while I was studying computer science.
15+
16+
You can see them as an optimization structure to accelerate some queries on a *set*.
17+
In particular, if you have a set of objects and you want to know if some other object belongs to that set or not, then a Bloom filter can tell you two things:
18+
- no, the object is definitely not in the set
19+
- maybe, you'll have to ask the actual set interface for a more precise answer.
20+
And the whole point is that it can give you this answer *very quickly*, and the filter's memory footprint is also super light.
21+
22+
How does it work?
23+
24+
The filter is composed of B bits and H (independent) hash functions, all of them taking an object as input and returning an integer between 0 and M-1.
25+
Initially all M bits are set to 0.
26+
When an object is added to the set, we compute it's H hashes, all of which providing a bit position that will be set to 1.
27+
28+
Now when you want to know if some object belongs to the set, compute it's H hashes and check that all corresponding bits are all 1s.
29+
If not, then this object is definitely not in the set. Otherwise well...we don't know, so we pass the query onto the actual set interface!
30+
31+
Since the hash functions are independant, any object should be mapped to M "random" bit positions, therefore it's very unlikely that an object that is **not** in the set actually passes the filter test.
32+
And that's where the true beauty of Bloom filter is: since this model is pretty simple, we can actually quantify this probability (with actual math and calculus yes!), depending on B, H, and the number of items in the set.
33+
34+
I'm not gonna do the maths here, mainly because I haven't figured out yet how to make math expressions work with Jekyll, but they're easy to find online.
35+

0 commit comments

Comments
 (0)