Bloom Filters

Introduction

A Bloom Filter is a space-efficient probabilistic data structure that tests whether an element is a member of a set. Bloom filters are designed to be compact and fast, with a trade-off of allowing some false positives but never false negatives. This implementation extends the basic concept with a Cascade Filter system that maintains multiple Bloom filters in sequence to handle large datasets efficiently.

Properties and Applications

Space Efficiency: Uses minimal memory compared to traditional data structures
Fast Lookups: O(k) lookup time, where k is the number of hash functions
No False Negatives: If a Bloom filter indicates an element is not in the set, it definitely isn't
Probabilistic: May return false positives (indicating an element is in the set when it isn't)
Cascading Expansion: Automatically creates new filters when current ones become saturated

Basic Structure and Formation

In a Bloom filter, when an element is added to the set, it is hashed by multiple hash functions. The resulting hash values determine which bits in a bit array are set to 1. To test if an element is in the set, it is hashed by the same functions, and if all corresponding bits are 1, the element is considered to be in the set (with some probability of a false positive).

Visualizing a Bloom Filter

Let's visualize a simple Bloom filter with 8 bits and 2 hash functions:

Initial state: [0, 0, 0, 0, 0, 0, 0, 0]

Adding element "A":
- Hash1("A") = 1, Hash2("A") = 6
- Set bits 1 and 6 to 1
- State: [0, 1, 0, 0, 0, 0, 1, 0]

Adding element "B":
- Hash1("B") = 3, Hash2("B") = 7
- Set bits 3 and 7 to 1
- State: [0, 1, 0, 1, 0, 0, 1, 1]

Testing element "C":
- Hash1("C") = 1, Hash2("C") = 7
- Check bits 1 and 7 (both are 1)
- Result: "C" may be in the set (false positive!)

Testing element "D":
- Hash1("D") = 2, Hash2("D") = 5
- Check bits 2 and 5 (both are 0)
- Result: "D" is definitely not in the set

Implementation key concepts

Bit Array

The foundation of a Bloom filter is its bit array, which in this implementation is a 256-bit array represented as a u256 value. Each bit position corresponds to a possible hash output.

Hash Functions

This implementation uses two cryptographic hash functions:

SHA3-256: A secure hash algorithm that produces a 256-bit hash value
BLAKE2b-256: Another cryptographic hash function that produces a 256-bit output

Using multiple hash functions reduces the probability of false positives.

Filter Fullness

A Bloom filter becomes less efficient as more elements are added (increasing the false positive rate). This implementation tracks filter fullness and considers a filter "full" when half of its bits (128 out of 256) are set to 1.

Cascade Filter

The Cascade Filter structure extends basic Bloom filters by maintaining a vector of filters. When the current active filter becomes full, a new empty filter is created and becomes the active one.

Implementation Details

Structures

BloomFilter

public struct BloomFilter has store {
    bits: u256,
    is_full: bool
}

The BloomFilter structure maintains:

A 256-bit array (bits) to store the set membership information
A boolean flag indicating whether the filter is considered full

CascadeFilter

public struct CascadeFilter has key, store {
    id: UID,
    filters: vector<BloomFilter>,
    writing_filter: u64
}

The CascadeFilter structure maintains:

A unique identifier
A vector of Bloom filters
The index of the current active filter for writing

Operations

Adding Data

When adding data to the Cascade Filter:

The data is hashed using both SHA3-256 and BLAKE2b-256
The hash values are modulo mapped to positions in the bit array
The corresponding bits are set to 1
If the current filter becomes full, a new filter is created and added to the cascade

Verifying Data Exclusion

To verify if data is not in the Cascade Filter:

The data is hashed using both SHA3-256 and BLAKE2b-256, just as during insertion
For each Bloom filter in the cascade:

The hash values are checked against the corresponding bit positions
If any bit position is 0 for either hash function, the data is definitely not in that filter
If both bit positions are 1, the data may be in that filter (could be a false positive)

Only if all filters in the cascade indicate possible presence (all relevant bits are 1) will the verification return false
The verification returns true only when it can guarantee the data is not present in any filter

This verification process leverages the fundamental property of Bloom filters: they can have false positives but never false negatives. By examining each filter in the cascade sequentially, the system can definitively determine if an element is absent from the entire collection. The implementation checks filters in order, stopping as soon as any filter indicates possible presence. This optimization improves performance for elements that may be present in the set, especially in cascade systems with many filters.

Advantages of Cascade Filters

Scalability: As data grows, new filters are added automatically
Controlled False Positive Rate: By limiting each filter's fullness, the false positive rate remains bounded
Efficient Updates: Adding new data is fast (O(1) operation)
Memory Efficiency: Minimal memory footprint even for large datasets

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
sources		sources
tests		tests
.gitignore		.gitignore
Move.lock		Move.lock
Move.toml		Move.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bloom Filters

Introduction

Properties and Applications

Basic Structure and Formation

Visualizing a Bloom Filter

Implementation key concepts

Bit Array

Hash Functions

Filter Fullness

Cascade Filter

Implementation Details

Structures

BloomFilter

CascadeFilter

Operations

Adding Data

Verifying Data Exclusion

Advantages of Cascade Filters

About

Uh oh!

Releases

Packages

Languages

alilloig/bloom-filters-contracts

Folders and files

Latest commit

History

Repository files navigation

Bloom Filters

Introduction

Properties and Applications

Basic Structure and Formation

Visualizing a Bloom Filter

Implementation key concepts

Bit Array

Hash Functions

Filter Fullness

Cascade Filter

Implementation Details

Structures

BloomFilter

CascadeFilter

Operations

Adding Data

Verifying Data Exclusion

Advantages of Cascade Filters

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages