perf(l1): skip non-matching blocks in eth_getLogs via header bloom#6813
Conversation
Lines of code reportTotal lines added: Detailed view |
Benchmark: measured on mainnet-10 (warm, synced)Ran this branch on a synced mainnet node ( Single-call latency (warm)
Under concurrent load — 100-block range (the bigger win)
The header-bloom skip roughly doubles single-call latency and ~20×'s sustained throughput under load for small ranges — it no longer collapses at 100 rps. Where it's bounded
No regressionsOther endpoints unchanged (this PR only touches Caveats
Verdict: solid, low-risk win for small/sparse |
🤖 Kimi Code ReviewThis PR introduces a bloom filter optimization for RPC log filtering that correctly skips blocks provably lacking matching logs. The implementation is sound and well-tested. Code Quality & Correctness The bloom filter check logic is correct:
Minor Suggestions
topics.iter().all(|topic_filter| {
match topic_filter {
TopicFilter::Topic(None) | TopicFilter::Topics(sub_topics) if sub_topics.is_empty() => true,
TopicFilter::Topic(Some(topic)) => {
bloom.contains_input(BloomInput::Raw(topic.as_bytes()))
}
TopicFilter::Topics(sub_topics) => sub_topics.iter().flatten().any(|topic| {
bloom.contains_input(BloomInput::Raw(topic.as_bytes()))
}),
}
})
Testing The test coverage is comprehensive:
Security Assessment No vulnerabilities introduced. This is a read-only optimization that cannot affect consensus:
Performance The optimization is effective: for queries with specific address/topic constraints on sparse event signatures, this avoids expensive disk I/O for block bodies and receipts in blocks that definitely don't contain relevant logs. Verdict: LGTM. The bloom filter logic correctly implements the necessary-condition check as described in the comments. Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
🤖 Codex Code Review
Aside from that, the optimization direction looks good: using the header bloom to skip blocks before loading bodies/receipts is the right place to reduce I couldn’t run the Rust tests in this environment because Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
🤖 Claude Code ReviewHere is my review of PR #6813: PR #6813 —
|
Greptile SummaryThis PR adds a bloom-filter precheck to
Confidence Score: 3/5Not safe to merge as-is: the bloom prefilter can silently drop logs for clients that pass a topics array containing both concrete hashes and null wildcards in the same position. The crates/networking/rpc/eth/logs.rs — specifically the
|
| Filename | Overview |
|---|---|
| crates/networking/rpc/eth/logs.rs | Adds bloom prefilter for eth_getLogs; contains a correctness bug where Topics entries with mixed Some/None values lose the None wildcard semantics after flatten(), causing valid logs to be silently dropped. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[for block_num in from..=to] --> B[fetch block_header]
B --> C{block_bloom_matches?}
C -- false / provably no match --> D[continue / skip block]
C -- true / may match --> E[fetch block_body]
E --> F[for each tx: fetch receipt]
F --> G{receipt.succeeded?}
G -- no --> H[next tx]
G -- yes --> I[for each log in receipt]
I --> J{address_filter matches?}
J -- no --> K[skip log]
J -- yes --> L[accumulate into logs]
L --> M[end of block loop]
M --> N{topics filter set?}
N -- no --> O[return logs]
N -- yes --> P[filter logs by topic positions]
P --> O
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
crates/networking/rpc/eth/logs.rs:265-275
**`Topics` wildcard `None` entries incorrectly dropped by `flatten()`**
`sub_topics.iter().flatten().collect()` silently discards every `None` entry in a `Topics` list. Per the Ethereum JSON-RPC spec, `null` inside a topics array is a position-level wildcard: `[topic_1, null]` means "topic is `topic_1` OR anything", which resolves to a full wildcard for that position. The exact filter (lines 222–228) handles this correctly via `is_none_or` — if any element in `sub_topics` is `None`, the position passes unconditionally.
But the bloom check, after `flatten()`, sees `[topic_1]` instead of a wildcard and requires `topic_1` to be present in the bloom. A block whose logs only emit `topic_2` will be skipped even though the exact filter would have included them. This makes the prefilter a **false-negative filter** (incorrect results), not just a false-positive filter (harmless extra work). Any caller sending `topics: [[topic_1, null]]` will silently lose matching logs.
```suggestion
let allowed: Vec<&H256> = match topic_filter {
TopicFilter::Topic(topic) => topic.iter().collect(),
// If any element is None it's a wildcard for this position; skip the bloom check.
TopicFilter::Topics(sub_topics) => {
if sub_topics.iter().any(|t| t.is_none()) {
return true;
}
sub_topics.iter().flatten().collect()
}
};
// An empty set of allowed topics is a wildcard for this position.
allowed.is_empty()
|| allowed
.iter()
.any(|topic| bloom.contains_input(BloomInput::Raw(topic.as_bytes())))
```
Reviews (1): Last reviewed commit: "perf(l1,rpc): skip non-matching blocks i..." | Re-trigger Greptile
| topics.iter().all(|topic_filter| { | ||
| let allowed: Vec<&H256> = match topic_filter { | ||
| TopicFilter::Topic(topic) => topic.iter().collect(), | ||
| TopicFilter::Topics(sub_topics) => sub_topics.iter().flatten().collect(), | ||
| }; | ||
| // An empty set of allowed topics is a wildcard for this position. | ||
| allowed.is_empty() | ||
| || allowed | ||
| .iter() | ||
| .any(|topic| bloom.contains_input(BloomInput::Raw(topic.as_bytes()))) | ||
| }) |
There was a problem hiding this comment.
Topics wildcard None entries incorrectly dropped by flatten()
sub_topics.iter().flatten().collect() silently discards every None entry in a Topics list. Per the Ethereum JSON-RPC spec, null inside a topics array is a position-level wildcard: [topic_1, null] means "topic is topic_1 OR anything", which resolves to a full wildcard for that position. The exact filter (lines 222–228) handles this correctly via is_none_or — if any element in sub_topics is None, the position passes unconditionally.
But the bloom check, after flatten(), sees [topic_1] instead of a wildcard and requires topic_1 to be present in the bloom. A block whose logs only emit topic_2 will be skipped even though the exact filter would have included them. This makes the prefilter a false-negative filter (incorrect results), not just a false-positive filter (harmless extra work). Any caller sending topics: [[topic_1, null]] will silently lose matching logs.
| topics.iter().all(|topic_filter| { | |
| let allowed: Vec<&H256> = match topic_filter { | |
| TopicFilter::Topic(topic) => topic.iter().collect(), | |
| TopicFilter::Topics(sub_topics) => sub_topics.iter().flatten().collect(), | |
| }; | |
| // An empty set of allowed topics is a wildcard for this position. | |
| allowed.is_empty() | |
| || allowed | |
| .iter() | |
| .any(|topic| bloom.contains_input(BloomInput::Raw(topic.as_bytes()))) | |
| }) | |
| let allowed: Vec<&H256> = match topic_filter { | |
| TopicFilter::Topic(topic) => topic.iter().collect(), | |
| // If any element is None it's a wildcard for this position; skip the bloom check. | |
| TopicFilter::Topics(sub_topics) => { | |
| if sub_topics.iter().any(|t| t.is_none()) { | |
| return true; | |
| } | |
| sub_topics.iter().flatten().collect() | |
| } | |
| }; | |
| // An empty set of allowed topics is a wildcard for this position. | |
| allowed.is_empty() | |
| || allowed | |
| .iter() | |
| .any(|topic| bloom.contains_input(BloomInput::Raw(topic.as_bytes()))) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/rpc/eth/logs.rs
Line: 265-275
Comment:
**`Topics` wildcard `None` entries incorrectly dropped by `flatten()`**
`sub_topics.iter().flatten().collect()` silently discards every `None` entry in a `Topics` list. Per the Ethereum JSON-RPC spec, `null` inside a topics array is a position-level wildcard: `[topic_1, null]` means "topic is `topic_1` OR anything", which resolves to a full wildcard for that position. The exact filter (lines 222–228) handles this correctly via `is_none_or` — if any element in `sub_topics` is `None`, the position passes unconditionally.
But the bloom check, after `flatten()`, sees `[topic_1]` instead of a wildcard and requires `topic_1` to be present in the bloom. A block whose logs only emit `topic_2` will be skipped even though the exact filter would have included them. This makes the prefilter a **false-negative filter** (incorrect results), not just a false-positive filter (harmless extra work). Any caller sending `topics: [[topic_1, null]]` will silently lose matching logs.
```suggestion
let allowed: Vec<&H256> = match topic_filter {
TopicFilter::Topic(topic) => topic.iter().collect(),
// If any element is None it's a wildcard for this position; skip the bloom check.
TopicFilter::Topics(sub_topics) => {
if sub_topics.iter().any(|t| t.is_none()) {
return true;
}
sub_topics.iter().flatten().collect()
}
};
// An empty set of allowed topics is a wildcard for this position.
allowed.is_empty()
|| allowed
.iter()
.any(|topic| bloom.contains_input(BloomInput::Raw(topic.as_bytes())))
```
How can I resolve this? If you propose a fix, please make it concise.|
Pushed
The |
|
Note: the failing Run benchmark against base branch check is not caused by this PR — it's a pre-existing |
Benchmark Block Execution Results Comparison Against Main
|
Update: 10,000-block range now measuredThe earlier results left the 10k-block range pending (the freshly-synced node hadn't accumulated enough history). It has now, so here's the complete single-call
The header-bloom skip is a consistent ~1.5–2.2× win that grows with range size. But because it stays O(blocks-in-range) on header reads, the 10k range is still ~32 s — ~30–150× slower than the indexed clients. So this PR is a solid constant-factor improvement (and removes the under-load collapse for small ranges), while large-range |
edg-l
left a comment
There was a problem hiding this comment.
maybe tests should go to the test/ dir
| } | ||
|
|
||
| topics.iter().all(|topic_filter| { | ||
| let allowed: Vec<&H256> = match topic_filter { |
There was a problem hiding this comment.
Can't we avoid the collects here and perform the checks against an iterator?
There was a problem hiding this comment.
addressed in ea0b2b089: the topic check no longer collects into a Vec<&H256>; it matches each TopicFilter and short-circuits directly over iterators (any(...)), preserving the None-wildcard semantics and the empty-list wildcard. Behavior-identical — existing bloom_match_* tests still pass.
We had an intention to move tests to their own folder, but it seems we ended up keeping unit tests inline. |
iovoid
left a comment
There was a problem hiding this comment.
This could be made more efficient by constructing the filter once and then calling Bloom::contains_bloom
This is what Bloom::contains_input does internally.
Good catch, agreed. Since this PR is already approved and is the base of the getLogs stack (#6852 and the inverted-address-index work build on top of it), I've split it into a fast-follow — #6893 — to keep the approvals and avoid cascading a rebase through the stack. Will pick it up once the stack lands. |
…ookups (lambdaclass#6852) ## Motivation `eth_getLogs` is far slower in ethrex than in geth/reth/nethermind. Profiling-by-measurement on a synced mainnet node pinned the dominant cost to **per-transaction receipt access**: for each candidate block the handler fetched receipts with `get_receipt(block, tx_index)` once per transaction — and each of those also re-resolved the canonical block hash — i.e. ~2N reads for an N-transaction block (hundreds on mainnet). None of the reference clients do this; they read a block's receipts in bulk. ## Description In `eth_getLogs`'s per-block loop, replace the per-transaction `get_receipt` calls with a single `get_receipts_for_block(block_hash)` bulk read (one prefix scan over the block's receipts), then pair receipts with the block's transactions by index. Behaviour is unchanged — the existing `get_logs` tests pass — it's purely a read-pattern fix. ### Measured impact (mainnet-10, same node/datadir, `eth_getLogs` single-call, LUSD `Transfer`) | range | before (this base) | this PR | speedup | |---|---|---|---| | 100-block | 415 ms | **80 ms** | ~5.2× | | 1,000-block | 3,529 ms | **613 ms** | ~5.8× | | 10,000-block | 31,857 ms | **4,272 ms** | ~7.5× | (~16× vs the pre-bloom-prefilter baseline.) Per-block cost dropped from ~6 ms to ~0.8 ms. The remaining gap to geth (~20–30×) is **candidate-block narrowing** (the 2048-bit header bloom is saturated for ubiquitous signatures, so ~every block is a candidate) — addressed separately by a real inverted log index (lambdaclass#6785); this PR makes that stage-2 cheap so an index can sit on top. ## Notes Stacked on `perf/getlogs-index` (lambdaclass#6813). RPC-only change; uses the existing `get_receipts_for_block` store method, no storage/schema change. ## Checklist - [x] RPC-only; no `Store` schema change. --------- Co-authored-by: Ivan Litteri <67517699+ilitteri@users.noreply.github.com>
Motivation
eth_getLogsiterates every block in[fromBlock, toBlock], loads each block's body and every receipt, and filters logs in memory — an O(blocks × txs) full receipt scan whose cost is independent of how many logs actually match (~6 ms/block). The cross-client benchmark measured ethrex at 175–340× slower than geth on this top-10 traffic method.Every block header already carries a
logs_bloomover exactly the(address, topic)pairs we filter on, but the endpoint never consulted it.Description
First, lowest-cost step toward the log index in #6785: use the header bloom as a prefilter so blocks that provably can't contain a matching log are skipped without loading their body or receipts.
fetch_logs_with_filter, fetch the header first and runblock_bloom_matchesagainstheader.logs_bloombefore touching the body/receipts; skip the block on a miss.block_bloom_matchesis a necessary-condition check viaBloom::contains_input(which keccak-hashes its input, mirroring howbloom_from_logsbuilds the header bloom): requires at least one requested address present (if any) AND, for each constrained topic position, at least one allowed topic present. Wildcards impose no constraint. Bloom false positives are harmless — exact filtering still runs on non-skipped blocks, so results are unchanged.Zero storage cost, zero write-path cost, no migration — the bloom is already in the header. For sparse queries over large ranges (the benchmark's worst case) this skips nearly every block. A transposed/sectioned bloom index (geth-style bloombits) for dense or very large ranges is tracked as a follow-up in #6785 pending re-benchmark.
Part of #6785.
Checklist
Store;STORE_SCHEMA_VERSIONunaffected.