Skip to content

perf(l1): skip non-matching blocks in eth_getLogs via header bloom#6813

Merged
ElFantasma merged 15 commits into
mainfrom
perf/getlogs-index
Jun 18, 2026
Merged

perf(l1): skip non-matching blocks in eth_getLogs via header bloom#6813
ElFantasma merged 15 commits into
mainfrom
perf/getlogs-index

Conversation

@ElFantasma

Copy link
Copy Markdown
Contributor

Motivation

eth_getLogs iterates every block in [fromBlock, toBlock], loads each block's body and every receipt, and filters logs in memory — an O(blocks × txs) full receipt scan whose cost is independent of how many logs actually match (~6 ms/block). The cross-client benchmark measured ethrex at 175–340× slower than geth on this top-10 traffic method.

Every block header already carries a logs_bloom over exactly the (address, topic) pairs we filter on, but the endpoint never consulted it.

Description

First, lowest-cost step toward the log index in #6785: use the header bloom as a prefilter so blocks that provably can't contain a matching log are skipped without loading their body or receipts.

  • In fetch_logs_with_filter, fetch the header first and run block_bloom_matches against header.logs_bloom before touching the body/receipts; skip the block on a miss.
  • block_bloom_matches is a necessary-condition check via Bloom::contains_input (which keccak-hashes its input, mirroring how bloom_from_logs builds the header bloom): requires at least one requested address present (if any) AND, for each constrained topic position, at least one allowed topic present. Wildcards impose no constraint. Bloom false positives are harmless — exact filtering still runs on non-skipped blocks, so results are unchanged.
  • Added unit tests covering the matching semantics (address OR, topic OR-within / AND-across positions, wildcards, combined address+topic).

Zero storage cost, zero write-path cost, no migration — the bloom is already in the header. For sparse queries over large ranges (the benchmark's worst case) this skips nearly every block. A transposed/sectioned bloom index (geth-style bloombits) for dense or very large ranges is tracked as a follow-up in #6785 pending re-benchmark.

Part of #6785.

Checklist

  • No changes to the Store; STORE_SCHEMA_VERSION unaffected.

@github-actions github-actions Bot added the L1 Ethereum client label Jun 8, 2026
@github-actions github-actions Bot added the performance Block execution throughput and performance in general label Jun 8, 2026
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

Lines of code report

Total lines added: 145
Total lines removed: 0
Total lines changed: 145

Detailed view
+------------------------------------------+-------+------+
| File                                     | Lines | Diff |
+------------------------------------------+-------+------+
| ethrex/crates/networking/rpc/eth/logs.rs | 380   | +145 |
+------------------------------------------+-------+------+

@ElFantasma ElFantasma changed the title perf(l1,rpc): skip non-matching blocks in eth_getLogs via header bloom perf(l1): skip non-matching blocks in eth_getLogs via header bloom Jun 8, 2026
@ElFantasma

Copy link
Copy Markdown
Contributor Author

Benchmark: measured on mainnet-10 (warm, synced)

Ran this branch on a synced mainnet node (vegeta, recent-data workload, every call pre-validated to return a real result). Comparing eth_getLogs on LUSD Transfer ranges vs the pre-fix baseline.

Single-call latency (warm)

range pre-fix this PR speedup
100-block (median) ~623 ms ~306 ms ~2×
100-block (no matches) ~618 ms ~120 ms ~5×
1,000-block (median) ~6,131 ms ~3,273 ms ~1.9×

Under concurrent load — 100-block range (the bigger win)

rate pre-fix this PR
10 rps 623 ms, 100% success 306 ms, 100%
100 rps collapses: 10% success, ~1 rps 95% success, ~22 rps
1000 rps 1% success, ~0 rps 92% success, ~21 rps

The header-bloom skip roughly doubles single-call latency and ~20×'s sustained throughput under load for small ranges — it no longer collapses at 100 rps.

Where it's bounded

  • Still O(blocks-in-range): it reads every block's header for the bloom (~1.2 ms/block warm) and still loads receipts for matching blocks. So large ranges still saturate: the 1,000-block range times out under any concurrency (5% success at 10 rps, 0% above; single-call ~3.3 s).
  • This doesn't reach indexed clients (geth ~2.5/35 ms, reth ~4.8/49 ms, nethermind ~15/97 ms for 100/1000-block). It's a strong complement to a real log index (perf(l1,rpc): add a background-built inverted log index (address/topic → block) for eth_getLogs #6785), not a replacement.

No regressions

Other endpoints unchanged (this PR only touches logs.rs): eth_call 0.09 ms, eth_getBalance 0.06, eth_getBlockByNumber 0.92, eth_getTransactionReceipt 1.31 (p50 @1000 rps) — all within noise of the baseline.

Caveats

  • Measured on a 5k-block recent window (freshly-synced node hadn't accumulated more); the pre-fix side full-scans regardless of window, so the comparison holds for the ranges tested.
  • The 10,000-block range wasn't measurable yet (needs more accumulated history) — that's where the per-block-header-read ceiling would show most.

Verdict: solid, low-risk win for small/sparse eth_getLogs (and removes the under-load collapse). Large ranges still need the index work in #6785.

@ElFantasma ElFantasma marked this pull request as ready for review June 9, 2026 15:50
@ElFantasma ElFantasma requested a review from a team as a code owner June 9, 2026 15:50
@ethrex-project-sync ethrex-project-sync Bot moved this to In Review in ethrex_l1 Jun 9, 2026
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

🤖 Kimi Code Review

This PR introduces a bloom filter optimization for RPC log filtering that correctly skips blocks provably lacking matching logs. The implementation is sound and well-tested.

Code Quality & Correctness

The bloom filter check logic is correct:

  • Uses the block header's logs_bloom to test for necessary conditions (address presence and topic presence)
  • Correctly handles the semantics: OR within topic positions, AND across positions
  • No false negatives: bloom filters guarantee that if an address/topic is truly absent, contains_input returns false

Minor Suggestions

  1. Avoid unnecessary Vec allocation (line ~264-272): Instead of collecting into a Vec<&H256>, you could use iterator chains directly to avoid the heap allocation:
topics.iter().all(|topic_filter| {
    match topic_filter {
        TopicFilter::Topic(None) | TopicFilter::Topics(sub_topics) if sub_topics.is_empty() => true,
        TopicFilter::Topic(Some(topic)) => {
            bloom.contains_input(BloomInput::Raw(topic.as_bytes()))
        }
        TopicFilter::Topics(sub_topics) => sub_topics.iter().flatten().any(|topic| {
            bloom.contains_input(BloomInput::Raw(topic.as_bytes()))
        }),
    }
})
  1. Import consistency (line 13): Verify that ethereum_types is the intended source for Bloom rather than ethrex_common (which already provides H160/H256). If ethrex_common re-exports Bloom, prefer that to minimize external crate dependencies in this module.

Testing

The test coverage is comprehensive:

  • Empty filters (wildcards)
  • Single and multiple addresses (OR semantics)
  • Topic positions with OR and AND semantics
  • Combined address + topic requirements

Security Assessment

No vulnerabilities introduced. This is a read-only optimization that cannot affect consensus:

  • False positives in bloom filters only lead to processing blocks unnecessarily, never to incorrect results
  • The actual log filtering still occurs downstream with exact matching

Performance

The optimization is effective: for queries with specific address/topic constraints on sparse event signatures, this avoids expensive disk I/O for block bodies and receipts in blocks that definitely don't contain relevant logs.

Verdict: LGTM. The bloom filter logic correctly implements the necessary-condition check as described in the comments.


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

🤖 Codex Code Review

  1. crates/networking/rpc/eth/logs.rs:266-274 introduces a false-negative in eth_getLogs for topic arrays that contain null. The exact matcher below (:222-227) treats TopicFilter::Topics(vec![None, Some(t)]) as a wildcard for that position, because any(|st| st.is_none_or(...)) succeeds on the None. The new bloom prefilter flattens away None, then requires t to be present in the block bloom. That means a request like topics: [[null, T]] can now skip blocks that should match, returning incomplete results. The bloom shortcut needs to preserve the wildcard semantics: if a Topics(...) entry contains any None, that position must not constrain the bloom at all.

  2. crates/networking/rpc/eth/logs.rs:397-453 is missing a regression test for the case above. The added tests cover Topic(None) and Topics(vec![]) as wildcards, but not Topics(vec![None, Some(topic)]), which is the case that currently breaks. A focused unit test here would catch the false-negative immediately.

Aside from that, the optimization direction looks good: using the header bloom to skip blocks before loading bodies/receipts is the right place to reduce eth_getLogs cost.

I couldn’t run the Rust tests in this environment because cargo/rustup tries to write under a read-only home directory.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

🤖 Claude Code Review

Here is my review of PR #6813:


PR #6813perf(l1): skip non-matching blocks in eth_getLogs via header bloom

Overview

This PR adds a bloom filter precheck to eth_getLogs that uses the per-block logs_bloom header field to skip blocks that provably cannot contain matching logs, avoiding the cost of loading the block body and all receipts. The optimization is zero-cost on the write path (bloom is already in the header) and is valid because false positives in the bloom only cause unnecessary work, never missed results — as long as the prefilter is itself a true necessary condition.

The PR description is clear and accurate. The approach is sound.


Potential Correctness Bug — TopicFilter::Topics with a None element

logs.rs:257–259

TopicFilter::Topics(sub_topics) => sub_topics.iter().flatten().collect(),

flatten() on Vec<Option<H256>> silently drops None values. If sub_topics is [Some(h1), None], the allowed set becomes [h1], and the code then requires h1 to appear in the bloom. But a None inside a Topics list means "any topic value matches at this position," making the entire position a wildcard — the block must not be skipped regardless of what's in the bloom.

This is a false-negative: a block containing a log that matches via the None wildcard (but whose non-None sibling h1 is absent from the bloom) will be incorrectly skipped, causing eth_getLogs to silently drop valid log entries.

The fix:

TopicFilter::Topics(sub_topics) => {
    // A None in the alternatives means "any topic" — whole position is a wildcard.
    if sub_topics.iter().any(|t| t.is_none()) {
        return true;
    }
    sub_topics.iter().flatten().collect()
}

The existing test bloom_match_wildcard_topic_ignored covers Topics(vec![]) (empty → wildcard) and Topic(None), but does not cover Topics(vec![Some(h), None]). A test like this should be added:

#[test]
fn bloom_match_topics_with_none_element_is_wildcard() {
    // None anywhere inside Topics means "any topic" for this position.
    let bloom = bloom_with(&[], &[]);  // bloom doesn't contain h
    assert!(block_bloom_matches(
        &bloom,
        &HashSet::new(),
        &[TopicFilter::Topics(vec![Some(topic(1)), None])],
    ));
}

Note: It's worth checking the actual deserializer for TopicFilter to confirm whether Topics(vec![Some(h), None]) is reachable in practice. If JSON ["0xabc...", null] cannot appear as a topic position filter from any real caller, the bug is dormant — but the fix is still cheap and defensive.


Minor Points

Reordering of header vs. body fetch

The old code fetched body first, then header. The new code fetches header first (for the bloom check), then body. For non-filtered blocks the total number of storage reads is unchanged; for filtered-out blocks the body fetch is saved. This is correct and is the intended optimization.

Error on missing header vs. missing body

Previously an error would be raised when the body was missing; now it is raised when the header is missing (failing slightly earlier). Both are RpcErr::Internal, which is appropriate for an unexpected gap in stored state.

block_header.hash() after refactor

block_header is still in scope when block_header.hash() is called later in the loop body. No issue.

Comment update (// Take the body of the block)

Correct — previously said "header" because the header fetch followed immediately; now the header is fetched earlier, and this comment correctly describes the body fetch.


Test Coverage

Tests are thorough for the common semantics:

  • Empty filter always matches
  • Address OR (any of the requested addresses present → match)
  • Topic OR-within-position, AND-across-positions
  • Topic(None) and Topics(vec![]) wildcards
  • Combined address + topic

The only missing case is Topics(vec![Some(h), None]) as noted above.


Summary

The optimization is well-designed and correctly implements the bloom prefilter as a necessary-condition check for the typical cases. The single correctness concern is the handling of None inside a TopicFilter::Topics list, which could cause silent false-negatives. Fix the flatten() branch to short-circuit on any None element, and add a corresponding test. Everything else looks good.


Automated review by Claude (Anthropic) · sonnet · custom prompt

@greptile-apps

greptile-apps Bot commented Jun 9, 2026

Copy link
Copy Markdown

Greptile Summary

This PR adds a bloom-filter precheck to eth_getLogs so blocks that provably cannot contain matching logs are skipped before loading their body and receipts, addressing a significant performance gap vs. geth for sparse log queries.

  • The block_bloom_matches prefilter correctly short-circuits address checks and single-topic checks, but Topics(vec![Some(x), None]) loses its None wildcard because flatten() silently drops None entries — the exact filter treats any None in the list as a full wildcard for that position, so the two paths are inconsistent and can produce different result sets.
  • Header fetching is moved earlier in the loop (before body/receipts), which is sound since the header was already fetched for every block; the new placement enables the early-exit path.
  • Unit tests cover the main matching semantics but miss the mixed Some/None case in a Topics list.

Confidence Score: 3/5

Not safe to merge as-is: the bloom prefilter can silently drop logs for clients that pass a topics array containing both concrete hashes and null wildcards in the same position.

The Topics path in block_bloom_matches calls flatten() on a Vec<Option<H256>>, which discards None entries that represent full wildcards. The exact filter treats any None in a sub_topics list as pass-all via is_none_or, so the two code paths disagree on [Some(x), None]. Blocks whose logs match only through the wildcard arm will be skipped and their logs absent from the response — a silent data-loss regression.

crates/networking/rpc/eth/logs.rs — specifically the block_bloom_matches function and the missing test case for mixed Some/None topics lists.

Important Files Changed

Filename Overview
crates/networking/rpc/eth/logs.rs Adds bloom prefilter for eth_getLogs; contains a correctness bug where Topics entries with mixed Some/None values lose the None wildcard semantics after flatten(), causing valid logs to be silently dropped.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[for block_num in from..=to] --> B[fetch block_header]
    B --> C{block_bloom_matches?}
    C -- false / provably no match --> D[continue / skip block]
    C -- true / may match --> E[fetch block_body]
    E --> F[for each tx: fetch receipt]
    F --> G{receipt.succeeded?}
    G -- no --> H[next tx]
    G -- yes --> I[for each log in receipt]
    I --> J{address_filter matches?}
    J -- no --> K[skip log]
    J -- yes --> L[accumulate into logs]
    L --> M[end of block loop]
    M --> N{topics filter set?}
    N -- no --> O[return logs]
    N -- yes --> P[filter logs by topic positions]
    P --> O
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
crates/networking/rpc/eth/logs.rs:265-275
**`Topics` wildcard `None` entries incorrectly dropped by `flatten()`**

`sub_topics.iter().flatten().collect()` silently discards every `None` entry in a `Topics` list. Per the Ethereum JSON-RPC spec, `null` inside a topics array is a position-level wildcard: `[topic_1, null]` means "topic is `topic_1` OR anything", which resolves to a full wildcard for that position. The exact filter (lines 222–228) handles this correctly via `is_none_or` — if any element in `sub_topics` is `None`, the position passes unconditionally.

But the bloom check, after `flatten()`, sees `[topic_1]` instead of a wildcard and requires `topic_1` to be present in the bloom. A block whose logs only emit `topic_2` will be skipped even though the exact filter would have included them. This makes the prefilter a **false-negative filter** (incorrect results), not just a false-positive filter (harmless extra work). Any caller sending `topics: [[topic_1, null]]` will silently lose matching logs.

```suggestion
        let allowed: Vec<&H256> = match topic_filter {
            TopicFilter::Topic(topic) => topic.iter().collect(),
            // If any element is None it's a wildcard for this position; skip the bloom check.
            TopicFilter::Topics(sub_topics) => {
                if sub_topics.iter().any(|t| t.is_none()) {
                    return true;
                }
                sub_topics.iter().flatten().collect()
            }
        };
        // An empty set of allowed topics is a wildcard for this position.
        allowed.is_empty()
            || allowed
                .iter()
                .any(|topic| bloom.contains_input(BloomInput::Raw(topic.as_bytes())))
```

Reviews (1): Last reviewed commit: "perf(l1,rpc): skip non-matching blocks i..." | Re-trigger Greptile

Comment thread crates/networking/rpc/eth/logs.rs Outdated
Comment on lines +265 to +275
topics.iter().all(|topic_filter| {
let allowed: Vec<&H256> = match topic_filter {
TopicFilter::Topic(topic) => topic.iter().collect(),
TopicFilter::Topics(sub_topics) => sub_topics.iter().flatten().collect(),
};
// An empty set of allowed topics is a wildcard for this position.
allowed.is_empty()
|| allowed
.iter()
.any(|topic| bloom.contains_input(BloomInput::Raw(topic.as_bytes())))
})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Topics wildcard None entries incorrectly dropped by flatten()

sub_topics.iter().flatten().collect() silently discards every None entry in a Topics list. Per the Ethereum JSON-RPC spec, null inside a topics array is a position-level wildcard: [topic_1, null] means "topic is topic_1 OR anything", which resolves to a full wildcard for that position. The exact filter (lines 222–228) handles this correctly via is_none_or — if any element in sub_topics is None, the position passes unconditionally.

But the bloom check, after flatten(), sees [topic_1] instead of a wildcard and requires topic_1 to be present in the bloom. A block whose logs only emit topic_2 will be skipped even though the exact filter would have included them. This makes the prefilter a false-negative filter (incorrect results), not just a false-positive filter (harmless extra work). Any caller sending topics: [[topic_1, null]] will silently lose matching logs.

Suggested change
topics.iter().all(|topic_filter| {
let allowed: Vec<&H256> = match topic_filter {
TopicFilter::Topic(topic) => topic.iter().collect(),
TopicFilter::Topics(sub_topics) => sub_topics.iter().flatten().collect(),
};
// An empty set of allowed topics is a wildcard for this position.
allowed.is_empty()
|| allowed
.iter()
.any(|topic| bloom.contains_input(BloomInput::Raw(topic.as_bytes())))
})
let allowed: Vec<&H256> = match topic_filter {
TopicFilter::Topic(topic) => topic.iter().collect(),
// If any element is None it's a wildcard for this position; skip the bloom check.
TopicFilter::Topics(sub_topics) => {
if sub_topics.iter().any(|t| t.is_none()) {
return true;
}
sub_topics.iter().flatten().collect()
}
};
// An empty set of allowed topics is a wildcard for this position.
allowed.is_empty()
|| allowed
.iter()
.any(|topic| bloom.contains_input(BloomInput::Raw(topic.as_bytes())))
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/rpc/eth/logs.rs
Line: 265-275

Comment:
**`Topics` wildcard `None` entries incorrectly dropped by `flatten()`**

`sub_topics.iter().flatten().collect()` silently discards every `None` entry in a `Topics` list. Per the Ethereum JSON-RPC spec, `null` inside a topics array is a position-level wildcard: `[topic_1, null]` means "topic is `topic_1` OR anything", which resolves to a full wildcard for that position. The exact filter (lines 222–228) handles this correctly via `is_none_or` — if any element in `sub_topics` is `None`, the position passes unconditionally.

But the bloom check, after `flatten()`, sees `[topic_1]` instead of a wildcard and requires `topic_1` to be present in the bloom. A block whose logs only emit `topic_2` will be skipped even though the exact filter would have included them. This makes the prefilter a **false-negative filter** (incorrect results), not just a false-positive filter (harmless extra work). Any caller sending `topics: [[topic_1, null]]` will silently lose matching logs.

```suggestion
        let allowed: Vec<&H256> = match topic_filter {
            TopicFilter::Topic(topic) => topic.iter().collect(),
            // If any element is None it's a wildcard for this position; skip the bloom check.
            TopicFilter::Topics(sub_topics) => {
                if sub_topics.iter().any(|t| t.is_none()) {
                    return true;
                }
                sub_topics.iter().flatten().collect()
            }
        };
        // An empty set of allowed topics is a wildcard for this position.
        allowed.is_empty()
            || allowed
                .iter()
                .any(|topic| bloom.contains_input(BloomInput::Raw(topic.as_bytes())))
```

How can I resolve this? If you propose a fix, please make it concise.

@ElFantasma

Copy link
Copy Markdown
Contributor Author

Pushed 65bca2a0b addressing the review findings:

  • Correctness (Codex + Claude): the bloom prefilter now treats a Topics([...]) alternatives list containing any None as a wildcard for that position (returns early without constraining the bloom), instead of flatten()-ing the None away. This fixes the false-negative where topics: [[null, T]] could skip blocks that match via the wildcard and silently drop logs.
  • Regression test: added bloom_match_topics_with_none_element_is_wildcard covering Topics([Some(t), None]) against a bloom missing t (fails before the fix, passes after).
  • CHANGELOG: added a ## Perf entry (fixes the changelog check).

The Vec allocation can stay or be tidied into iterator chains per Kimi's nit — happy to fold that in if preferred. Perf is unchanged by this fix (it only adds a wildcard short-circuit; doesn't touch the constrained-topic path the benchmarks exercised).

@ElFantasma

Copy link
Copy Markdown
Contributor Author

Note: the failing Run benchmark against base branch check is not caused by this PR — it's a pre-existing main breakage tracked in #6819 (the base/main binary fails to import the stale l2-1k-erc20.rlp fixture under #6766's new logs_bloom validation). This PR only touches the RPC eth_getLogs read path. Safe to disregard that check here.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown

Benchmark Block Execution Results Comparison Against Main

Command Mean [s] Min [s] Max [s] Relative
base 59.481 ± 0.492 58.725 60.259 1.01 ± 0.01
head 58.852 ± 0.191 58.633 59.301 1.00

@ElFantasma

Copy link
Copy Markdown
Contributor Author

Update: 10,000-block range now measured

The earlier results left the 10k-block range pending (the freshly-synced node hadn't accumulated enough history). It has now, so here's the complete single-call eth_getLogs picture (warm, LUSD Transfer):

range pre-fix this PR speedup geth reth nethermind
100-block 618 ms 415 ms ~1.5× 2.5 4.8 15
1,000-block 6,131 ms 3,529 ms ~1.7× 35 49 97
10,000-block 70,441 ms 31,857 ms ~2.2× 204 507 1,074

The header-bloom skip is a consistent ~1.5–2.2× win that grows with range size. But because it stays O(blocks-in-range) on header reads, the 10k range is still ~32 s — ~30–150× slower than the indexed clients. So this PR is a solid constant-factor improvement (and removes the under-load collapse for small ranges), while large-range eth_getLogs still needs a real log index (tracked in #6785).

@edg-l edg-l left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe tests should go to the test/ dir

Comment thread crates/networking/rpc/eth/logs.rs Outdated
}

topics.iter().all(|topic_filter| {
let allowed: Vec<&H256> = match topic_filter {

@ilitteri ilitteri Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we avoid the collects here and perform the checks against an iterator?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in ea0b2b089: the topic check no longer collects into a Vec<&H256>; it matches each TopicFilter and short-circuits directly over iterators (any(...)), preserving the None-wildcard semantics and the empty-list wildcard. Behavior-identical — existing bloom_match_* tests still pass.

@ElFantasma ElFantasma requested a review from ilitteri June 16, 2026 20:40
@ElFantasma

Copy link
Copy Markdown
Contributor Author

maybe tests should go to the test/ dir

We had an intention to move tests to their own folder, but it seems we ended up keeping unit tests inline. test/ contains mainly integration tests, whereas these are unit tests of the private block_bloom_matches helper — moving them out would mean exposing it (and its module path) pub solely for tests, which we'd rather not do. So the convention I'm following is: integration tests → test/, unit tests → inline next to the code.
We can move to all tests to the test/ dir but if that's the case there is a lot of refactor to do everywhere.

@iovoid iovoid left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be made more efficient by constructing the filter once and then calling Bloom::contains_bloom

This is what Bloom::contains_input does internally.

@ElFantasma

Copy link
Copy Markdown
Contributor Author

This could be made more efficient by constructing the filter once and then calling Bloom::contains_bloom

This is what Bloom::contains_input does internally.

Good catch, agreed. contains_input re-derives each input's bloom via Bloom::from(..) (keccak + bit extraction) on every call, and since block_bloom_matches runs per block, we're recomputing the same address/topic blooms once per block. For wide-range queries that skip most blocks — the case this prefilter is meant to accelerate — that's the dominant per-block cost. Precomputing the blooms once before the loop and switching to contains_bloom removes the redundant hashing.

Since this PR is already approved and is the base of the getLogs stack (#6852 and the inverted-address-index work build on top of it), I've split it into a fast-follow — #6893 — to keep the approvals and avoid cascading a rebase through the stack. Will pick it up once the stack lands.

@ElFantasma ElFantasma added this pull request to the merge queue Jun 18, 2026
Merged via the queue into main with commit c3d970c Jun 18, 2026
57 checks passed
@ElFantasma ElFantasma deleted the perf/getlogs-index branch June 18, 2026 19:02
@github-project-automation github-project-automation Bot moved this from Todo to Done in ethrex_performance Jun 18, 2026
@github-project-automation github-project-automation Bot moved this from In Review to Done in ethrex_l1 Jun 18, 2026
akshay-ap pushed a commit to akshay-ap/ethrex that referenced this pull request Jun 19, 2026
…ookups (lambdaclass#6852)

## Motivation

`eth_getLogs` is far slower in ethrex than in geth/reth/nethermind.
Profiling-by-measurement on a synced mainnet node pinned the dominant
cost to **per-transaction receipt access**: for each candidate block the
handler fetched receipts with `get_receipt(block, tx_index)` once per
transaction — and each of those also re-resolved the canonical block
hash — i.e. ~2N reads for an N-transaction block (hundreds on mainnet).
None of the reference clients do this; they read a block's receipts in
bulk.

## Description

In `eth_getLogs`'s per-block loop, replace the per-transaction
`get_receipt` calls with a single `get_receipts_for_block(block_hash)`
bulk read (one prefix scan over the block's receipts), then pair
receipts with the block's transactions by index. Behaviour is unchanged
— the existing `get_logs` tests pass — it's purely a read-pattern fix.

### Measured impact (mainnet-10, same node/datadir, `eth_getLogs`
single-call, LUSD `Transfer`)

| range | before (this base) | this PR | speedup |
|---|---|---|---|
| 100-block | 415 ms | **80 ms** | ~5.2× |
| 1,000-block | 3,529 ms | **613 ms** | ~5.8× |
| 10,000-block | 31,857 ms | **4,272 ms** | ~7.5× |

(~16× vs the pre-bloom-prefilter baseline.) Per-block cost dropped from
~6 ms to ~0.8 ms.

The remaining gap to geth (~20–30×) is **candidate-block narrowing**
(the 2048-bit header bloom is saturated for ubiquitous signatures, so
~every block is a candidate) — addressed separately by a real inverted
log index (lambdaclass#6785); this PR makes that stage-2 cheap so an index can sit
on top.

## Notes

Stacked on `perf/getlogs-index` (lambdaclass#6813). RPC-only change; uses the
existing `get_receipts_for_block` store method, no storage/schema
change.

## Checklist

- [x] RPC-only; no `Store` schema change.

---------

Co-authored-by: Ivan Litteri <67517699+ilitteri@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client performance Block execution throughput and performance in general

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants