Skip to content

Extend timestamp slicing optimization to eth_getLogs range queries (fromBlock/toBlock) [Discussion Needed] #4830

@quiet-node

Description

@quiet-node

Background

PR #4828 integrates parallel timestamp slicing optimization for block retrieval endpoints, delivering great latency reduction for large blocks. The optimization is now active for:

  • eth_getBlockByHash / eth_getBlockByNumber
  • eth_getBlockReceipts
  • eth_getLogs with blockHash parameter

The algorithm dynamically calculates slice count based on block transaction count (block.count / MAX_LOGS_PER_SLICE), enabling effective parallelization when fetching logs and synthetic transactions from the Mirror Node.

Problem

eth_getLogs with range parameters (fromBlock, toBlock) is not yet optimized.

When a range query spans multiple blocks (e.g., fromBlock=100, toBlock=150), the current implementation cannot effectively calculate the optimal slice count because:

  1. Multiple blocks involved: Range spans 50+ blocks, each with independent transaction counts
  2. Unknown total transaction volume: No easy way to determine total logs across the entire timestamp duration
  3. Performance gap: Without proper slicing, range queries still use sequential pagination, missing out on parallelization benefits

Example scenario:

eth_getLogs({
  fromBlock: "0x64",
  toBlock: "0x96",
  topics: [...]
})
// Spans 50 blocks over ~100 seconds
// Could have 0 to 500,000 transactions
// Currently processes sequentially - slow for large ranges

Potential Options (Discussion Needed)

Technical Context:
Hedera blocks are created at consistent ~2 second intervals with a theoretical throughput capacity of 10,000 TPS. However, actual transaction counts vary significantly per block - ranging from dozens in low-activity periods to thousands during high-activity periods. Any solution must balance these characteristics to provide effective slicing without excessive overhead.

Option 1: TPS-Based Estimation

  • Calculate observed TPS from endpoint blocks (fromBlock and toBlock)
    • blockTps = (block.count) / (block.timestamp.to - block.timestamp.from)
  • Use max observed TPS between the two blocks to estimate total transaction count across range
  • Formula: sliceCount = (maxTPS × duration) / maxLogsPerSlice

Pros: Adaptive to actual network activity, no additional API calls, uses real block data
Cons: Endpoint blocks might not represent middle of range, risk of misestimation for variable activity

Option 2: Theoretical Capacity-Based Estimation

  • Use Hedera's maximum theoretical capacity to calculate worst-case scenario
  • Each block runs for ~2 seconds at 10,000 TPS = up to 20,000 transactions per block theoretically
  • Calculate total duration from fromBlock.timestamp.from to toBlock.timestamp.to
  • Formula: estimatedMax = (duration × 10,000 TPS), then sliceCount = estimatedMax / maxLogsPerSlice

Note: This worst-case approach need further investigation to find a more conservative slice count multiplier. Using full theoretical capacity (10,000 TPS) could create unnecessary stress on Mirror Node servers, especially since most blocks contain far fewer transactions (typically dozens to hundreds). A scaled-down factor (e.g., 25-50% of theoretical max) might provide better balance between performance gains and server load.

Pros: Safe for all scenarios, predictable, uses well-known Hedera network constants.
Cons: Over-slices for typical usage, may create excessive concurrent load on Mirror Node infrastructure

Option 3: Full Block Enumeration (Most Accurate)

  • Fetch metadata for every block in the range from fromBlock to toBlock
  • Retrieve actual block.count for each block to get precise transaction counts
  • Sum all transaction counts across the range
  • Formula: totalTxCount = sum(block[i].count for i in range), then sliceCount = totalTxCount / maxLogsPerSlice

This approach provides perfect accuracy by using real data from every block in the range. For example, a range of 1000 blocks would require 1000 individual API requests to Mirror Node to gather all block metadata before calculating the optimal slice count.

Pros: Most accurate estimation, adapts precisely to actual transaction distribution, no guesswork
Cons: Significant overhead (1 API call per block in range), creates large load on Mirror Node, slow for wide ranges, may not justify the benefit

Option 4: Leverage Mirror Node Aggregation Capabilities

  • Explore whether Mirror Node could provide aggregated transaction counts for timestamp ranges
  • Consider reaching out to Mirror Node team to discuss potential feature enhancements
  • If such capability exists or becomes available, relay could query total logs/transactions for a given duration
  • This would enable accurate slice count calculation with minimal overhead

Pros: Efficient, accurate, single query operation
Cons: Depends on Mirror Node API capabilities, may require coordination and timeline

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions