-
Notifications
You must be signed in to change notification settings - Fork 97
Description
Background
PR #4828 integrates parallel timestamp slicing optimization for block retrieval endpoints, delivering great latency reduction for large blocks. The optimization is now active for:
eth_getBlockByHash/eth_getBlockByNumbereth_getBlockReceiptseth_getLogswithblockHashparameter
The algorithm dynamically calculates slice count based on block transaction count (block.count / MAX_LOGS_PER_SLICE), enabling effective parallelization when fetching logs and synthetic transactions from the Mirror Node.
Problem
eth_getLogs with range parameters (fromBlock, toBlock) is not yet optimized.
When a range query spans multiple blocks (e.g., fromBlock=100, toBlock=150), the current implementation cannot effectively calculate the optimal slice count because:
- Multiple blocks involved: Range spans 50+ blocks, each with independent transaction counts
- Unknown total transaction volume: No easy way to determine total logs across the entire timestamp duration
- Performance gap: Without proper slicing, range queries still use sequential pagination, missing out on parallelization benefits
Example scenario:
eth_getLogs({
fromBlock: "0x64",
toBlock: "0x96",
topics: [...]
})
// Spans 50 blocks over ~100 seconds
// Could have 0 to 500,000 transactions
// Currently processes sequentially - slow for large rangesPotential Options (Discussion Needed)
Technical Context:
Hedera blocks are created at consistent ~2 second intervals with a theoretical throughput capacity of 10,000 TPS. However, actual transaction counts vary significantly per block - ranging from dozens in low-activity periods to thousands during high-activity periods. Any solution must balance these characteristics to provide effective slicing without excessive overhead.
Option 1: TPS-Based Estimation
- Calculate observed TPS from endpoint blocks (
fromBlockandtoBlock)- blockTps = (block.count) / (block.timestamp.to - block.timestamp.from)
- Use max observed TPS between the two blocks to estimate total transaction count across range
- Formula:
sliceCount = (maxTPS × duration) / maxLogsPerSlice
Pros: Adaptive to actual network activity, no additional API calls, uses real block data
Cons: Endpoint blocks might not represent middle of range, risk of misestimation for variable activity
Option 2: Theoretical Capacity-Based Estimation
- Use Hedera's maximum theoretical capacity to calculate worst-case scenario
- Each block runs for ~2 seconds at 10,000 TPS = up to 20,000 transactions per block theoretically
- Calculate total duration from
fromBlock.timestamp.fromtotoBlock.timestamp.to - Formula:
estimatedMax = (duration × 10,000 TPS), thensliceCount = estimatedMax / maxLogsPerSlice
Note: This worst-case approach need further investigation to find a more conservative slice count multiplier. Using full theoretical capacity (10,000 TPS) could create unnecessary stress on Mirror Node servers, especially since most blocks contain far fewer transactions (typically dozens to hundreds). A scaled-down factor (e.g., 25-50% of theoretical max) might provide better balance between performance gains and server load.
Pros: Safe for all scenarios, predictable, uses well-known Hedera network constants.
Cons: Over-slices for typical usage, may create excessive concurrent load on Mirror Node infrastructure
Option 3: Full Block Enumeration (Most Accurate)
- Fetch metadata for every block in the range from
fromBlocktotoBlock - Retrieve actual
block.countfor each block to get precise transaction counts - Sum all transaction counts across the range
- Formula:
totalTxCount = sum(block[i].count for i in range), thensliceCount = totalTxCount / maxLogsPerSlice
This approach provides perfect accuracy by using real data from every block in the range. For example, a range of 1000 blocks would require 1000 individual API requests to Mirror Node to gather all block metadata before calculating the optimal slice count.
Pros: Most accurate estimation, adapts precisely to actual transaction distribution, no guesswork
Cons: Significant overhead (1 API call per block in range), creates large load on Mirror Node, slow for wide ranges, may not justify the benefit
Option 4: Leverage Mirror Node Aggregation Capabilities
- Explore whether Mirror Node could provide aggregated transaction counts for timestamp ranges
- Consider reaching out to Mirror Node team to discuss potential feature enhancements
- If such capability exists or becomes available, relay could query total logs/transactions for a given duration
- This would enable accurate slice count calculation with minimal overhead
Pros: Efficient, accurate, single query operation
Cons: Depends on Mirror Node API capabilities, may require coordination and timeline