-
Notifications
You must be signed in to change notification settings - Fork 14
Add a metric to count the EVM block processing time during event ingestion #944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds a Prometheus histogram Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Engine as Ingestion Engine
participant Collector as Metrics Collector
participant Prom as Prometheus
Engine->>Engine: start := time.Now()\nprocessEvents(batch)
alt processing succeeds
Engine->>Collector: BlockProcessTime(start)
Collector->>Prom: Observe(elapsedSeconds) on histogram
else processing fails
Engine-->>Collector: (no call)
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
metrics/collector.go (2)
68-73: Improved metric documentation and configuration.The updated help text and custom buckets make this metric more useful for monitoring ingestion latency. The buckets align well with the block production rate target.
Consider slightly refining the help text to emphasize this measures latency (wall-clock time from block proposal to indexing completion) rather than processing duration:
🔎 Optional help text refinement
- Help: "Time taken to fully ingest an EVM block in the local state index since block proposal", + Help: "Latency from EVM block proposal time to indexing completion (wall-clock duration)",
75-81: Refine histogram buckets for better granularity around the 0.8s target.Per the PR objectives, this metric aims to evaluate whether processing keeps up with the 0.8s block production rate. The current buckets jump from 0.5s to 1.0s, lacking granularity around the critical 0.8s threshold needed for proper p50/p95/p99 analysis.
🔎 Recommended bucket configuration
var blockProcessTime = prometheus.NewHistogram(prometheus.HistogramOpts{ Name: prefixedName("block_process_time_seconds"), Help: "Processing time to fully index an EVM block in the local state index", - Buckets: []float64{.5, 1, 2.5, 5, 10, 15, 20, 30, 45}, + Buckets: []float64{0.1, 0.2, 0.4, 0.6, 0.8, 1, 2, 5, 10, 20}, })This provides:
- Finer granularity around 0.8s (0.6, 0.8, 1.0) to detect when processing approaches or exceeds the block rate
- Lower buckets (0.1, 0.2, 0.4) to measure fast-path performance
- Reasonable upper bounds for detecting slowdowns
services/ingestion/engine.go (1)
190-190: LGTM! Timing instrumentation correctly captures block processing duration.The timing captures the complete block processing workflow including transaction replay, state validation, storage writes, and batch commit. The placement after successful batch commit ensures only completed processing is measured, which aligns with the PR objective to evaluate throughput.
Consider whether failed processing attempts should also be measured to gain insight into performance during errors. This could help identify if errors are caused by slow processing:
🔎 Optional: Measure all processing attempts
start := time.Now() err := e.withBatch( func(batch *pebbleDB.Batch) error { return e.indexEvents(events, batch) }, ) + e.collector.BlockProcessTime(start) + if err != nil { return fmt.Errorf("failed to index events for cadence block %d: %w", events.CadenceHeight(), err) } - e.collector.BlockProcessTime(start)This would record timing for both successful and failed attempts. However, the current approach (measuring only successful processing) is also reasonable and aligns with the issue description.
Also applies to: 199-199
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
metrics/collector.gometrics/nop.goservices/ingestion/engine.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Test
🔇 Additional comments (3)
metrics/collector.go (2)
111-111: LGTM! Proper metric registration and interface integration.The metric is correctly added to the registration slice, the interface method is properly declared, the field is added to the struct, and initialization follows the established pattern.
Also applies to: 129-129, 151-151, 176-176
246-248: LGTM! Implementation follows established patterns.The method correctly calculates and observes the elapsed processing time, consistent with other duration metrics in the collector.
metrics/nop.go (1)
24-24: LGTM! No-op implementation correctly satisfies the interface.The no-op method properly implements the Collector interface without behavioral changes, consistent with the other methods in this collector.
19f7d5e to
90070a0
Compare
|
|
||
| // EVM block processing time during event ingestion, including transaction replay | ||
| // and state validation | ||
| var blockProcessTime = prometheus.NewHistogram(prometheus.HistogramOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps check if a summary is more appropriate than a histogram: https://prometheus.io/docs/practices/histograms/
Two rules of thumb:
1. If you need to aggregate, choose histograms.
2. Otherwise, choose a histogram if you have an idea of the range and distribution of values that will be observed. Choose a summary if you need an accurate quantile, no matter what the range and distribution of the values is.
In the past I have found that if the metric goes out of the range of the buckets you loose information (which is usually right when you need the most information)
Closes: #938
Description
This will give us some insights on how fast the block processing logic is, and if it is able to keep up with the 0.8s block production rate.
For contributor use:
masterbranchFiles changedin the Github PR explorerSummary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.