Skip to content

Conversation

@m-Peter
Copy link
Collaborator

@m-Peter m-Peter commented Dec 22, 2025

Closes: #938

Description

This will give us some insights on how fast the block processing logic is, and if it is able to keep up with the 0.8s block production rate.


For contributor use:

  • Targeted PR against master branch
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • Code follows the standards mentioned here.
  • Updated relevant documentation
  • Re-reviewed Files changed in the Github PR explorer
  • Added appropriate labels

Summary by CodeRabbit

  • New Features

    • Adds a new block processing time metric to report how long block handling takes.
  • Chores

    • Integrated timing into event processing so block processing durations are recorded for improved monitoring and performance visibility.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 22, 2025

Walkthrough

Adds a Prometheus histogram evm_gateway_block_process_time_seconds and exposes it via a new BlockProcessTime(start time.Time) method on the Collector interface, implements the method in default and noop collectors, and records the metric in the ingestion engine around block processing.

Changes

Cohort / File(s) Summary
Metrics core
metrics/collector.go
Added blockProcessTime histogram (evm_gateway_block_process_time_seconds); extended Collector interface with BlockProcessTime(start time.Time); added blockProcessTime field and implementation on DefaultCollector; initialized and registered metric in NewCollector; included in public metrics slice.
No-op collector
metrics/nop.go
Added no-op BlockProcessTime(start time.Time) method to nopCollector to satisfy updated Collector interface.
Ingestion instrumentation
services/ingestion/engine.go
Captures start := time.Now() at start of processEvents and calls e.collector.BlockProcessTime(start) after successful batch indexing to observe elapsed processing time.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Engine as Ingestion Engine
  participant Collector as Metrics Collector
  participant Prom as Prometheus

  Engine->>Engine: start := time.Now()\nprocessEvents(batch)
  alt processing succeeds
    Engine->>Collector: BlockProcessTime(start)
    Collector->>Prom: Observe(elapsedSeconds) on histogram
  else processing fails
    Engine-->>Collector: (no call)
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Focus review: metric name, help text, and bucket choices in metrics/collector.go
  • Verify noop implementation and that ingestion timing is recorded only on success in services/ingestion/engine.go

Possibly related PRs

Suggested reviewers

  • zhangchiqing
  • janezpodhostnik
  • peterargue

Poem

🐰
I hopped to the code with a stopwatch bright,
Counting each block from morning to night,
Buckets and histograms snug in a row,
Now times are recorded — watch metrics grow!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and clearly describes the main change: adding a metric to measure EVM block processing time during event ingestion.
Linked Issues check ✅ Passed The PR successfully implements the requirement from issue #938 to add a metric tracking EVM block indexing/replaying duration for throughput insights.
Out of Scope Changes check ✅ Passed All changes are directly related to the linked issue objective of adding a block processing time metric; no out-of-scope modifications detected.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch mpeter/add-block-processing-time-metric

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19f7d5e and 90070a0.

📒 Files selected for processing (3)
  • metrics/collector.go
  • metrics/nop.go
  • services/ingestion/engine.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • services/ingestion/engine.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test
🔇 Additional comments (4)
metrics/nop.go (1)

24-24: LGTM!

The no-op implementation correctly satisfies the updated Collector interface, consistent with the existing pattern for other metric methods.

metrics/collector.go (3)

75-81: Well-designed bucket configuration for the target use case.

The bucket boundaries provide good granularity around the 0.8s block production target (0.6, 0.8, 1.0) while also capturing both fast processing and degraded performance scenarios. This will enable precise alerting when processing lags behind block production.


246-248: LGTM!

The implementation correctly observes the elapsed time using time.Since(start).Seconds(), consistent with the existing MeasureRequestDuration and BlockIngestionTime methods.


68-73: Good documentation clarification.

The updated help text clearly distinguishes blockIngestionTime (wall-clock latency from block proposal to indexing) from the new blockProcessTime (actual processing duration), preventing confusion between the two related metrics.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
metrics/collector.go (2)

68-73: Improved metric documentation and configuration.

The updated help text and custom buckets make this metric more useful for monitoring ingestion latency. The buckets align well with the block production rate target.

Consider slightly refining the help text to emphasize this measures latency (wall-clock time from block proposal to indexing completion) rather than processing duration:

🔎 Optional help text refinement
-	Help:    "Time taken to fully ingest an EVM block in the local state index since block proposal",
+	Help:    "Latency from EVM block proposal time to indexing completion (wall-clock duration)",

75-81: Refine histogram buckets for better granularity around the 0.8s target.

Per the PR objectives, this metric aims to evaluate whether processing keeps up with the 0.8s block production rate. The current buckets jump from 0.5s to 1.0s, lacking granularity around the critical 0.8s threshold needed for proper p50/p95/p99 analysis.

🔎 Recommended bucket configuration
 var blockProcessTime = prometheus.NewHistogram(prometheus.HistogramOpts{
 	Name:    prefixedName("block_process_time_seconds"),
 	Help:    "Processing time to fully index an EVM block in the local state index",
-	Buckets: []float64{.5, 1, 2.5, 5, 10, 15, 20, 30, 45},
+	Buckets: []float64{0.1, 0.2, 0.4, 0.6, 0.8, 1, 2, 5, 10, 20},
 })

This provides:

  • Finer granularity around 0.8s (0.6, 0.8, 1.0) to detect when processing approaches or exceeds the block rate
  • Lower buckets (0.1, 0.2, 0.4) to measure fast-path performance
  • Reasonable upper bounds for detecting slowdowns
services/ingestion/engine.go (1)

190-190: LGTM! Timing instrumentation correctly captures block processing duration.

The timing captures the complete block processing workflow including transaction replay, state validation, storage writes, and batch commit. The placement after successful batch commit ensures only completed processing is measured, which aligns with the PR objective to evaluate throughput.

Consider whether failed processing attempts should also be measured to gain insight into performance during errors. This could help identify if errors are caused by slow processing:

🔎 Optional: Measure all processing attempts
 	start := time.Now()
 	err := e.withBatch(
 		func(batch *pebbleDB.Batch) error {
 			return e.indexEvents(events, batch)
 		},
 	)
+	e.collector.BlockProcessTime(start)
+
 	if err != nil {
 		return fmt.Errorf("failed to index events for cadence block %d: %w", events.CadenceHeight(), err)
 	}
-	e.collector.BlockProcessTime(start)

This would record timing for both successful and failed attempts. However, the current approach (measuring only successful processing) is also reasonable and aligns with the issue description.

Also applies to: 199-199

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4e9c15c and 19f7d5e.

📒 Files selected for processing (3)
  • metrics/collector.go
  • metrics/nop.go
  • services/ingestion/engine.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test
🔇 Additional comments (3)
metrics/collector.go (2)

111-111: LGTM! Proper metric registration and interface integration.

The metric is correctly added to the registration slice, the interface method is properly declared, the field is added to the struct, and initialization follows the established pattern.

Also applies to: 129-129, 151-151, 176-176


246-248: LGTM! Implementation follows established patterns.

The method correctly calculates and observes the elapsed processing time, consistent with other duration metrics in the collector.

metrics/nop.go (1)

24-24: LGTM! No-op implementation correctly satisfies the interface.

The no-op method properly implements the Collector interface without behavioral changes, consistent with the other methods in this collector.

@m-Peter m-Peter force-pushed the mpeter/add-block-processing-time-metric branch from 19f7d5e to 90070a0 Compare December 22, 2025 15:54

// EVM block processing time during event ingestion, including transaction replay
// and state validation
var blockProcessTime = prometheus.NewHistogram(prometheus.HistogramOpts{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps check if a summary is more appropriate than a histogram: https://prometheus.io/docs/practices/histograms/

Two rules of thumb:
1. If you need to aggregate, choose histograms.
2. Otherwise, choose a histogram if you have an idea of the range and distribution of values that will be observed. Choose a summary if you need an accurate quantile, no matter what the range and distribution of the values is.

In the past I have found that if the metric goes out of the range of the buckets you loose information (which is usually right when you need the most information)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add metric to track the duration of indexing/replaying EVM blocks

3 participants