Add a metric to count the EVM block processing time during event ingestion #944

m-Peter · 2025-12-22T12:29:38Z

Closes: #938

Description

This will give us some insights on how fast the block processing logic is, and if it is able to keep up with the 0.8s block production rate.

For contributor use:

Targeted PR against master branch
Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
Code follows the standards mentioned here.
Updated relevant documentation
Re-reviewed Files changed in the Github PR explorer
Added appropriate labels

Summary by CodeRabbit

New Features
- Adds a new block processing time metric to report how long block handling takes.
Chores
- Integrated timing into event processing so block processing durations are recorded for improved monitoring and performance visibility.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-22T12:29:49Z

Walkthrough

Adds a Prometheus histogram evm_gateway_block_process_time_seconds and exposes it via a new BlockProcessTime(start time.Time) method on the Collector interface, implements the method in default and noop collectors, and records the metric in the ingestion engine around block processing.

Changes

Cohort / File(s)	Summary
Metrics core `metrics/collector.go`	Added `blockProcessTime` histogram (`evm_gateway_block_process_time_seconds`); extended `Collector` interface with `BlockProcessTime(start time.Time)`; added `blockProcessTime` field and implementation on `DefaultCollector`; initialized and registered metric in `NewCollector`; included in public metrics slice.
No-op collector `metrics/nop.go`	Added no-op `BlockProcessTime(start time.Time)` method to `nopCollector` to satisfy updated `Collector` interface.
Ingestion instrumentation `services/ingestion/engine.go`	Captures `start := time.Now()` at start of `processEvents` and calls `e.collector.BlockProcessTime(start)` after successful batch indexing to observe elapsed processing time.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Engine as Ingestion Engine
  participant Collector as Metrics Collector
  participant Prom as Prometheus

  Engine->>Engine: start := time.Now()\nprocessEvents(batch)
  alt processing succeeds
    Engine->>Collector: BlockProcessTime(start)
    Collector->>Prom: Observe(elapsedSeconds) on histogram
  else processing fails
    Engine-->>Collector: (no call)
  end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Focus review: metric name, help text, and bucket choices in metrics/collector.go
Verify noop implementation and that ingestion timing is recorded only on success in services/ingestion/engine.go

Possibly related PRs

Add metric for the total number of dropped EVM transactions #837 — Similar change updating metrics/collector.go and adding Collector method(s) and noop implementations.
Add metric to count the number of rate-limited transactions #839 — Adds new Prometheus metrics and Collector interface methods; pattern matches this PR.
Track time taken to fully ingest an EVM block in the local state index #752 — Related block-timing metric additions across collector and ingestion engine.

Suggested reviewers

zhangchiqing
janezpodhostnik
peterargue

Poem

🐰
I hopped to the code with a stopwatch bright,
Counting each block from morning to night,
Buckets and histograms snug in a row,
Now times are recorded — watch metrics grow!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and clearly describes the main change: adding a metric to measure EVM block processing time during event ingestion.
Linked Issues check	✅ Passed	The PR successfully implements the requirement from issue #938 to add a metric tracking EVM block indexing/replaying duration for throughput insights.
Out of Scope Changes check	✅ Passed	All changes are directly related to the linked issue objective of adding a block processing time metric; no out-of-scope modifications detected.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch mpeter/add-block-processing-time-metric

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19f7d5e and 90070a0.

📒 Files selected for processing (3)

metrics/collector.go
metrics/nop.go
services/ingestion/engine.go

🚧 Files skipped from review as they are similar to previous changes (1)

services/ingestion/engine.go

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Test

🔇 Additional comments (4)

metrics/nop.go (1)

24-24: LGTM!

The no-op implementation correctly satisfies the updated Collector interface, consistent with the existing pattern for other metric methods.

metrics/collector.go (3)

75-81: Well-designed bucket configuration for the target use case.

The bucket boundaries provide good granularity around the 0.8s block production target (0.6, 0.8, 1.0) while also capturing both fast processing and degraded performance scenarios. This will enable precise alerting when processing lags behind block production.

246-248: LGTM!

The implementation correctly observes the elapsed time using time.Since(start).Seconds(), consistent with the existing MeasureRequestDuration and BlockIngestionTime methods.

68-73: Good documentation clarification.

The updated help text clearly distinguishes blockIngestionTime (wall-clock latency from block proposal to indexing) from the new blockProcessTime (actual processing duration), preventing confusion between the two related metrics.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

metrics/collector.go (2)
68-73: Improved metric documentation and configuration.

The updated help text and custom buckets make this metric more useful for monitoring ingestion latency. The buckets align well with the block production rate target.

Consider slightly refining the help text to emphasize this measures latency (wall-clock time from block proposal to indexing completion) rather than processing duration:
🔎 Optional help text refinement
-	Help:    "Time taken to fully ingest an EVM block in the local state index since block proposal",
+	Help:    "Latency from EVM block proposal time to indexing completion (wall-clock duration)",
75-81: Refine histogram buckets for better granularity around the 0.8s target.

Per the PR objectives, this metric aims to evaluate whether processing keeps up with the 0.8s block production rate. The current buckets jump from 0.5s to 1.0s, lacking granularity around the critical 0.8s threshold needed for proper p50/p95/p99 analysis.
🔎 Recommended bucket configuration
 var blockProcessTime = prometheus.NewHistogram(prometheus.HistogramOpts{
 	Name:    prefixedName("block_process_time_seconds"),
 	Help:    "Processing time to fully index an EVM block in the local state index",
-	Buckets: []float64{.5, 1, 2.5, 5, 10, 15, 20, 30, 45},
+	Buckets: []float64{0.1, 0.2, 0.4, 0.6, 0.8, 1, 2, 5, 10, 20},
 })
This provides:

Finer granularity around 0.8s (0.6, 0.8, 1.0) to detect when processing approaches or exceeds the block rate

Lower buckets (0.1, 0.2, 0.4) to measure fast-path performance

Reasonable upper bounds for detecting slowdowns
services/ingestion/engine.go (1)
190-190: LGTM! Timing instrumentation correctly captures block processing duration.

The timing captures the complete block processing workflow including transaction replay, state validation, storage writes, and batch commit. The placement after successful batch commit ensures only completed processing is measured, which aligns with the PR objective to evaluate throughput.

Consider whether failed processing attempts should also be measured to gain insight into performance during errors. This could help identify if errors are caused by slow processing:
🔎 Optional: Measure all processing attempts
 	start := time.Now()
 	err := e.withBatch(
 		func(batch *pebbleDB.Batch) error {
 			return e.indexEvents(events, batch)
 		},
 	)
+	e.collector.BlockProcessTime(start)
+
 	if err != nil {
 		return fmt.Errorf("failed to index events for cadence block %d: %w", events.CadenceHeight(), err)
 	}
-	e.collector.BlockProcessTime(start)
This would record timing for both successful and failed attempts. However, the current approach (measuring only successful processing) is also reasonable and aligns with the issue description.
Also applies to: 199-199

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4e9c15c and 19f7d5e.

📒 Files selected for processing (3)

metrics/collector.go
metrics/nop.go
services/ingestion/engine.go

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Test

🔇 Additional comments (3)

metrics/collector.go (2)

111-111: LGTM! Proper metric registration and interface integration.

The metric is correctly added to the registration slice, the interface method is properly declared, the field is added to the struct, and initialization follows the established pattern.

Also applies to: 129-129, 151-151, 176-176

246-248: LGTM! Implementation follows established patterns.

The method correctly calculates and observes the elapsed processing time, consistent with other duration metrics in the collector.

metrics/nop.go (1)

24-24: LGTM! No-op implementation correctly satisfies the interface.

The no-op method properly implements the Collector interface without behavioral changes, consistent with the other methods in this collector.

…stion

janezpodhostnik · 2026-01-05T13:42:20Z

metrics/collector.go


+// EVM block processing time during event ingestion, including transaction replay
+// and state validation
+var blockProcessTime = prometheus.NewHistogram(prometheus.HistogramOpts{


perhaps check if a summary is more appropriate than a histogram: https://prometheus.io/docs/practices/histograms/

Two rules of thumb:
1. If you need to aggregate, choose histograms.
2. Otherwise, choose a histogram if you have an idea of the range and distribution of values that will be observed. Choose a summary if you need an accurate quantile, no matter what the range and distribution of the values is.

In the past I have found that if the metric goes out of the range of the buckets you loose information (which is usually right when you need the most information)

m-Peter self-assigned this Dec 22, 2025

m-Peter requested review from janezpodhostnik, peterargue and zhangchiqing as code owners December 22, 2025 12:29

m-Peter added Improvement Feature labels Dec 22, 2025

coderabbitai bot reviewed Dec 22, 2025

View reviewed changes

Add a metric to count the EVM block processing time during event inge…

90070a0

…stion

m-Peter force-pushed the mpeter/add-block-processing-time-metric branch from 19f7d5e to 90070a0 Compare December 22, 2025 15:54

janezpodhostnik approved these changes Jan 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a metric to count the EVM block processing time during event ingestion #944

Add a metric to count the EVM block processing time during event ingestion #944

Uh oh!

m-Peter commented Dec 22, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 22, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

janezpodhostnik Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add a metric to count the EVM block processing time during event ingestion #944

Are you sure you want to change the base?

Add a metric to count the EVM block processing time during event ingestion #944

Uh oh!

Conversation

m-Peter commented Dec 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

janezpodhostnik Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

m-Peter commented Dec 22, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 22, 2025 •

edited

Loading