Skip to content

perf(metrics): prevent multi-second engine stalls from scrape hooks#189

Merged
will-2012 merged 2 commits into
bnb-chain:developfrom
zhk101:fix/metrics-perf
May 18, 2026
Merged

perf(metrics): prevent multi-second engine stalls from scrape hooks#189
will-2012 merged 2 commits into
bnb-chain:developfrom
zhk101:fix/metrics-perf

Conversation

@zhk101
Copy link
Copy Markdown

@zhk101 zhk101 commented May 15, 2026

Addresses bnb-chain/reth-bsc#353 (the engine-tree handler is slow stalls observed on BSC full nodes have their root cause here in the metrics scrape path).

Description

The Prometheus metrics endpoint and push-gateway loop currently run the full hook chain + exporter render() synchronously on a tokio worker. On large databases the two heavy hooks (Database::report_metrics and StaticFileProvider::report_metrics) walk MDBX page metadata, the freelist DB, and every static-file jar header on every scrape, which can take seconds and stall the runtime. In practice this surfaces as multi-second engine-tree handler is slow events on a cadence matching the hook throttle, observed on a BSC full node (bnb-chain/reth-bsc#353).

This PR makes two independent changes:

  1. Adds a RETH_DISABLE_HEAVY_METRICS env-var escape hatch so operators can drop those two hooks without rebuilding.
  2. Offloads the synchronous scrape work (render() + hooks) to spawn_blocking for both the HTTP endpoint and the push-gateway loop, so even when the hooks are enabled they no longer block the tokio worker that accepted the request.

Rationale

The heavy hooks above already produce real multi-second engine stalls today; the env-var escape hatch makes those stop without losing the rest of the metrics. Beyond that, even after gating the two heavy hooks out, render() itself is O(total time-series) and grows over the lifetime of the node. A multi-millisecond synchronous block on a tokio worker is enough to add tail latency to anything else scheduled on that worker (engine API, RPC, networking). Moving the rendering to the blocking pool makes the metrics path a non-actor for runtime jitter regardless of how the registry grows.

Example

Opt out of heavy DB / static-file hooks at startup:

RETH_DISABLE_HEAVY_METRICS=1 reth node ...

The metrics endpoint still serves the rest of the registry (process, jemalloc, io, chain spec, version). Default behavior (no env var set) is unchanged.

Changes

Notable changes:

  • crates/node/builder/src/launch/common.rs: metrics_hooks skips registering the DB and static-file report_metrics hooks when RETH_DISABLE_HEAVY_METRICS is set. Doc comment on the function documents the escape hatch.
  • crates/node/metrics/src/server.rs: HTTP request handler offloads handle_request (hooks + render or pprof dump) onto spawn_blocking; returns 500 on join error.
  • crates/node/metrics/src/server.rs: Push-gateway loop offloads the per-tick hooks + render onto spawn_blocking; on join error logs a warning and skips the tick rather than killing the loop. HTTP put remains async.

Potential Impacts

  • Scrape latency: a single extra spawn_blocking hop per request. Negligible compared to the synchronous render() cost it removes.
  • Hook ordering / thread-locals: hooks now execute on a blocking-pool thread instead of a tokio worker. Existing hooks use atomics / cheap procfs reads and don't depend on thread-locals, so this is safe today, but worth keeping in mind for future hook additions.
  • RETH_DISABLE_HEAVY_METRICS: new env var. Operators who haven't set it see identical behavior.

Related: bnb-chain/reth-bsc#353

zhk101 added 2 commits May 15, 2026 10:39
The two custom hooks registered by `metrics_hooks` walk MDBX page
metadata + the freelist DB and iterate every static-file jar header on
every Prometheus scrape. On large databases this is expensive enough to
stall the metrics endpoint and starve the runtime.

Skip registering both hooks when `RETH_DISABLE_HEAVY_METRICS` is set in
the environment; the rest of the registry (process, jemalloc, io, chain
spec, version) is unaffected and the endpoint still responds normally.
The env var is documented on the function so the escape hatch is
discoverable without grepping the source.
The Prometheus metrics handler is fundamentally synchronous: it
invokes every registered hook and then runs the prometheus exporter's
`render()`, all on the tokio worker that accepted the HTTP request
(or on the runtime worker driving the push-gateway loop).

The default hooks are cheap (procfs, jemalloc atomic reads), but the
two `report_metrics` hooks (DB stat walk, static-file jar enumeration)
can take seconds on large archives. Even with those gated out (see
preceding patch), `render()` itself is O(total time-series) and will
grow over time. A multi-millisecond synchronous block on a runtime
worker is not ideal and can become a real engine latency source if
hook cost ever regresses.

Move the synchronous work off the runtime worker:

- Endpoint handler now offloads `handle_request` (which calls the hook
  + render or the pprof dump) to `spawn_blocking`. On join error,
  return a 500 instead of letting the connection task panic.
- Push-gateway loop offloads the hook + render to `spawn_blocking`;
  on join error, log a warning and skip this tick rather than killing
  the loop. The HTTP put itself was already async so it stays inline.

No behavioral change to what the endpoint or push-gateway returns;
only the thread on which the rendering happens.
@zhk101 zhk101 requested a review from joey0612 as a code owner May 15, 2026 08:55
@hashdit-bot
Copy link
Copy Markdown

hashdit-bot Bot commented May 15, 2026

Pull Request Review

This PR optimizes the metrics pipeline by preventing expensive metric hooks from stalling Tokio runtime workers. It introduces an environment-variable gate (RETH_DISABLE_HEAVY_METRICS) to skip DB/static-file heavy hooks and moves synchronous metrics collection/rendering into spawn_blocking for both the HTTP scrape endpoint and push-gateway loop. It also adds error handling around blocking-task join failures (500 for endpoint requests, warning-and-skip for push ticks).

Sensitive Content

No sensitive content detected.

Security Issues

No serious security issues detected.


Generated by Hashdit Bot. This tool can absolutely NOT replace manual audits.

@zhk101 zhk101 changed the title metrics: reduce scrape overhead on runtime workers metrics: prevent multi-second engine stalls from DB/static-file scrape hooks May 15, 2026
@zhk101 zhk101 changed the title metrics: prevent multi-second engine stalls from DB/static-file scrape hooks perf: metrics - prevent multi-second engine stalls from DB/static-file scrape hooks May 16, 2026
@zhk101 zhk101 changed the title perf: metrics - prevent multi-second engine stalls from DB/static-file scrape hooks perf(metrics): prevent multi-second engine stalls from scrape hooks May 16, 2026
@will-2012 will-2012 merged commit ae53c08 into bnb-chain:develop May 18, 2026
39 of 46 checks passed
chee-chyuan pushed a commit that referenced this pull request May 21, 2026
…189)

* metrics: gate heavy scrape hooks behind RETH_DISABLE_HEAVY_METRICS

The two custom hooks registered by `metrics_hooks` walk MDBX page
metadata + the freelist DB and iterate every static-file jar header on
every Prometheus scrape. On large databases this is expensive enough to
stall the metrics endpoint and starve the runtime.

Skip registering both hooks when `RETH_DISABLE_HEAVY_METRICS` is set in
the environment; the rest of the registry (process, jemalloc, io, chain
spec, version) is unaffected and the endpoint still responds normally.
The env var is documented on the function so the escape hatch is
discoverable without grepping the source.

* metrics: run scrape handler and push-gateway render on spawn_blocking

The Prometheus metrics handler is fundamentally synchronous: it
invokes every registered hook and then runs the prometheus exporter's
`render()`, all on the tokio worker that accepted the HTTP request
(or on the runtime worker driving the push-gateway loop).

The default hooks are cheap (procfs, jemalloc atomic reads), but the
two `report_metrics` hooks (DB stat walk, static-file jar enumeration)
can take seconds on large archives. Even with those gated out (see
preceding patch), `render()` itself is O(total time-series) and will
grow over time. A multi-millisecond synchronous block on a runtime
worker is not ideal and can become a real engine latency source if
hook cost ever regresses.

Move the synchronous work off the runtime worker:

- Endpoint handler now offloads `handle_request` (which calls the hook
  + render or the pprof dump) to `spawn_blocking`. On join error,
  return a 500 instead of letting the connection task panic.
- Push-gateway loop offloads the hook + render to `spawn_blocking`;
  on join error, log a warning and skip this tick rather than killing
  the loop. The HTTP put itself was already async so it stays inline.

No behavioral change to what the endpoint or push-gateway returns;
only the thread on which the rendering happens.
chee-chyuan pushed a commit that referenced this pull request May 26, 2026
…189)

* metrics: gate heavy scrape hooks behind RETH_DISABLE_HEAVY_METRICS

The two custom hooks registered by `metrics_hooks` walk MDBX page
metadata + the freelist DB and iterate every static-file jar header on
every Prometheus scrape. On large databases this is expensive enough to
stall the metrics endpoint and starve the runtime.

Skip registering both hooks when `RETH_DISABLE_HEAVY_METRICS` is set in
the environment; the rest of the registry (process, jemalloc, io, chain
spec, version) is unaffected and the endpoint still responds normally.
The env var is documented on the function so the escape hatch is
discoverable without grepping the source.

* metrics: run scrape handler and push-gateway render on spawn_blocking

The Prometheus metrics handler is fundamentally synchronous: it
invokes every registered hook and then runs the prometheus exporter's
`render()`, all on the tokio worker that accepted the HTTP request
(or on the runtime worker driving the push-gateway loop).

The default hooks are cheap (procfs, jemalloc atomic reads), but the
two `report_metrics` hooks (DB stat walk, static-file jar enumeration)
can take seconds on large archives. Even with those gated out (see
preceding patch), `render()` itself is O(total time-series) and will
grow over time. A multi-millisecond synchronous block on a runtime
worker is not ideal and can become a real engine latency source if
hook cost ever regresses.

Move the synchronous work off the runtime worker:

- Endpoint handler now offloads `handle_request` (which calls the hook
  + render or the pprof dump) to `spawn_blocking`. On join error,
  return a 500 instead of letting the connection task panic.
- Push-gateway loop offloads the hook + render to `spawn_blocking`;
  on join error, log a warning and skip this tick rather than killing
  the loop. The HTTP put itself was already async so it stays inline.

No behavioral change to what the endpoint or push-gateway returns;
only the thread on which the rendering happens.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants