You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
apollo_l1_events: bound catch-up commit-block backlog with a cap and metric
The catchupper's `commit_block_backlog` was an unbounded `Vec` populated by every
commit-block arriving above the provider's height during startup catch-up, and
drained only once L2 sync reached the target. A persistently slow or stalled sync
while the batcher keeps committing could grow it without limit (security finding
L-16).
Add a configurable `max_commit_block_backlog_len` (default 1,000,000) and a
`l1_message_provider_commit_block_backlog_len` gauge. On overflow,
`add_commit_block_to_backlog` returns a new `CatchUpBacklogOverflow` error rather
than dropping entries: the backlog must stay a gapless, strictly-sequential run,
so drop-oldest would corrupt the drain-time invariant and silently skip an
L1-handler commit. The error surfaces to the batcher (which already handles
commit_block errors) and to logs/alerts. The gauge is updated on each push and
reset to 0 when the backlog drains at catch-up completion.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: crates/apollo_l1_events/src/metrics.rs
+2Lines changed: 2 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -19,6 +19,7 @@ define_metrics!(
19
19
MetricCounter{L1_MESSAGE_SCRAPER_REORG_DETECTED,"l1_message_scraper_reorg_detected","Number of times the L1 message scraper detected a reorganization in the base layer", init=0},
20
20
MetricGauge{L1_MESSAGE_SCRAPER_LAST_SUCCESS_TIMESTAMP_SECONDS,"l1_message_scraper_last_success_timestamp_seconds","Unix timestamp (seconds) of the last successful L1 message scrape"},
21
21
MetricGauge{L1_MESSAGE_PROVIDER_NUM_PENDING_TXS,"l1_message_provider_num_pending_txs","The number of pending L1 handler transactions in the transaction manager"},
22
+
MetricGauge{L1_MESSAGE_PROVIDER_COMMIT_BLOCK_BACKLOG_LEN,"l1_message_provider_commit_block_backlog_len","The number of commit-blocks buffered in the catch-up backlog while the provider syncs to the target height; abnormal sustained growth indicates a stalled or lagging L2 sync"},
"description": "Maximum number of commit-blocks buffered in the catch-up backlog during startup sync before commit_block fails; guards against unbounded memory growth on a stalled or lagging L2 sync.",
0 commit comments