We hit the same failure mode twice on Berachain mainnet archive nodes and it looks execution-side rather than BeaconKit-only.
Reth nodes looks healthy with (eth_syncing=false, peers connected, BeaconKit engine timeouts, eth_getLogs degradation)
Summary
On two separate occasions, bera-reth stopped making forward progress while still appearing superficially healthy:
- peers still connected
eth_syncing returned false
- basic RPCs like
eth_blockNumber still responded
- but BeaconKit stopped receiving successful EL responses and eventually flatlined
- on at least one node, local
eth_getLogs became hung/broken until reth was restarted
Restarting reth restored progress. Restarting BeaconKit alone did not reliably recover the node.
Incident windows
We have seen this occasionally since around 2026-03-22 and two specific times where
- 2026-03-22 06:56 UTC
- 2026-03-22 23:39 UTC
We saw the same or very similar pattern on two different nodes.
Main symptoms
Execution-layer side:
reth stopped advancing the canonical chain height
eth_syncing still returned false
- peers remained connected
- basic RPCs still answered
- local
eth_getLogs either failed heavily or stopped responding
- in one case,
reth warned that the beacon client was online but no consensus updates were being received
Consensus-layer side:
- BeaconKit started timing out on local Engine API calls to
reth
- BeaconKit replay / deposit-reading path got stuck
- after
reth restart, BeaconKit sometimes needed one or more restarts to fully recover
Example BeaconKit-side errors we saw:
engine API call timed out
- deposit-reading / filter-related errors during recovery
- later forkchoice / replay stalls
Recovery
What consistently helped was:
- restart
reth
- if BeaconKit does not resume by itself, restart BeaconKit
- in some cases BeaconKit needed a second restart after
reth was healthy again
Restarting BeaconKit without restarting reth was not sufficient on the affected node where eth_getLogs was wedged.
Why this looks like a reth issue
The strongest signal was:
reth remained up and still answered simple RPC calls
- but archive/read-heavy RPC behavior degraded sharply before the stall
eth_getLogs was a recurring failure signal
- BeaconKit then lost effective EL/CL coordination
- restarting
reth cleared the condition
This makes it look like reth entered a degraded internal state rather than crashing outright.
Metrics / patterns seen before the stall
When we looked closer at metrics before at least one of the stop windows we saw:
- sharp latency increase across multiple RPC methods, not just one:
eth_getLogs
eth_call
eth_getBlockReceipts
debug_traceBlockByNumber
- even
eth_getBlockByNumber got slower
- repeated
eth_getLogs failures before the node flatlined
- archive-style read traffic present before both incidents
- no clear evidence of classic host resource exhaustion:
- no obvious peer collapse
- no obvious bandwidth saturation
- no confirmed RAM exhaustion
- no confirmed FD exhaustion
So us it looks like:
- some internal
reth degradation is triggered or exercised by archive-heavy traffic, especially around log/filter-style paths
- that degraded state eventually breaks CL/EL coordination with BeaconKit
Questions
- Is this a known failure mode in
bera-reth or upstream reth?
- Does this line up with any known issues around:
eth_getLogs
- filter/log handling
- long-lived read transactions
- Engine API responsiveness while the node still reports
eth_syncing=false
- Are there metrics or logs we should capture next time to narrow this further?
- Is there a recommended mitigation for archive nodes besides process restart?
Additional note
We can provide more exact logs / timings if useful, but I wanted to first report the pattern because it repeated across nodes and required reth restart to recover.
We hit the same failure mode twice on Berachain mainnet archive nodes and it looks execution-side rather than BeaconKit-only.
Reth nodes looks healthy with (eth_syncing=false, peers connected, BeaconKit engine timeouts, eth_getLogs degradation)
Summary
On two separate occasions,
bera-rethstopped making forward progress while still appearing superficially healthy:eth_syncingreturnedfalseeth_blockNumberstill respondedeth_getLogsbecame hung/broken untilrethwas restartedRestarting
rethrestored progress. Restarting BeaconKit alone did not reliably recover the node.Incident windows
We have seen this occasionally since around 2026-03-22 and two specific times where
We saw the same or very similar pattern on two different nodes.
Main symptoms
Execution-layer side:
rethstopped advancing the canonical chain heighteth_syncingstill returnedfalseeth_getLogseither failed heavily or stopped respondingrethwarned that the beacon client was online but no consensus updates were being receivedConsensus-layer side:
rethrethrestart, BeaconKit sometimes needed one or more restarts to fully recoverExample BeaconKit-side errors we saw:
engine API call timed outRecovery
What consistently helped was:
rethrethwas healthy againRestarting BeaconKit without restarting
rethwas not sufficient on the affected node whereeth_getLogswas wedged.Why this looks like a
rethissueThe strongest signal was:
rethremained up and still answered simple RPC callseth_getLogswas a recurring failure signalrethcleared the conditionThis makes it look like
rethentered a degraded internal state rather than crashing outright.Metrics / patterns seen before the stall
When we looked closer at metrics before at least one of the stop windows we saw:
eth_getLogseth_calleth_getBlockReceiptsdebug_traceBlockByNumbereth_getBlockByNumbergot slowereth_getLogsfailures before the node flatlinedSo us it looks like:
rethdegradation is triggered or exercised by archive-heavy traffic, especially around log/filter-style pathsQuestions
bera-rethor upstreamreth?eth_getLogseth_syncing=falseAdditional note
We can provide more exact logs / timings if useful, but I wanted to first report the pattern because it repeated across nodes and required
rethrestart to recover.