Skip to content

fix(engine): scope engine-API retry budget to consensus phase#3112

Draft
fridrik01 wants to merge 2 commits into
mainfrom
engine-retry-phase-budgets
Draft

fix(engine): scope engine-API retry budget to consensus phase#3112
fridrik01 wants to merge 2 commits into
mainfrom
engine-retry-phase-budgets

Conversation

@fridrik01
Copy link
Copy Markdown
Contributor

@fridrik01 fridrik01 commented May 21, 2026

This PR scopes the engine-API retry budget to the consensus phase that issued the call, so a stuck or hostile EL can no longer trap consensus by inducing an infinite retry inside an ABCI handler.

Previously, every engine call from beacon-kit went through backoff.WithMaxTries(0) + backoff.WithMaxElapsedTime(0), which means an infinite retry on anything not explicitly wrapped in Permanent. PR #3109 fixed the specific HTTP 413 vector by classifying 4xx as fatal, but the architectural pattern remained. Any error the classifier missed (5xx, EOF, syncing status, novel transport shape) inherited infinite retry. This PR replaces the one-size-fits-all policy with a per-phase budget.

Changes

  • New EnginePhase enum (PhaseBuild / PhaseValidate / PhaseFinalize / PhaseStartup) is now a required constructor argument to transition.NewTransitionCtx. A future change that flips VerifyPayload(true) on the proposer path can no longer silently inherit Validate semantics, it fails at compile time.
  • NotifyForkchoiceUpdate / NotifyNewPayload derive their backoff MaxElapsedTime from the phase:
    • PhaseBuild and PhaseValidate are bounded at 75% of CometBFT's TimeoutPropose / TimeoutPrevote. A stuck or hostile EL causes the proposer to skip (or the validator to reject), and CometBFT advances to the next round.
    • PhaseFinalize and PhaseStartup are unbounded for transient signals only. IsNonFatalError (transport errors, 5xx, syncing) retries forever, so a brief EL outage (bera-reth restart, reverse-proxy hiccup) doesn't drop a consensus-finalized block.
  • Fatal errors (HTTP 4xx, pre-defined JSON-RPC errors) are now backoff.Permanent in every phase, including Finalize. A misconfigured JWT or wrong chain ID surfaces immediately instead of hot-looping silently.
  • HTTP 408 / 425 / 429 are carved out as retryable per RFC, so a reverse proxy, WAF, or rate limiter in front of the EL can't turn brief backpressure into missed slots.

What does not change

  • A bera-reth restart still does not crash beacon-kit. ECONNREFUSED / ECONNRESET / 5xx / EOF continue to retry forever in PhaseFinalize via IsNonFatalError.
  • PrepareProposal failure semantics are unchanged from the operator's perspective (no proposal, CometBFT picks another proposer).

@fridrik01 fridrik01 force-pushed the engine-retry-phase-budgets branch from 185bd6e to c5bee9a Compare May 21, 2026 18:08
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 61.36364% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.38%. Comparing base (fd10658) to head (c5bee9a).

Files with missing lines Patch % Lines
execution/engine/engine.go 48.07% 26 Missing and 1 partial ⚠️
beacon/blockchain/payload.go 0.00% 3 Missing and 1 partial ⚠️
engine-primitives/engine-primitives/phase.go 83.33% 2 Missing ⚠️
beacon/blockchain/execution_engine.go 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@                      Coverage Diff                      @@
##           fix-engine-4xx-retry-loop    #3112      +/-   ##
=============================================================
+ Coverage                      61.34%   61.38%   +0.04%     
=============================================================
  Files                            370      371       +1     
  Lines                          18936    18948      +12     
=============================================================
+ Hits                           11617    11632      +15     
+ Misses                          6363     6360       -3     
  Partials                         956      956              
Files with missing lines Coverage Δ
beacon/blockchain/finalize_block.go 64.53% <100.00%> (+0.25%) ⬆️
beacon/blockchain/process_proposal.go 67.54% <100.00%> (+0.12%) ⬆️
beacon/validator/block_builder.go 63.56% <100.00%> (+0.13%) ⬆️
execution/client/errors.go 54.65% <100.00%> (+1.63%) ⬆️
node-core/components/engine.go 100.00% <100.00%> (ø)
payload/builder/payload.go 69.87% <100.00%> (ø)
primitives/transition/context.go 100.00% <100.00%> (ø)
state-transition/core/state_processor_payload.go 64.16% <100.00%> (+0.30%) ⬆️
beacon/blockchain/execution_engine.go 68.96% <0.00%> (ø)
engine-primitives/engine-primitives/phase.go 83.33% <83.33%> (ø)
... and 2 more

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Base automatically changed from fix-engine-4xx-retry-loop to main May 21, 2026 22:46
@fridrik01 fridrik01 self-assigned this May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant