Cleaner node shutdown logic #584

eugene-uniswap · 2025-04-16T01:54:02Z

Implement functionality to

fail /readyz health check when node shutdown is initiated.
allow 7 second for existing requests to drain before shutting down RPC servers and before database is closed.

Tests

Testing was done on a local WS by launching op-geth, hitting /readyz on RPC port with fortio, interrupting op-geth with Ctrl-C and watching for readyz probe to fail while op-geth is waiting for 7 seconds before shutting down the server.

fortio load -c 1 -qps 1 -allow-initial-errors -n 10000 http://127.0.0.1:8551/readyz

17:50:26.384 r1 [WRN] http_client.go:1217> Content-length missing, header_len=64, thread=0, run=0
17:50:27.384 r1 [WRN] http_client.go:1217> Content-length missing, header_len=64, thread=0, run=0
17:50:28.384 r1 [WRN] http_client.go:1151> Non ok http code, code=503, status="HTTP/1.1 503", thread=0, run=0
17:50:29.384 r1 [WRN] http_client.go:1151> Non ok http code, code=503, status="HTTP/1.1 503", thread=0, run=0
17:50:30.384 r1 [WRN] http_client.go:1151> Non ok http code, code=503, status="HTTP/1.1 503", thread=0, run=0
17:50:31.384 r1 [WRN] http_client.go:1151> Non ok http code, code=503, status="HTTP/1.1 503", thread=0, run=0
17:50:32.384 r1 [WRN] http_client.go:1151> Non ok http code, code=503, status="HTTP/1.1 503", thread=0, run=0
17:50:33.384 r1 [WRN] http_client.go:1151> Non ok http code, code=503, status="HTTP/1.1 503", thread=0, run=0
17:50:34.384 r1 [WRN] http_client.go:1151> Non ok http code, code=503, status="HTTP/1.1 503", thread=0, run=0
17:50:35.383 r1 [ERR] http_client.go:954> Unable to connect, dest={"IP":"127.0.0.1","Port":8551,"Zone":""}, err="dial tcp 127.0.0.1:8551: connect: connection refused", numfd=6, thread=0, run=0
17:50:36.385 r1 [ERR] http_client.go:954> Unable to connect, dest={"IP":"127.0.0.1","Port":8551,"Zone":""}, err="dial tcp 127.0.0.1:8551: connect: connection refused", numfd=6, thread=0, run=0

op-geth output:

^CINFO [04-22|17:50:28.036] Got interrupt, shutting down...
INFO [04-22|17:50:35.037] HTTP server stopped                      endpoint=127.0.0.1:8551
INFO [04-22|17:50:35.038] IPC endpoint closed                      url=/Users/eugene.aleynikov/Library/Ethereum/geth.ipc
INFO [04-22|17:50:35.155] Ethereum protocol stopped
INFO [04-22|17:50:35.155] Transaction pool stopped
INFO [04-22|17:50:35.204] Persisting dirty state to disk           root=d7f897..0f0544 layers=0
INFO [04-22|17:50:35.205] Persisted dirty state to disk            size=74.00B elapsed="264.834µs"
INFO [04-22|17:50:35.236] Blockchain stopped

Additional context

This PR is needed for the work to add backend healthcheck into proxyd.
When launching op-geth via Docker Compose, the default stop_grace_period is 10 seconds. This should be increased to at least 15 seconds—or ideally 30 seconds—to align with the default values used by Kubernetes and ECS. Otherwise, the container risks being forcefully terminated with a SIGKILL before the drain logic has a chance to complete, which will cause unclean shutdown and possible on-disk data corruption.

Metadata

paranoiacblack

Overall looks good but I think we should avoid adding channels if we don't have to, it adds an extra layer of management to a codebase littered with many haphazard channels. And, as you know, poorly managed channels lead to deadlocks.

node/rpcstack.go

paranoiacblack

Looks fine, overall, you should go ahead and move it to review by the optimism team to see what they think makes sense.

node/rpcstack.go

sebastianst

This seems like a general improvement of go-ethereum and not related to the OP Stack diff. Would be great if you could open this PR upstream and it will then be available in op-geth soon after. If there's resistance from upstream maintainers, we can take a closer look. Feel also free to refer back to this PR.

eugene-uniswap · 2025-05-21T00:24:50Z

This seems like a general improvement of go-ethereum and not related to the OP Stack diff. Would be great if you could open this PR upstream and it will then be available in op-geth soon after. If there's resistance from upstream maintainers, we can take a closer look. Feel also free to refer back to this PR.

ethereum/go-ethereum#31866

Better shutdown logic

100cd63

eugene-uniswap requested a review from a team as a code owner April 16, 2025 01:54

eugene-uniswap requested a review from protolambda April 16, 2025 01:54

eugene-uniswap marked this pull request as draft April 16, 2025 01:54

eugene-uniswap added 2 commits April 16, 2025 11:12

Fix unit tests

2b883bf

Increase timeout from 6 to 7 seconds per design document

31cdd0d

paranoiacblack suggested changes Apr 21, 2025

View reviewed changes

Address review comments

be87548

paranoiacblack approved these changes Apr 22, 2025

View reviewed changes

node/rpcstack.go Show resolved Hide resolved

eugene-uniswap marked this pull request as ready for review April 22, 2025 03:23

sebastianst self-requested a review May 6, 2025 18:30

sebastianst self-assigned this May 6, 2025

sebastianst added this to Protocol Team May 6, 2025

sebastianst moved this to In Review in Protocol Team May 6, 2025

sebastianst reviewed May 16, 2025

View reviewed changes

Merge branch 'optimism' into cleaner-shutdown

fc95daa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cleaner node shutdown logic #584

Cleaner node shutdown logic #584

eugene-uniswap commented Apr 16, 2025 •

edited

Loading

Uh oh!

paranoiacblack left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

paranoiacblack left a comment

Uh oh!

Uh oh!

sebastianst left a comment

Uh oh!

eugene-uniswap commented May 21, 2025

Uh oh!

Uh oh!

Cleaner node shutdown logic #584

Are you sure you want to change the base?

Cleaner node shutdown logic #584

Conversation

eugene-uniswap commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paranoiacblack left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

paranoiacblack left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sebastianst left a comment

Choose a reason for hiding this comment

Uh oh!

eugene-uniswap commented May 21, 2025

Uh oh!

Uh oh!

eugene-uniswap commented Apr 16, 2025 •

edited

Loading