Skip to content

Restarting Lighthouse sometimes stalls due to in-use sockets #2254

Open
@michaelsproul

Description

@michaelsproul

Description

Some users have reported that Lighthouse cannot be restarted quickly due to TCP ports not being freed immediately after process exit. After a bit of research, it seems that this is a consequence of TCP's design, and that most operating systems wait 30-120 seconds after socket closure in order to avoid delayed packets being sent to a new listener. This thread has a good summary: https://stackoverflow.com/questions/3229860/what-is-the-meaning-of-so-reuseaddr-setsockopt-option-linux/3233022#3233022

If we establish that Lighthouse's networking stack is robust against delayed packets, we could opt into receiving them by setting the SO_REUSEADDR flag when binding TCP sockets. Actually doing this could be a bit tricky, because we might have to punch through Tokio & LibP2P's abstractions, but perhaps they already provide configuration options.

Until then, anyone who experiences issues rebinding sockets can wait out the TIME_WAIT period. You can see sockets in this state using a command like:

ss --numeric -o state time-wait

Version

v1.1.3, likely v1.2.0 as well

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions