-
Notifications
You must be signed in to change notification settings - Fork 404
[lightning-net-tokio] Fix race-y unwrap fetching peer socket address #1449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lightning-net-tokio] Fix race-y unwrap fetching peer socket address #1449
Conversation
I recently saw the following panic on one of my test nodes: ``` thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 107, kind: NotConnected, message: "Transport endpoint is not connected" }', rust-lightning/lightning-net-tokio/src/lib.rs:250:38 ``` Presumably what happened is somehow the connection was closed in between us accepting it and us going to start processing it. While this is a somewhat surprising race, its clearly reachable. The fix proposed here is quite trivial - simply don't `unwrap` trying to fetch our peer's socket address, instead treat the peer address as `None` and discover the disconnection later when we go to read.
Codecov Report
@@ Coverage Diff @@
## main #1449 +/- ##
==========================================
+ Coverage 90.88% 91.63% +0.74%
==========================================
Files 75 75
Lines 41474 46120 +4646
Branches 41474 46120 +4646
==========================================
+ Hits 37695 42260 +4565
- Misses 3779 3860 +81
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! good refactoring! unwrap()
sometimes is messy into help catching crash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't imagine there's any reasonable way to add a unit test simulating this exact scenario, considering you'd need an actual TCP connection?
9490393
Hmmm, yea, I tried, wrote a test, but it didn't reproduce the issue. I still committed it because maybe there's a world where it does manage to reproduce the issue on some hosts, dunno. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two minor comments.
464175e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 464175e
Sadly this does not reproduce the issue fixed in the previous commit.
373dfcc
to
3f22d81
Compare
Squashed fixup commits. |
// This attempts to find other similar races by opening connections and shutting them down | ||
// while connecting. Sadly in testing this did *not* reproduce the previous issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the "and shutting them down while connecting" happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We drop the other TcpStream
, which should close it.
I recently saw the following panic on one of my test nodes:
Presumably what happened is somehow the connection was closed in
between us accepting it and us going to start processing it. While
this is a somewhat surprising race, its clearly reachable. The fix
proposed here is quite trivial - simply don't
unwrap
trying tofetch our peer's socket address, instead treat the peer address as
None
and discover the disconnection later when we go to read.