-
Notifications
You must be signed in to change notification settings - Fork 405
Give peers which are sending us messages/receiving messages from us longer to respond to ping #1137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Give peers which are sending us messages/receiving messages from us longer to respond to ping #1137
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1137 +/- ##
==========================================
+ Coverage 90.57% 91.67% +1.10%
==========================================
Files 67 69 +2
Lines 34567 43809 +9242
==========================================
+ Hits 31309 40163 +8854
- Misses 3258 3646 +388
Continue to review full report at Codecov.
|
ef39f70
to
007cce0
Compare
lightning/src/ln/peer_handler.rs
Outdated
@@ -1148,14 +1182,15 @@ impl<Descriptor: SocketDescriptor, CM: Deref, RM: Deref, L: Deref, CMH: Deref> P | |||
!peer.should_forward_channel_announcement(msg.contents.short_channel_id) { | |||
continue | |||
} | |||
if peer.pending_outbound_buffer.len() > OUTBOUND_BUFFER_LIMIT_DROP_GOSSIP { | |||
if peer.pending_outbound_buffer.len() > OUTBOUND_BUFFER_LIMIT_DROP_GOSSIP || peer.msgs_sent_since_pong > BUFFER_DRAIN_MSGS_PER_TICK * 2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check seems to be done a lot. Would it be possible to create a method such as can_queue_outbound_message(&peer) and skip if it returns false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, its just done in this specific method - we dont limit non-gossip messages in the buffer (as we can't just drop HTLC messages or whatever).
lightning/src/ln/peer_handler.rs
Outdated
if !peer.channel_encryptor.is_ready_for_encryption() || peer.their_node_id.is_none() { | ||
// The peer needs to complete its handshake before we can exchange messages. We | ||
// give peers one timer tick to complet handshake. | ||
if peer.awaiting_pong_tick_intervals != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can a peer that is not ready for encryption be already be owing us pongs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its more a deadline for when the peer completes the initial handshake - we reset awaiting_pong_tick_intervals
to 0 after the handshake completes, I'll document better.
peers[1].read_event(&mut fd_b, &fd_a.outbound_data.lock().unwrap().split_off(0)).unwrap(); | ||
peers[1].process_events(); | ||
peers[0].read_event(&mut fd_a, &fd_b.outbound_data.lock().unwrap().split_off(0)).unwrap(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we add tests that check precisely the limits of the buffer constants and verify there are no off-by-one errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't check the message count exactly cause we just have bytes, but we can count how many iterations it took and check that.
82ab64d
to
fa04659
Compare
The `cmp::min` appeared to confused `end` for a count.
698ef00
to
8616cf1
Compare
In the coming commits simply calling `timer_tick_occurred` will no longer disconnect all peers, so its helpful to have a utility method.
cff18d1
to
f714dca
Compare
Just successfully opened a channel to Matt with this PR! |
Added one more commit that is incredibly useful when grep'ing log entries - ensure we print who tan error is from. |
lightning/src/ln/peer_handler.rs
Outdated
@@ -285,14 +285,18 @@ enum InitSyncTracker{ | |||
NodesSyncing(PublicKey), | |||
} | |||
|
|||
/// The ratio between buffer sizes at which we stop sending initial sync messages vs when we stop | |||
/// forwarding messages to peers altogether. | |||
const FORWARD_INIT_SYNC_BUFFER_SIZE_LIMIT: usize = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming here is a bit strange. Is it really a limit? Also, why say "INIT_SYNC" here but "GOSSIP" below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, I think the confusion comes from OUTBOUND_BUFFER_LIMIT_READ_PAUSE
being used for two things - when the buffer is "full" implying we stop reading new messages and when the buffer is "full" implying we stop putting more initial-gossip-sync messages in the buffer. The DROP_GOSSIP
limit is for post-initial-sync gossip with a ratio of init-to-normal-forwards. Do you have a concrete suggestion for changing these names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. Was thinking something with "factor" in the name given it is multiplied. I think my main complaint is that SIZE_LIMIT
seems to imply units of "number of messages" but this constant is really unit-less since it is a ratio of two size limits. So an expression like BUFFER_DRAIN_MSGS_PER_TICK * FACTOR
should have units "number of messages per tick" not "number of messages squared per tick" as the SIZE_LIMIT
naming would imply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Ok, I'd somewhat misunderstood your complaint here, yea, definitely this needs RATIO or FACTOR in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do the new fixup names sit for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
struct OptionalFromDebugger<'a>(&'a Option<PublicKey>); | ||
impl core::fmt::Display for OptionalFromDebugger<'_> { | ||
fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> Result<(), core::fmt::Error> { | ||
if let Some(node_id) = self.0 { write!(f, " from {}", log_pubkey!(node_id)) } else { Ok(()) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps keep the "from" in each message and use "peer" in the else case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That feels wrong? from peer
isn't exactly a useful log string. Note that the need to introduce this wrapper is because of the optional public key itself, not the "from".
@@ -979,7 +986,7 @@ impl<Descriptor: SocketDescriptor, CM: Deref, RM: Deref, L: Deref, CMH: Deref> P | |||
return Err(PeerHandleError{ no_connection_possible: false }.into()); | |||
} | |||
|
|||
log_info!(self.logger, "Received peer Init message: {}", msg.features); | |||
log_info!(self.logger, "Received peer Init message from {}: {}", log_pubkey!(peer.their_node_id.unwrap()), msg.features); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the wrapper here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the wrapper only exists to solve a very specific lifetime issue thanks for format!()
being kinda braindead (creating references to every argument as-is instead of accepting references as-is).
@@ -493,6 +493,13 @@ impl<Descriptor: SocketDescriptor, RM: Deref, L: Deref> PeerManager<Descriptor, | |||
} | |||
} | |||
|
|||
struct OptionalFromDebugger<'a>(&'a Option<PublicKey>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe name this something like MessageSender
wrapping a Peer
reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some docs explaining why this was added at all. Given its not really particularly specific to Peer
I dunno if its worth renaming.
7189cc3
to
b4dcb56
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
b4dcb56
to
28b7430
Compare
Successfully opened a channel with Matt & Val as a result of this PR. Exciting! tACK |
This marginally simplifies coming commits.
See comment for rationale.
This ensures we don't let a hung connection stick around forever if the peer never completes the initial handshake. This also resolves a race where, on receiving a second connection from a peer, we may reset their_node_id to None to prevent sending messages even though the `channel_encryptor` `is_ready_for_encryption()`. Sending pings only checks the `channel_encryptor` status, not `their_node_id` resulting in an `unwrap` on `None` in `enqueue_message`.
28b7430
to
0caa8bb
Compare
Squashed without changes, only diff since Arik's ACK is to fix compilation. Will land after CI:
|
This should largely resolve disconnection issues during initial sync, without the larger changes in #1023. It also resolves one
None
unwrap (which may or may not even be present on latest git).