-
Notifications
You must be signed in to change notification settings - Fork 27
notification: Replace async block with poll_recv #510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Alexandru Vasile <[email protected]>
|
|
||
| match future.poll_unpin(cx) { | ||
| Poll::Pending => None, | ||
| None => match this.async_rx.poll_recv(cx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tokio select chooses randomly which future gets polled first, now async_rx is always polled first, potentially sync_rx never polled. Is fairness necessary here or this is intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that's a fair point :D In substrate, we are using exclusively synx_rx. IIRC, there's no usage of async_rx atm. Will change the order, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out loud, if the mostly loaded one is sync_rx, we should poll first async_rx, so that we fall through to polling of sync_rx as well. In the opposite case we might end up polling only the sync_rx which is always loaded and starve async_rx.
May be we can keep tokio::select! that does the polling randomization internally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, we can close this for now, no longer needed if we keep the select! 🙏
Signed-off-by: Alexandru Vasile <[email protected]>
…nto lexnv/poll-wakers-connections
| None => { | ||
| let future = async { | ||
| tokio::select! { | ||
| notification = this.async_rx.recv() => notification, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.rs/tokio/1.49.0/tokio/sync/mpsc/struct.Receiver.html#cancel-safety
But this is cancel safe. So, I don't get your argument?
The wake up call should still be registered and the entire future be called when there was an event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the receivers should be cancel safe. The issue here is with the let future = async { }. The context waker is registered by the inner recv calls inside the temporary future. The future would later on be dropped if future.poll_unpin returns Poll::Pending. Then, when the sync_rx got a new notification, it would wake the waker corresponding to the dropped future, causing the poll_next to stall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, when the
sync_rxgot a new notification, it would wake the waker corresponding to the dropped future, causing the poll_next to stall.
But the waker is for the entire task and not just the future. So, the waker just wakes up the entire task and not some particular future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've dug a bit into tokio to figure this out, indeed I'm mistaken with the "stalled connection" because I assumed recv worked similarly to reserve (the initial issue we noticed in webrtc):
-
The tokio's bounded Receiver uses a waiting list of "context waker" via a wrapper over Semaphore implementation
-
I assumed that
sync_rx.recv()would call into the semaphoreacquireor similar to place the context waker into the linked list (obtaining an Acquire)- Because the
sync_rx.recv()future would get dropped immediately, the waker would be removed on Drop from the linked list - When the notification is received, there would be no registered waker in the list
- Because the
However, the semaphore is only used for capacity. When we call into recv, the Receiver stores the waker into a separate variable:
/// Receiver waker. Notified when a value is pushed into the channel.
rx_waker: CachePadded<AtomicWaker>,
fn recv
self.inner.rx_waker.register_by_ref(cx.waker());
So regardless if the temporary future gets dropped, we'll still wake the proper waker under the hood. This PR justs turns into a tiny optimization to not create and drop a dedicated async block :D
The
Connection::poll_nextimplementation needlessly created anasync blockjust to drop it when returningPoll::Pending. Instead, this PR polls the async_rx and sync_rx receivers manually. Considering that all substrate implementations usesync_rx, this takes priority (instead of the previoustokio::select!, which polled fairly).Discovered during investigation of:
cc @paritytech/networking