Skip to content

Conversation

@lexnv
Copy link
Collaborator

@lexnv lexnv commented Jan 19, 2026

The PR fixes a AsyncWrite::poll_write implementation of the crypto/noise sockets that causes panics in rust-yamux and leads to unnecessary connection closure and instability:

  • T0: poll_write is called with buffer of len 512 bytes
    • the implementation encrypt data and buffers it
    • because the io socket is not ready, poll pending is returned.
  • T1: tokio_tungstenite (or other component) decides that a new payload must be sent (usually a PONG frame) of len 12 bytes
    • because the inner buffers contain the previous message, upon flushing the size of the older message is returned (ie 512)
    • rust-yamux uses the 512 bytes to index a buffer of 12 bytes (as expected by the second poll_write with buffer 12)
    • This caused frequent panics on the websocket implementation, which is currently addressed as abrupt connection termination

This PR fixes the AsyncWrite contract violation by effectively decoupling the encryption from the writing steps.

  • the inner buffers are drained as much as possible (until the socket returns poll pending)
  • The provided message is encrypted into the inner buffer if it has capacity and the number of bytes is returned immediately
  • a subsequent poll_write or poll_flush or poll_close will propagate the encrypted buffered data to the underlying socket

Previous fixes:

The fixes are still needed since the tokio-tungstenite (websocket crate) was not properly scoped and may exhibit similar behavior.

cc @paritytech/networking

@lexnv lexnv self-assigned this Jan 19, 2026
@lexnv lexnv added the bug Something isn't working label Jan 19, 2026
Copy link
Collaborator

@dmitry-markin dmitry-markin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool finding and good job here!

// This can happen once we have pending data and no space to buffer new data.
// Therefore, a call to poll_write returned Pending in the first step. This
// just forwards the Pending to the caller until we have space again.
return Poll::Pending;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we haven't registered the waker to wake up the task once we have space in the buffer (and poll_write above might not wake as as well if it returned Ready). May be we don't need this if block at all to not return Poll::Pending?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, if the poll_write returns Poll::Ready, then we must rely on the fact that:

  • one chunk+overhead has sufficient space to fill into the encrypt_buffer

@lexnv lexnv merged commit 891ca6d into master Jan 20, 2026
8 checks passed
@lexnv lexnv deleted the lexnv/fix-crypto-noise branch January 20, 2026 15:25
dmitry-markin added a commit that referenced this pull request Jan 21, 2026
## [0.13.0] - 2026-01-21

This release brings multiple fixes to both the transport and
application-level protocols.

Specifically, it enhances WebSocket stability by resolving AsyncWrite
errors and ensuring that partial writes during the negotiation phase no
longer trigger connection failures.

At the same time, Bitswap client functionality is introduced, which
makes this release semver breaking.

### Added

- Add Bitswap client
([#501](#501))

### Fixed

- notif/fix: Avoid CPU busy loops on litep2p full shutdown
([#521](#521))
- protocol: Ensure transport manager knows about closed connections
([#515](#515))
- substream: Decrement the bytes counter to avoid excessive flushing
([#511](#511))
- crypto/noise: Improve stability of websockets by fixing AsyncWrite
implementation ([#518](#518))
- bitswap: Split block responses into batches under 2 MiB
([#516](#516))
- crypto/noise: Fix connection negotiation logic on partial writes
([#519](#519))
- substream/fix: Fix partial reads for ProtocolCodec::Identity
([#512](#512))
- webrtc: Avoid panics returning error instead
([#509](#509))
- bitswap: e2e test & max payload fix
([#508](#508))
- tcp: Exit connections when events fail to propagate to protocols
([#506](#506))
- webrtc: Avoid future being dropped when channel is full
([#483](#483))

---------

Co-authored-by: Alexandru Vasile <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants