Skip to content

fix(unix): split oversized GSO sends into smaller chunks#2653

Draft
kixelated wants to merge 1 commit into
quinn-rs:mainfrom
kixelated:udp-retry-gso-eio
Draft

fix(unix): split oversized GSO sends into smaller chunks#2653
kixelated wants to merge 1 commit into
quinn-rs:mainfrom
kixelated:udp-retry-gso-eio

Conversation

@kixelated

Copy link
Copy Markdown

(untested, and mostly AI generated)

When a Transmit exceeds what one sendmsg syscall can accept — either because it has more segments than max_gso_segments allows, or because its total size exceeds the kernel's per-call limit (u16::MAX bytes) — split it upfront into smaller batches dispatched via separate sendmsg calls. This catches three cases that today silently drop datagrams or rely on the QUIC layer's loss timers to recover:

  • Stale in-flight GSO transmits left over from a GSO halt (udp: retry failed GSO sendmsg on EIO without GSO #2399). After EIO halts GSO via max_gso_segments.store(1), any transmits queued while GSO was still enabled now violate the new limit and need to be re-batched as individual sends rather than dropped.

  • The first GSO send that triggers the halt itself (udp: retry failed GSO sendmsg on EIO without GSO #2399). The EIO arm now re-dispatches the failing transmit through send after the halt, where the upfront chunking check splits it into individual datagrams without waiting for a PTO.

  • Callers that pass oversized Transmit::contents (partial Should quinn-udp automatically chunk Transmit::contents? #2201) — e.g. >65535 bytes total, which the kernel rejects with EMSGSIZE. These are now broken into kernel-acceptable batches, and each chunk still uses GSO when its segment count is > 1.

The public contract of max_gso_segments is unchanged — callers should still query it and size their Transmits accordingly. This path is a defensive net so that callers which don't (or which raced a GSO halt) don't silently drop datagrams.

Cellular Android, where the kernel reports GSO support but the radio driver returns EIO on the first real send, is the original motivation: recovery now completes inside the sendmsg call instead of waiting for a ~1s handshake PTO. Firefox is hot-fixing the same case in D265812 and Firezone hit it independently (firezone/firezone#10279).

Fixes #2399.
Refs #2201.

@thomaseizinger

Copy link
Copy Markdown
Collaborator

(untested, and mostly AI generated)

FTR: I don't mind this in principle and I see you've marked it as draft for now. Let us know when you've reviewed it yourself and done some testing on it.

I am glad this is being worked on. We do this on top of quinn-udp right now. I suggest you have a look at https://github.com/firezone/firezone/blob/445f46da6273dcce917a197f0747f91af782624b/rust/libs/connlib/socket-factory/src/lib.rs#L412 too.

Comment thread quinn-udp/src/unix.rs Outdated
state: &UdpSocketState,
io: SockRef<'_>,
transmit: &Transmit<'_>,
max_per_chunk: usize,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any practical value to making this dynamic rather than simply hardcoding this to 1?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's hard to follow, working on renaming stuff and avoiding some recursion now.

max_per_chunk should be called gso_segments or something. It's the maximum number of segments that can be contained in a single packet, initially starting at >1 (depending on the OS) and shrinking down to 1 on an error.

If it shrinks down to 1, we call send again to repeat this logic, but gso_segments will be 1 now.

Comment thread quinn-udp/src/unix.rs Outdated
) -> io::Result<()> {
let segment_size = transmit
.effective_segment_size()
.expect("caller verified segment_size is set");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there isn't, can this not safely create a single chunk and send it instead of having to panic? The invariants being held so far from each other often ends up in tears sometimes. I feel like we need to try harder to avoid these things.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be passed in as an argument instead of recomputing it anyway.

@kixelated kixelated force-pushed the udp-retry-gso-eio branch 2 times, most recently from e7cc1fb to ae31e3a Compare June 3, 2026 20:13
@kixelated

Copy link
Copy Markdown
Author

kk I simplified the logic and removed the recursion. It's still untested (on my Mac and the customer has the failing Android anyway).

Some of the comments are still AI generated, but I've closely reviewed everything.

Comment thread quinn-udp/src/unix.rs
// `send_batched` halting GSO mid-syscall.
if e.raw_os_error() == Some(libc::EINVAL)
&& !state.sendmsg_einval()
&& transmit.effective_segment_size().is_none()

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes an existing bug, where a GSO failure could also disable ECN. Can split into a separate PR if you'd prefer.

@thomaseizinger thomaseizinger left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two initial comments.

I also think we should have integration-level test coverage for this.

Comment thread quinn-udp/src/unix.rs Outdated
Comment on lines +391 to +399
#[cfg(any(target_os = "linux", target_os = "android"))]
if let Some(segment_size) = transmit.effective_segment_size() {
// Get the (GSO) segment size, and compute the number of segments we could fit into a UDP payload.
let segment_count = max_segments_per_send(state.max_gso_segments(), segment_size);
return send_batched(state, io, transmit, segment_size, segment_count);
}

// Fall back to sending each packet individually.
send_packet(state, &io, transmit)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this. There should not be a need for branching here because a single packet is a batch of size one with segment_size == packet_len, right?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly a micro-optimization to avoid running the send_batched loop/logic if GSO isn't enabled/supported.

You're right that we could effective_segment_size instead of transmit.segment_size. It would end up being a batch of 1 packet.

Lemme see if I can combine the code paths to avoid the branching.

Comment thread quinn-udp/src/unix.rs Outdated
/// The kernel rejects any GSO send whose total `contents` length exceeds
/// `u16::MAX`, regardless of `max_gso_segments`.
#[cfg(any(target_os = "linux", target_os = "android", test))]
const MAX_SEND_BYTES: usize = u16::MAX as usize;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I've already pointed out, it is not that easy. You need to take into account IP and UDP headers here. See the link I've left in the previous comment.

@kixelated kixelated force-pushed the udp-retry-gso-eio branch 2 times, most recently from 36af237 to 26ef777 Compare June 3, 2026 22:35
@kixelated

Copy link
Copy Markdown
Author

@thomaseizinger better?

When a `Transmit` exceeds what one `sendmsg` syscall can accept — either
because it has more segments than `max_gso_segments` allows, or because
its total size exceeds the kernel's per-call limit (`u16::MAX` bytes) —
split it upfront into smaller batches dispatched via separate `sendmsg`
calls. This catches three cases that today silently drop datagrams or
rely on the QUIC layer's loss timers to recover:

* Stale in-flight GSO transmits left over from a GSO halt (quinn-rs#2399).
  After `EIO` halts GSO via `max_gso_segments.store(1)`, any transmits
  queued while GSO was still enabled now violate the new limit and need
  to be re-batched as individual sends rather than dropped.

* The first GSO send that triggers the halt itself (quinn-rs#2399). The `EIO`
  arm now re-dispatches the failing transmit through `send` after the
  halt, where the upfront chunking check splits it into individual
  datagrams without waiting for a PTO.

* Callers that pass oversized `Transmit::contents` (partial quinn-rs#2201) —
  e.g. >65535 bytes total, which the kernel rejects with `EMSGSIZE`.
  These are now broken into kernel-acceptable batches, and each chunk
  still uses GSO when its segment count is > 1.

The public contract of `max_gso_segments` is unchanged — callers should
still query it and size their `Transmit`s accordingly. This path is a
defensive net so that callers which don't (or which raced a GSO halt)
don't silently drop datagrams.

Cellular Android, where the kernel reports GSO support but the radio
driver returns `EIO` on the first real send, is the original motivation:
recovery now completes inside the `sendmsg` call instead of waiting for
a ~1s handshake PTO. Firefox is hot-fixing the same case in D265812 and
Firezone hit it independently (firezone/firezone#10279).

Fixes quinn-rs#2399.
Refs quinn-rs#2201.
@kixelated kixelated force-pushed the udp-retry-gso-eio branch from 26ef777 to a24b09a Compare June 4, 2026 03:48

@thomaseizinger thomaseizinger left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall better yes but I am not sure we are going into the right direction in general. Some questions to think about:

  • Shouldn't this logic apply to all code-paths, i.e. Windows + Apple as well? The way I implemented this in Firezone, it is running above quinn-udp, regardless of whether GSO / apple-fast path is active.
  • What about integration-level tests? Whether or not we are doing the slicing correctly is only surfaced if we actually hit a kernel and pass it a giant Transmit, i.e. one with a content that is larger than u16::MAX.

If we don't apply this loop globally, then I think we should inline this into the current send function. We already have a loop there and we just need to keep looping and call prepare_msg, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

udp: retry failed GSO sendmsg on EIO without GSO

3 participants