fix(unix): split oversized GSO sends into smaller chunks by kixelated · Pull Request #2653 · quinn-rs/quinn

kixelated · 2026-05-24T23:06:27Z

(untested, and mostly AI generated)

When a Transmit exceeds what one sendmsg syscall can accept — either because it has more segments than max_gso_segments allows, or because its total size exceeds the kernel's per-call limit (u16::MAX bytes) — split it upfront into smaller batches dispatched via separate sendmsg calls. This catches three cases that today silently drop datagrams or rely on the QUIC layer's loss timers to recover:

Stale in-flight GSO transmits left over from a GSO halt (udp: retry failed GSO sendmsg on EIO without GSO #2399). After EIO halts GSO via max_gso_segments.store(1), any transmits queued while GSO was still enabled now violate the new limit and need to be re-batched as individual sends rather than dropped.
The first GSO send that triggers the halt itself (udp: retry failed GSO sendmsg on EIO without GSO #2399). The EIO arm now re-dispatches the failing transmit through send after the halt, where the upfront chunking check splits it into individual datagrams without waiting for a PTO.
Callers that pass oversized Transmit::contents (partial Should quinn-udp automatically chunk Transmit::contents? #2201) — e.g. >65535 bytes total, which the kernel rejects with EMSGSIZE. These are now broken into kernel-acceptable batches, and each chunk still uses GSO when its segment count is > 1.

The public contract of max_gso_segments is unchanged — callers should still query it and size their Transmits accordingly. This path is a defensive net so that callers which don't (or which raced a GSO halt) don't silently drop datagrams.

Cellular Android, where the kernel reports GSO support but the radio driver returns EIO on the first real send, is the original motivation: recovery now completes inside the sendmsg call instead of waiting for a ~1s handshake PTO. Firefox is hot-fixing the same case in D265812 and Firezone hit it independently (firezone/firezone#10279).

Fixes #2399.
Refs #2201.

thomaseizinger · 2026-05-25T05:06:14Z

(untested, and mostly AI generated)

FTR: I don't mind this in principle and I see you've marked it as draft for now. Let us know when you've reviewed it yourself and done some testing on it.

I am glad this is being worked on. We do this on top of quinn-udp right now. I suggest you have a look at https://github.com/firezone/firezone/blob/445f46da6273dcce917a197f0747f91af782624b/rust/libs/connlib/socket-factory/src/lib.rs#L412 too.

flub · 2026-06-03T07:30:30Z

+    state: &UdpSocketState,
+    io: SockRef<'_>,
+    transmit: &Transmit<'_>,
+    max_per_chunk: usize,


is there any practical value to making this dynamic rather than simply hardcoding this to 1?

Yeah it's hard to follow, working on renaming stuff and avoiding some recursion now.

max_per_chunk should be called gso_segments or something. It's the maximum number of segments that can be contained in a single packet, initially starting at >1 (depending on the OS) and shrinking down to 1 on an error.

If it shrinks down to 1, we call send again to repeat this logic, but gso_segments will be 1 now.

flub · 2026-06-03T07:31:47Z

+) -> io::Result<()> {
+    let segment_size = transmit
+        .effective_segment_size()
+        .expect("caller verified segment_size is set");


If there isn't, can this not safely create a single chunk and send it instead of having to panic? The invariants being held so far from each other often ends up in tears sometimes. I feel like we need to try harder to avoid these things.

This should be passed in as an argument instead of recomputing it anyway.

kixelated · 2026-06-03T20:14:04Z

kk I simplified the logic and removed the recursion. It's still untested (on my Mac and the customer has the failing Android anyway).

Some of the comments are still AI generated, but I've closely reviewed everything.

kixelated · 2026-06-03T20:15:42Z

+        // `send_batched` halting GSO mid-syscall.
+        if e.raw_os_error() == Some(libc::EINVAL)
+            && !state.sendmsg_einval()
+            && transmit.effective_segment_size().is_none()


This fixes an existing bug, where a GSO failure could also disable ECN. Can split into a separate PR if you'd prefer.

thomaseizinger

Left two initial comments.

I also think we should have integration-level test coverage for this.

thomaseizinger · 2026-06-03T20:17:53Z

+    #[cfg(any(target_os = "linux", target_os = "android"))]
+    if let Some(segment_size) = transmit.effective_segment_size() {
+        // Get the (GSO) segment size, and compute the number of segments we could fit into a UDP payload.
+        let segment_count = max_segments_per_send(state.max_gso_segments(), segment_size);
+        return send_batched(state, io, transmit, segment_size, segment_count);
+    }
+
+    // Fall back to sending each packet individually.
+    send_packet(state, &io, transmit)


I don't really understand this. There should not be a need for branching here because a single packet is a batch of size one with segment_size == packet_len, right?

It's mostly a micro-optimization to avoid running the send_batched loop/logic if GSO isn't enabled/supported.

You're right that we could effective_segment_size instead of transmit.segment_size. It would end up being a batch of 1 packet.

Lemme see if I can combine the code paths to avoid the branching.

thomaseizinger · 2026-06-03T20:19:41Z

+/// The kernel rejects any GSO send whose total `contents` length exceeds
+/// `u16::MAX`, regardless of `max_gso_segments`.
+#[cfg(any(target_os = "linux", target_os = "android", test))]
+const MAX_SEND_BYTES: usize = u16::MAX as usize;


As I've already pointed out, it is not that easy. You need to take into account IP and UDP headers here. See the link I've left in the previous comment.

kixelated · 2026-06-03T22:37:59Z

@thomaseizinger better?

When a `Transmit` exceeds what one `sendmsg` syscall can accept — either because it has more segments than `max_gso_segments` allows, or because its total size exceeds the kernel's per-call limit (`u16::MAX` bytes) — split it upfront into smaller batches dispatched via separate `sendmsg` calls. This catches three cases that today silently drop datagrams or rely on the QUIC layer's loss timers to recover: * Stale in-flight GSO transmits left over from a GSO halt (quinn-rs#2399). After `EIO` halts GSO via `max_gso_segments.store(1)`, any transmits queued while GSO was still enabled now violate the new limit and need to be re-batched as individual sends rather than dropped. * The first GSO send that triggers the halt itself (quinn-rs#2399). The `EIO` arm now re-dispatches the failing transmit through `send` after the halt, where the upfront chunking check splits it into individual datagrams without waiting for a PTO. * Callers that pass oversized `Transmit::contents` (partial quinn-rs#2201) — e.g. >65535 bytes total, which the kernel rejects with `EMSGSIZE`. These are now broken into kernel-acceptable batches, and each chunk still uses GSO when its segment count is > 1. The public contract of `max_gso_segments` is unchanged — callers should still query it and size their `Transmit`s accordingly. This path is a defensive net so that callers which don't (or which raced a GSO halt) don't silently drop datagrams. Cellular Android, where the kernel reports GSO support but the radio driver returns `EIO` on the first real send, is the original motivation: recovery now completes inside the `sendmsg` call instead of waiting for a ~1s handshake PTO. Firefox is hot-fixing the same case in D265812 and Firezone hit it independently (firezone/firezone#10279). Fixes quinn-rs#2399. Refs quinn-rs#2201.

thomaseizinger

Overall better yes but I am not sure we are going into the right direction in general. Some questions to think about:

Shouldn't this logic apply to all code-paths, i.e. Windows + Apple as well? The way I implemented this in Firezone, it is running above quinn-udp, regardless of whether GSO / apple-fast path is active.
What about integration-level tests? Whether or not we are doing the slicing correctly is only surfaced if we actually hit a kernel and pass it a giant Transmit, i.e. one with a content that is larger than u16::MAX.

If we don't apply this loop globally, then I think we should inline this into the current send function. We already have a loop there and we just need to keep looping and call prepare_msg, right?

flub reviewed Jun 3, 2026

View reviewed changes

kixelated force-pushed the udp-retry-gso-eio branch 2 times, most recently from e7cc1fb to ae31e3a Compare June 3, 2026 20:13

kixelated commented Jun 3, 2026

View reviewed changes

thomaseizinger requested changes Jun 3, 2026

View reviewed changes

kixelated force-pushed the udp-retry-gso-eio branch 2 times, most recently from 36af237 to 26ef777 Compare June 3, 2026 22:35

kixelated force-pushed the udp-retry-gso-eio branch from 26ef777 to a24b09a Compare June 4, 2026 03:48

thomaseizinger reviewed Jun 4, 2026

View reviewed changes

Uh oh!

Conversation

kixelated commented May 24, 2026

Uh oh!

thomaseizinger commented May 25, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kixelated commented Jun 3, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomaseizinger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kixelated commented Jun 3, 2026

Uh oh!

thomaseizinger left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants