reactor: retry io_uring socket send on EAGAIN by sunyuechi · Pull Request #3435 · scylladb/seastar

sunyuechi · 2026-05-29T12:01:29Z

The io_uring backend submits socket writes as IORING_OP_SENDMSG/SEND. When the send buffer is full the kernel can complete the op with -EAGAIN, which io_completion::complete_with() turns into a thrown system_error handed to the caller. The aio/epoll backends instead go through do_sendmsg(), which waits for POLLOUT and resends, so they never leak EAGAIN.

On an EAGAIN completion, fall back to the poll-based do_sendmsg()/ do_send() like the other backends. recvmsg()/recv_some() share the same path but reads do not fill a buffer, so they are left unchanged.

Observed under load on riscv64, where io_uring is the default backend: unittest-seastar-socket's test_preemptive_down() aborts with EAGAIN.

tchaikov

lgtm. but i'd suggest improve the commit message to note that this is caused by the kernel design gap in iouring's socket i/o error handling.

the normal inline path handles socket EAGAIN correctly. but under memory pressure, an allocation for arming a poll fails, the request is pushed to io-wq. io-wq then calls sendmsg in blocking mode, but SOCK_NONBLOCK sockets return EAGAIN immediately. io-wq doesn't retry for these sockets, and leak EAGAIN to userspace.

disk i/o retries on REQ_F_REISSUE, see torvalds/linux@d803d12. but socket i/o does not share this error handling with disk i/o.

in interesting thing is, liburing considers it a contract that io_uring should handle EAGAIN internally if MSG_DONTWAIT is not set in the SQE. see https://github.com/axboe/liburing/blob/a33a770d8379978660036251e14710c6ee8a3d34/test/socket-nb.c#L91-L93.

seastar's iouring backend does not set MSG_DONTWAIT in its non-speculate paths, so, per the kernel's contract, EAGAIN should not reach userspace.

probably we should file a bug against liburing / linux kernel on the gap between them.

sunyuechi · 2026-05-30T15:02:35Z

Thanks for the clear analysis. Added a note about the kernel gap to the commit msg. You wanna file the bug or should i? @tchaikov

tchaikov · 2026-05-30T15:09:35Z

Thanks for the clear analysis. Added a note about the kernel gap to the commit msg. You wanna file the bug or should i? @tchaikov

i will try. still trying to get my head around the iouring kernel source.

BTW, my analysis was updated. as i realized that it's not about read/write versus send/recv, it's about disk i/o versus network i/o. as read/write ops on socket also suffer from this issue.

The io_uring backend submits socket writes as IORING_OP_SENDMSG/SEND. When the send buffer is full the kernel can complete the op with -EAGAIN, which io_completion::complete_with() turns into a thrown system_error handed to the caller. The aio/epoll backends instead go through do_sendmsg(), which waits for POLLOUT and resends, so they never leak EAGAIN. This is a gap in the kernel's io_uring error handling: disk I/O retries -EAGAIN internally via REQ_F_REISSUE (kernel commit d803d123948f), but socket I/O does not share that path, so -EAGAIN leaks to userspace. On an EAGAIN completion, fall back to the poll-based do_sendmsg()/ do_send() like the other backends. recvmsg()/recv_some() share the same path but reads do not fill a buffer, so they are left unchanged. Observed under load on riscv64, where io_uring is the default backend: unittest-seastar-socket's test_preemptive_down() aborts with EAGAIN. Signed-off-by: Sun Yuechi <sunyuechi@iscas.ac.cn>

avikivity · 2026-05-31T13:07:31Z

+                return _r.do_sendmsg(fd, iovs, len);
+            }
+            return make_exception_future<size_t>(e);
+        });


I don't like adding a continuation here. Although it should be rare that we mis-speculate.

We can inherit from kernel_io_completion instead of io_completion so we capture the -EAGAIN and re-issue the operation from there.

Please do file a bug and add a reference to it in the comment.

tchaikov reviewed May 30, 2026

View reviewed changes

sunyuechi force-pushed the wip-io-uring-retry-send-eagain branch from 3b9bb49 to 5ec3652 Compare May 30, 2026 15:01

sunyuechi force-pushed the wip-io-uring-retry-send-eagain branch from 5ec3652 to 1a4e991 Compare May 30, 2026 15:16

avikivity reviewed May 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reactor: retry io_uring socket send on EAGAIN#3435

reactor: retry io_uring socket send on EAGAIN#3435
sunyuechi wants to merge 1 commit into
scylladb:masterfrom
sunyuechi:wip-io-uring-retry-send-eagain

sunyuechi commented May 29, 2026

Uh oh!

tchaikov left a comment •

edited

Loading

Uh oh!

sunyuechi commented May 30, 2026

Uh oh!

tchaikov commented May 30, 2026

Uh oh!

avikivity May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sunyuechi commented May 29, 2026

Uh oh!

tchaikov left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sunyuechi commented May 30, 2026

Uh oh!

tchaikov commented May 30, 2026

Uh oh!

avikivity May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tchaikov left a comment •

edited

Loading