Skip to content

Conversation

@StephanDollberg
Copy link
Member

@StephanDollberg StephanDollberg commented Dec 1, 2025

Cherry pick #87

This commit was dropped in the 25.2 rebase unintentionally.

and scylladb#3129

bhalevy and others added 3 commits December 1, 2025 11:58
As seen in scylladb#1784,
having flush retry `io_submit` in a loop might be futile
if the kernel internal queue is full and we need to reap
completions to free up space for more iocb:s.

This change breaks out of the loop as soon as
`io_submit` fails and just moves the remaining entries
in the iocbs queue into the front so more entries can
be queued (otherwise, we'd need a circular queue that
will penalize the error-free path for the unlikely event of
an error)

Fixes scylladb#1784

Signed-off-by: Benny Halevy <[email protected]>
(cherry picked from commit 5138d5f)
EAGAIN is expected here when "Insufficient resources are available to queue any iocbs" (see io_submit(2)).
Abort on any other error, as those indicate an internal error on our side.

Signed-off-by: Benny Halevy <[email protected]>
(cherry picked from commit 9fff5f3)
The `aio_general_context` had the implicit assumption that in a single
tick we would never queue more than `--max-networking-io-control-blocks`
events/iocbs. This however ignores situations such as queuing multiple
iocbs per socket per tick, having left over iocbs in the queue from
previous ticks via the new EAGAIN handling or simply because a lot more
sockets are in use which isn't prevented anywhere else.

If this condition was hit (`last > end`) the reactor would just assert
out and crash.

To avoid this situation this patch introduces a backlog into which
elements are being enqueued when the original array is full and which
can grow unbounded. This mirrors how the `aio_storage_context` works
which uses the `io_sink` for the same purpose.

To avoid oversized allocations after startup the split into two separate
data structures is needed (instead of just regrowing the array). Further
the datastructure from which the iocbs are passed into `io_submit` needs to
be in contigiuous memory (and also provide an API to use it which most
containers don't).

`std::deque` is used in the backlog to avoid oversized allocations in
the backlog itself. The existing array solution for `iocbs` is kept to
fulfill the contigiuous memory requirement.

Further we slightly change how EAGAIN is handled. Instead of
backshifting the array we keep the array as is and just track the
`begin` of the array across `flush` calls. This is possible now as the
backlog handling is in place.

This introduces "batching" and prevents degenerate cases where only a
single element is being submitted which would then result in repeated
shifting of the whole array on each `flush` call. Given we use a chunked
data structure like `std::deque` erasing from the front of the backlog
is relatively cheap and does not require shifting all the elements in
the backlog. Hence, the per-iocb overhead is amortized constant.

Note that in general we try to submit as many iocbs per `io_submit`
call. Given the new behaviour of not backshifting the iocb array and
immediately backfilling from the backlog we might introduce `io_submit`
calls that don't try to submit the max amount of iocbs.

However we assume that if we ran into EAGAIN then either:

 - We are still behind the next time around: it's unlikely we would
   succeed in submitting all the iocbs anyway
 - We have now caugt up: we have introduced a single additional
   `io_submit` call which only submits `max_poll()/2` iocbs on average.
   The backlog will be drained at full `max_poll` per `io_submit`.

(cherry picked from commit d9175fc)
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR cherry-picks commit changes related to aio_general_context that were unintentionally dropped during the 25.2 rebase. The changes improve AIO (Asynchronous I/O) handling by introducing a backlog mechanism to prevent overload and removing time-based retry logic in favor of immediate early returns on resource exhaustion.

Key Changes:

  • Adds a backlog queue (iocbs_backlog) to handle overflow when the main iocbs array is full
  • Replaces time-based retry logic with immediate early return on EAGAIN, allowing completion reaping to free resources
  • Introduces a begin pointer to track submission progress separately from the array start

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/core/reactor_backend.hh Adds begin pointer and iocbs_backlog deque to aio_general_context struct
src/core/reactor_backend.cc Implements backlog queueing in queue() and refactors flush() to use early return on EAGAIN with backlog management

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

travisdowns
travisdowns previously approved these changes Dec 1, 2025
Allow setting TCP keepalive settings on the httpd connections.

TCP keepalives are useful to detect idle but otherwise dead connections
such as when connections get dropped by AWS (network) load balancers for
example.

httpd currently doesn't offer application level timeouts so this is a
cheap way to get some resilience.

Closes scylladb#3129

(cherry picked from commit 5dff05b)
@StephanDollberg
Copy link
Member Author

I have added scylladb#3129 to this PR

travisdowns
travisdowns previously approved these changes Dec 3, 2025
@StephanDollberg
Copy link
Member Author

The DNS tests seem broken on 25.2 already.

xemul added 2 commits December 3, 2025 12:52
Signed-off-by: Pavel Emelyanov <[email protected]>
(cherry picked from commit 8d8e790)
In case reverse resolution fails, try again with another host name.
At least one of them should succeed.

Signed-off-by: Pavel Emelyanov <[email protected]>
(cherry picked from commit 7e0f288)
@StephanDollberg
Copy link
Member Author

pulled in scylladb#3113 as well now to fix the tests.

Disable dpdk build as we don't use it an it is long and breaks now and
then: it still runs upstream

(cherry picked from commit ac9d8a3)
@StephanDollberg
Copy link
Member Author

StephanDollberg commented Dec 3, 2025

And #235 to didsable dpdk

@travisdowns travisdowns disabled auto-merge December 3, 2025 14:46
@StephanDollberg StephanDollberg merged commit 77ef0a4 into v25.2.x Dec 3, 2025
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants