-
Notifications
You must be signed in to change notification settings - Fork 21
[v25.2] Cherry pick aio_general_context changes #247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v25.2] Cherry pick aio_general_context changes #247
Conversation
As seen in scylladb#1784, having flush retry `io_submit` in a loop might be futile if the kernel internal queue is full and we need to reap completions to free up space for more iocb:s. This change breaks out of the loop as soon as `io_submit` fails and just moves the remaining entries in the iocbs queue into the front so more entries can be queued (otherwise, we'd need a circular queue that will penalize the error-free path for the unlikely event of an error) Fixes scylladb#1784 Signed-off-by: Benny Halevy <[email protected]> (cherry picked from commit 5138d5f)
EAGAIN is expected here when "Insufficient resources are available to queue any iocbs" (see io_submit(2)). Abort on any other error, as those indicate an internal error on our side. Signed-off-by: Benny Halevy <[email protected]> (cherry picked from commit 9fff5f3)
The `aio_general_context` had the implicit assumption that in a single tick we would never queue more than `--max-networking-io-control-blocks` events/iocbs. This however ignores situations such as queuing multiple iocbs per socket per tick, having left over iocbs in the queue from previous ticks via the new EAGAIN handling or simply because a lot more sockets are in use which isn't prevented anywhere else. If this condition was hit (`last > end`) the reactor would just assert out and crash. To avoid this situation this patch introduces a backlog into which elements are being enqueued when the original array is full and which can grow unbounded. This mirrors how the `aio_storage_context` works which uses the `io_sink` for the same purpose. To avoid oversized allocations after startup the split into two separate data structures is needed (instead of just regrowing the array). Further the datastructure from which the iocbs are passed into `io_submit` needs to be in contigiuous memory (and also provide an API to use it which most containers don't). `std::deque` is used in the backlog to avoid oversized allocations in the backlog itself. The existing array solution for `iocbs` is kept to fulfill the contigiuous memory requirement. Further we slightly change how EAGAIN is handled. Instead of backshifting the array we keep the array as is and just track the `begin` of the array across `flush` calls. This is possible now as the backlog handling is in place. This introduces "batching" and prevents degenerate cases where only a single element is being submitted which would then result in repeated shifting of the whole array on each `flush` call. Given we use a chunked data structure like `std::deque` erasing from the front of the backlog is relatively cheap and does not require shifting all the elements in the backlog. Hence, the per-iocb overhead is amortized constant. Note that in general we try to submit as many iocbs per `io_submit` call. Given the new behaviour of not backshifting the iocb array and immediately backfilling from the backlog we might introduce `io_submit` calls that don't try to submit the max amount of iocbs. However we assume that if we ran into EAGAIN then either: - We are still behind the next time around: it's unlikely we would succeed in submitting all the iocbs anyway - We have now caugt up: we have introduced a single additional `io_submit` call which only submits `max_poll()/2` iocbs on average. The backlog will be drained at full `max_poll` per `io_submit`. (cherry picked from commit d9175fc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR cherry-picks commit changes related to aio_general_context that were unintentionally dropped during the 25.2 rebase. The changes improve AIO (Asynchronous I/O) handling by introducing a backlog mechanism to prevent overload and removing time-based retry logic in favor of immediate early returns on resource exhaustion.
Key Changes:
- Adds a backlog queue (
iocbs_backlog) to handle overflow when the mainiocbsarray is full - Replaces time-based retry logic with immediate early return on EAGAIN, allowing completion reaping to free resources
- Introduces a
beginpointer to track submission progress separately from the array start
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/core/reactor_backend.hh | Adds begin pointer and iocbs_backlog deque to aio_general_context struct |
| src/core/reactor_backend.cc | Implements backlog queueing in queue() and refactors flush() to use early return on EAGAIN with backlog management |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Allow setting TCP keepalive settings on the httpd connections. TCP keepalives are useful to detect idle but otherwise dead connections such as when connections get dropped by AWS (network) load balancers for example. httpd currently doesn't offer application level timeouts so this is a cheap way to get some resilience. Closes scylladb#3129 (cherry picked from commit 5dff05b)
|
I have added scylladb#3129 to this PR |
|
The DNS tests seem broken on 25.2 already. |
Signed-off-by: Pavel Emelyanov <[email protected]> (cherry picked from commit 8d8e790)
In case reverse resolution fails, try again with another host name. At least one of them should succeed. Signed-off-by: Pavel Emelyanov <[email protected]> (cherry picked from commit 7e0f288)
|
pulled in scylladb#3113 as well now to fix the tests. |
Disable dpdk build as we don't use it an it is long and breaks now and then: it still runs upstream (cherry picked from commit ac9d8a3)
|
And #235 to didsable dpdk |
Cherry pick #87
This commit was dropped in the 25.2 rebase unintentionally.
and scylladb#3129