Skip to content

rpc: replace _outstanding unordered_map with log_indexed_container#3450

Open
emdroid wants to merge 1 commit into
scylladb:masterfrom
emdroid:fix/SCYLLADB-1541-rpc-outstanding-deque
Open

rpc: replace _outstanding unordered_map with log_indexed_container#3450
emdroid wants to merge 1 commit into
scylladb:masterfrom
emdroid:fix/SCYLLADB-1541-rpc-outstanding-deque

Conversation

@emdroid

@emdroid emdroid commented Jun 5, 2026

Copy link
Copy Markdown

std::unordered_map<id_type, unique_ptr<reply_handler_base>> rehashes with a single large contiguous allocation when the number of in-flight requests crosses a bucket threshold. At high concurrency (e.g. 1000 concurrent ALL-consistency reads) this produces >128 KiB allocations, triggering Seastar's large-alloc warning.

Introduce a new generic utility, log_indexed_container<T, IndexType> (include/seastar/util/log_indexed_container.hh), which maps monotonically increasing integer indices to values without hashing or large contiguous allocations:

  • Backed by boost::container::deque, which releases front blocks on pop_front (unlike std::deque in libstdc++), keeping memory bounded.
  • Stores values in direct index-offset slots, so lookup is O(1).
  • Trims empty slots from the front after each removal, keeping the deque short relative to the concurrency window.
  • Supports any nullable type (raw pointer, unique_ptr) natively, and wraps non-nullable types in std::optional internally.
  • API: emplace(idx, value), find(idx), take(idx), clear_at(idx), clear(), size(), empty(), base_index().

Replace _outstanding with a log_indexed_container specialised on reply_handler_base unique_ptr. RPC message ids are already monotonically increasing per-connection, making them a natural key. Outstanding reply handlers are removed via take(). The timer and cancellation callbacks preserve the original invariant that the slot always exists when they fire: the receive path disarms both atomically (no yield point) before returning to the reactor, so a missing slot indicates a bug.

Also add tests/unit/log_indexed_container_test.cc with 13 unit tests covering emplace, find, take, clear_at, trim_front, base_index advancement, and a custom index type.

Fixes: SCYLLADB-1541

@emdroid emdroid requested a review from Copilot June 5, 2026 09:16
@emdroid emdroid self-assigned this Jun 5, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR replaces the outstanding-reply handler storage from a hash map to a deque-backed, slot-indexed table to reduce large contiguous allocations during rehash and streamline handler lookup/removal.

Changes:

  • Introduces reply_table (deque-based) to store in-flight reply handlers by message id.
  • Updates client cancel/timeout/reply paths to use get()/insert()/remove() instead of unordered_map operations.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
src/rpc/rpc.cc Switches outstanding handler operations to the new reply_table API.
include/seastar/rpc/rpc.hh Implements reply_table using std::deque and replaces the _outstanding map.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread include/seastar/rpc/rpc.hh Outdated
Comment thread include/seastar/rpc/rpc.hh Outdated
Comment thread include/seastar/rpc/rpc.hh Outdated
Comment thread include/seastar/rpc/rpc.hh Outdated
Comment thread include/seastar/rpc/rpc.hh Outdated
Comment thread include/seastar/rpc/rpc.hh Outdated
@emdroid emdroid force-pushed the fix/SCYLLADB-1541-rpc-outstanding-deque branch from 372f9f8 to d600495 Compare June 5, 2026 09:55
Comment thread include/seastar/rpc/rpc.hh Outdated
// ever exceeds a page regardless of in-flight count.
// Note: no_wait verbs consume a message id without registering a slot;
// those ids produce an out-of-range lookup on reply, handled gracefully.
struct reply_table {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used the same idea recently here scylladb/scylladb#29794. Maybe extract a template data structure for reuse? The log_indexed_container already has unit tests and also handles the case when insert index is smaller than base_index.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that would make sense, only we'd need to move (copy first) the log_indexed_container to Seastar. Btw. there is also the idea of "chunked map" #2133 - which this might eventually evolve into.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this PR urgent? There are some comments in scylladb/scylladb#29794 which I need to address, and IIUC I must do it before we merge this PR. I'm not sure when I'll do it, though, as I have some higher-priority work (mostly reviews) and I'm going on vacation on Thursday.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's P3 and not marked as CI stability, so not really urgent.

Comment thread include/seastar/rpc/rpc.hh Outdated

private:
std::deque<std::unique_ptr<reply_handler_base>> _slots;
id_type _base_id = 1; // id corresponding to _slots[0]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id_type _base_id = 1; This seems strange, why not zero?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_message_id starts at 1 in client, so the first id inserted is 1, meaning slot 0 would always be empty if _base_id = 0. Starting at 1 avoids that wasted slot. Which is what Claude inferred when working at the fix - but starting at 0 doesn't make a big difference I guess, or at least comment could be added with the explanation. But especially if we'd create a shared implementation, starting at 0 would be better indeed

Comment thread include/seastar/rpc/rpc.hh Outdated
// ids are monotonically increasing and always >= _base_id;
// a negative index means the caller is violating this invariant.
SEASTAR_ASSERT(idx >= 0);
while (static_cast<id_type>(_slots.size()) <= idx) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch-insert at the end is likely more efficient _slots.insert(_slots.end(), count, nullptr)

Comment thread include/seastar/rpc/rpc.hh Outdated
}

// Pop null entries from the front, advancing _base_id.
void trim() noexcept {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be made private? There can't be null slots at the front after remove-s

Comment thread src/rpc/rpc.cc Outdated
cancel->cancel_wait = [this, id] {
_outstanding[id]->cancel();
_outstanding.erase(id);
if (auto* h = _outstanding.get(id)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why relax the assumptions? previously it was guaranteed that id exists

Comment thread src/rpc/rpc.cc Outdated
_stats.timeout++;
_outstanding[id]->timeout();
_outstanding.erase(id);
if (auto* h = _outstanding.get(id)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why relax the assumptions? previously it was guaranteed that id exists

@emdroid emdroid force-pushed the fix/SCYLLADB-1541-rpc-outstanding-deque branch from d600495 to 23ea09d Compare June 5, 2026 18:04
@emdroid emdroid changed the title rpc: replace _outstanding unordered_map with slot-indexed deque rpc: replace _outstanding unordered_map with log_indexed_container Jun 5, 2026
@emdroid

emdroid commented Jun 5, 2026

Copy link
Copy Markdown
Author

New version:

std::unordered_map<id_type, unique_ptr<reply_handler_base>> rehashes
with a single large contiguous allocation when the number of in-flight
requests crosses a bucket threshold. At high concurrency (e.g. 1000
concurrent ALL-consistency reads) this produces >128 KiB allocations,
triggering Seastar's large-alloc warning.

Introduce a new generic utility, log_indexed_container<T, IndexType>
(include/seastar/util/log_indexed_container.hh), which maps monotonically
increasing integer indices to values without hashing or large contiguous
allocations:

  - Backed by boost::container::deque, which releases front blocks on
    pop_front (unlike std::deque in libstdc++), keeping memory bounded.
  - Stores values in direct index-offset slots, so lookup is O(1).
  - Trims empty slots from the front after each removal, keeping the
    deque short relative to the concurrency window.
  - Supports any nullable type (raw pointer, unique_ptr) natively, and
    wraps non-nullable types in std::optional internally.
  - API: emplace(idx, value), find(idx), take(idx), clear_at(idx),
    clear(), size(), empty(), base_index().

Replace _outstanding with a log_indexed_container specialised on
reply_handler_base unique_ptr. RPC message ids are already monotonically
increasing per-connection, making them a natural key. Outstanding reply
handlers are removed via take(). The timer and cancellation callbacks
preserve the original invariant that the slot always exists when they
fire: the receive path disarms both atomically (no yield point) before
returning to the reactor, so a missing slot indicates a bug.

Also add tests/unit/log_indexed_container_test.cc with 13 unit tests
covering emplace, find, take, clear_at, trim_front, base_index
advancement, and a custom index type.

Fixes: SCYLLADB-1541
@emdroid emdroid force-pushed the fix/SCYLLADB-1541-rpc-outstanding-deque branch from 23ea09d to 0ffa7da Compare June 5, 2026 18:19
@emdroid emdroid requested a review from patjed41 June 8, 2026 07:49

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the changes should be separated. 1st commit should introduce the new container, the second replace the unordered_map.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants