rt: improve spawn_blocking scalability with sharded queue #7757

alex · 2025-12-04T00:11:00Z

The blocking pool's task queue was protected by a single mutex, causing severe contention when many threads spawn blocking tasks concurrently. This resulted in nearly linear degradation: 16 concurrent threads took ~18x longer than a single thread.

Replace the single-mutex queue with a sharded queue that distributes tasks across 16 lock-protected shards. The implementation adapts to concurrency levels by using fewer shards when thread count is low, maintaining cache locality while avoiding contention at scale.

Benchmark results (spawning 100 batches of 16 tasks per thread):

Concurrency	Before	After	Improvement
1 thread	13.3ms	17.8ms	+34%
2 threads	26.0ms	20.1ms	-23%
4 threads	45.4ms	27.5ms	-39%
8 threads	111.5ms	20.3ms	-82%
16 threads	247.8ms	22.4ms	-91%

The slight overhead at 1 thread is due to the sharded infrastructure, but this is acceptable given the dramatic improvement at higher concurrency where the original design suffered from lock contention.

(Notwithstanding that this shows as a commit from claude, every line is human reviewed. If there's a mistake, it's Alex's fault.)

Closes #2528.

alex · 2025-12-04T00:37:45Z

(FreeBSD failures look unrelated.)

tokio/src/runtime/blocking/sharded_queue.rs

tokio/src/runtime/blocking/pool.rs

martin-g · 2025-12-04T17:38:13Z

Please rebase to latest master to get the fix for the FreeBSD failures.

tokio/src/runtime/blocking/pool.rs

tokio/src/runtime/blocking/sharded_queue.rs

tokio/src/runtime/blocking/pool.rs

ADD-SP · 2025-12-08T13:25:26Z

tokio/src/runtime/blocking/sharded_queue.rs

+        // Update max_shard_pushed BEFORE pushing the task.
+        self.max_shard_pushed.fetch_max(index, Release);
+
+        self.shards[index].push(task);


With the Release ordering, compiler might reorder the self.shards[index].push(task) and fetch_max, which means that the .push(task) might be sequenced-before the fetch_max.

I think you're right. (I hate atomic orderings :-/) AcqRel I think is what we want.

This is still wrong. Consider the following scenario:

Thread A Thread B Thread C preferred_shard = 0 preferred_shard = 1 max_shard_pushed = 0 shards[0].push(_) condvar.notify_one() wakes up ... max_shard = 0 max_shard_pushed = 1 shards[1].push(_) condvar.notify_one() wakes up ... max_shard = 1 shards[0].pop() = Some shards[0].pop() = None

In this case, Thread B does not check shards[1] because it read max_shard with the value of zero. This means that two tasks were spawned, but only one gets picked up.

Well, I guess in principle Thread C will see the second task after it finishes executing the first one.

Hmm, I think it's always the case that if a push happens concurrently with a pop that the pop might miss it, and we'll have to "fall back" to catching it in the wait_for_task loop.

I think in principle we could address this one by reloading max_shard_pushed, but of course you can still have a race condition.

In the scenario you've got here, what would happen is that after thread B returns None from pop, it'll go wait_for_task and then the task will get picked up.

Hmm, I think it's always the case that if a push happens concurrently with a pop that the pop might miss it, and we'll have to "fall back" to catching it in the wait_for_task loop.

Your notify_one() call ensures that for every push(), there will be a subsequent call to pop() that is not concurrent and hence guaranteed to see the pushed message. So there's at least one thread that's guaranteed to pick up each message.

If we imagine that the max_shard_pushed logic was removed, then Thread B would in fact be guaranteed to see the message in shards[1].

shards[1].push(_) on A happens-before shards[0].pop() = Some on thread C because thread C is the thread waken up by the second notify_one() call.

shards[0].pop() = Some on thread C happens-before shards[0].pop() = None on thread B, since otherwise thread B would have gotten Some when calling pop().

After shards[0].pop() = None, thread B would attempt to call shards[1].pop()

So by this logic, the shards[1].pop() call would in fact happen after shards[1].push(_), and is hence guaranteed to see the message that was pushed.

tokio/src/runtime/blocking/sharded_queue.rs

ADD-SP · 2025-12-11T14:35:20Z

tokio/src/runtime/blocking/sharded_queue.rs

+        // Acquire the condvar mutex before waiting
+        let guard = self.condvar_mutex.lock();
+
+        // Double-check shutdown and tasks after acquiring lock, as state may
+        // have changed while we were waiting for the lock
+        if self.is_shutdown() {
+            return WaitResult::Shutdown;
+        }
+        if let Some(task) = self.pop(preferred_shard) {
+            return WaitResult::Task(task);
+        }


Here, we attempt to acquire the shard lock while holding the condvar lock, this is the nested locking pattern. In general, we should avoid this pattern as it is error-prone.

Hmm, is there a preferred pattern to avoid the nested locking?

In this case we want to ensure that when we wait for a notification, there wasn't already a task that's made pending concurrently.

Nested locking is fine as long as locks are always taken in the same order.

tokio/src/runtime/blocking/sharded_queue.rs

alex · 2025-12-15T04:10:59Z

(I don't think the netlify failures are related)

ADD-SP

Sorry for the late review. I'd like to take some time to think about the nested locking issue. We may have a better choice, or not.

alex · 2025-12-30T13:18:16Z

No problem -- rebase was to pick up a fix for the netlify failures.

Let me know if there's other experiments it'd be useful for me to try.

tokio/src/runtime/blocking/sharded_queue.rs

Darksonn · 2026-01-03T14:40:48Z

tokio/src/runtime/blocking/sharded_queue.rs

+/// Calculate the effective number of shards to use based on thread count.
+/// Uses fewer shards at low concurrency for better cache locality.
+#[inline]
+fn effective_shards(num_threads: usize) -> usize {
+    match num_threads {


This logic seems error-prone and likely to lead to missed tasks. Does it actually matter for your benchmark?

It matters at N_THREADS=1 -- I don't personally care about that case at all, if we're ok with a small pessimization there (10% iirc?), I'd be delighted to delete this max shard logic and just always use a fixed number.

Does this sound ok to you? I'd love to delete it because its responsible for a lot of the complexity.

I've gone ahead and dropped this behavior.

tokio/src/runtime/blocking/sharded_queue.rs

The blocking pool's task queue was protected by a single mutex, causing severe contention when many threads spawn blocking tasks concurrently. This resulted in nearly linear degradation: 16 concurrent threads took ~18x longer than a single thread. Replace the single-mutex queue with a sharded queue that distributes tasks across 16 lock-protected shards. The implementation adapts to concurrency levels by using fewer shards when thread count is low, maintaining cache locality while avoiding contention at scale. Benchmark results (spawning 100 batches of 16 tasks per thread): | Concurrency | Before | After | Improvement | |-------------|----------|---------|-------------| | 1 thread | 13.3ms | 17.8ms | +34% | | 2 threads | 26.0ms | 20.1ms | -23% | | 4 threads | 45.4ms | 27.5ms | -39% | | 8 threads | 111.5ms | 20.3ms | -82% | | 16 threads | 247.8ms | 22.4ms | -91% | The slight overhead at 1 thread is due to the sharded infrastructure, but this is acceptable given the dramatic improvement at higher concurrency where the original design suffered from lock contention.

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Use the same approach as sync::watch: prefer thread_rng_n() for shard selection to reduce contention on the atomic counter, falling back to round-robin when the RNG is not available (loom mode or missing features). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

tokio/src/runtime/blocking/sharded_queue.rs

this allows readers to proceed concurrently

…ains weren't tremendous

alex force-pushed the claude/improve-spawn-blocking-perf-01A5VqgjoFsxUcvmP6eAjdTf branch 3 times, most recently from 9537dda to 016f6ca Compare December 4, 2025 00:19

ADD-SP added A-tokio Area: The main tokio crate M-blocking Module: tokio/task/blocking T-performance Topic: performance and benchmarks labels Dec 4, 2025

ADD-SP reviewed Dec 4, 2025

View reviewed changes

tokio/src/runtime/blocking/sharded_queue.rs Show resolved Hide resolved

ADD-SP added S-waiting-on-author Status: awaiting some action (such as code changes) from the PR or issue author. and removed S-waiting-on-author Status: awaiting some action (such as code changes) from the PR or issue author. labels Dec 4, 2025

martin-g reviewed Dec 4, 2025

View reviewed changes

tokio/src/runtime/blocking/sharded_queue.rs Show resolved Hide resolved

tokio/src/runtime/blocking/sharded_queue.rs Outdated Show resolved Hide resolved

tokio/src/runtime/blocking/pool.rs Show resolved Hide resolved

alex force-pushed the claude/improve-spawn-blocking-perf-01A5VqgjoFsxUcvmP6eAjdTf branch 2 times, most recently from f4416fb to 21ff5ce Compare December 4, 2025 17:36

alex force-pushed the claude/improve-spawn-blocking-perf-01A5VqgjoFsxUcvmP6eAjdTf branch from 21ff5ce to 694fa6b Compare December 4, 2025 17:39

martin-g reviewed Dec 5, 2025

View reviewed changes

tokio/src/runtime/blocking/pool.rs Outdated Show resolved Hide resolved

alex force-pushed the claude/improve-spawn-blocking-perf-01A5VqgjoFsxUcvmP6eAjdTf branch from 694fa6b to edd5e10 Compare December 5, 2025 12:37

ADD-SP reviewed Dec 6, 2025

View reviewed changes

tokio/src/runtime/blocking/pool.rs Outdated Show resolved Hide resolved

alex force-pushed the claude/improve-spawn-blocking-perf-01A5VqgjoFsxUcvmP6eAjdTf branch from edd5e10 to 126cb78 Compare December 6, 2025 02:06

ADD-SP reviewed Dec 6, 2025

View reviewed changes

tokio/src/runtime/blocking/sharded_queue.rs Outdated Show resolved Hide resolved

ADD-SP reviewed Dec 6, 2025

View reviewed changes

tokio/src/runtime/blocking/pool.rs Outdated Show resolved Hide resolved

ADD-SP added S-waiting-on-author Status: awaiting some action (such as code changes) from the PR or issue author. and removed S-waiting-on-author Status: awaiting some action (such as code changes) from the PR or issue author. labels Dec 7, 2025

ADD-SP reviewed Dec 8, 2025

View reviewed changes

alex force-pushed the claude/improve-spawn-blocking-perf-01A5VqgjoFsxUcvmP6eAjdTf branch from 855e269 to 046d5c6 Compare December 8, 2025 14:12

mox692 reviewed Dec 10, 2025

View reviewed changes

tokio/src/runtime/blocking/sharded_queue.rs Outdated Show resolved Hide resolved

ADD-SP reviewed Dec 11, 2025

View reviewed changes

ADD-SP added the S-waiting-on-author Status: awaiting some action (such as code changes) from the PR or issue author. label Dec 11, 2025

ADD-SP reviewed Dec 11, 2025

View reviewed changes

tokio/src/runtime/blocking/sharded_queue.rs Outdated Show resolved Hide resolved

tokio/src/runtime/blocking/sharded_queue.rs Outdated Show resolved Hide resolved

ADD-SP removed the S-waiting-on-author Status: awaiting some action (such as code changes) from the PR or issue author. label Dec 15, 2025

alex force-pushed the claude/improve-spawn-blocking-perf-01A5VqgjoFsxUcvmP6eAjdTf branch from 81d7f25 to 54330c3 Compare December 30, 2025 13:12

ADD-SP reviewed Dec 30, 2025

View reviewed changes

Darksonn reviewed Jan 3, 2026

View reviewed changes

alex force-pushed the claude/improve-spawn-blocking-perf-01A5VqgjoFsxUcvmP6eAjdTf branch from 08376f4 to 3fd9e25 Compare January 3, 2026 20:40

claude and others added 4 commits January 3, 2026 15:44

remove len from Shard, it was only used for a fast path in len

5f3fdf8

remove drain method in favor of pop loop

391edbf

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

use destructuring for wait_timeout result

15b6df1

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

alex force-pushed the claude/improve-spawn-blocking-perf-01A5VqgjoFsxUcvmP6eAjdTf branch from 3fd9e25 to a7c341e Compare January 3, 2026 20:44

alex force-pushed the claude/improve-spawn-blocking-perf-01A5VqgjoFsxUcvmP6eAjdTf branch from a7c341e to fa4e0a6 Compare January 3, 2026 20:50

Darksonn reviewed Jan 4, 2026

View reviewed changes

tokio/src/runtime/blocking/sharded_queue.rs Outdated Show resolved Hide resolved

simplify features required

98e0071

alex force-pushed the claude/improve-spawn-blocking-perf-01A5VqgjoFsxUcvmP6eAjdTf branch from 25439f8 to 98e0071 Compare January 4, 2026 16:06

alex added 2 commits January 4, 2026 11:25

optimization to avoid performing reallocs while holding a lock

1f53f22

this allows readers to proceed concurrently

remove adaptive behavior, it introduced a lot of complexity and the g…

502a6d8

…ains weren't tremendous

alex force-pushed the claude/improve-spawn-blocking-perf-01A5VqgjoFsxUcvmP6eAjdTf branch from e0107cc to 502a6d8 Compare January 13, 2026 12:55

Uh oh!

rt: improve spawn_blocking scalability with sharded queue #7757

Are you sure you want to change the base?

rt: improve spawn_blocking scalability with sharded queue #7757

Conversation

alex commented Dec 4, 2025 • edited by mox692 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alex commented Dec 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

martin-g commented Dec 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ADD-SP Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alex commented Dec 15, 2025

Uh oh!

ADD-SP left a comment

Choose a reason for hiding this comment

Uh oh!

alex commented Dec 30, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

alex commented Dec 4, 2025 •

edited by mox692

Loading

ADD-SP Dec 8, 2025 •

edited

Loading