scx_p2dq: Add saturation-aware WAKE_SYNC waker CPU handoff #3139

hodgesds · 2025-12-07T15:12:24Z

Implement a fast-path optimization for WAKE_SYNC wakeups that directly assigns wakees to the waker's CPU when the system has capacity. This provides zero-latency handoff for producer-consumer workloads while gracefully degrading at high utilization.

The optimization checks if:

System is not saturated (!saturated && !overloaded)
Waker CPU is in wakee's affinity mask
Waker CPU has no queued work (both local and LLC DSQs empty)

When these conditions are met, the wakee inherits the waker's CPU immediately.

Performance impact (schbench benchmark on 176 CPU system):

50-70% load: 47-55x wakeup latency improvement (995μs → 18-21μs)
80% load: 41x improvement (995μs → 24μs)
90% load: 18x improvement (995μs → 55μs)
100% load: No change (gracefully disabled)

Pipe workloads (producer-consumer pairs) see even higher trigger rates with up to 174,000 handoffs/sec at 50% load compared to ~1,000/sec for request-response patterns.

The optimization is placed early in pick_idle_cpu() to take priority over the prev_cpu sticky path, and only activates when beneficial. At saturation, it automatically disables to avoid overhead and allows normal pick-2 load balancing.

Changes:

Add P2DQ_STAT_WAKE_SYNC_WAKER counter to track handoffs
Check both local DSQ and LLC DSQ before handoff (waker consumes from both)
Gate optimization with saturation check
Expose counter in userspace stats
Remove unused idle_smtmask
fix bug in can_migrate checking LLC min runs

Tested with schbench and stress-ng across load levels 50-100%.

Implement a fast-path optimization for WAKE_SYNC wakeups that directly assigns wakees to the waker's CPU when the system has capacity. This provides zero-latency handoff for producer-consumer workloads while gracefully degrading at high utilization. The optimization checks if: 1. System is not saturated (!saturated && !overloaded) 2. Waker CPU is in wakee's affinity mask 3. Waker CPU has no queued work (both local and LLC DSQs empty) When these conditions are met, the wakee inherits the waker's CPU immediately. Performance impact (schbench benchmark on 176 CPU system): - 50-70% load: 47-55x wakeup latency improvement (995μs → 18-21μs) - 80% load: 41x improvement (995μs → 24μs) - 90% load: 18x improvement (995μs → 55μs) - 100% load: No change (gracefully disabled) Pipe workloads (producer-consumer pairs) see even higher trigger rates with up to 174,000 handoffs/sec at 50% load compared to ~1,000/sec for request-response patterns. The optimization is placed early in pick_idle_cpu() to take priority over the prev_cpu sticky path, and only activates when beneficial. At saturation, it automatically disables to avoid overhead and allows normal pick-2 load balancing. Changes: - Add P2DQ_STAT_WAKE_SYNC_WAKER counter to track handoffs - Check both local DSQ and LLC DSQ before handoff (waker consumes from both) - Gate optimization with saturation check - Expose counter in userspace stats Tested with schbench and stress-ng across load levels 50-100%. Signed-off-by: Daniel Hodges <hodgesd@meta.com>

hodgesds · 2025-12-20T12:10:38Z

I did some more testing trying to get the pipe based producer consumer (single producer/consumer) working with logic to detect vs multi consumers and it seems to be hardware/workload dependent.

hodgesds requested review from arighi, etsal, htejun, likewhatevs and multics69 December 7, 2025 15:12

hodgesds force-pushed the p2dq-wakee-optimize branch 3 times, most recently from de81335 to e48d9bd Compare December 11, 2025 07:19

hodgesds force-pushed the p2dq-wakee-optimize branch from e48d9bd to 3172c6d Compare December 20, 2025 11:06

likewhatevs approved these changes Dec 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scx_p2dq: Add saturation-aware WAKE_SYNC waker CPU handoff #3139

scx_p2dq: Add saturation-aware WAKE_SYNC waker CPU handoff #3139

Uh oh!

hodgesds commented Dec 7, 2025

Uh oh!

hodgesds commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

scx_p2dq: Add saturation-aware WAKE_SYNC waker CPU handoff #3139

Are you sure you want to change the base?

scx_p2dq: Add saturation-aware WAKE_SYNC waker CPU handoff #3139

Uh oh!

Conversation

hodgesds commented Dec 7, 2025

Uh oh!

hodgesds commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants