worker: prefer macOS P-cores via pthread QoS USER_INITIATED#2342
Merged
MarcusSorealheis merged 5 commits intoMay 17, 2026
Merged
Conversation
Apple Silicon's XNU scheduler will park UTILITY/BACKGROUND threads on
efficiency cores. Single-thread-bursty workloads (swift-frontend,
clang) typical in iOS RBE builds can run 2x-3x slower on an E-core,
so tag the worker process with QOS_CLASS_USER_INITIATED to bias
scheduling toward P-cores.
The setter runs in three places:
- Main thread before tokio runtime creation so worker threads
inherit the class via pthread QoS inheritance.
- tokio Builder::on_thread_start hook as belt-and-suspenders for
any thread (e.g. blocking pool) that misses inheritance.
- Top of LocalWorker::run for the same reason.
Implementation uses libc's pthread_set_qos_class_self_np binding;
the new `nativelink_worker::qos` module is compile-gated so non-macOS
targets emit no call and pull in no symbol. A round-trip test on
macOS verifies the kernel accepted the class change.
Ported from upstream commit 0fce813 (TraceMachina/nativelink TraceMachina#2243).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The QoS scheme in PR TraceMachina#2243 hinges on tokio worker threads actually seeing QOS_CLASS_USER_INITIATED at task-runtime; without an end-to-end test the on_thread_start hook could silently regress (e.g. if the hook ran on the wrong thread or the kernel rejected the class) and the worker would quietly fall back to E-core scheduling. Adds a macOS-only test that builds a fresh multi-threaded tokio runtime with the same on_thread_start hook used in main, spawns a task to force execution on a worker thread, and reads back the class with pthread_get_qos_class_np. Also refactors the existing single-thread test to share a helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t test The on_thread_start inheritance test must construct a custom-built runtime via `Builder::new_multi_thread()` and drive it with `block_on` — that is the unit under test. No `nativelink-util::task` wrapper exposes a custom-built runtime with a thread-start hook, so the disallowed_methods lint cannot be addressed at the root cause. Use `#[expect(clippy::disallowed_methods, reason = ...)]` per the modern Rust 2024 idiom (fails if the lint stops firing, with a reviewer-visible justification) rather than a silent `#[allow]`. Mirrors the same justified escape already used in src/bin/nativelink.rs::main.
The previous single-definition `pub fn set_user_initiated() -> bool` had a `#[cfg(target_os = "macos")]` block that called libc and a `#[cfg(not(...))]` block that returned `true`. On Linux CI clippy sees only the trivial `true` arm and fires `missing_const_for_fn`, failing ubuntu, asan, Bazel Dev/ubuntu, and every dependent rbe-* job. This did not reproduce on macOS because the macOS arm calls libc, which is not const-eligible, so clippy stays silent. Split into two cfg-gated definitions: the macOS impl stays a regular `pub fn` because `libc::pthread_set_qos_class_self_np` is not const; the non-macOS impl becomes `pub const fn` returning `true`. Call sites are unchanged, both arms still return `bool`, and the existing `qos::macos_tests::*` continue to apply since they were already gated on `#[cfg(target_os = "macos")]`. Doc comments are now split per arm and specialised to each platform's actual behaviour. Splitting (rather than `#[allow(missing_const_for_fn)]` on a single function) is the right fix because the lint is accurate for the non-macOS arm in isolation; suppressing it would hide a legitimate const-fn opportunity and mask future bugs on whichever platform clippy runs against.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
On macOS, set
pthread_set_qos_class_self_np(QOS_CLASS_USER_INITIATED, 0)on the main thread, the tokio runtime's worker threads (viaBuilder::on_thread_start), and at the top ofLocalWorker::run. The QoS class is inherited bypthread_create, so tokio worker and blocking-pool threads pick it up; the explicit hook inon_thread_startis belt-and-suspenders for late-attached blocking threads.No behavior change on Linux/Windows —
qos::set_user_initiated()is aconst fnno-op there, and thelibcdep iscfg(target_os = \"macos\")-gated.Why
On Apple Silicon (M-series), XNU routes work between performance (P) and efficiency (E) cores partly based on QoS. The default QoS for a build daemon is
UTILITYor below, which the scheduler can park on E-cores. Single-thread-bursty workloads — SwiftCompile is the canonical example — can run 2–3× slower on an E-core than a P-core.USER_INITIATEDis a scheduler hint, not a hard P-core pin (there is no public API for hard pinning, andUSER_INTERACTIVEis reserved for UI work). Under saturation the scheduler may still spill to E-cores, but when capacity exists the bias clearly improves wall time for compile-heavy actions on macOS RBE workers.Files touched
Test plan
Provenance
Equivalent to commit `0fce813bc` from #2243, ported atomically. The 156-commit parent PR bundles many unrelated changes; this isolates a single change that (a) is macOS-only and (b) attacks the dominant SwiftCompile wall-time bottleneck on Apple Silicon RBE fleets.
🤖 Generated with Claude Code
This change is