Skip to content

worker: prefer macOS P-cores via pthread QoS USER_INITIATED#2342

Merged
MarcusSorealheis merged 5 commits into
TraceMachina:mainfrom
erneestoc:ec/pr2243-macos-qos-user-initiated
May 17, 2026
Merged

worker: prefer macOS P-cores via pthread QoS USER_INITIATED#2342
MarcusSorealheis merged 5 commits into
TraceMachina:mainfrom
erneestoc:ec/pr2243-macos-qos-user-initiated

Conversation

@erneestoc
Copy link
Copy Markdown
Contributor

@erneestoc erneestoc commented May 17, 2026

Summary

On macOS, set pthread_set_qos_class_self_np(QOS_CLASS_USER_INITIATED, 0) on the main thread, the tokio runtime's worker threads (via Builder::on_thread_start), and at the top of LocalWorker::run. The QoS class is inherited by pthread_create, so tokio worker and blocking-pool threads pick it up; the explicit hook in on_thread_start is belt-and-suspenders for late-attached blocking threads.

No behavior change on Linux/Windows — qos::set_user_initiated() is a const fn no-op there, and the libc dep is cfg(target_os = \"macos\")-gated.

Why

On Apple Silicon (M-series), XNU routes work between performance (P) and efficiency (E) cores partly based on QoS. The default QoS for a build daemon is UTILITY or below, which the scheduler can park on E-cores. Single-thread-bursty workloads — SwiftCompile is the canonical example — can run 2–3× slower on an E-core than a P-core.

USER_INITIATED is a scheduler hint, not a hard P-core pin (there is no public API for hard pinning, and USER_INTERACTIVE is reserved for UI work). Under saturation the scheduler may still spill to E-cores, but when capacity exists the bias clearly improves wall time for compile-heavy actions on macOS RBE workers.

Files touched

  • `nativelink-worker/src/qos.rs` (new, 152 lines) — macOS impl via `libc::pthread_set_qos_class_self_np`, non-macOS const-fn no-op, two macOS tests + one cross-platform compile-time test
  • `nativelink-worker/src/lib.rs` — `pub mod qos`
  • `nativelink-worker/src/local_worker.rs` — call at top of `LocalWorker::run`
  • `nativelink-worker/Cargo.toml` — `[target.'cfg(target_os = "macos")'.dependencies] libc = "0.2"`
  • `nativelink-worker/BUILD.bazel` — register `src/qos.rs` and macOS-only libc dep via `select()`
  • `src/bin/nativelink.rs` — main-thread call before `tokio::runtime::Builder::build`, plus `on_thread_start` hook on the runtime builder

Test plan

  • `cargo test -p nativelink-worker --lib` — 3 passed
    • `qos::macos_tests::sets_user_initiated_on_current_thread` — round-trips via `pthread_get_qos_class_np` to confirm the kernel accepted the class (not just that the syscall returned 0)
    • `qos::macos_tests::tokio_worker_threads_inherit_user_initiated_via_on_thread_start` — builds a fresh `new_multi_thread` runtime with the same hook used in `main`, forces scheduling onto a worker thread, reads back the class. This verifies the load-bearing claim that on_thread_start actually delivers USER_INITIATED to worker threads.
    • Cross-platform: `#[cfg(not(target_os = "macos"))]` no-op compile-time anchor
  • `cargo build -p nativelink-worker --bin nativelink` clean
  • `cargo clippy -p nativelink-worker --lib -- -D warnings` clean
  • `cargo fmt --check` clean

Provenance

Equivalent to commit `0fce813bc` from #2243, ported atomically. The 156-commit parent PR bundles many unrelated changes; this isolates a single change that (a) is macOS-only and (b) attacks the dominant SwiftCompile wall-time bottleneck on Apple Silicon RBE fleets.

🤖 Generated with Claude Code


This change is Reviewable

erneestoc and others added 5 commits May 16, 2026 18:49
Apple Silicon's XNU scheduler will park UTILITY/BACKGROUND threads on
efficiency cores. Single-thread-bursty workloads (swift-frontend,
clang) typical in iOS RBE builds can run 2x-3x slower on an E-core,
so tag the worker process with QOS_CLASS_USER_INITIATED to bias
scheduling toward P-cores.

The setter runs in three places:
  - Main thread before tokio runtime creation so worker threads
    inherit the class via pthread QoS inheritance.
  - tokio Builder::on_thread_start hook as belt-and-suspenders for
    any thread (e.g. blocking pool) that misses inheritance.
  - Top of LocalWorker::run for the same reason.

Implementation uses libc's pthread_set_qos_class_self_np binding;
the new `nativelink_worker::qos` module is compile-gated so non-macOS
targets emit no call and pull in no symbol. A round-trip test on
macOS verifies the kernel accepted the class change.

Ported from upstream commit 0fce813 (TraceMachina/nativelink TraceMachina#2243).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The QoS scheme in PR TraceMachina#2243 hinges on tokio worker threads actually
seeing QOS_CLASS_USER_INITIATED at task-runtime; without an end-to-end
test the on_thread_start hook could silently regress (e.g. if the hook
ran on the wrong thread or the kernel rejected the class) and the
worker would quietly fall back to E-core scheduling.

Adds a macOS-only test that builds a fresh multi-threaded tokio runtime
with the same on_thread_start hook used in main, spawns a task to force
execution on a worker thread, and reads back the class with
pthread_get_qos_class_np. Also refactors the existing single-thread test
to share a helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t test

The on_thread_start inheritance test must construct a custom-built
runtime via `Builder::new_multi_thread()` and drive it with `block_on`
— that is the unit under test. No `nativelink-util::task` wrapper
exposes a custom-built runtime with a thread-start hook, so the
disallowed_methods lint cannot be addressed at the root cause.

Use `#[expect(clippy::disallowed_methods, reason = ...)]` per the
modern Rust 2024 idiom (fails if the lint stops firing, with a
reviewer-visible justification) rather than a silent `#[allow]`.

Mirrors the same justified escape already used in
src/bin/nativelink.rs::main.
The previous single-definition `pub fn set_user_initiated() -> bool` had
a `#[cfg(target_os = "macos")]` block that called libc and a
`#[cfg(not(...))]` block that returned `true`. On Linux CI clippy sees
only the trivial `true` arm and fires `missing_const_for_fn`, failing
ubuntu, asan, Bazel Dev/ubuntu, and every dependent rbe-* job. This did
not reproduce on macOS because the macOS arm calls libc, which is not
const-eligible, so clippy stays silent.

Split into two cfg-gated definitions: the macOS impl stays a regular
`pub fn` because `libc::pthread_set_qos_class_self_np` is not const; the
non-macOS impl becomes `pub const fn` returning `true`. Call sites are
unchanged, both arms still return `bool`, and the existing
`qos::macos_tests::*` continue to apply since they were already gated on
`#[cfg(target_os = "macos")]`. Doc comments are now split per arm and
specialised to each platform's actual behaviour.

Splitting (rather than `#[allow(missing_const_for_fn)]` on a single
function) is the right fix because the lint is accurate for the
non-macOS arm in isolation; suppressing it would hide a legitimate
const-fn opportunity and mask future bugs on whichever platform clippy
runs against.
Copy link
Copy Markdown
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MarcusSorealheis MarcusSorealheis merged commit 2410a08 into TraceMachina:main May 17, 2026
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants