Skip to content

Commit 2410a08

Browse files
erneestocclaude
andauthored
worker: prefer macOS P-cores via pthread QoS USER_INITIATED (#2342)
* worker: set QoS USER_INITIATED on macOS for P-core preference Apple Silicon's XNU scheduler will park UTILITY/BACKGROUND threads on efficiency cores. Single-thread-bursty workloads (swift-frontend, clang) typical in iOS RBE builds can run 2x-3x slower on an E-core, so tag the worker process with QOS_CLASS_USER_INITIATED to bias scheduling toward P-cores. The setter runs in three places: - Main thread before tokio runtime creation so worker threads inherit the class via pthread QoS inheritance. - tokio Builder::on_thread_start hook as belt-and-suspenders for any thread (e.g. blocking pool) that misses inheritance. - Top of LocalWorker::run for the same reason. Implementation uses libc's pthread_set_qos_class_self_np binding; the new `nativelink_worker::qos` module is compile-gated so non-macOS targets emit no call and pull in no symbol. A round-trip test on macOS verifies the kernel accepted the class change. Ported from upstream commit 0fce813 (TraceMachina/nativelink #2243). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * worker: test tokio worker thread inherits USER_INITIATED QoS The QoS scheme in PR #2243 hinges on tokio worker threads actually seeing QOS_CLASS_USER_INITIATED at task-runtime; without an end-to-end test the on_thread_start hook could silently regress (e.g. if the hook ran on the wrong thread or the kernel rejected the class) and the worker would quietly fall back to E-core scheduling. Adds a macOS-only test that builds a fresh multi-threaded tokio runtime with the same on_thread_start hook used in main, spawns a task to force execution on a worker thread, and reads back the class with pthread_get_qos_class_np. Also refactors the existing single-thread test to share a helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * qos: justify disallowed_methods escape on tokio_worker_threads_inherit test The on_thread_start inheritance test must construct a custom-built runtime via `Builder::new_multi_thread()` and drive it with `block_on` — that is the unit under test. No `nativelink-util::task` wrapper exposes a custom-built runtime with a thread-start hook, so the disallowed_methods lint cannot be addressed at the root cause. Use `#[expect(clippy::disallowed_methods, reason = ...)]` per the modern Rust 2024 idiom (fails if the lint stops firing, with a reviewer-visible justification) rather than a silent `#[allow]`. Mirrors the same justified escape already used in src/bin/nativelink.rs::main. * qos: split set_user_initiated into cfg-gated fn / const fn The previous single-definition `pub fn set_user_initiated() -> bool` had a `#[cfg(target_os = "macos")]` block that called libc and a `#[cfg(not(...))]` block that returned `true`. On Linux CI clippy sees only the trivial `true` arm and fires `missing_const_for_fn`, failing ubuntu, asan, Bazel Dev/ubuntu, and every dependent rbe-* job. This did not reproduce on macOS because the macOS arm calls libc, which is not const-eligible, so clippy stays silent. Split into two cfg-gated definitions: the macOS impl stays a regular `pub fn` because `libc::pthread_set_qos_class_self_np` is not const; the non-macOS impl becomes `pub const fn` returning `true`. Call sites are unchanged, both arms still return `bool`, and the existing `qos::macos_tests::*` continue to apply since they were already gated on `#[cfg(target_os = "macos")]`. Doc comments are now split per arm and specialised to each platform's actual behaviour. Splitting (rather than `#[allow(missing_const_for_fn)]` on a single function) is the right fix because the lint is accurate for the non-macOS arm in isolation; suppressing it would hide a legitimate const-fn opportunity and mask future bugs on whichever platform clippy runs against. * ci: retrigger after GitHub 502 fetching rules_kotlin tarball --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent e7532cb commit 2410a08

6 files changed

Lines changed: 192 additions & 0 deletions

File tree

nativelink-worker/BUILD.bazel

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ rust_library(
1313
"src/directory_cache.rs",
1414
"src/lib.rs",
1515
"src/local_worker.rs",
16+
"src/qos.rs",
1617
"src/running_actions_manager.rs",
1718
"src/worker_api_client_wrapper.rs",
1819
"src/worker_utils.rs",
@@ -51,6 +52,7 @@ rust_library(
5152
"@crates//:uuid",
5253
] + select({
5354
"@platforms//os:linux": ["@crates//:libc"],
55+
"@platforms//os:macos": ["@crates//:libc"],
5456
"//conditions:default": [],
5557
}),
5658
)
@@ -98,6 +100,7 @@ rust_test_suite(
98100
"@crates//:which",
99101
] + select({
100102
"@platforms//os:linux": ["@crates//:libc"],
103+
"@platforms//os:macos": ["@crates//:libc"],
101104
"//conditions:default": [],
102105
}),
103106
)

nativelink-worker/Cargo.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,9 @@ uuid = { version = "1.16.0", default-features = false, features = [
5858
[target.'cfg(target_os = "linux")'.dependencies]
5959
libc = { version = "0.2.183", default-features = false }
6060

61+
[target.'cfg(target_os = "macos")'.dependencies]
62+
libc = { version = "0.2.183", default-features = false }
63+
6164
[dev-dependencies]
6265
nativelink-macro = { path = "../nativelink-macro" }
6366

nativelink-worker/src/lib.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ pub mod directory_cache;
1616
pub mod local_worker;
1717
#[cfg(target_os = "linux")]
1818
pub mod namespace_utils;
19+
pub mod qos;
1920
pub mod running_actions_manager;
2021
pub mod worker_api_client_wrapper;
2122
pub mod worker_utils;

nativelink-worker/src/local_worker.rs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -798,6 +798,14 @@ impl<T: WorkerApiClientTrait + 'static, U: RunningActionsManager> LocalWorker<T,
798798
mut self,
799799
mut shutdown_rx: broadcast::Receiver<ShutdownGuard>,
800800
) -> Result<(), Error> {
801+
// Belt-and-suspenders QoS bump: the main binary already calls
802+
// this before runtime creation so the tokio worker threads
803+
// inherit P-core preference via pthread QoS inheritance, but
804+
// any thread that reaches this point should also be tagged in
805+
// case it was spawned by a path that bypassed `on_thread_start`.
806+
// No-op on non-macOS.
807+
let _ = crate::qos::set_user_initiated();
808+
801809
let sleep_fn = self
802810
.sleep_fn
803811
.take()

nativelink-worker/src/qos.rs

Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
// Copyright 2024 The NativeLink Authors. All rights reserved.
2+
//
3+
// Licensed under the Functional Source License, Version 1.1, Apache 2.0 Future License (the "License");
4+
// you may not use this file except in compliance with the License.
5+
// You may obtain a copy of the License at
6+
//
7+
// See LICENSE file for details
8+
//
9+
// Unless required by applicable law or agreed to in writing, software
10+
// distributed under the License is distributed on an "AS IS" BASIS,
11+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
// See the License for the specific language governing permissions and
13+
// limitations under the License.
14+
15+
//! Darwin `QoS` (Quality of Service) helpers for worker scheduling.
16+
//!
17+
//! Apple Silicon (M-series) CPUs have a heterogeneous topology with
18+
//! performance ("P") and efficiency ("E") cores. XNU's scheduler routes
19+
//! threads to P or E cores in part based on the thread's `QoS` class. The
20+
//! default class assigned to long-running background daemons is typically
21+
//! `UTILITY` or `BACKGROUND`, both of which the scheduler may park on
22+
//! E-cores.
23+
//!
24+
//! Single-thread-bursty workloads such as `swift-frontend` and `clang`
25+
//! invocations (typical in iOS RBE builds) can run 2x–3x slower when
26+
//! pinned to an E-core. Tagging the worker process with
27+
//! `QOS_CLASS_USER_INITIATED` tells the scheduler to treat its threads
28+
//! as foreground-equivalent and bias placement toward P-cores.
29+
//!
30+
//! On Linux and Windows these helpers compile away to nothing — they are
31+
//! intentionally not behind a runtime branch so non-macOS builds never
32+
//! emit a call.
33+
34+
/// Sets the calling thread's `QoS` class to `USER_INITIATED` on macOS.
35+
///
36+
/// Returns `true` if the underlying `pthread_set_qos_class_self_np`
37+
/// call succeeded; returns `false` if it failed.
38+
///
39+
/// Safe to call from any thread, including tokio runtime worker threads
40+
/// via `Builder::on_thread_start`.
41+
#[cfg(target_os = "macos")]
42+
#[inline]
43+
pub fn set_user_initiated() -> bool {
44+
// SAFETY: `pthread_set_qos_class_self_np` is a thread-local
45+
// setter with no preconditions on the caller; passing a valid
46+
// enum variant and relative priority 0 is always defined.
47+
let ret = unsafe {
48+
libc::pthread_set_qos_class_self_np(libc::qos_class_t::QOS_CLASS_USER_INITIATED, 0)
49+
};
50+
ret == 0
51+
}
52+
53+
/// Compile-time no-op on non-macOS targets.
54+
///
55+
/// Always returns `true`. The call site expands to nothing after
56+
/// inlining / dead-code elimination, so non-macOS builds never emit
57+
/// a runtime branch or a libc call.
58+
#[cfg(not(target_os = "macos"))]
59+
#[inline]
60+
pub const fn set_user_initiated() -> bool {
61+
true
62+
}
63+
64+
#[cfg(all(test, target_os = "macos"))]
65+
mod macos_tests {
66+
use super::set_user_initiated;
67+
68+
/// Reads the current thread's `QoS` class via `pthread_get_qos_class_np`.
69+
/// Panics with a contextual message on failure (only called from tests).
70+
fn current_qos_class() -> libc::qos_class_t {
71+
let mut class: libc::qos_class_t = libc::qos_class_t::QOS_CLASS_UNSPECIFIED;
72+
let mut rel_prio: libc::c_int = 0;
73+
// SAFETY: out-pointers point to stack-allocated, properly sized
74+
// and aligned storage owned by this thread.
75+
let ret = unsafe {
76+
libc::pthread_get_qos_class_np(
77+
libc::pthread_self(),
78+
core::ptr::from_mut(&mut class),
79+
core::ptr::from_mut(&mut rel_prio),
80+
)
81+
};
82+
assert_eq!(ret, 0, "pthread_get_qos_class_np failed: {ret}");
83+
class
84+
}
85+
86+
/// Proves the `QoS` call is wired up on macOS and the underlying
87+
/// Darwin symbol resolves at link time. A failure here means the
88+
/// worker would silently keep running on E-cores.
89+
#[test]
90+
fn sets_user_initiated_on_current_thread() {
91+
assert!(
92+
set_user_initiated(),
93+
"pthread_set_qos_class_self_np(USER_INITIATED) returned non-zero",
94+
);
95+
// `qos_class_t` is a `#[repr(u32)]` C enum that does not derive
96+
// `PartialEq` in libc, so compare the underlying discriminants.
97+
assert_eq!(
98+
current_qos_class() as u32,
99+
libc::qos_class_t::QOS_CLASS_USER_INITIATED as u32,
100+
"`QoS` class did not update; thread will be eligible for E-core scheduling",
101+
);
102+
}
103+
104+
/// Validates the load-bearing claim that tokio worker threads created
105+
/// with a `Builder::on_thread_start` hook calling `set_user_initiated`
106+
/// observe `QOS_CLASS_USER_INITIATED` from inside spawned tasks. This
107+
/// mirrors the wiring in `src/bin/nativelink.rs::main`. Without this
108+
/// test the entire `QoS` scheme is unverified at the integration level.
109+
///
110+
/// This is the one place in the worker crate that must construct a
111+
/// fresh `tokio::runtime::Builder::new_multi_thread()` and drive it
112+
/// with `block_on` — the unit under test *is* the `on_thread_start`
113+
/// hook on a custom-built runtime, which `nativelink-util::task` and
114+
/// `#[nativelink_test]` do not expose. The `#[expect]` mirrors the
115+
/// same justified escape used in `src/bin/nativelink.rs::main`.
116+
#[test]
117+
#[expect(
118+
clippy::disallowed_methods,
119+
reason = "test exercises `Builder::on_thread_start` + `block_on`; \
120+
no util wrapper exposes a custom-built runtime with a thread-start hook"
121+
)]
122+
fn tokio_worker_threads_inherit_user_initiated_via_on_thread_start() {
123+
// Deliberately build a fresh runtime in-test (do not reuse a
124+
// global one) so the hook is exercised on freshly-spawned
125+
// worker threads with whatever class they were born with.
126+
let rt = tokio::runtime::Builder::new_multi_thread()
127+
.worker_threads(2)
128+
.on_thread_start(|| {
129+
assert!(set_user_initiated(), "hook failed in worker thread");
130+
})
131+
.enable_all()
132+
.build()
133+
.expect("build tokio runtime");
134+
135+
let observed: u32 = rt.block_on(async {
136+
// Force execution on a worker thread (not the caller).
137+
tokio::spawn(async { current_qos_class() as u32 })
138+
.await
139+
.expect("join spawned task")
140+
});
141+
142+
assert_eq!(
143+
observed,
144+
libc::qos_class_t::QOS_CLASS_USER_INITIATED as u32,
145+
"tokio worker thread did not inherit USER_INITIATED from on_thread_start",
146+
);
147+
}
148+
}
149+
150+
#[cfg(all(test, not(target_os = "macos")))]
151+
mod non_macos_tests {
152+
use super::set_user_initiated;
153+
154+
/// On Linux/Windows the function must be a true no-op that always
155+
/// reports success — there is no runtime cost and no platform call.
156+
#[test]
157+
fn is_a_noop_on_non_macos() {
158+
assert!(set_user_initiated());
159+
}
160+
}
161+
162+
/// Compile-time assertion: when `target_os` is not `macos`, this module
163+
/// must not reference any libc symbol. Reviewers can `grep "extern crate
164+
/// libc"` or inspect this constant to verify the no-op story.
165+
#[cfg(not(target_os = "macos"))]
166+
pub const NON_MACOS_IS_NOOP: () = ();

src/bin/nativelink.rs

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -718,8 +718,19 @@ fn get_config() -> Result<CasConfig, Error> {
718718
}
719719

720720
fn main() -> Result<(), Box<dyn core::error::Error>> {
721+
// Set QoS to USER_INITIATED on the main thread *before* the tokio
722+
// runtime is built so the spawned worker threads inherit P-core
723+
// scheduling preference via pthread QoS inheritance on Apple
724+
// Silicon. `on_thread_start` below is a belt-and-suspenders hook
725+
// for any thread that misses the inherited class (e.g. tokio
726+
// blocking pool threads created lazily). No-op on non-macOS.
727+
let _ = nativelink_worker::qos::set_user_initiated();
728+
721729
#[expect(clippy::disallowed_methods, reason = "starting main runtime")]
722730
let runtime = tokio::runtime::Builder::new_multi_thread()
731+
.on_thread_start(|| {
732+
let _ = nativelink_worker::qos::set_user_initiated();
733+
})
723734
.enable_all()
724735
.build()?;
725736

0 commit comments

Comments
 (0)