Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tokio: limit number of threads and set names #146

Merged
merged 1 commit into from
Mar 21, 2025
Merged

Conversation

d4l3k
Copy link
Member

@d4l3k d4l3k commented Mar 20, 2025

Tokio by default creates the runtime with threads equal to the number of CPUs. On beefy GPU boxes there may be hundreds of cores leading to hundreds of useless threads that make it harder to debug issues via lldb.

This also sets the thread names so it's easier to understand what pools are for what.

Test plan:

$ torchft_lighthouse
$ lldb -p 1234
> thread backtrace all
  thread #243, name = 'torchft-lighths', stop reason = signal SIGSTOP
    frame #0: 0x00007fd7fcd0792d libc.so.6`syscall + 29
    frame #1: 0x00007fd65a51e0a9 _torchft.cpython-312-x86_64-linux-gnu.so`parking_lot::condvar::Condvar::wait_until_internal::ha421dfa1f35f6d78 + 649
    frame #2: 0x00007fd65a50fdee _torchft.cpython-312-x86_64-linux-gnu.so`tokio::runtime::scheduler::multi_thread::park::Parker::park::h1064fe7e072bea0b + 222
    frame #3: 0x00007fd65a5166da _torchft.cpython-312-x86_64-linux-gnu.so`tokio::runtime::scheduler::multi_thread::worker::Context::park_timeout::hb6e439a80eb82fbc + 154
    frame #4: 0x00007fd65a515dbf _torchft.cpython-312-x86_64-linux-gnu.so`tokio::runtime::scheduler::multi_thread::worker::Context::run::h925c9d2cbee36e7e + 2879
    frame #5: 0x00007fd65a502fc4 _torchft.cpython-312-x86_64-linux-gnu.so`tokio::runtime::context::runtime::enter_runtime::h15ac70dde2453af5 + 692
    frame #6: 0x00007fd65a5151fa _torchft.cpython-312-x86_64-linux-gnu.so`tokio::runtime::scheduler::multi_thread::worker::run::h6967ec6caf4789c9 + 138
    frame #7: 0x00007fd65a4fa357 _torchft.cpython-312-x86_64-linux-gnu.so`_$LT$tokio..runtime..blocking..task..BlockingTask$LT$T$GT$$u20$as$u20$core..future..future..Future$GT$::poll::he2363864a5567ead + 135
    frame #8: 0x00007fd65a4fddd3 _torchft.cpython-312-x86_64-linux-gnu.so`tokio::runtime::task::core::Core$LT$T$C$S$GT$::poll::hdbdd87c38a1334ff + 147
    frame #9: 0x00007fd65a4f3164 _torchft.cpython-312-x86_64-linux-gnu.so`tokio::runtime::task::harness::Harness$LT$T$C$S$GT$::poll::h635b9aa31f062af8 + 180
    frame #10: 0x00007fd65a4f63ff _torchft.cpython-312-x86_64-linux-gnu.so`tokio::runtime::blocking::pool::Inner::run::h40998686924d7eab + 239
    frame #11: 0x00007fd65a4f844e _torchft.cpython-312-x86_64-linux-gnu.so`std::sys::backtrace::__rust_begin_short_backtrace::h71277f9d3c6edc88 + 206
    frame #12: 0x00007fd65a4f8bc2 _torchft.cpython-312-x86_64-linux-gnu.so`core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h0987e2e9d3ea45b5 + 162
    frame #13: 0x00007fd65a54b52b _torchft.cpython-312-x86_64-linux-gnu.so`std::sys::pal::unix::thread::Thread::new::thread_start::hcdbd1049068002f4 + 43
    frame #14: 0x00007fd7fcc8a3b2 libc.so.6`start_thread + 722
    frame #15: 0x00007fd7fcd0f430 libc.so.6`__clone3 + 48

@d4l3k d4l3k requested a review from H-Huang March 20, 2025 23:41
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 20, 2025
@d4l3k d4l3k requested a review from fegin March 20, 2025 23:41
src/lib.rs Outdated
@@ -71,7 +71,11 @@ impl ManagerServer {
connect_timeout: Duration,
) -> PyResult<Self> {
py.allow_threads(move || {
let runtime = Runtime::new()?;
let runtime = tokio::runtime::Builder::new_multi_thread()
.worker_threads(4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just nit: do you want this to be configurable? Or we are ok with 4 threads in any cases.

@@ -294,7 +302,10 @@ fn lighthouse_main(py: Python<'_>) -> PyResult<()> {
let mut args = env::args();
args.next(); // discard binary arg
let opt = lighthouse::LighthouseOpt::from_iter(args);
let rt = Runtime::new()?;
let rt = tokio::runtime::Builder::new_multi_thread()
.thread_name("torchft-lighths")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to set number of threads here as well?

@d4l3k d4l3k force-pushed the d4l3k/tokio_threads branch from e602c05 to 8fd028c Compare March 21, 2025 17:17
@d4l3k d4l3k force-pushed the d4l3k/tokio_threads branch from 8fd028c to 4c662fe Compare March 21, 2025 17:20
@d4l3k d4l3k merged commit 3724f7c into main Mar 21, 2025
7 checks passed
@d4l3k d4l3k deleted the d4l3k/tokio_threads branch March 21, 2025 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants