-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Is your feature request related to a problem? Please describe.
KernelThread
re-implements a lot of standard functionality present in plain old std::thread
while bifurcating the code for windows and pthreads.
This is potentially very bugprone, and does not play nicely with other low-level tools like ASAN.
Describe the solution you'd like
KernelThread
should just be a thin wrapper over std::thread
.
Additional context
As an added motivation for this change. On macOS, for long-running programs I often get
Process 97504 stopped
* thread #15, stop reason = EXC_BAD_ACCESS (code=1, address=0x11b238018)
frame #0: 0x000000011b2c179c libclang_rt.asan_osx_dynamic.dylib`__asan::GetCurrentThread() + 20
libclang_rt.asan_osx_dynamic.dylib`__asan::GetCurrentThread:
-> 0x11b2c179c <+20>: ldr x0, [x0, #0x98]
0x11b2c17a0 <+24>: ldp x29, x30, [sp], #0x10
0x11b2c17a4 <+28>: retab
libclang_rt.asan_osx_dynamic.dylib`__asan::AsanThread::ClearShadowForThreadStackAndTLS:
0x11b2c17a8 <+0>: pacibsp
thread #16, stop reason = EXC_BAD_ACCESS (code=1, address=0x11b238518)
frame #0: 0x000000011b2c179c libclang_rt.asan_osx_dynamic.dylib`__asan::GetCurrentThread() + 20
libclang_rt.asan_osx_dynamic.dylib`__asan::GetCurrentThread:
-> 0x11b2c179c <+20>: ldr x0, [x0, #0x98]
0x11b2c17a0 <+24>: ldp x29, x30, [sp], #0x10
0x11b2c17a4 <+28>: retab
libclang_rt.asan_osx_dynamic.dylib`__asan::AsanThread::ClearShadowForThreadStackAndTLS:
0x11b2c17a8 <+0>: pacibsp
(lldb) bt
* thread #15, stop reason = EXC_BAD_ACCESS (code=1, address=0x11b238018)
* frame #0: 0x000000011b2c179c libclang_rt.asan_osx_dynamic.dylib`__asan::GetCurrentThread() + 20
frame #1: 0x000000011b286238 libclang_rt.asan_osx_dynamic.dylib`__asan_stack_malloc_0 + 36
frame #2: 0x000000030332e790 librealm-legate.1.dylib`Realm::PriorityQueue<Realm::Thread*, Realm::DummyLock>::peek(this=0x0000617000001d60, item_priority=0x0000000175c210c0, higher_than=-2147483647) const at pri_queue.inl:147
frame #3: 0x000000030332a304 librealm-legate.1.dylib`Realm::ThreadedTaskScheduler::scheduler_loop(this=0x0000617000001c80) at tasks.cc:1122:20
frame #4: 0x000000030332f278 librealm-legate.1.dylib`Realm::ThreadedTaskScheduler::scheduler_loop_wlock(this=0x0000617000001c80) at tasks.cc:1275:5
frame #5: 0x0000000303396770 librealm-legate.1.dylib`void Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop_wlock()>(obj=0x0000617000001c80) at threads.inl:97:5
frame #6: 0x00000003033a97ec librealm-legate.1.dylib`Realm::KernelThread::pthread_entry(data=0x000060f00005eff0) at threads.cc:868:5
frame #8: 0x00000003033fbaa0 librealm-legate.1.dylib`void std::__1::__thread_execute[abi:ne190102]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void* (*)(void*), Realm::KernelThread*, 2ul>(__t=size=3, (null)=__tuple_indices<2UL> @ 0x000000017572ae8f) at thread.h:198:3
frame #9: 0x00000003033f9770 librealm-legate.1.dylib`void* std::__1::__thread_proxy[abi:ne190102]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void* (*)(void*), Realm::KernelThread*>>(__vp=0x0000603000b1b560) at thread.h:207:3
frame #10: 0x000000011b2b24a8 libclang_rt.asan_osx_dynamic.dylib`asan_thread_start(void*) + 80
frame #11: 0x000000019db0bc0c libsystem_pthread.dylib`_pthread_start + 136
This -- as best as I can tell -- is stack overflow. I am not exactly sure where the bug is, or how it is getting triggered, but if I remove all of the custom threading code, altstack handling, stack size manipulation, and simply replace the main thread
member with std::thread
, this problem never appears.