Skip to content

[FEATURE] Replace KernelThread with std::thread #297

@Jacobfaib

Description

@Jacobfaib

Is your feature request related to a problem? Please describe.

KernelThread re-implements a lot of standard functionality present in plain old std::thread while bifurcating the code for windows and pthreads.

This is potentially very bugprone, and does not play nicely with other low-level tools like ASAN.

Describe the solution you'd like
KernelThread should just be a thin wrapper over std::thread.

Additional context
As an added motivation for this change. On macOS, for long-running programs I often get

Process 97504 stopped
* thread #15, stop reason = EXC_BAD_ACCESS (code=1, address=0x11b238018)
    frame #0: 0x000000011b2c179c libclang_rt.asan_osx_dynamic.dylib`__asan::GetCurrentThread() + 20
libclang_rt.asan_osx_dynamic.dylib`__asan::GetCurrentThread:
->  0x11b2c179c <+20>: ldr    x0, [x0, #0x98]
    0x11b2c17a0 <+24>: ldp    x29, x30, [sp], #0x10
    0x11b2c17a4 <+28>: retab

libclang_rt.asan_osx_dynamic.dylib`__asan::AsanThread::ClearShadowForThreadStackAndTLS:
    0x11b2c17a8 <+0>:  pacibsp
  thread #16, stop reason = EXC_BAD_ACCESS (code=1, address=0x11b238518)
    frame #0: 0x000000011b2c179c libclang_rt.asan_osx_dynamic.dylib`__asan::GetCurrentThread() + 20
libclang_rt.asan_osx_dynamic.dylib`__asan::GetCurrentThread:
->  0x11b2c179c <+20>: ldr    x0, [x0, #0x98]
    0x11b2c17a0 <+24>: ldp    x29, x30, [sp], #0x10
    0x11b2c17a4 <+28>: retab

libclang_rt.asan_osx_dynamic.dylib`__asan::AsanThread::ClearShadowForThreadStackAndTLS:
    0x11b2c17a8 <+0>:  pacibsp
(lldb) bt
* thread #15, stop reason = EXC_BAD_ACCESS (code=1, address=0x11b238018)
  * frame #0: 0x000000011b2c179c libclang_rt.asan_osx_dynamic.dylib`__asan::GetCurrentThread() + 20
    frame #1: 0x000000011b286238 libclang_rt.asan_osx_dynamic.dylib`__asan_stack_malloc_0 + 36
    frame #2: 0x000000030332e790 librealm-legate.1.dylib`Realm::PriorityQueue<Realm::Thread*, Realm::DummyLock>::peek(this=0x0000617000001d60, item_priority=0x0000000175c210c0, higher_than=-2147483647) const at pri_queue.inl:147
    frame #3: 0x000000030332a304 librealm-legate.1.dylib`Realm::ThreadedTaskScheduler::scheduler_loop(this=0x0000617000001c80) at tasks.cc:1122:20
    frame #4: 0x000000030332f278 librealm-legate.1.dylib`Realm::ThreadedTaskScheduler::scheduler_loop_wlock(this=0x0000617000001c80) at tasks.cc:1275:5
    frame #5: 0x0000000303396770 librealm-legate.1.dylib`void Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop_wlock()>(obj=0x0000617000001c80) at threads.inl:97:5
    frame #6: 0x00000003033a97ec librealm-legate.1.dylib`Realm::KernelThread::pthread_entry(data=0x000060f00005eff0) at threads.cc:868:5
    frame #8: 0x00000003033fbaa0 librealm-legate.1.dylib`void std::__1::__thread_execute[abi:ne190102]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void* (*)(void*), Realm::KernelThread*, 2ul>(__t=size=3, (null)=__tuple_indices<2UL> @ 0x000000017572ae8f) at thread.h:198:3
    frame #9: 0x00000003033f9770 librealm-legate.1.dylib`void* std::__1::__thread_proxy[abi:ne190102]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void* (*)(void*), Realm::KernelThread*>>(__vp=0x0000603000b1b560) at thread.h:207:3
    frame #10: 0x000000011b2b24a8 libclang_rt.asan_osx_dynamic.dylib`asan_thread_start(void*) + 80
    frame #11: 0x000000019db0bc0c libsystem_pthread.dylib`_pthread_start + 136

This -- as best as I can tell -- is stack overflow. I am not exactly sure where the bug is, or how it is getting triggered, but if I remove all of the custom threading code, altstack handling, stack size manipulation, and simply replace the main thread member with std::thread, this problem never appears.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions