Description
The macOS libc implementation runs its own atfork()
handlers, which can allocate memory. I discovered this when my macOS CI hung while running tests with -fsanitize=thread
. I was able to ssh in and get a backtrace like this:
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00007ff8036a05f6 libsystem_kernel.dylib`swtch_pri + 10
frame #1: 0x00007ff8036dc8b6 libsystem_pthread.dylib`cthread_yield + 20
frame #2: 0x0000000106977a09 libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::internal_sched_yield() + 9
frame #3: 0x000000010697a5d5 libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::StaticSpinMutex::LockSlow() + 53
frame #4: 0x00000001069eaf06 libclang_rt.tsan_osx_dynamic.dylib`__tsan::DenseSlabAlloc<__tsan::MBlock, 262144ul, 4096ul, 3221225472ull>::Refill(__tsan::DenseSlabAllocCache*) + 470
frame #5: 0x00000001069e9a37 libclang_rt.tsan_osx_dynamic.dylib`__tsan::MetaMap::AllocBlock(__tsan::ThreadState*, unsigned long, unsigned long, unsigned long) + 71
frame #6: 0x00000001069c9f9c libclang_rt.tsan_osx_dynamic.dylib`__tsan::user_alloc_internal(__tsan::ThreadState*, unsigned long, unsigned long, unsigned long, bool) + 156
frame #7: 0x00000001069caa42 libclang_rt.tsan_osx_dynamic.dylib`__tsan::user_calloc(__tsan::ThreadState*, unsigned long, unsigned long, unsigned long) + 66
frame #8: 0x00000001069c7f63 libclang_rt.tsan_osx_dynamic.dylib`wrap_calloc + 115
frame #9: 0x00007ff80f195097 libsystem_coreservices.dylib`_dirhelper_init + 49
frame #10: 0x00007ff803709f86 libsystem_platform.dylib`_os_once_callout + 18
frame #11: 0x00007ff80f549c73 libSystem.B.dylib`libSystem_atfork_child + 48
frame #12: 0x00007ff8035ac598 libsystem_c.dylib`fork + 84
frame #13: 0x000000010699da26 libclang_rt.tsan_osx_dynamic.dylib`wrap_fork + 70
frame #14: 0x00000001064ff9f7 bfs`bfs_spawn(exe="/bin/echo", ctx=0x00007ff7b9ab96d8, argv=0x00007b0800001240, envp=0x00007ff7b9abad18) at xspawn.c:217:14
...
A multi-threaded process should only call async-signal-safe functions between fork()
and exec()
. But even following this rule in my own code, it's still possible to encounter hangs like the above since the platform itself is breaking that rule.
Why am I reporting this here instead of to Apple? Well first of all, I'm not an Apple customer. But mainly, since this is all within the implementation of libc, it's possible that their own allocator implementation guarantees that these calls are safe. TSan's replacement implementation, though, can't handle them. Since there's nothing a user can really do about this race, it would be great if TSan could work around it, perhaps by installing its own pthread_atfork()
handlers.