Skip to content

TSAN TLS on macOS accesses uninitialized memory on GCD workqueue creation #1466

Open
@rschreyer

Description

@rschreyer

On macOS, I'm encountering an issue where at some point my application crashes under this backtrace, upon the creation of a GCD thread (via dispatch_apply()):

 * frame #0: 0x0000000100789810 libclang_rt.tsan_osx_dynamic.dylib`__tsan::ProcWire(proc=0x0000000104938000, thr=0x0020400000032274) at tsan_rtl_proc.cpp:47:3
    frame #1: 0x0000000100799228 libclang_rt.tsan_osx_dynamic.dylib`__tsan::my_pthread_introspection_hook(event=1, thread=0x000000038c173000, addr=0x000000038c173000, size=16384) at tsan_platform_mac.cpp:236:7
    frame #2: 0x0000000100231188 libBacktraceRecording.dylib`pthread_introspection_hook + 72
    frame #3: 0x00000001001e0a60 libsystem_pthread.dylib`_pthread_introspection_hook_callout_thread_create + 88
    frame #4: 0x00000001001e030c libsystem_pthread.dylib`_pthread_wqthread_setup + 368
    frame #5: 0x00000001001e0018 libsystem_pthread.dylib`_pthread_wqthread + 52

The crash occurs because the thr parameter to ProcWire is bogus. That in turn occurs because of the hacky implementation of TLS where the ThreadState* is stored in the the shadow memory found from the pthread_t.

Critically, that TLS field is initialized lazily (presumably for to signal safety reasons?), and a nullptr value within the shadow memory is the key for lazy initialization. However, the contents of the shadow memory are not reliably 0x0. When it isn't, then whatever value is there becomes ThreadState*, which then crashes on first dereference in ProcWire.

The shadow memory will be non-zero when the kernel assigns the new pthread page to a VM range that was previously used for heap allocations, and so accesses via that memory have populated the shadow region.

I've attached a reproducer app, which allocates a large amount of memory to prime the heap (puts new heap allocations up in the same range that the kernel wants to put pthreads), and the iteratively allocates a random amount of memory and allocs/deletes GCD threads until the new GCD pthread lands in a previously used VM range.

tsan_debug.zip

I've also attached a small patch that works around the issue for me locally, but it is even uglier, and probably isn't suitable as a real fix.

tsan_patch.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions