Skip to content

i#5383: macOS a64 client threads and private TLS #7300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

ndrewh
Copy link
Contributor

@ndrewh ndrewh commented Feb 24, 2025

Adds macOS ARM64 support for client threads and private TLS (under -private_loader -- though we don't implement a full private loader yet).

  • Separates macOS x86 and A64 private TLS into separate files. The new A64 private TLS uses TLS_TYPE_SLOT in a similar manner to the linux riscv TLS implementation.
  • A64 handler for new_bsdthread_intercept
  • Fix wrong exit syscall in client_thread_run (it currently exits the entire process rather than thread)
  • dynamorio_mach_syscall and dynamorio_mach_dep_syscall now use the correct a64 calling convention

The end result is that client threads can start and terminate properly, and multithreaded applications can also terminate without crashing. I did not test attach/detach, but I'm guessing it's still broken (there is no injector implementation anyway iiuc)

@@ -1246,7 +1246,7 @@ signal_thread_inherit(dcontext_t *dcontext, void *clone_record)
* FIXME: are current pending or blocked inherited?
*/
#ifdef MACOS
if (record->app_thread_xsp != 0) {
if (record->app_thread_xsp == 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this code but you'll get heap oob/uaf asserts if this condition is !=.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is freeing the clone record.
See the comment in clone_record_t:

#ifdef MACOS
    /* XXX i#1403: once we have lower-level, earlier thread interception we can
     * likely switch to something closer to what we do on Linux.
     * This is used for bsdthread_create, where app_thread_xsp is NULL;
     * for vfork, app_thread_xsp is non-NULL and this is unused.
     */

This change != to == looks suspect. Please double check vs this pasted comment: if bsdthread_create ends up different on aarch64 please update that comment; does it need separate handling a64 vs x86.

Copy link
Contributor Author

@ndrewh ndrewh May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so please correct me if I'm wrong (and it's been like three months since I looked at this code, so I'm probably wrong), but in the MACOS case create_clone_record sets app_thread_xsp to NULL iff the clone record is heap-allocated. So I think the only case in which we want to free it is if it's NULL.

if (app_thread_xsp == NULL) {
record = HEAP_TYPE_ALLOC(GLOBAL_DCONTEXT, clone_record_t, ACCT_THREAD_MGT,
true /*prot*/);
record->app_thread_xsp = 0;
record->continuation_pc = thread_func;
record->thread_arg = thread_arg;
record->clone_flags = CLONE_THREAD | CLONE_VM | CLONE_SIGHAND | SIGCHLD;
} else {

@ndrewh
Copy link
Contributor Author

ndrewh commented Feb 25, 2025

Some messy stuff I'm not sure if there's a better way to do:

  • In dynamo_thread_init we allocate temporary TLS until os_tls_init because we will SEGV in mig_get_reply_port sometimes when acquiring locks if thread register is NULL (this occurs in client threads, which do not inherit TLS).
  • In dynamo_thread_exit_common we cannot use app TLS for a similar reason, because the app TLS is free'd by pthread_terminate before we intercept the call to bsdthread_terminate.

You'll get a backtrace like this if we do not maintain a valid thread register during entry and exit.

* thread #12, stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
  * frame #0: 0x00000001983c24e4 libsystem_kernel.dylib`mig_get_reply_port + 24
    frame #1: 0x00000001983c5144 libsystem_kernel.dylib`semaphore_create + 52
    frame #2: 0x000000010121b59c libdynamorio.dylib`mutex_get_contended_event [inlined] ksynch_init_var(synch=0x0000000300e73ad8) at ksynch_macos.c:87:9 [opt]
    frame #3: 0x000000010121b580 libdynamorio.dylib`mutex_get_contended_event(lock=0x00000001012481c0) at ksynch_macos.c:162:14 [opt]
    frame #4: 0x000000010120cf6c libdynamorio.dylib`mutex_wait_contended_lock(lock=0x00000001012481c0, mc=0x0000000000000000) at os.c:10477:30 [opt]
    frame #5: 0x000000010109ff14 libdynamorio.dylib`d_r_mutex_lock [inlined] d_r_mutex_lock_app(lock=<unavailable>, mc=0x0000000000000000) at utils.c:884:9 [opt]
    frame #6: 0x000000010109fe88 libdynamorio.dylib`d_r_mutex_lock(lock=<unavailable>) at utils.c:897:5 [opt]
    frame #7: 0x0000000101082530 libdynamorio.dylib`dynamo_thread_init(dstack_in="", mc=0x0000000000000000, os_data=0x0000000300e73c00, client_thread=true) at dynamo.c:2290:5 [opt]
    frame #8: 0x000000010120795c libdynamorio.dylib`client_thread_run at os.c:4110:5 [opt]

@ndrewh ndrewh marked this pull request as ready for review February 26, 2025 18:16
@derekbruening
Copy link
Contributor

under -private_loader -- though we don't implement a full private loader yet

Probably orthogonal to this PR: but do you think it is possible to load private copies of library on OSX? #1285 has some discussion about whether it will ever work well. Xref #7312 has some discussion of alternatives.

Copy link
Contributor

@derekbruening derekbruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contributing. I have mostly style comments but also some points need clarifying.

@@ -1246,7 +1246,7 @@ signal_thread_inherit(dcontext_t *dcontext, void *clone_record)
* FIXME: are current pending or blocked inherited?
*/
#ifdef MACOS
if (record->app_thread_xsp != 0) {
if (record->app_thread_xsp == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is freeing the clone record.
See the comment in clone_record_t:

#ifdef MACOS
    /* XXX i#1403: once we have lower-level, earlier thread interception we can
     * likely switch to something closer to what we do on Linux.
     * This is used for bsdthread_create, where app_thread_xsp is NULL;
     * for vfork, app_thread_xsp is non-NULL and this is unused.
     */

This change != to == looks suspect. Please double check vs this pasted comment: if bsdthread_create ends up different on aarch64 please update that comment; does it need separate handling a64 vs x86.

@ndrewh
Copy link
Contributor Author

ndrewh commented May 26, 2025

I think I addressed everything, except for the clone record free. I left a comment about that in the review and in the code, please see if it makes sense.

@ndrewh ndrewh requested a review from derekbruening May 26, 2025 16:48
@ndrewh
Copy link
Contributor Author

ndrewh commented May 27, 2025

Sorry, i was testing this some more and found out the mach syscalls via dynamorio_mach_syscall (i.e. for mach_thread_self, which we need here for TLS) are just completely broken on macOS a64. It doesn't seem like you can make the mach syscalls using the indirect syscall entrypoint (x16=0), which is what is what dynamorio does currently.

I've patched the drlibc asm to use the normal (non-indirect) convention (i.e. args in x0-x7, number in x16). Currently it seems like mach syscalls use negated syscall numbers, which works on my machine and matches the shipped asm on macOS 14.4.1:

(lldb) disass -s 0x180406108 -c 100
libsystem_kernel.dylib`thread_self_trap:
libsystem_kernel.dylib[0x180406108] <+0>: mov    x16, #-0x1b
libsystem_kernel.dylib[0x18040610c] <+4>: svc    #0x80
libsystem_kernel.dylib[0x180406110] <+8>: ret

This seems to differ from how it's done in drlibc_x86.asm (which ors SYSCALL_NUM_MARKER_MACH into the syscall number). I don't have a clue if this is a macOS version difference or if this is an ARM64 vs. X86 difference, and I don't have other machines to test on.

@ndrewh
Copy link
Contributor Author

ndrewh commented May 27, 2025

OK -- i took this opportunity to also implement dynamorio_mach_dep_syscall, so now this PR uses the raw machdep syscall instead of the _thread_set_tsd_base library call. I think I'm actually done now and this can be reviewed :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants