Skip to content

i#1973: musl: Scan stack for kernel arguments when used as library #7266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

ziyao233
Copy link
Member

@ziyao233 ziyao233 commented Feb 10, 2025

Musl doesn't pass any arguments to constructors. To obtain environment variables for initialization of dynamorio, we scan the stack and search for specific patterns in the auxiliary vector passed by kernel, then walk back towards the top and find the environment variables.

We choose to match AT_EXECFN, which has been passed unconditionally by the kernel from 2012. Its value should be a valid address, providing an extra check.

This makes using dynamorio as a shared library possible on musl thus fixes several testcases. The logic could be enabled as fallback on older Android platforms later as well.

Issue: #1973

Musl doesn't pass any arguments to constructors. To obtain environment
variables for initialization of dynamorio, we scan the stack and search
for specific patterns in the auxiliary vector passed by kernel, then
walk back towards the top and find the environment variables.

We choose to match AT_EXECFN, which has been passed unconditionally by
the kernel from 2012. Its value should be a valid address, providing an
extra check.

This makes using dynamorio as a shared library possible on musl thus
fixes several testcases. The logic could be enabled as fallback on older
Android platforms as well.

Issue: DynamoRIO#1973
@@ -788,6 +788,82 @@ static init_fn_t
#else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic could be enabled as fallback on older Android platforms later as well.

Note that get_kernel_args() in loader_android.c for older Android finds argc, argv, and envp inside a structure pointed at by TLS which I believe is specific to Android.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the stack layouts on starting a process between normal Linux kernels and Android ones should be the same, thus it should be okay to reuse it for Android.

core/unix/os.c Outdated

/* Check for AT_EXECFN entry in the auxvector, which contains pathname
* of the program and should be a readable address. */
if (p->a_type == AT_EXECFN && check_address_readable((void *)p->a_un.a_val)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a strong enough check? An integer 31 adjacent to a valid pointer? It seems like this could happen to appear somewhere else on the stack.

Copy link
Member Author

@ziyao233 ziyao233 Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously it's not that strong. My original intention is to depend on prctl PR_GET_AUXV, which returns the content of auxvector, enabling an exact match. But it is relatively new (introduced in Linux 6.4).

Alpine Linux started to ship 6.6 kernel from 3.19, two versions before current LTS, so it may not be that problematic.

Do you think it's acceptable to use PR_GET_AUXV to do some exact address matching and fallback to current solution if it isn't available? Or bail out in case of absence of PR_GET_AUXV to prevent possible failures of the flaky solution?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since limited to ifdef MUSL that may be ok. Make sure to add a comment about it being new. And a comment about the fragility of the fallback if it's left in place.

core/unix/os.c Outdated

/* Check for AT_EXECFN entry in the auxvector, which contains pathname
* of the program and should be a readable address. */
if (p->a_type == AT_EXECFN && check_address_readable((void *)p->a_un.a_val)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since limited to ifdef MUSL that may be ok. Make sure to add a comment about it being new. And a comment about the fragility of the fallback if it's left in place.

@ziyao233
Copy link
Member Author

I'm too busy this month and plan to continue working on the PR in May. Thanks for your patience :)

@ziyao233 ziyao233 requested a review from derekbruening May 21, 2025 04:26
@@ -115,6 +115,8 @@ typedef struct rlimit64 rlimit64_t;
#endif

#ifdef LINUX
/* Definitions for PR_GET_AUXV */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: Used end punctuation in new comments (no trailing . here).

@@ -182,6 +184,11 @@ char **our_environ;

#include "decode_fast.h" /* decode_cti: maybe os_handle_mov_seg should be ifdef X86? */

/* For auxvector keys (AT_*) used by search_auxvector() */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: Used end punctuation in new comments (no trailing . here).

* pointer entries as marker when searching through the address space.
*
* TODO: PR_GET_AUXV is relatively new (introduced in Linux 6.4), implement a
* fallback when it's unavailable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use /proc/self/auxv which looks like it was added in 2.6.0: https://www.man7.org/linux/man-pages/man5/proc_pid_auxv.5.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, this is a good idea. I'll adapt it.


# if defined(MUSL)
/* We rely on PR_GET_AUXV to obtain a copy of auxvector and take one of the
* pointer entries as marker when searching through the address space.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But below it uses AT_UID which is not a pointer entry.

if (!d_r_safe_read(p, sizeof(ELF_AUXV_TYPE), &entry))
return NULL;

if (p->a_type == AT_UID && p->a_un.a_val == uid) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AT_UID is 11. So this is just looking for 11 next to the uid: and the uid could be a small integer, even 0 for root: so this doesn't seem a very conclusive check. I think it needs to look at a pointer value much less likely to be mistaken; ideally multiple values.

* auxvector entries cannot be used as target marker when searching, or
* applications linked to libdynamorio.so, like test client.gonative, will fail
* to execute when running under early injection. Switch to use a more-reliable
* address-related entry when the problem is resolved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this code is DR itself: if DR intercepts the application calling SYS_prctl that interception would not apply to DR code itself which does direct syscalls, unless you added extra code to go through the interception: but if we already have the data to change the interception at this point we can avoid all that and directly use the updated app-view value and not bother with a prctl syscall. But we don't have that info: isn't this super early before DR has even initialized? So the auxv values on the stack haven't even been modified for transparency. So I don't follow this comment: it doesn't seem to apply to this code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this code is DR itself: if DR intercepts the application calling SYS_prctl that interception would not apply to DR code itself which does direct syscalls, unless you added extra code to go through the interception: but if we already have the data to change the interception at this point we can avoid all that and directly use the updated app-view value and not bother with a prctl syscall. But we don't have that info: isn't this super early before DR has even initialized? So the auxv values on the stack haven't even been modified for transparency. So I don't follow this comment: it doesn't seem to apply to this code.

The problem is not about the libdynamorio.so being execv'ed when doing early injection, in which case our customized _start rountine just calls privload_early_inject(), and the only caller of search_auxvector(), _init(), won't get executed actually.

The client.gonative's case is that an application links to DynamoRIO directly, i.e. ldd shows the dependency for libdynamorio.so. In this case libdynamorio.so will be loaded twice in the same address space (assuming early injection is used): once being execv'ed directly, and this copy of DynamoRIO runs natively and could perform syscalls without being intercepted; once being loaded by the dynamic loader of the application. I don't find any materials suggesting this copy could be identified by the DynamoRIO runtime (the one being execv'ed) and thus runs natively, does it?. If not, this copy is actually part of the application and runs under DynamoRIO's interception, and we'll hit the transparency problem: the auxvector bits on stack are correctly intercepted, but the ones obtained by PR_GET_AUXV aren't, here comes the mismatch and trials to match an auxvector entry will fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I lost the context of this change in the time gap between reviews.

Taking a step back: Trying to figure out whether we need to support this case.

This client.gonative seems to only use libdynamorio.so for dr_app_running_under_dynamorio() but does not use the start/stop interface: it seems it should use DYNAMORIO_ANNOTATE_RUNNING_ON_DYNAMORIO() instead and not need to pull in the DR library.

For drdecode: we have a static library.

I guess we do claim to provide the full shared library for standalone usage, beyond the drdecode static library: so currently the docs say you can use the full libdynamorio.so w/o the start/stop interface. And I guess we should try to support running such an app under DR.

In such a case: what does it need envp for? It looks at auxv for page size, but has a fallback; it looks for config and options in env vars, though that doesn't matter much for standalone usage.

Conclusion: this is a marginal use case, but if it doesn't take much effort to support it, it is probably worth it. If it gets difficult, we should consider scope management rather than spending too much effort here.

Re-reading your comment: it makes sense now.

if (!d_r_safe_read(p, sizeof(ELF_AUXV_TYPE), &entry))
return NULL;

if (p->a_type == AT_UID && p->a_un.a_val == uid) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says "We choose to match AT_EXECFN": please update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants