Skip to content

[LibOS] Race condition triggers ASan use‑after‑poison in execve path ( release_clear_child_tid ) #2148

@forkthus

Description

@forkthus

Description of the problem

exec_same fails on the Jenkins-SGX-24.04-Sanitizers job for PR #1795 with an ASan use‑after‑poison inside release_clear_child_tid(). Reproduces on main, so this is not PR‑specific.

[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan: use-after-poison (unallocated SGX memory?) while trying to store 4 bytes at 0x84c0990
[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan: the bad address is 0x84c0990 (0 from beginning of access)
[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan: location: release_clear_child_tid at libos_futex.c, libsysdb.so+0x49ab05 (addr = 0xeb25b05)
[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan: (for a full traceback, use GDB with a breakpoint at "libos_abort")
[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan:
[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan: shadow bytes around the bad address:
[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan:   0x180010980f0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan:   0x18001098100: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan:   0x18001098110: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan:   0x18001098120: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
[2025-05-07T09:06:11.907Z] [2.257] [P1:libos] error: asan: =>0x18001098130: f7 f7[f7]f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
...

Root Cause:

  1. thread_exit() enqueues a cleanup_thread task on the async‑worker queue; that task writes *clear_child_tid.
  2. In the libos_syscall_execve() path, the VMA of the thread’s TCB is freed before the async worker gets to run:
    __atomic_store_n(clear_child_tid, 0, __ATOMIC_RELEASE);
  3. When the worker eventually stores 0 to *clear_child_tid, it writes to memory that has already been freed.

Steps to reproduce

  1. Build Gramine with SGX, ASAN, and UBSAN.
  2. Run exec_same test with args [arg_#1...arg_#49]

Expected results

The async‑worker thread should zero each exiting thread’s *clear_child_tid before that thread’s VMA is freed.

Actual results

libos_syscall_execve() frees the thread’s VMA first, and the async worker attempts to write to *clear_child_tid afterwards, resulting in a use‑after‑poison.

Gramine commit hash

f0f71be / ff71d7a

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions