-
-
Notifications
You must be signed in to change notification settings - Fork 163
Description
Issue description:
When running GPU workloads (e.g. benchmark/gpu/workload/vec_add.cu) with the bpftime CUDA agent attached, the process sometimes segfaults on exit.
The crash happens in the CUDA watcher thread, which polls ctx->cuda_shared_mem->flag1 in bpf_attach_ctx::start_cuda_watcher_thread().
Root Cause:
The agent process opens the global shared memory with bpftime_initialize_global_shm(shm_open_type::SHM_OPEN_ONLY).
In runtime/src/bpftime_shm_internal.cpp, the global destructor __destruct_shm() unconditionally calls bpftime_destroy_global_shm().
bpftime_shm::~bpftime_shm() unmaps the Boost shared memory segment and calls cudaHostUnregister(base_addr) for the whole segment, which includes the cuda::CommSharedMem holding flag1/flag2.
The CUDA watcher thread is started in bpf_attach_ctx::start_cuda_watcher_thread() and is detached; because bpf_attach_ctx is stored in a union (bpf_attach_ctx_holder) whose destructor is empty, ~bpf_attach_ctx() never runs at process exit, so cuda_watcher_should_stop is never set to true.