Skip to content

[BUG] Finalization segfault if using Open MPI 5 #331

@eddy16112

Description

@eddy16112

Describe the bug
We have seen a segfault during runtime finalization when using UCX+Open MPi 5

To Reproduce
Steps to reproduce the behavior:

  1. Create a conda env with Open MPI 5.0.8
conda create -n ompi5
conda activate ompi5
conda install conda-forge::openmpi
  1. Compile realm with UCX
cmake ../ -DREALM_ENABLE_UCX=ON -DREALM_ENABLE_CUDA=OFF -DREALM_ENABLE_HIP=OFF -DREALM_ENABLE_OPENMP=OFF -DREALM_ENABLE_PYTHON=ON -DREALM_ENABLE_HDF5=OFF -DREALM_BUILD_TESTS=ON
  1. Run any of the realm program, I picked memspeed and remove everything except runtime init and shutdown.
memspeed

Expected behavior
Here is the backtrace

Thread 6 (Thread 0x7fffe0fddc00 (LWP 1607363) "memspeed" (Exiting)):
#0  0x00007ffff73af5b0 in ?? ()
#1  0x00007ffff7894be1 in advise_stack_range (guardsize=<optimized out>, pd=140736968121344, size=<optimized out>, mem=0x7fffe0edb000) at ./nptl/allocatestack.c:195
#2  start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:551
#3  0x00007ffff79268c0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 5 (Thread 0x7ffff01a2c00 (LWP 1607362) "memspeed" (Exiting)):
#0  0x00007ffff73af5b0 in ?? ()
#1  0x00007ffff7894be1 in advise_stack_range (guardsize=<optimized out>, pd=140737221635072, size=<optimized out>, mem=0x7ffff00a0000) at ./nptl/allocatestack.c:195
#2  start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:551
#3  0x00007ffff79268c0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 3 (Thread 0x7ffff13ff640 (LWP 1607357) "cuda0000380000f"):
#0  0x00007ffff7918c3f in __GI___poll (fds=0x555555f15e50, nfds=3, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007ffff284164f in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007ffff290f18f in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007ffff283c233 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007ffff7894ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#5  0x00007ffff79268c0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 1 (Thread 0x7ffff7ab3c00 (LWP 1607353) "memspeed"):
#0  __futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=1607363, futex_word=0x7fffe0fdded0) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=128, abstime=0x0, clockid=0, expected=1607363, futex_word=0x7fffe0fdded0) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7fffe0fdded0, expected=1607363, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=128) at ./nptl/futex-internal.c:139
#3  0x00007ffff7896624 in __pthread_clockjoin_ex (threadid=140736968121344, thread_return=0x0, clockid=0, abstime=0x0, block=<optimized out>) at ./nptl/pthread_join_common.c:105
#4  0x000055555577b2cb in Realm::KernelThread::join (this=0x555555fadf60) at /home/weiwu/realm/src/realm/threads.cc:1062
#5  0x0000555555aaf17d in Realm::BackgroundWorkThread::join (this=0x555555d8ced0) at /home/weiwu/realm/src/realm/bgwork.cc:154
#6  0x0000555555ab0318 in Realm::BackgroundWorkManager::stop_dedicated_workers (this=0x555555d8d300) at /home/weiwu/realm/src/realm/bgwork.cc:335
#7  0x0000555555707b4d in Realm::RuntimeImpl::wait_for_shutdown (this=0x555555d8d020) at /home/weiwu/realm/src/realm/runtime_impl.cc:2826
#8  0x00005555556fc998 in Realm::Runtime::wait_for_shutdown (this=0x7fffffffde40) at /home/weiwu/realm/src/realm/runtime_impl.cc:734
#9  0x000055555559782a in main (argc=5, argv=0x7fffffffdf78) at /home/weiwu/realm/tests/memspeed.cc:584

If we remove the MPI_Finaliza, the segfault is gone.

Here is a branch that could reproduce the segfault even without UCX. https://github.com/StanfordLegion/realm/commits/debug-mpi
d4d4718
In this branch, we explicitly initialize the MPI bootstrap during initialization and close it during the finalization, then we can produce the segfault using

memspeed  -ll:networks none

The MPI bootstrap dlopen the mpi wrapper, which calls MPI_Init_thread and MPI_Finalize. If we replace the dlopen with a direct call to MPI, then the error is gone.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions