[BUG] Finalization segfault if using Open MPI 5

**Describe the bug**
We have seen a segfault during runtime finalization when using UCX+Open MPi 5

**To Reproduce**
Steps to reproduce the behavior:
1. Create a conda env with Open MPI 5.0.8
```
conda create -n ompi5
conda activate ompi5
conda install conda-forge::openmpi
```
2. Compile realm with UCX
```
cmake ../ -DREALM_ENABLE_UCX=ON -DREALM_ENABLE_CUDA=OFF -DREALM_ENABLE_HIP=OFF -DREALM_ENABLE_OPENMP=OFF -DREALM_ENABLE_PYTHON=ON -DREALM_ENABLE_HDF5=OFF -DREALM_BUILD_TESTS=ON
```

3. Run any of the realm program, I picked memspeed and remove everything except runtime init and shutdown.
```
memspeed
```

**Expected behavior**
Here is the backtrace
```
Thread 6 (Thread 0x7fffe0fddc00 (LWP 1607363) "memspeed" (Exiting)):
#0  0x00007ffff73af5b0 in ?? ()
#1  0x00007ffff7894be1 in advise_stack_range (guardsize=<optimized out>, pd=140736968121344, size=<optimized out>, mem=0x7fffe0edb000) at ./nptl/allocatestack.c:195
#2  start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:551
#3  0x00007ffff79268c0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 5 (Thread 0x7ffff01a2c00 (LWP 1607362) "memspeed" (Exiting)):
#0  0x00007ffff73af5b0 in ?? ()
#1  0x00007ffff7894be1 in advise_stack_range (guardsize=<optimized out>, pd=140737221635072, size=<optimized out>, mem=0x7ffff00a0000) at ./nptl/allocatestack.c:195
#2  start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:551
#3  0x00007ffff79268c0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 3 (Thread 0x7ffff13ff640 (LWP 1607357) "cuda0000380000f"):
#0  0x00007ffff7918c3f in __GI___poll (fds=0x555555f15e50, nfds=3, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007ffff284164f in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007ffff290f18f in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007ffff283c233 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007ffff7894ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#5  0x00007ffff79268c0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 1 (Thread 0x7ffff7ab3c00 (LWP 1607353) "memspeed"):
#0  __futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=1607363, futex_word=0x7fffe0fdded0) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=128, abstime=0x0, clockid=0, expected=1607363, futex_word=0x7fffe0fdded0) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7fffe0fdded0, expected=1607363, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=128) at ./nptl/futex-internal.c:139
#3  0x00007ffff7896624 in __pthread_clockjoin_ex (threadid=140736968121344, thread_return=0x0, clockid=0, abstime=0x0, block=<optimized out>) at ./nptl/pthread_join_common.c:105
#4  0x000055555577b2cb in Realm::KernelThread::join (this=0x555555fadf60) at /home/weiwu/realm/src/realm/threads.cc:1062
#5  0x0000555555aaf17d in Realm::BackgroundWorkThread::join (this=0x555555d8ced0) at /home/weiwu/realm/src/realm/bgwork.cc:154
#6  0x0000555555ab0318 in Realm::BackgroundWorkManager::stop_dedicated_workers (this=0x555555d8d300) at /home/weiwu/realm/src/realm/bgwork.cc:335
#7  0x0000555555707b4d in Realm::RuntimeImpl::wait_for_shutdown (this=0x555555d8d020) at /home/weiwu/realm/src/realm/runtime_impl.cc:2826
#8  0x00005555556fc998 in Realm::Runtime::wait_for_shutdown (this=0x7fffffffde40) at /home/weiwu/realm/src/realm/runtime_impl.cc:734
#9  0x000055555559782a in main (argc=5, argv=0x7fffffffdf78) at /home/weiwu/realm/tests/memspeed.cc:584
```
If we remove the `MPI_Finaliza`, the segfault is gone.

Here is a branch that could reproduce the segfault even without UCX. https://github.com/StanfordLegion/realm/commits/debug-mpi
https://github.com/StanfordLegion/realm/commit/d4d4718816affc1d57f16bac28e2c91624d8766e
In this branch, we explicitly initialize the MPI bootstrap during initialization and close it during the finalization, then we can produce the segfault using
```
memspeed  -ll:networks none
```

The MPI bootstrap dlopen the mpi wrapper, which calls MPI_Init_thread and MPI_Finalize. If we replace the dlopen with a direct call to MPI, then the error is gone. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Finalization segfault if using Open MPI 5 #331

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Finalization segfault if using Open MPI 5 #331

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions