Skip to content

PML/UCX: simple mpi test hangs if using UCX PML in singleton mode #13626

@hppritcha

Description

@hppritcha

The UCX PML doesn't work in singleton mode.

To reproduce do these steps

  • Build Open MPI main with UCX support.
  • Build hello_c in the examples folder.
  • run hello_c in singleton mode:

the process will hang in MPI_Finalize.

traceback shows:

#0  0x00007ffff6d8365a in opal_common_ucx_mca_pmix_fence (worker=0x87e310) at common_ucx.c:451
#1  0x00007ffff6d83a3c in opal_common_ucx_del_procs (procs=0x7fffec000e70, count=1, my_rank=0, max_disconnect=1, worker=0x87e310) at common_ucx.c:531
#2  0x00007ffff79bd10a in mca_pml_ucx_del_procs (procs=0x7fffec000ff0, nprocs=1) at pml_ucx.c:567
#3  0x00007ffff76e8d66 in ompi_mpi_instance_cleanup_pml () at instance/instance.c:189
#4  0x00007ffff6d0f566 in opal_finalize_cleanup_domain (domain=0x7ffff70480e0 <opal_init_domain>) at runtime/opal_finalize_core.c:129
#5  0x00007ffff6cfe757 in opal_finalize () at runtime/opal_finalize.c:56
#6  0x00007ffff76e5886 in ompi_rte_finalize () at runtime/ompi_rte.c:1045
#7  0x00007ffff76eb6e1 in ompi_mpi_instance_finalize_common () at instance/instance.c:951
#8  0x00007ffff76eb958 in ompi_mpi_instance_finalize (instance=0x7ffff7d93ac0 <ompi_mpi_instance_default>) at instance/instance.c:996
#9  0x00007ffff76dfd36 in ompi_mpi_finalize () at runtime/ompi_mpi_finalize.c:294
#10 0x00007ffff7731f1e in PMPI_Finalize () at finalize_generated.c:53
#11 0x00000000004008b8 in main (argc=1, argv=0x7ffffffefc08) at hello_c.c:23

This problem can be fixed by patching the UCX PML for the singleton case. Alternatively, the PMIx_Fence_nb implementation can be patched to invoke the cbfunc (if non null) in the quick exit for singleton case:

     /* if we are a singleton, there is nothing to do */
     if (pmix_client_globals.singleton) {
-        return PMIX_SUCCESS;
+        rc = PMIX_SUCCESS;
+        if(NULL != cbfunc) {
+            cbfunc(rc, cbdata);
+        }
+        return rc;
     }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions