Skip to content

mpi4py>=4.0 prevent the blockage from casampi.private.start_mpi calls #3

@r-xue

Description

@r-xue

Traditionally, the Python processes spawned from an mpicasa call split their roles when casampi.private.start_mpi is executed: rank 0 becomes the MPIclient, while non-rank 0 processes are placed in a holding pattern as MPIServers.

It appears that builds using conda-forge mpi4py>=4.0+openmpi=5.0.4 alter this behavior, as non-rank 0 processes no longer assume their server roles after the casampi.private.start_mpi call. I have confirmed that this issue arises solely due to the mpi4py version bump in both CASA version 6.6.1 and 6.6.6.

(pipe1669py38) rxue@xenon:~/Workspace/nvme/nrao/tickets/PIPE-1669/working$ casa6mpi_xvfb pipeline_flag -c ../scripts/test_working.py

======================   ALLOCATED NODES   ======================
    xenon: slots=1 max_slots=0 slots_inuse=0 state=UP
        Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED
        aliases: xenon
=================================================================

======================   ALLOCATED NODES   ======================
    xenon: slots=18 max_slots=0 slots_inuse=0 state=UP
        Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN
        aliases: xenon
=================================================================

======================   ALLOCATED NODES   ======================
    xenon: slots=18 max_slots=0 slots_inuse=0 state=UP
        Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN
        aliases: xenon
=================================================================

========================   JOB MAP   ========================
Data for JOB prterun-xenon-3573009@1 offset 0 Total slots allocated 18
    Mapping policy: BYCORE:OVERSUBSCRIBE  Ranking policy: FILL Binding policy: NONE
    Cpu set: N/A  PPR: N/A  Cpus-per-rank: N/A  Cpu Type: CORE


Data for node: xenon    Num slots: 18   Max slots: 0    Num procs: 8
        Process jobid: prterun-xenon-3573009@1 App: 0 Process rank: 0 Bound: N/A
        Process jobid: prterun-xenon-3573009@1 App: 0 Process rank: 1 Bound: N/A
        Process jobid: prterun-xenon-3573009@1 App: 0 Process rank: 2 Bound: N/A
        Process jobid: prterun-xenon-3573009@1 App: 0 Process rank: 3 Bound: N/A
        Process jobid: prterun-xenon-3573009@1 App: 0 Process rank: 4 Bound: N/A
        Process jobid: prterun-xenon-3573009@1 App: 0 Process rank: 5 Bound: N/A
        Process jobid: prterun-xenon-3573009@1 App: 0 Process rank: 6 Bound: N/A
        Process jobid: prterun-xenon-3573009@1 App: 0 Process rank: 7 Bound: N/A

=============================================================

Using configuration file ~/.casa/config.py

Using configuration file ~/.casa/config.py

Using configuration file ~/.casa/config.py

Using configuration file ~/.casa/config.py

Using configuration file ~/.casa/config.py

Using configuration file ~/.casa/config.py

Using configuration file ~/.casa/config.py

Using configuration file ~/.casa/config.py
Using user-supplied startup.py at ~/.casa/startup.py

Using user-supplied startup.py at ~/.casa/startup.py

Using user-supplied startup.py at ~/.casa/startup.py

Using user-supplied startup.py at ~/.casa/startup.py

Using user-supplied startup.py at ~/.casa/startup.py

Using user-supplied startup.py at ~/.casa/startup.py

Using user-supplied startup.py at ~/.casa/startup.py

Using user-supplied startup.py at ~/.casa/startup.py

No event loop hook running.
No event loop hook running.
No event loop hook running.
No event loop hook running.
No event loop hook running.
No event loop hook running.
No event loop hook running.
No event loop hook running.
/zfs/nvme/Workspace/nrao/tickets/PIPE-1669/working
/zfs/nvme/Workspace/nrao/tickets/PIPE-1669/working
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Start a pipeline processing session:
casalog.num_cpus:        36
casalog.total_memory:    131798098
casalog.omp_num_threads: 1
is_mpi_enabled:          False
/zfs/nvme/Workspace/nrao/tickets/PIPE-1669/working
/zfs/nvme/Workspace/nrao/tickets/PIPE-1669/working
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Start a pipeline processing session:
casalog.num_cpus:        36
casalog.total_memory:    131798098
casalog.omp_num_threads: 1
is_mpi_enabled:          False
/zfs/nvme/Workspace/nrao/tickets/PIPE-1669/working
/zfs/nvme/Workspace/nrao/tickets/PIPE-1669/working
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Start a pipeline processing session:
casalog.num_cpus:        36
casalog.total_memory:    131798098
casalog.omp_num_threads: 1
is_mpi_enabled:          False
/zfs/nvme/Workspace/nrao/tickets/PIPE-1669/working
/zfs/nvme/Workspace/nrao/tickets/PIPE-1669/working
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Start a pipeline processing session:
casalog.num_cpus:        36
casalog.total_memory:    131798098
casalog.omp_num_threads: 1
is_mpi_enabled:          False
/zfs/nvme/Workspace/nrao/tickets/PIPE-1669/working
/zfs/nvme/Workspace/nrao/tickets/PIPE-1669/working
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Start a pipeline processing session:
casalog.num_cpus:        36
casalog.total_memory:    131798098
casalog.omp_num_threads: 1
is_mpi_enabled:          False
/zfs/nvme/Workspace/nrao/tickets/PIPE-1669/working
/zfs/nvme/Workspace/nrao/tickets/PIPE-1669/working
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Start a pipeline processing session:
casalog.num_cpus:        36
casalog.total_memory:    131798098
casalog.omp_num_threads: 1
is_mpi_enabled:          False
/zfs/nvme/Workspace/nrao/tickets/PIPE-1669/working
/zfs/nvme/Workspace/nrao/tickets/PIPE-1669/working
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Start a pipeline processing session:
casalog.num_cpus:        36
casalog.total_memory:    131798098
casalog.omp_num_threads: 1
is_mpi_enabled:          False
/zfs/nvme/Workspace/nrao/tickets/PIPE-1669/working
/zfs/nvme/Workspace/nrao/tickets/PIPE-1669/working
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Start a pipeline processing session:
casalog.num_cpus:        36
casalog.total_memory:    131798098
casalog.omp_num_threads: 1
is_mpi_enabled:          False
openmpi version:         ('Open MPI', (5, 0, 4))
--------------------------------------------------------------------------------
openmpi version:         ('Open MPI', (5, 0, 4))
--------------------------------------------------------------------------------
openmpi version:         ('Open MPI', (5, 0, 4))
--------------------------------------------------------------------------------
openmpi version:         ('Open MPI', (5, 0, 4))
--------------------------------------------------------------------------------
openmpi version:         ('Open MPI', (5, 0, 4))
--------------------------------------------------------------------------------
openmpi version:         ('Open MPI', (5, 0, 4))
--------------------------------------------------------------------------------
openmpi version:         ('Open MPI', (5, 0, 4))
--------------------------------------------------------------------------------
openmpi version:         ('Open MPI', (5, 0, 4))
--------------------------------------------------------------------------------
CASA 6.6.1.15 -- Common Astronomy Software Applications
CASA 6.6.1.15 -- Common Astronomy Software Applications
CASA 6.6.1.15 -- Common Astronomy Software Applications
CASA 6.6.1.15 -- Common Astronomy Software Applications
CASA 6.6.1.15 -- Common Astronomy Software Applications
CASA 6.6.1.15 -- Common Astronomy Software Applications
2024-10-12 20:16:52     INFO    ::casa::MPIServer-1     Using configuration file ~/.casa/config.py
2024-10-12 20:16:52     INFO    ::casa::MPIServer-5     Using configuration file ~/.casa/config.py
2024-10-12 20:16:52     INFO    ::casa::MPIServer-5     Using user-supplied startup.py at ~/.casa/startup.py
2024-10-12 20:16:52     INFO    ::casa::MPIServer-1     Using user-supplied startup.py at ~/.casa/startup.py
2024-10-12 20:16:52     INFO    ::casa::MPIServer-1
2024-10-12 20:16:52     INFO    ::casa::MPIServer-1     Checking Measures tables in data repository sub-directory /home/rxue/Workspace/nvme/nrao/casa_dist/casarundata/geodetic
2024-10-12 20:16:52     INFO    ::casa  Using configuration file ~/.casa/config.py
2024-10-12 20:16:52     INFO    ::casa  Using user-supplied startup.py at ~/.casa/startup.py
2024-10-12 20:16:52     INFO    ::casa::MPIServer-2     Using configuration file ~/.casa/config.py
2024-10-12 20:16:52     INFO    ::casa::MPIServer-2     Using user-supplied startup.py at ~/.casa/startup.py
2024-10-12 20:16:52     INFO    ::casa::MPIServer-2
2024-10-12 20:16:52     INFO    ::casa
2024-10-12 20:16:52     INFO    ::casa::MPIServer-5
2024-10-12 20:16:52     INFO    ::casa::MPIServer-2     Checking Measures tables in data repository sub-directory /home/rxue/Workspace/nvme/nrao/casa_dist/casarundata/geodetic
2024-10-12 20:16:52     INFO    ::casa::MPIServer-5     Checking Measures tables in data repository sub-directory /home/rxue/Workspace/nvme/nrao/casa_dist/casarundata/geodetic
2024-10-12 20:16:52     INFO    ::casa  Checking Measures tables in data repository sub-directory /home/rxue/Workspace/nvme/nrao/casa_dist/casarundata/geodetic
CASA 6.6.1.15 -- Common Astronomy Software Applications
CASA 6.6.1.15 -- Common Astronomy Software Applications
2024-10-12 20:16:52     INFO    ::casa::MPIServer-7     Using configuration file ~/.casa/config.py
2024-10-12 20:16:52     INFO    ::casa::MPIServer-4     Using configuration file ~/.casa/config.py
2024-10-12 20:16:52     INFO    ::casa::MPIServer-6     Using configuration file ~/.casa/config.py
2024-10-12 20:16:52     INFO    ::casa::MPIServer-7     Using user-supplied startup.py at ~/.casa/startup.py
2024-10-12 20:16:52     INFO    ::casa::MPIServer-3     Using configuration file ~/.casa/config.py
2024-10-12 20:16:52     INFO    ::casa::MPIServer-4     Using user-supplied startup.py at ~/.casa/startup.py
2024-10-12 20:16:52     INFO    ::casa::MPIServer-6     Using user-supplied startup.py at ~/.casa/startup.py
2024-10-12 20:16:52     INFO    ::casa::MPIServer-3     Using user-supplied startup.py at ~/.casa/startup.py
2024-10-12 20:16:52     INFO    ::casa::MPIServer-7
2024-10-12 20:16:52     INFO    ::casa::MPIServer-7     Checking Measures tables in data repository sub-directory /home/rxue/Workspace/nvme/nrao/casa_dist/casarundata/geodetic
2024-10-12 20:16:52     INFO    ::casa::MPIServer-3
2024-10-12 20:16:52     INFO    ::casa::MPIServer-6
2024-10-12 20:16:52     INFO    ::casa::MPIServer-4
2024-10-12 20:16:52     INFO    ::casa::MPIServer-3     Checking Measures tables in data repository sub-directory /home/rxue/Workspace/nvme/nrao/casa_dist/casarundata/geodetic
2024-10-12 20:16:52     INFO    ::casa::MPIServer-6     Checking Measures tables in data repository sub-directory /home/rxue/Workspace/nvme/nrao/casa_dist/casarundata/geodetic
2024-10-12 20:16:52     INFO    ::casa::MPIServer-4     Checking Measures tables in data repository sub-directory /home/rxue/Workspace/nvme/nrao/casa_dist/casarundata/geodetic
2024-10-12 20:16:52     INFO    ::casa::MPIServer-1       IERSeop2000 (version date, last date in table (UTC)): 2024/10/05/15:00, 2024/09/05/00:00:00
2024-10-12 20:16:52     INFO    ::casa    IERSeop2000 (version date, last date in table (UTC)): 2024/10/05/15:00, 2024/09/05/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-5       IERSeop2000 (version date, last date in table (UTC)): 2024/10/05/15:00, 2024/09/05/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-7       IERSeop2000 (version date, last date in table (UTC)): 2024/10/05/15:00, 2024/09/05/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-3       IERSeop2000 (version date, last date in table (UTC)): 2024/10/05/15:00, 2024/09/05/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-4       IERSeop2000 (version date, last date in table (UTC)): 2024/10/05/15:00, 2024/09/05/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-6       IERSeop2000 (version date, last date in table (UTC)): 2024/10/05/15:00, 2024/09/05/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-2       IERSeop2000 (version date, last date in table (UTC)): 2024/10/05/15:00, 2024/09/05/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-5       IERSeop97 (version date, last date in table (UTC)): 2024/10/05/15:00, 2024/09/05/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-7       IERSeop97 (version date, last date in table (UTC)): 2024/10/05/15:00, 2024/09/05/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-4       IERSeop97 (version date, last date in table (UTC)): 2024/10/05/15:00, 2024/09/05/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-3       IERSeop97 (version date, last date in table (UTC)): 2024/10/05/15:00, 2024/09/05/00:00:00
2024-10-12 20:16:52     INFO    ::casa    IERSeop97 (version date, last date in table (UTC)): 2024/10/05/15:00, 2024/09/05/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-6       IERSeop97 (version date, last date in table (UTC)): 2024/10/05/15:00, 2024/09/05/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-1       IERSeop97 (version date, last date in table (UTC)): 2024/10/05/15:00, 2024/09/05/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-7       IERSpredict (version date, last date in table (UTC)): 2024/10/11/15:00, 2025/01/09/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-5       IERSpredict (version date, last date in table (UTC)): 2024/10/11/15:00, 2025/01/09/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-6       IERSpredict (version date, last date in table (UTC)): 2024/10/11/15:00, 2025/01/09/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-3       IERSpredict (version date, last date in table (UTC)): 2024/10/11/15:00, 2025/01/09/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-4       IERSpredict (version date, last date in table (UTC)): 2024/10/11/15:00, 2025/01/09/00:00:00
2024-10-12 20:16:52     INFO    ::casa    IERSpredict (version date, last date in table (UTC)): 2024/10/11/15:00, 2025/01/09/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-1       IERSpredict (version date, last date in table (UTC)): 2024/10/11/15:00, 2025/01/09/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-2       IERSeop97 (version date, last date in table (UTC)): 2024/10/05/15:00, 2024/09/05/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-6       TAI_UTC (version date, last date in table (UTC)): 2024/09/29/15:00, 2017/01/01/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-3       TAI_UTC (version date, last date in table (UTC)): 2024/09/29/15:00, 2017/01/01/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-7       TAI_UTC (version date, last date in table (UTC)): 2024/09/29/15:00, 2017/01/01/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-5       TAI_UTC (version date, last date in table (UTC)): 2024/09/29/15:00, 2017/01/01/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-4       TAI_UTC (version date, last date in table (UTC)): 2024/09/29/15:00, 2017/01/01/00:00:00
2024-10-12 20:16:52     INFO    ::casa    TAI_UTC (version date, last date in table (UTC)): 2024/09/29/15:00, 2017/01/01/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-1       TAI_UTC (version date, last date in table (UTC)): 2024/09/29/15:00, 2017/01/01/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-2       IERSpredict (version date, last date in table (UTC)): 2024/10/11/15:00, 2025/01/09/00:00:00
2024-10-12 20:16:52     INFO    ::casa::MPIServer-2       TAI_UTC (version date, last date in table (UTC)): 2024/09/29/15:00, 2017/01/01/00:00:00
working? False
working? False
working? False
working? False
working? False
working? False
working? False
working? False
--------------------------------------------------------------------------

As an interim solution, we should continue using mpi4py<4+openmpi=5.0.3 for the time being.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions