Skip to content

Openmpi + shell runner : error if cpu-list span 2 sockets #19

@couletj

Description

@couletj

On my clusters, nodes have 96 cores splitted in 2 sockets (2 * 48).

Shell runner is not enable to launch a batch of tests with OpenMPI if one specific job is scheduled for execution on CPU not sharing the same socket.

More precisely I runned the test suite with n_workers = 95.
We can see in pytest_static_sched_1.sh that one job is scheduled for execution on ranks 47 and 48 :

[some tests before]
(mpiexec --cpu-list 45,46 -np 2 python3 -u -m pytest -s --_worker --_scheduler_ip_address=XXX --_scheduler_port=XXX --_session_folder=tmpb3y7c9gv -p pytest_parallel.plugin -rfEs -s --durations=10 --scheduler=shell --n-workers=95 --_test_idx=71 {TESTPATH} > .pytest_parallel/tmpb3y7c9gv/{OUTPATH} 2>&1 ; python3 -m pytest_parallel.send_report --_scheduler_ip_address=XXX --_scheduler_port=XXX --_session_folder=tmpb3y7c9gv --_test_idx=71 --_test_name=.pytest_parallel/tmpb3y7c9gv/{TESTNAME} & \
(mpiexec --cpu-list 47,48 -np 2 python3 -u -m pytest -s --_worker --_scheduler_ip_address=XXX --_scheduler_port=XXX --_session_folder=tmpb3y7c9gv -p pytest_parallel.plugin -rfEs -s --durations=10 --scheduler=shell --n-workers=95 --_test_idx=72 {TESTPATH} > .pytest_parallel/tmpb3y7c9gv/{OUTPATH} 2>&1 ; python3 -m pytest_parallel.send_report --_scheduler_ip_address=XXX --_scheduler_port=XXX --_session_folder=tmpb3y7c9gv --_test_idx=72 --_test_name=.pytest_parallel/tmpb3y7c9gv/{TESTNAME} & \
[some tests after]

All the step crash with the message
INTERNALERROR> AssertionError: FATAL ERROR in pytest_parallel early processing
and if I look the log file of related, it contains this OpemMPI error :

--------------------------------------------------------------------------
Your job failed to map because the resulting process placement
would cause the process to be bound to CPUs in more than one
package:

  Mapping policy:  PE-LIST:NOOVERSUBSCRIBE
  Binding policy:  CORE:IF-SUPPORTED
  PE-LIST:         47,48

This configuration almost always results in a loss of performance
that can significantly impact applications. Either alter the
mapping, binding, and/or PE-LIST policies so that each process
can fit into a single package, or consider using an alternative
mapper that can handle this configuration (e.g., the rankfile mapper).
--------------------------------------------------------------------------

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions