Skip to content

MPI launch error when using shell scheduler #8

@BerengerBerthoul

Description

@BerengerBerthoul

With Intel MPI and scheduler=shell, it seems that the MPI processes in a step can't be launched concurrently and fail with this error :

f01.1629534PSM2 can't open hfi unit: 0 (err=23)
Abort(1615247) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(178)........: 
MPID_Init(1532)..............: 
MPIDI_OFI_mpi_init_hook(1552): 
create_vni_context(2131).....: OFI endpoint open failed (ofi_init.c:2131:create_vni_context:Invalid argument)

Using n-worker strictly less than the number of cores seems to solve the issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions