Skip to content

gasnetc_ofi_init failure on Frontier/Crusher with 1 node #22

Open
@elliottslaughter

Description

@elliottslaughter

This is to document a known issue with Slingshot 11 network. Legion runs hit an error if you use only 1 node:

*** FATAL ERROR (proc 0): in gasnetc_ofi_init() at .../gasnet_ofi.c:946: fi_domain failed: -38(Function not implemented)

I have been told that this is an issue with the SLURM integration, and therefore is not something that Legion/GASNet are in a position to directly address.

In the meantime, I'm aware of two workarounds:

  1. Use 2 or more nodes
  2. Run with srun --network=single_node_vni

I will update this issue when the workarounds are no longer required.

Edit: I understand that the issue is related to SLURM settings at OLCF, not necessarily to Slingshot 11 per se.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions