Skip to content

GACODE GPU parallelization on Princeton Traverse Cluster #359

@jfparisi

Description

@jfparisi

I would like to run gacode with multi-node GPU support on Princeton's Traverse cluster. I encounter an MPI_ABORT issue when running on any number of nodes using srun on Traverse (previously, mpirun ran successfully on a single node). Traverse IT support told me that to run on multiple nodes requires srun. I have attached the exec and wrap files and an example slurm submission script that reproduce this issue. I have tried various iterations of wrap.TRAVERSE files inspired by the existing wrap.* files for other machines.

Possibly important details: -C gpu for srun is not supported. OMPI variables are also not supported.

Thanks!
job.slurm_2node.txt
exec.TRAVERSE.txt
wrap.TRAVERSE.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions