I would like to run gacode with multi-node GPU support on Princeton's Traverse cluster. I encounter an MPI_ABORT issue when running on any number of nodes using srun on Traverse (previously, mpirun ran successfully on a single node). Traverse IT support told me that to run on multiple nodes requires srun. I have attached the exec and wrap files and an example slurm submission script that reproduce this issue. I have tried various iterations of wrap.TRAVERSE files inspired by the existing wrap.* files for other machines.
Possibly important details: -C gpu for srun is not supported. OMPI variables are also not supported.
Thanks!
job.slurm_2node.txt
exec.TRAVERSE.txt
wrap.TRAVERSE.txt
I would like to run
gacodewith multi-node GPU support on Princeton's Traverse cluster. I encounter anMPI_ABORTissue when running on any number of nodes usingsrunon Traverse (previously,mpirunran successfully on a single node). Traverse IT support told me that to run on multiple nodes requiressrun. I have attached the exec and wrap files and an example slurm submission script that reproduce this issue. I have tried various iterations ofwrap.TRAVERSEfiles inspired by the existingwrap.*files for other machines.Possibly important details:
-C gpuforsrunis not supported.OMPIvariables are also not supported.Thanks!
job.slurm_2node.txt
exec.TRAVERSE.txt
wrap.TRAVERSE.txt