Skip to content
Discussion options

You must be logged in to vote

The error happens because CUDA_VISIBLE_DEVICES is being set to an empty value—your shell isn't expanding $OMPI_COMM_WORLD_LOCAL_RANK as you expect in the mpirun command. This means Quokka (via AMReX) can't see any GPUs and aborts with CUDA error 100 "no CUDA-capable device is detected".

To fix this, you need to ensure each MPI process gets the correct GPU ID. The typical approach is to use a wrapper script so that the environment variable is set per process after MPI launches it. For example, create a small shell script like:

#!/bin/bash
export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}
exec "$@"

Then launch with:

mpirun -np 8 --bind-to core --map-by core --allow-run-as-root \
  …

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@tianninglyu
Comment options

@dosubot
Comment options

Answer selected by tianninglyu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant