Try running quokka on a bare metal server #1295

tianninglyu · 2025-09-02T10:57:58Z

tianninglyu
Sep 2, 2025

Hi all,

I am trying to run quokka on a bare metal server by using this command:

mpirun -np 8 --bind-to core --map-by core --allow-run-as-root --x CUDA_VISIBLE_DEVICES=$OMPI_COMM_WORLD_LOCAL_RANK \ src/problems/HydroBlast3D/test_hydro3d_blast ../inputs/benchmark_unigrid_256.in > HDBlast3D_Uniform256.log 2>&1

And the error is:
amrex::Abort::3::CUDA error 100 in file /root/Code/quokka/extern/amrex/Src/Base/AMReX_GpuDevice.cpp line 189: no CUDA-capable device is detected !!!

An information mighe be helpful: in the output file, it shows SET CUDA_VISIBLE_DEVICES=, which indicates the script didn't understand $OMPI_COMM_WORLD_LOCAL_RANK correctly.

What should I do?

Answered by dosubot[bot]

Sep 2, 2025

The error happens because CUDA_VISIBLE_DEVICES is being set to an empty value—your shell isn't expanding $OMPI_COMM_WORLD_LOCAL_RANK as you expect in the mpirun command. This means Quokka (via AMReX) can't see any GPUs and aborts with CUDA error 100 "no CUDA-capable device is detected".

To fix this, you need to ensure each MPI process gets the correct GPU ID. The typical approach is to use a wrapper script so that the environment variable is set per process after MPI launches it. For example, create a small shell script like:

#!/bin/bash
export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}
exec "$@"

Then launch with:

mpirun -np 8 --bind-to core --map-by core --allow-run-as-root \
  …

View full answer

tianninglyu · 2025-09-02T11:00:45Z

dosubot[bot]
bot Sep 2, 2025

The error happens because CUDA_VISIBLE_DEVICES is being set to an empty value—your shell isn't expanding $OMPI_COMM_WORLD_LOCAL_RANK as you expect in the mpirun command. This means Quokka (via AMReX) can't see any GPUs and aborts with CUDA error 100 "no CUDA-capable device is detected".

To fix this, you need to ensure each MPI process gets the correct GPU ID. The typical approach is to use a wrapper script so that the environment variable is set per process after MPI launches it. For example, create a small shell script like:

#!/bin/bash
export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}
exec "$@"

Then launch with:

mpirun -np 8 --bind-to core --map-by core --allow-run-as-root \
    bash wrapper.sh src/problems/HydroBlast3D/test_hydro3d_blast ../inputs/benchmark_unigrid_256.in

This way, each MPI process will have CUDA_VISIBLE_DEVICES set to its local rank, and Quokka should detect the GPUs correctly.

Also, double-check that your server has CUDA-capable GPUs, the drivers and CUDA toolkit are installed, and Quokka is built with GPU support (-DAMReX_GPU_BACKEND=CUDA). If you want to isolate the issue, try running with a single MPI rank and explicitly set CUDA_VISIBLE_DEVICES=0 to confirm basic GPU functionality.

Let me know if you need help with a wrapper script or further troubleshooting!

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

2 replies

tianninglyu Sep 2, 2025
Author

Hi @dosu, I get a new problem:

Initializing AMReX (25.08)... MPI initialized with 8 MPI processes MPI initialized with thread support level 0 Initializing CUDA... CUDA initialized with 8 devices. AMReX (25.08) initialized [node64:2051830:0:2051830] cma_ep.c:85 process_vm_readv(pid=2051834 {0x13ae3c936000,524288}-->{0x58de5000000,524288}) returned -1: Bad address ==== backtrace (tid:2051830) ==== 0 /lib/libucs.so.0(ucs_handle_error+0x2ec) [0x13b0f95c429c] 1 /lib/libucs.so.0(ucs_fatal_error_message+0xb2) [0x13b0f95c4442] 2 /lib/libucs.so.0(ucs_log_default_handler+0x95e) [0x13b0f95c5d4e] 3 /lib/libucs.so.0(ucs_log_dispatch+0xf6) [0x13b0f95c1776] 4 /lib/ucx/libuct_cma.so.0(+0x3e49) [0x13b0ee457e49] 5 /lib/ucx/libuct_cma.so.0(uct_cma_ep_tx+0x201) [0x13b0ee458071] 6 /lib/libuct.so.0(uct_scopy_ep_progress_tx+0x6b) [0x13b0edf052eb] 7 /lib/libucs.so.0(ucs_arbiter_dispatch_nonempty+0xb4) [0x13b0f95bf7e4] 8 /lib/libuct.so.0(uct_scopy_iface_progress+0x88) [0x13b0edf0f658] 9 /lib/libucs.so.0(+0x294b0) [0x13b0f95c84b0] 10 /lib/libucp.so.0(ucp_worker_progress+0x5a) [0x13b0edf6f7ba] 11 /usr/local/openmpi/lib/libopen-pal.so.80(opal_progress+0x34) [0x13b0ee4fb8f4] 12 /usr/local/openmpi/lib/libmpi.so.40(ompi_request_default_test_all+0x63) [0x13b0f9696a93] 13 /usr/local/openmpi/lib/libmpi.so.40(PMPI_Testall+0xaa) [0x13b0f96dd15a] 14 src/problems/HydroBlast3D/test_hydro3d_blast(+0x238983) [0x56b8c3654983] 15 src/problems/HydroBlast3D/test_hydro3d_blast(+0x12e1ac) [0x56b8c354a1ac] 16 src/problems/HydroBlast3D/test_hydro3d_blast(+0x170849) [0x56b8c358c849] 17 src/problems/HydroBlast3D/test_hydro3d_blast(+0x170f56) [0x56b8c358cf56] 18 src/problems/HydroBlast3D/test_hydro3d_blast(+0x171302) [0x56b8c358d302] 19 src/problems/HydroBlast3D/test_hydro3d_blast(+0x4b6794) [0x56b8c38d2794] 20 src/problems/HydroBlast3D/test_hydro3d_blast(+0x1768c9) [0x56b8c35928c9] 21 src/problems/HydroBlast3D/test_hydro3d_blast(+0x89545) [0x56b8c34a5545] 22 src/problems/HydroBlast3D/test_hydro3d_blast(+0x71f60) [0x56b8c348df60] 23 /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca) [0x13b0ee62a1ca] 24 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b) [0x13b0ee62a28b] 25 src/problems/HydroBlast3D/test_hydro3d_blast(+0x771e5) [0x56b8c34931e5]

dosubot[bot] bot Sep 2, 2025

This error is coming from the UCX CMA (Cross Memory Attach) transport used by MPI for inter-process communication, not from Quokka itself. It's often caused by kernel restrictions, permissions, or incompatibilities between your OpenMPI/UCX versions and the system configuration.

To troubleshoot:

Try disabling the CMA transport by setting the environment variable:
export UCX_TLS=^cma
before running your mpirun command. This tells UCX to avoid using CMA, which is often the source of these "Bad address" errors.
Make sure you're running as a regular user (not root), as some kernels restrict CMA for root processes.
Check your kernel's Yama security settings (/proc/sys/kernel/yama/ptrace_scope). If it's set to a restrictive value, CMA may not work between processes.
Confirm that your OpenMPI and UCX versions are compatible and CUDA-aware.
If the problem persists, try running with a single MPI rank to see if the error is related to multi-process communication.

Quokka doesn't control UCX or MPI transport selection, so these settings need to be managed at the environment or job launcher level. If disabling CMA resolves the issue, you can keep UCX_TLS=^cma in your environment for future runs.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Try running quokka on a bare metal server #1295

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Try running quokka on a bare metal server #1295

Uh oh!

tianninglyu Sep 2, 2025

Replies: 1 comment · 2 replies

Uh oh!

dosubot[bot] bot Sep 2, 2025

Uh oh!

tianninglyu Sep 2, 2025 Author

Uh oh!

dosubot[bot] bot Sep 2, 2025

tianninglyu
Sep 2, 2025

Replies: 1 comment 2 replies

dosubot[bot]
bot Sep 2, 2025

tianninglyu Sep 2, 2025
Author