Check for existence of CUDA devices prior to running accelerate tests #938

johnwikman · 2025-03-28T11:09:29Z

This PR adds a check in misc/test-spec.mc to check that there is at least one CUDA device that we can compile and run programs on before trying to run the accelerate tests. This is necessary for containerized build environments which will have nvcc installed, but no access to any GPUs.

This check is done by a function cudaGetDeviceCount that I added to a file cuda/sys.mc. This function will return None () if it cannot run CUDA programs on the system. Otherwise it will return the number of available devices wrapped in a Some. There is also a file test/examples/cuda/device_count.mc which can be used to quickly test the behavior of this function.

I tested this under various runtime conditions on a server that has access to 4 GPUs. The runtime conditions were constrained through containerization. The containers were launched as:

podman run --rm -it localhost/mikinglang/baseline:v8-debian12.6-linux-amd64 bash
podman run --rm -it localhost/mikinglang/baseline:v8-cuda11.4-linux-amd64 bash
podman run --rm -it --device "nvidia.com/gpu=all" localhost/mikinglang/baseline:v8-cuda11.4-linux-amd64 bash

The first container neither has GPU or nvcc. The second one has nvcc but no GPUs. The third one has nvcc and access to 4 GPUs.

We test this by running this install script:

git clone https://github.com/johnwikman/miking.git \
&& cd miking \
&& git checkout cudacheck2 \
&& make install

Followed by this to compile and run the check program:

mi compile src/test/examples/cuda/device_count.mc
./device_count

We get the expected output for each respective container:

Could not compile and run CUDA programs in your environment.
Could not compile and run CUDA programs in your environment.
Found 4 CUDA devices on your system.

Also, if running with CUDA_VISIBLE_DEVICES="1,2" ./device_count on the 3rd container, then we instead get the output Found 2 CUDA devices on your system. which is to be expected.

not appear to work or is otherwise not well documented.

before running the accelerate tests

johnwikman added 7 commits March 25, 2025 15:52

Add a file to check for presence of CUDA devices.

507f938

Change function name

f6222da

Add an explicit check for CUDA devices

ed1bce2

Remove the timeout check from running system commands, this does

5bc324b

not appear to work or is otherwise not well documented.

Update the test spec to actually check that CUDA devices exist

871570a

before running the accelerate tests

update comment

bf7e999

remove the todo

54eba23

david-broman merged commit 80cd279 into miking-lang:develop Apr 6, 2025
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Check for existence of CUDA devices prior to running accelerate tests #938

Check for existence of CUDA devices prior to running accelerate tests #938

Uh oh!

johnwikman commented Mar 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Check for existence of CUDA devices prior to running accelerate tests #938

Check for existence of CUDA devices prior to running accelerate tests #938

Uh oh!

Conversation

johnwikman commented Mar 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants