Check for existence of CUDA devices prior to running accelerate tests #938
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds a check in
misc/test-spec.mcto check that there is at least one CUDA device that we can compile and run programs on before trying to run the accelerate tests. This is necessary for containerized build environments which will havenvccinstalled, but no access to any GPUs.This check is done by a function
cudaGetDeviceCountthat I added to a filecuda/sys.mc. This function will returnNone ()if it cannot run CUDA programs on the system. Otherwise it will return the number of available devices wrapped in aSome. There is also a filetest/examples/cuda/device_count.mcwhich can be used to quickly test the behavior of this function.I tested this under various runtime conditions on a server that has access to 4 GPUs. The runtime conditions were constrained through containerization. The containers were launched as:
podman run --rm -it localhost/mikinglang/baseline:v8-debian12.6-linux-amd64 bashpodman run --rm -it localhost/mikinglang/baseline:v8-cuda11.4-linux-amd64 bashpodman run --rm -it --device "nvidia.com/gpu=all" localhost/mikinglang/baseline:v8-cuda11.4-linux-amd64 bashThe first container neither has GPU or
nvcc. The second one hasnvccbut no GPUs. The third one hasnvccand access to 4 GPUs.We test this by running this install script:
Followed by this to compile and run the check program:
We get the expected output for each respective container:
Could not compile and run CUDA programs in your environment.Could not compile and run CUDA programs in your environment.Found 4 CUDA devices on your system.Also, if running with
CUDA_VISIBLE_DEVICES="1,2" ./device_counton the 3rd container, then we instead get the outputFound 2 CUDA devices on your system.which is to be expected.