Commit 463f8f0
committed
feat(launcher): expose GPUs to eval container via NVIDIA_VISIBLE_DEVICES
Benchmarks like compute-eval need to compile and execute CUDA code
inside the eval container. Without GPU access, nvcc can't detect
the target architecture and compiled binaries fail with
cudaErrorInsufficientDriver.
Export NVIDIA_VISIBLE_DEVICES=all before the eval srun and pass it
through to the container. This makes pyxis/enroot expose the parent
job's GPUs to the eval container.
Validated with compute-eval on HSG: pass@1 went from 0% (no GPU)
to 51.25% (with GPU access).
Signed-off-by: Wojciech Prazuch <wprazuch@nvidia.com>1 parent 4d51bee commit 463f8f0
File tree
1 file changed
+8
-1
lines changed- packages/nemo-evaluator-launcher/src/nemo_evaluator_launcher/executors/slurm
1 file changed
+8
-1
lines changedLines changed: 8 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1014 | 1014 | | |
1015 | 1015 | | |
1016 | 1016 | | |
| 1017 | + | |
1017 | 1018 | | |
1018 | 1019 | | |
1019 | 1020 | | |
1020 | 1021 | | |
1021 | | - | |
| 1022 | + | |
| 1023 | + | |
| 1024 | + | |
| 1025 | + | |
| 1026 | + | |
| 1027 | + | |
| 1028 | + | |
1022 | 1029 | | |
1023 | 1030 | | |
1024 | 1031 | | |
| |||
0 commit comments