Closed
Description
runTheMatrix.py
creates and executes jobs without any kind of GPU assignment.
On a machine with a single GPU, his is not an issue.
On a machine with more than one GPU, for example
$ rocmComputeCapabilities
0 gfx90a:sramecc+:xnack- AMD Instinct MI250X
1 gfx90a:sramecc+:xnack- AMD Instinct MI250X
2 gfx90a:sramecc+:xnack- AMD Instinct MI250X
3 gfx90a:sramecc+:xnack- AMD Instinct MI250X
4 gfx90a:sramecc+:xnack- AMD Instinct MI250X
5 gfx90a:sramecc+:xnack- AMD Instinct MI250X
6 gfx90a:sramecc+:xnack- AMD Instinct MI250X
7 gfx90a:sramecc+:xnack- AMD Instinct MI250X
or
$ cudaComputeCapabilities
0 8.9 NVIDIA L4
1 8.9 NVIDIA L4
2 8.9 NVIDIA L4
3 8.9 NVIDIA L4
the result is that all jobs try to use all GPUs, which is quite inefficient.
A better approach would be to assign a different GPU to each job, for example in a round-robin fashion.
If there are more concurrent jobs than GPUs, the GPUs will be shared - but to a much lesser extent than now.