Skip to content

feature request: runTheMatrix.py should assign a different GPU to each job #47337

Closed
@fwyzard

Description

@fwyzard

runTheMatrix.py creates and executes jobs without any kind of GPU assignment.

On a machine with a single GPU, his is not an issue.

On a machine with more than one GPU, for example

$ rocmComputeCapabilities 
   0    gfx90a:sramecc+:xnack-    AMD Instinct MI250X
   1    gfx90a:sramecc+:xnack-    AMD Instinct MI250X
   2    gfx90a:sramecc+:xnack-    AMD Instinct MI250X
   3    gfx90a:sramecc+:xnack-    AMD Instinct MI250X
   4    gfx90a:sramecc+:xnack-    AMD Instinct MI250X
   5    gfx90a:sramecc+:xnack-    AMD Instinct MI250X
   6    gfx90a:sramecc+:xnack-    AMD Instinct MI250X
   7    gfx90a:sramecc+:xnack-    AMD Instinct MI250X

or

$ cudaComputeCapabilities 
   0     8.9    NVIDIA L4
   1     8.9    NVIDIA L4
   2     8.9    NVIDIA L4
   3     8.9    NVIDIA L4

the result is that all jobs try to use all GPUs, which is quite inefficient.

A better approach would be to assign a different GPU to each job, for example in a round-robin fashion.
If there are more concurrent jobs than GPUs, the GPUs will be shared - but to a much lesser extent than now.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions