update handling of GPU workflows in runTheMatrix.py

`runTheMatrix.py` has some GPU-related options:
```
GPU-related options:
  These options are only meaningful when --gpu is used, and is not set to forbidden.

  --gpu [{forbidden,optional,required}], --requires-gpu [{forbidden,optional,required}]
                        Enable GPU workflows. Possible options are "forbidden" (default), "required" (implied if no argument is given), or "optional". (default: forbidden)
  --gpu-memory GPUMEMORYMB
                        Specify the minimum amount of GPU memory required by the job, in MB. (default: 8000)
  --cuda-capabilities CUDACAPABILITIES
                        Specify a comma-separated list of CUDA "compute capabilities", or GPU hardware architectures, that the job can use. (default: 6.0,6.1,6.2,7.0,7.2,7.5,8.0,8.6)
  --cuda-runtime CUDARUNTIME
                        Specify major and minor version of the CUDA runtime used to build the application. (default: 12.4)
  --force-gpu-name GPUNAME
                        Request a specific GPU model, e.g. "Tesla T4" or "NVIDIA GeForce RTX 2080". The default behaviour is to accept any supported GPU. (default: )
  --force-cuda-driver-version CUDADRIVERVERSION
                        Request a specific CUDA driver version, e.g. 470.57.02. The default behaviour is to accept any supported CUDA driver version. (default: )
  --force-cuda-runtime-version CUDARUNTIMEVERSION
                        Request a specific CUDA runtime version, e.g. 11.4. The default behaviour is to accept any supported CUDA runtime version. (default: )
```

However, they affect only the creation of WMAgent (?) workflows, not the actual content of the workflow generated by `cmsDriver.py` and executed by `cmsRun`.

---

I would like to propose two changes:
  1. change the default for the `--gpu` option from `forbidden` to `optional`;
  2. propagate the meaning of the `--gpu` option to cmsDriver, via the `--accelerators` option.

---

The first change is IMHO something we should do in its own right, but here it is motivated by minimising the impact of the second change on the cmsDriver workflows.

---

The second change proposes to map:
  - `--gpu optional` to the current behaviour, that is, no extra cmsDriver options
  - `--gpu forbidden` to `cmsDriver.py --accelerators cpu`
  - `--gpu required` to `cmsDriver.py --accelerators gpu-*`

By default cmsDriver does not impose any restrictions on the usage of GPUs.
Passing `--accelerators cpu` sets the job's `process.options.accelerators` to `[ 'cpu' ]`, which prevents the use of GPUs in a CUDA or Alpaka workflow.
Passing `--accelerators gpu-*` sets the job's `process.options.accelerators` to `[ 'gpu-*' ]`, which requires the use of GPUs in a CUDA or Alpaka workflow.

The advantage of this approach is that we no longer need to triplicate all Alpaka-related workflows: one version to run on any backend, one version to run only on CPU, one version to run only on GPUs.

---

As this change would affect O&C and PPD operations, what is their opinion ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

update handling of GPU workflows in runTheMatrix.py #46069

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

update handling of GPU workflows in runTheMatrix.py #46069

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions