Description
runTheMatrix.py
has some GPU-related options:
GPU-related options:
These options are only meaningful when --gpu is used, and is not set to forbidden.
--gpu [{forbidden,optional,required}], --requires-gpu [{forbidden,optional,required}]
Enable GPU workflows. Possible options are "forbidden" (default), "required" (implied if no argument is given), or "optional". (default: forbidden)
--gpu-memory GPUMEMORYMB
Specify the minimum amount of GPU memory required by the job, in MB. (default: 8000)
--cuda-capabilities CUDACAPABILITIES
Specify a comma-separated list of CUDA "compute capabilities", or GPU hardware architectures, that the job can use. (default: 6.0,6.1,6.2,7.0,7.2,7.5,8.0,8.6)
--cuda-runtime CUDARUNTIME
Specify major and minor version of the CUDA runtime used to build the application. (default: 12.4)
--force-gpu-name GPUNAME
Request a specific GPU model, e.g. "Tesla T4" or "NVIDIA GeForce RTX 2080". The default behaviour is to accept any supported GPU. (default: )
--force-cuda-driver-version CUDADRIVERVERSION
Request a specific CUDA driver version, e.g. 470.57.02. The default behaviour is to accept any supported CUDA driver version. (default: )
--force-cuda-runtime-version CUDARUNTIMEVERSION
Request a specific CUDA runtime version, e.g. 11.4. The default behaviour is to accept any supported CUDA runtime version. (default: )
However, they affect only the creation of WMAgent (?) workflows, not the actual content of the workflow generated by cmsDriver.py
and executed by cmsRun
.
I would like to propose two changes:
- change the default for the
--gpu
option fromforbidden
tooptional
; - propagate the meaning of the
--gpu
option to cmsDriver, via the--accelerators
option.
The first change is IMHO something we should do in its own right, but here it is motivated by minimising the impact of the second change on the cmsDriver workflows.
The second change proposes to map:
--gpu optional
to the current behaviour, that is, no extra cmsDriver options--gpu forbidden
tocmsDriver.py --accelerators cpu
--gpu required
tocmsDriver.py --accelerators gpu-*
By default cmsDriver does not impose any restrictions on the usage of GPUs.
Passing --accelerators cpu
sets the job's process.options.accelerators
to [ 'cpu' ]
, which prevents the use of GPUs in a CUDA or Alpaka workflow.
Passing --accelerators gpu-*
sets the job's process.options.accelerators
to [ 'gpu-*' ]
, which requires the use of GPUs in a CUDA or Alpaka workflow.
The advantage of this approach is that we no longer need to triplicate all Alpaka-related workflows: one version to run on any backend, one version to run only on CPU, one version to run only on GPUs.
As this change would affect O&C and PPD operations, what is their opinion ?