-
Notifications
You must be signed in to change notification settings - Fork 27
Description
RRTMGP CUDA kernels take much longer when called from ClimaAtmos compared to the corresponding RRTMGP benchmark for an equivalent problem size.
Please see below for the corresponding ClimaAtmos and RRTMGP builds for the equivalent all-sky problem with aerosols. The number of aerosols is reduced to two to match with the RRTMGP benchmark.
ClimaAtmos PR: #3740
ClimaAtmos build: gpu_aquaplanet_dyamond - strong scaling - 1 GPU - No MPI @
https://buildkite.com/clima/climaatmos-target-gpu-simulations/builds/414#0195ce32-92da-478d-9f29-282e94952871
RRTMGP build: GPU all-sky with aerosols DYAMOND benchmark @ https://buildkite.com/clima/rrtmgp-clima-a100-pipeline/builds/46#_
In this example, the CUDA kernels for the longwave and shortwave problems take about 1.5x and 1.6x the time taken by the full longwave or shortwave solver in RRTMGP.