According to Teja, ClimaLand simulations on GPU only use the GPU 60% of the time. This suggests there could be considerable speed up by launching fewer kernels that each collectively do more.
The integrated model is about 2x slower than the soil model alone, but the snow model is very fast. This suggests that the canopy model is a source of poor performance. The canopy model launches many kernels that do very little computation, because they do a point wise computation (not a full column of work, like the soil).
Ideas to test
- refactor canopy update aux to call a giant point wise function and see if this has any impact on performance
- likewise for canopy boundary fluxes
According to Teja, ClimaLand simulations on GPU only use the GPU 60% of the time. This suggests there could be considerable speed up by launching fewer kernels that each collectively do more.
The integrated model is about 2x slower than the soil model alone, but the snow model is very fast. This suggests that the canopy model is a source of poor performance. The canopy model launches many kernels that do very little computation, because they do a point wise computation (not a full column of work, like the soil).
Ideas to test