Optimized memory usage and added multithreading to 1-D reflected light calculations#402
Optimized memory usage and added multithreading to 1-D reflected light calculations#402Nicholaswogan wants to merge 32 commits into
Conversation
|
Note that multithreading by default always uses 1 thread (i.e. is not parallel), to be consistent with previous versions of PICASO and to prevent multithreading within an already parallel calculation (e.g. a retrieval). The user has to explicitly turn on multithreading with |
| raise Exception("The only available opacity methods are: resortrebin, preweighted, and resampled") | ||
| return opacityclass | ||
|
|
||
| class RTSolvers: |
There was a problem hiding this comment.
Is this a prelude to keeping track of memory allocation?
There was a problem hiding this comment.
The idea with RTSolvers is to preserve the work memory there (and attached it to the inputs class) needed for any RT calculation. This will mean that work memory does not need to be allocated+deallocated every call to a given RT algorithm (slow). Instead, the allocation happens once.
There was a problem hiding this comment.
However, the work memory needed for the new get_reflected_1d that I wrote is very small compared to what is needed for the old version.
So, you probably wouldn't take much a performance hit, if you just allocated/deallocated the work memory on the fly.
|
Closed, because I'm re-doing in a way that is simpler |
This PR should be accepted after #399.
Summary
This branch refactors the 1-D reflected-light radiative-transfer path to reduce allocations and peak memory usage while preserving the existing outputs. It also enables multithreaded reflected-light execution by parallelizing over wavelength. The PR adds unit tests that demonstrates that the new non-allocating version of
get_reflected_1dproduces the same results (over allifbranches) when compared to the old allocatingget_reflected_1d. The new implementation in this PR, uses about half the peak memory, and is a factor of 2 faster using 1 thread, compared to the old implementation.This PR focuses on
get_reflected_1d, but the same approach could be easily extended to other RT calculations (3D, thermal and tranmision).Peak memory use (and code runtimes) can be reduced further by reworking
get_opacitiesandcompute_opacities, however the gains there are probably less substantial than this PR.Relevant to #263
What changed
picaso/fluxes_noalloc.py.GetReflected1DandGetReflected1DWorkspaceto hold persistent results and per-thread scratch space.justdoit.pyto call the new class-based reflected solver.Comparison to old code
I ran a test case that computes five reflected light spectra of a clear-sky modern Earth. The table below summarizes the runtime and memory usage of both the old
get_reflected_1dand the new implementation in this PR.get_reflected_1d