Skip to content

Optimized memory usage and added multithreading to 1-D reflected light calculations#402

Closed
Nicholaswogan wants to merge 32 commits into
natashabatalha:devfrom
Nicholaswogan:memory
Closed

Optimized memory usage and added multithreading to 1-D reflected light calculations#402
Nicholaswogan wants to merge 32 commits into
natashabatalha:devfrom
Nicholaswogan:memory

Conversation

@Nicholaswogan
Copy link
Copy Markdown
Collaborator

This PR should be accepted after #399.

Summary

This branch refactors the 1-D reflected-light radiative-transfer path to reduce allocations and peak memory usage while preserving the existing outputs. It also enables multithreaded reflected-light execution by parallelizing over wavelength. The PR adds unit tests that demonstrates that the new non-allocating version of get_reflected_1d produces the same results (over all if branches) when compared to the old allocating get_reflected_1d. The new implementation in this PR, uses about half the peak memory, and is a factor of 2 faster using 1 thread, compared to the old implementation.

This PR focuses on get_reflected_1d, but the same approach could be easily extended to other RT calculations (3D, thermal and tranmision).

Peak memory use (and code runtimes) can be reduced further by reworking get_opacities and compute_opacities, however the gains there are probably less substantial than this PR.

Relevant to #263

What changed

  • Added a no-allocation reflected-light implementation in picaso/fluxes_noalloc.py.
  • Introduced GetReflected1D and GetReflected1DWorkspace to hold persistent results and per-thread scratch space.
  • Reworked the 1D reflected-light path to stream over wavelength and reuse workspace buffers.
  • Added thread-safe parallel execution over wavelength with per-thread workspaces.
  • Updated justdoit.py to call the new class-based reflected solver.
  • Updated the climate reflected-light path to use a single persistent reflected workspace across the Gauss-angle loop.
  • Added and updated tests for:
    • tridiagonal solver parity
    • setup routine parity
    • 1D reflected-light parity

Comparison to old code

I ran a test case that computes five reflected light spectra of a clear-sky modern Earth. The table below summarizes the runtime and memory usage of both the old get_reflected_1d and the new implementation in this PR.

Version of get_reflected_1d Threads Runtime Allocations Peak memory Total memory
Old 1 53.9 s 25,935,610 5.760 GB 166.907 GB
New 1 25.0 s 743,485 2.395 GB 37.089 GB
New 4 11.8 s 743,520 2.395 GB 37.089 GB

@Nicholaswogan
Copy link
Copy Markdown
Collaborator Author

Note that multithreading by default always uses 1 thread (i.e. is not parallel), to be consistent with previous versions of PICASO and to prevent multithreading within an already parallel calculation (e.g. a retrieval). The user has to explicitly turn on multithreading with numba.set_num_threads(num_threads) where num_threads is the number of threads to use.

Comment thread picaso/justdoit.py
raise Exception("The only available opacity methods are: resortrebin, preweighted, and resampled")
return opacityclass

class RTSolvers:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a prelude to keeping track of memory allocation?

Copy link
Copy Markdown
Collaborator Author

@Nicholaswogan Nicholaswogan Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea with RTSolvers is to preserve the work memory there (and attached it to the inputs class) needed for any RT calculation. This will mean that work memory does not need to be allocated+deallocated every call to a given RT algorithm (slow). Instead, the allocation happens once.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the work memory needed for the new get_reflected_1d that I wrote is very small compared to what is needed for the old version.

So, you probably wouldn't take much a performance hit, if you just allocated/deallocated the work memory on the fly.

@Nicholaswogan
Copy link
Copy Markdown
Collaborator Author

Nicholaswogan commented Apr 24, 2026

Closed, because I'm re-doing in a way that is simpler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants