Improve clouds computation #201

sehnem · 2025-04-19T19:23:57Z

Included allow_rechunk to the compute cloud functions, as the output size gets very large and dask 2025.3 included a fixes for these cases.

I was able to run the full examples on 16Gb of RAM, so it should not be an issue anymore.

Copilot

Pull Request Overview

This PR improves the cloud optics graph functionality by adding the allow_rechunk parameter to the compute cloud functions and updating the example notebook to reflect improved memory handling and performance adjustments.

Added allow_rechunk=True in the cloud optics function calls for better handling of large outputs.
Revised the example notebook to modify chunk sizes, include a markdown note about Dask version requirements, and update the Dask cluster setup.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
pyrte_rrtmgp/rrtmgp_cloud_optics.py	Added allow_rechunk parameter to compute_cloud_optics calls.
examples/dyamond_clouds/example.ipynb	Updated chunk sizes, added Dask version recommendation, and cluster configuration changes.

Comments suppressed due to low confidence (2)

examples/dyamond_clouds/example.ipynb:79

[nitpick] Consider clarifying this instruction by briefly noting what issues may arise with older versions of Dask. This can help users understand the necessity of the version requirement.

For avoiding memory issues please use dask version 2025.3.0 or higher. A [fix](https://docs.dask.org/en/stable/changelog.html#v2025-3-0) for the apply_ufunc was included in it that solve the memory issues.

pyrte_rrtmgp/rrtmgp_cloud_optics.py:174

Verify that adding the 'allow_rechunk' parameter does not introduce unintended behavior in parallel execution. If needed, update tests and documentation to reflect its expected impact.

allow_rechunk=True,

brendancol · 2025-04-21T15:26:22Z

@sehnem woohoo, very exciting!

I'm seeing the TypeError: apply_ufunc() got an unexpected keyword argument 'allow_rechunk' in the CI...is that something you have on your radar?

@sehnem I moved the allow_rechunk into the dask args but let me know if that still achieve the fix you were hoping

brendancol · 2025-05-01T17:39:23Z

Fix xarray warning:

/Users/brendancol/miniconda3/envs/pyrte-312/lib/python3.12/site-packages/xarray/namedarray/core.py:264: UserWarning: Duplicate dimension names present: dimensions {'ncontact'} appear more than once in dims=('nf', 'ncontact', 'ncontact'). We do not yet support duplicate dimension names, but we do allow initial construction of the object. We recommend you rename the dims immediately to become distinct, as most xarray functionality is likely to fail silently if you do not. To rename the dimensions you will need to set the ``.dims`` attribute of each variable, ``e.g. var.dims=('x0', 'x1')``.
  self._dims = self._parse_dimensions(dims)

dask specific arg

for more information, see https://pre-commit.ci

sehnem · 2025-05-04T00:03:04Z

@brendancol I was able to make the processing viable using map_blocks in a higher level, making the same in lower level functions did not had much effect. Some other factors that slow down the process are writing to the disk, or having HUGE outputs, aggregating them in the way they will be used helps with that.

brendancol · 2025-05-04T15:01:37Z

@sehnem thanks for the update. I pull and run locally and get back with any comments.

…tion for 7 worker dask cluser and process_chunk refactor

RobertPincus · 2025-06-11T20:40:02Z

@sehnem @brendancol Should this PR get merged? How will we handle the VERY LARGE DATA required ?

sehnem · 2025-06-14T11:59:00Z

@RobertPincus I have no experience with large files in git repositories, it make clones heavy, but I saw once a repository where the big files were not cloned by default, I think that it is this feature. Probably Brendan knows better.

sehnem added 2 commits April 19, 2025 16:18

improve DYAMOND2 example

661fece

update example

8800ae6

sehnem requested review from brendancol and Copilot April 19, 2025 19:24

sehnem changed the title ~~Improve clouds graph~~ Improve clouds computation Apr 19, 2025

Copilot AI reviewed Apr 19, 2025

View reviewed changes

sehnem mentioned this pull request Apr 19, 2025

[Feature Request] Give users finer control over chunking when computing optical properties #191

Open

fix lint

8108b74

sehnem force-pushed the improve-clouds-graph branch from 49fdb7e to 8108b74 Compare April 19, 2025 19:31

Fixed TypeError on xr.apply_func by moving allow_rechunk=True into the

c366733

dask specific arg

brendancol changed the title ~~Improve clouds computation~~ WIP: Improve clouds computation May 1, 2025

sehnem and others added 7 commits May 3, 2025 18:51

improve example

1cc855a

[pre-commit.ci] auto fixes from pre-commit.com hooks

766c96f

for more information, see https://pre-commit.ci

fixes to the output

09e3ebd

make the dyamond example faster

c8573b6

fixed import

e8f5520

adjust chuck size

8dc6924

[pre-commit.ci] auto fixes from pre-commit.com hooks

110892c

for more information, see https://pre-commit.ci

sehnem changed the title ~~WIP: Improve clouds computation~~ Improve clouds computation May 4, 2025

added temporary refactor of example notebook which includes configura…

8a75e9c

…tion for 7 worker dask cluser and process_chunk refactor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve clouds computation #201

Improve clouds computation #201

Uh oh!

sehnem commented Apr 19, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

brendancol commented Apr 21, 2025 •

edited

Loading

Uh oh!

brendancol commented May 1, 2025

Uh oh!

sehnem commented May 4, 2025

Uh oh!

brendancol commented May 4, 2025

Uh oh!

RobertPincus commented Jun 11, 2025

Uh oh!

sehnem commented Jun 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Improve clouds computation #201

Are you sure you want to change the base?

Improve clouds computation #201

Uh oh!

Conversation

sehnem commented Apr 19, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

brendancol commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brendancol commented May 1, 2025

Uh oh!

sehnem commented May 4, 2025

Uh oh!

brendancol commented May 4, 2025

Uh oh!

RobertPincus commented Jun 11, 2025

Uh oh!

sehnem commented Jun 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

brendancol commented Apr 21, 2025 •

edited

Loading