Skip to content

Conversation

@sehnem
Copy link
Collaborator

@sehnem sehnem commented Apr 19, 2025

Included allow_rechunk to the compute cloud functions, as the output size gets very large and dask 2025.3 included a fixes for these cases.

I was able to run the full examples on 16Gb of RAM, so it should not be an issue anymore.

@sehnem sehnem requested review from brendancol and Copilot April 19, 2025 19:24
@sehnem sehnem changed the title Improve clouds graph Improve clouds computation Apr 19, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the cloud optics graph functionality by adding the allow_rechunk parameter to the compute cloud functions and updating the example notebook to reflect improved memory handling and performance adjustments.

  • Added allow_rechunk=True in the cloud optics function calls for better handling of large outputs.
  • Revised the example notebook to modify chunk sizes, include a markdown note about Dask version requirements, and update the Dask cluster setup.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
pyrte_rrtmgp/rrtmgp_cloud_optics.py Added allow_rechunk parameter to compute_cloud_optics calls.
examples/dyamond_clouds/example.ipynb Updated chunk sizes, added Dask version recommendation, and cluster configuration changes.
Comments suppressed due to low confidence (2)

examples/dyamond_clouds/example.ipynb:79

  • [nitpick] Consider clarifying this instruction by briefly noting what issues may arise with older versions of Dask. This can help users understand the necessity of the version requirement.
For avoiding memory issues please use dask version 2025.3.0 or higher. A [fix](https://docs.dask.org/en/stable/changelog.html#v2025-3-0) for the apply_ufunc was included in it that solve the memory issues.

pyrte_rrtmgp/rrtmgp_cloud_optics.py:174

  • Verify that adding the 'allow_rechunk' parameter does not introduce unintended behavior in parallel execution. If needed, update tests and documentation to reflect its expected impact.
allow_rechunk=True,

@sehnem sehnem force-pushed the improve-clouds-graph branch from 49fdb7e to 8108b74 Compare April 19, 2025 19:31
@brendancol
Copy link
Collaborator

brendancol commented Apr 21, 2025

@sehnem woohoo, very exciting!

I'm seeing the TypeError: apply_ufunc() got an unexpected keyword argument 'allow_rechunk' in the CI...is that something you have on your radar?

@sehnem I moved the allow_rechunk into the dask args but let me know if that still achieve the fix you were hoping

@brendancol
Copy link
Collaborator

  • Fix xarray warning:
/Users/brendancol/miniconda3/envs/pyrte-312/lib/python3.12/site-packages/xarray/namedarray/core.py:264: UserWarning: Duplicate dimension names present: dimensions {'ncontact'} appear more than once in dims=('nf', 'ncontact', 'ncontact'). We do not yet support duplicate dimension names, but we do allow initial construction of the object. We recommend you rename the dims immediately to become distinct, as most xarray functionality is likely to fail silently if you do not. To rename the dimensions you will need to set the ``.dims`` attribute of each variable, ``e.g. var.dims=('x0', 'x1')``.
  self._dims = self._parse_dimensions(dims)

@brendancol brendancol changed the title Improve clouds computation WIP: Improve clouds computation May 1, 2025
@sehnem
Copy link
Collaborator Author

sehnem commented May 4, 2025

@brendancol I was able to make the processing viable using map_blocks in a higher level, making the same in lower level functions did not had much effect. Some other factors that slow down the process are writing to the disk, or having HUGE outputs, aggregating them in the way they will be used helps with that.

@sehnem sehnem changed the title WIP: Improve clouds computation Improve clouds computation May 4, 2025
@brendancol
Copy link
Collaborator

@sehnem thanks for the update. I pull and run locally and get back with any comments.

…tion for 7 worker dask cluser and process_chunk refactor
@RobertPincus
Copy link
Member

@sehnem @brendancol Should this PR get merged? How will we handle the VERY LARGE DATA required ?

@sehnem
Copy link
Collaborator Author

sehnem commented Jun 14, 2025

@RobertPincus I have no experience with large files in git repositories, it make clones heavy, but I saw once a repository where the big files were not cloned by default, I think that it is this feature. Probably Brendan knows better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants