-
Notifications
You must be signed in to change notification settings - Fork 3
Cmip6Builder Ensemble bugfix & dataset open caching #510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #510 +/- ##
=======================================
Coverage 99.14% 99.15%
=======================================
Files 17 17
Lines 1516 1531 +15
=======================================
+ Hits 1503 1518 +15
Misses 13 13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Remove realisation index from an asset
|
This pull request has been mentioned on ACCESS Hive Community Forum. There might be relevant details there: |
|
@dougiesquire Any chance you could take a look at this when you get the chance? |
|
Added to the todo list |
|
Sorry @charles-turner-1, I've taken only a cursory look before going on leave, but yes this looks good to me. If you wanted someone to test you might be able rope in @jemmajeffree who has asked about this functionality on CMIP data in the past |
|
What do you need me to do? |
| if cls.ensemble: | ||
| with xr.open_dataset( | ||
| file, | ||
| chunks={}, | ||
| decode_cf=False, | ||
| decode_times=False, | ||
| decode_coords=False, | ||
| ) as ds: | ||
| member_id = ds.attrs.get("realization_index", None) | ||
| if member_id is None: | ||
| raise ParserError( | ||
| f"Cannot determine member for file {file} - " | ||
| "realization_index attribute missing" | ||
| ) | ||
| ncinfo_dict["member"] = f"r{int(member_id)}" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use some clever refactoring of the base builder to avoid opening the file more than once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just pushed an update so we now use a @cache version of open_dataset.
All the builder tests still pass in my local tests and from 300 repeats of the 2 cmip6 test_builder_build tests the @cache version is ~40% quicker.
…args from open calls since there's a hashing issue otherwise
for more information, see https://pre-commit.ci
Co-authored-by: joshuatorrance <[email protected]>
|
I think I wrote this as a comment but it seems to have disappeared - I'm thinking it might be worth cherry-picking the caching stuff into a separate feature branch. NB. CI is failing because pixi doesn't seem to play nicely with commits straight from github. I'll try to work that one out ASAP. |
for more information, see https://pre-commit.ci
Change Summary
Fixes to address issues on this forum post
TLDR; updates the
Cmip6Builderto separate datasets onmemberif the ensemble kwarg is passed as true. Reasoning being that ensemble runs of a model will all be on the same grid (and so mergeable), but should not be merged as part of the same dataset.Related issue number
None, but see forum post
Please add any other relevant info below:
This is something we never really noticed as an issue before, but I think the changes to mergeability based
file_idhave created it. This will probably also need to be updated in theAccessEsm15Builder.@dougiesquire can you confirm I'm barking up the right tree with the intent of the ensemble kwarg here?