Update chunking tutorial - add stuff about disk chunks, dask graph, etc by charles-turner-1 · Pull Request #508 · COSIMA/cosima-recipes

charles-turner-1 · 2025-04-07T04:55:52Z

Renamed the tutorial to be more descriptive
Added some exposition on Dask Graphs, etc.

review-notebook-app · 2025-04-07T04:55:57Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Copilot

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

access-hive-bot · 2025-05-06T23:49:31Z

This pull request has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:

https://forum.access-hive.org.au/t/intake-vs-mfdataset/4445/19

charles-turner-1 · 2025-05-07T01:26:23Z

Hey @dougiesquire, @jemmajeffree - I've updated this to include some points relevant to the chunking forum discussion - give us a shout if you think anything needs changing/clarification!

…cosima-recipes into chunking-tutorial-update

review-notebook-app · 2025-05-09T03:27:05Z

View / edit / reply to this conversation on ReviewNB

jemmajeffree commented on 2025-05-09T03:27:05Z
----------------------------------------------------------------

Line #3.    # Open up the dashboard by clicking the launch button below - it'll help you to see what dask is doing when it runs expensive operations.

I use "task stream" and "progress" most, followed by "scheduler system" and "workers memory" - Is it worth making a suggestion for smth like this given how overwhelming the selection of yellow boxes can be?

charles-turner-1 commented on 2025-05-09T03:47:54Z
----------------------------------------------------------------

I've actually never changed mine from the defaults, which are task steam, worker memory & progress for me. TBH I'm mostly interested in just being able to see something (anything) is happening & the session hasn't died

jemmajeffree commented on 2025-05-09T03:56:19Z
----------------------------------------------------------------

you got defaults!? I started with nothing and a million options, so maybe it's changed recently or this is a slightly different approach that opens things.

Anyhow, if there are defaults then it's fine to leave without further suggestions

review-notebook-app · 2025-05-09T03:27:06Z

View / edit / reply to this conversation on ReviewNB

jemmajeffree commented on 2025-05-09T03:27:06Z
----------------------------------------------------------------

Line #4.    datastore.to_dask(xarray_open_kwargs = {'decode_timedelta' : False}) # We need this to avoid a bunch of annoying warnings.

"this" -> "{'decode_timedelta': False}" ; possibly ambiguous for new users

charles-turner-1 commented on 2025-05-09T03:48:22Z
----------------------------------------------------------------

yeah, will add a little more exposition there

charles-turner-1 commented on 2025-05-14T06:04:59Z
----------------------------------------------------------------

fixed above

review-notebook-app · 2025-05-09T03:27:07Z

View / edit / reply to this conversation on ReviewNB

jemmajeffree commented on 2025-05-09T03:27:06Z
----------------------------------------------------------------

is there a reason not to merge this idea with the previous cell? ie, time the ds = and finish with ds['u']?

charles-turner-1 commented on 2025-05-09T03:49:16Z
----------------------------------------------------------------

when you use the %%timeit cell magic you can't see any of the outputs - I think they're actually deleted by the cell magic to make benchmarking less unreliable

review-notebook-app · 2025-05-09T03:27:07Z

View / edit / reply to this conversation on ReviewNB

jemmajeffree commented on 2025-05-09T03:27:07Z
----------------------------------------------------------------

Move the dot points to after the "rule of thumb" line, to make it clearer why we want each file as a single chunk?

Also replace "Lets seem what happens if" with "Let's see what chunk size we get if"

review-notebook-app · 2025-05-09T03:27:08Z

View / edit / reply to this conversation on ReviewNB

jemmajeffree commented on 2025-05-09T03:27:08Z
----------------------------------------------------------------

I'm not quite sure what you're getting at with point 2. Isn't the argument more that the chunks were too small? If they were huge, you would definitely want to split each file into chunks even though we're opening all of them

Also, shouldn't the last point be in a dot point?

I wonder if it's worth having something in here about choosing chunk sizes for whatever you're going to do next. Ie, if you know you're only looking at the southern ocean, then maybe leave the chunks split in yu_ocean, or if only looking at the shallow ocean leave them split in st_ocean? I know this isn't the right place to give a full explanation of choosing chunks for each individual analysis, but I think a brief mention would help

charles-turner-1 commented on 2025-05-14T07:38:35Z
----------------------------------------------------------------

Yeah, I'm not quite sure what I'm saying with point 2 any more. I'm gonna leave it in for now in case I figure it out & it was important. If I don't I'll excise it before we merge.

review-notebook-app · 2025-05-09T03:27:09Z

View / edit / reply to this conversation on ReviewNB

jemmajeffree commented on 2025-05-09T03:27:08Z
----------------------------------------------------------------

pedagogically, I don't really like giving a deliberately bad example before a good one, and I'm worried about confusing people. I think this point might be better made using xu_ocean:400, and then explain afterwards why xu_ocean:120 is a bad idea

review-notebook-app · 2025-05-09T03:27:09Z

View / edit / reply to this conversation on ReviewNB

jemmajeffree commented on 2025-05-09T03:27:09Z
----------------------------------------------------------------

what happens if you use time:3 instead of 2, does it help at all? ie, because won't chunk sizes of 2 break across files?

review-notebook-app · 2025-05-09T03:27:10Z

View / edit / reply to this conversation on ReviewNB

jemmajeffree commented on 2025-05-09T03:27:10Z
----------------------------------------------------------------

Maybe I missed something, but I thought that warning meant that you're not respecting the chunks on disk because you haven't got an integer multiple? In which case I'd be less comfortable encouraging people to ignore it

review-notebook-app · 2025-05-09T03:27:11Z

View / edit / reply to this conversation on ReviewNB

jemmajeffree commented on 2025-05-09T03:27:10Z
----------------------------------------------------------------

I'm getting a little lost in this section, what I'm supposed to take from each cell. What would the output be here if they were problematic?

review-notebook-app · 2025-05-09T03:27:12Z

View / edit / reply to this conversation on ReviewNB

jemmajeffree commented on 2025-05-09T03:27:11Z
----------------------------------------------------------------

they're bridging across files though, in the time dimension. Is this not an issue? And, come to think about it, aren't you going to end up with (2,1) repeatedly through each file if there's no merging of chunks between files?

review-notebook-app · 2025-05-09T03:27:12Z

View / edit / reply to this conversation on ReviewNB

jemmajeffree commented on 2025-05-09T03:27:12Z
----------------------------------------------------------------

this pattern shows up in the task stream as a lot of flashing very thin yellow and green stripes, possibly worth mentioning as something to look for (there's lots of white, which is a sure sign something can be optimised)

review-notebook-app · 2025-05-09T03:27:13Z

View / edit / reply to this conversation on ReviewNB

jemmajeffree commented on 2025-05-09T03:27:13Z
----------------------------------------------------------------

I'm slightly concerned about somebody copy-pasting 'datavars':'minimal' without understanding what it does and then discovering that they can't load data from multiple variables into a single dataset later. Can we add a 1 line explanation?

review-notebook-app · 2025-05-09T03:27:14Z

View / edit / reply to this conversation on ReviewNB

jemmajeffree commented on 2025-05-09T03:27:13Z
----------------------------------------------------------------

is there a reason to have this step separate from the above? In what situation would you use 'coords': minmal without 'compat':'override' Especially since the thing that makes the difference and does what you said was important (overriding checks) is the compat override

review-notebook-app · 2025-05-09T03:27:15Z

View / edit / reply to this conversation on ReviewNB

jemmajeffree commented on 2025-05-09T03:27:14Z
----------------------------------------------------------------

Can we topload the TLDR into the start of the section? Knowing scientists and where they look for information...

review-notebook-app · 2025-05-09T03:27:15Z

View / edit / reply to this conversation on ReviewNB

jemmajeffree commented on 2025-05-09T03:27:15Z
----------------------------------------------------------------

Incidentally, I can think of situations in which I might want to run a rolling average through the ocean like that (ie, I want to strip out high frequency signals but can't be bothered convolving with a gaussian etc) wouldn't recommend changing anything, but you might be interested

charles-turner-1 · 2025-05-09T03:47:55Z

I've actually never changed mine from the defaults, which are task steam, worker memory & progress for me. TBH I'm mostly interested in just being able to see something (anything) is happening & the session hasn't died

View entire conversation on ReviewNB

charles-turner-1 · 2025-05-09T03:48:23Z

yeah, will add a little more exposition there

View entire conversation on ReviewNB

charles-turner-1 · 2025-05-09T03:49:17Z

when you use the %%timeit cell magic you can't see any of the outputs - I think they're actually deleted by the cell magic to make benchmarking less unreliable

View entire conversation on ReviewNB

jemmajeffree · 2025-05-09T03:56:21Z

you got defaults!? I started with nothing and a million options, so maybe it's changed recently or this is a slightly different approach that opens things.

Anyhow, if there are defaults then it's fine to leave without further suggestions

View entire conversation on ReviewNB

charles-turner-1 · 2025-05-11T23:24:09Z

I think the chunking section needs a hefty rewrite TBH, being able to look directly at the chunks now has changed the approach we wanna take - will re-request a review when that's done.

…to have in there

charles-turner-1 · 2025-05-14T06:05:00Z

fixed above

View entire conversation on ReviewNB

charles-turner-1 · 2025-05-14T07:38:36Z

Yeah, I'm not quite sure what I'm saying with point 2 any more. I'm gonna leave it in for now in case I figure it out & it was important. If I don't I'll excise it before we merge.

View entire conversation on ReviewNB

Update chunking tutorial - add stuff about dask graph, etc

0c4fb66

charles-turner-1 requested a review from Copilot April 7, 2025 04:58

Copilot AI reviewed Apr 7, 2025

View reviewed changes

charles-turner-1 marked this pull request as ready for review April 7, 2025 04:58

charles-turner-1 requested a review from navidcy April 7, 2025 04:58

charles-turner-1 mentioned this pull request Apr 7, 2025

New chunking tutorial #442

Closed

Add some extra notes on chunking

2d32098

adele-morrison requested a review from jemmajeffree May 7, 2025 00:44

Update tutorial

1f11480

charles-turner-1 requested a review from dougiesquire May 7, 2025 01:25

charles-turner-1 force-pushed the chunking-tutorial-update branch from 6a2e0d1 to 1f11480 Compare May 7, 2025 01:33

charles-turner-1 and others added 3 commits May 7, 2025 12:10

Merge branch 'main' into chunking-tutorial-update

ab97690

Fix chunking info - 920 => 3

3b7c2eb

Merge branch 'chunking-tutorial-update' of https://github.com/COSIMA/…

1c77416

…cosima-recipes into chunking-tutorial-update

Update tutorial - validate_chunkspec has really changed what we need …

c7e7899

…to have in there

Rewrite for clarity

05a3d46

charles-turner-1 changed the title ~~Update chunking tutorial - add stuff about dask graph, etc~~ Update chunking tutorial - add stuff about disk chunks, dask graph, etc May 14, 2025

charles-turner-1 and others added 3 commits May 14, 2025 15:44

Fixed typo

eb0dde9

Merge branch 'main' into chunking-tutorial-update

611c12b

Fix a few things

6ad2c52

rbeucher approved these changes May 22, 2025

View reviewed changes

charles-turner-1 merged commit 8194ce2 into main May 22, 2025
3 checks passed

charles-turner-1 deleted the chunking-tutorial-update branch May 22, 2025 02:41

Conversation

charles-turner-1 commented Apr 7, 2025

Uh oh!

review-notebook-app Bot commented Apr 7, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

access-hive-bot commented May 6, 2025

Uh oh!

charles-turner-1 commented May 7, 2025

Uh oh!

review-notebook-app Bot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app Bot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app Bot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app Bot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app Bot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app Bot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app Bot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app Bot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app Bot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app Bot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app Bot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app Bot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app Bot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app Bot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app Bot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charles-turner-1 commented May 9, 2025

Uh oh!

charles-turner-1 commented May 9, 2025

Uh oh!

charles-turner-1 commented May 9, 2025

Uh oh!

jemmajeffree commented May 9, 2025

Uh oh!

charles-turner-1 commented May 11, 2025

Uh oh!

charles-turner-1 commented May 14, 2025

Uh oh!

charles-turner-1 commented May 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

review-notebook-app Bot commented May 9, 2025 •

edited

Loading

review-notebook-app Bot commented May 9, 2025 •

edited

Loading

review-notebook-app Bot commented May 9, 2025 •

edited

Loading

review-notebook-app Bot commented May 9, 2025 •

edited

Loading

review-notebook-app Bot commented May 9, 2025 •

edited

Loading

review-notebook-app Bot commented May 9, 2025 •

edited

Loading

review-notebook-app Bot commented May 9, 2025 •

edited

Loading

review-notebook-app Bot commented May 9, 2025 •

edited

Loading

review-notebook-app Bot commented May 9, 2025 •

edited

Loading

review-notebook-app Bot commented May 9, 2025 •

edited

Loading

review-notebook-app Bot commented May 9, 2025 •

edited

Loading

review-notebook-app Bot commented May 9, 2025 •

edited

Loading

review-notebook-app Bot commented May 9, 2025 •

edited

Loading

review-notebook-app Bot commented May 9, 2025 •

edited

Loading

review-notebook-app Bot commented May 9, 2025 •

edited

Loading