Add notebook demonstrating workflow with TEMPO Level 3 data as a virtual dataset #924

danielfromearth · 2025-01-08T22:22:36Z

This change adds a tutorial notebook, which shows how to work with TEMPO Level-3 data using the new earthaccess.open_virtual_mfdataset() functionality.
Addresses #903.

Pull Request (PR) draft checklist - click to expand

Please review our
contributing documentation
before getting started.
Populate a descriptive title. For example, instead of "Updated README.md", use a
title such as "Add testing details to the contributor section of the README".
Example PRs: #763
Populate the body of the pull request with:
- A clear description of the change you are proposing.
- Links to any issues resolved by this PR with text in the PR description, for
  example closes #1. See
  GitHub docs - Linking a pull request to an issue.
Update CHANGELOG.md with details about your change in a section titled
## Unreleased. If such a section does not exist, please create one. Follow
Common Changelog for your additions.
Example PRs: #763
[n/a] Update the documentation and/or the README.md with details of changes to the
earthaccess interface, if any. Consider new environment variables, function names,
decorators, etc.

Click the "Ready for review" button at the bottom of the "Conversation" tab in GitHub
once these requirements are fulfilled. Don't worry if you see any test failures in
GitHub at this point!

Pull Request (PR) merge checklist - click to expand

Please do your best to complete these requirements! If you need help with any of these
requirements, you can ping the @nsidc/earthaccess-support team in a comment and we
will help you out!

[n/a] Add unit tests for any new features.
Apply formatting and linting autofixes. You can add a GitHub comment in this Pull
Request containing "pre-commit.ci autofix" to automate this.
Ensure all automated PR checks (seen at the bottom of the "conversation" tab) pass.
Get at least one approving review.

📚 Documentation preview 📚: https://earthaccess--924.org.readthedocs.build/en/924/

github-actions · 2025-01-08T22:22:52Z

👈 Launch a binder notebook on this branch for commit 63bae71

I will automatically update this comment whenever this PR is modified

👈 Launch a binder notebook on this branch for commit aed618c

👈 Launch a binder notebook on this branch for commit 34ccf9b

👈 Launch a binder notebook on this branch for commit 98ee876

👈 Launch a binder notebook on this branch for commit b27ec52

👈 Launch a binder notebook on this branch for commit c921e3c

👈 Launch a binder notebook on this branch for commit 31b3caa

👈 Launch a binder notebook on this branch for commit a5572e1

👈 Launch a binder notebook on this branch for commit 9b6dfda

👈 Launch a binder notebook on this branch for commit 14bc121

👈 Launch a binder notebook on this branch for commit 40c8a43

👈 Launch a binder notebook on this branch for commit 88ddaa0

👈 Launch a binder notebook on this branch for commit 2ccc2cf

👈 Launch a binder notebook on this branch for commit b8483c0

👈 Launch a binder notebook on this branch for commit eb02ef9

👈 Launch a binder notebook on this branch for commit 1ef7ed7

👈 Launch a binder notebook on this branch for commit b30116c

👈 Launch a binder notebook on this branch for commit c62dab5

👈 Launch a binder notebook on this branch for commit 3c354e5

danielfromearth · 2025-04-04T13:41:07Z

Note: discussion of the data-related issue for the tutorial when using TEMPO Level-2 data has been happening over in zarr-developers/VirtualiZarr#487.

…vel 3* data

danielfromearth · 2025-04-11T20:59:49Z

Alright, I've changed tactics for the short-term due to various challenges (see previous comments above) of opening Level-2 TEMPO data as a virtual dataset.

New notebook

With plans to still work on that, I've decided to create a new draft notebook that works well with Level-3 data from TEMPO. See it here. This notebook demonstrates working with a year's worth of TEMPO Level-3 data (it is 4,867 granules), including:

loading several netCDF groups using earthaccess.open_virtual_mfdataset with load=True so they are indexed
merging the groups into one Dataset
computing spatial and temporal means for a subset of the data, and
creating plots of the results.

Timings

I've included %%time markings on many of the notebook cells to indicate the performance along the way.
The most substantial steps took:

~8 min. to use earthaccess.open_virtual_mfdataset() for opening three netCDF groups from the year's worth of granules.
~9 min. to compute mean over time to create a plot

Would love to hear initial thoughts/comments/questions/suggestions on this!

@ayushnag, @betolink, @battistowx, @TomNicholas

…using-TEMPO-data

…ta as a virtual dataset

danielfromearth · 2025-04-15T15:10:53Z

I think this looks okay now, but could use other pairs of eyes on it. As part of reviewing this, I would appreciate any suggestions or comments — whether on content, formatting, presentation, etc!

danielfromearth · 2025-04-18T20:42:25Z

I've shortened the notebook slightly by removing the %%time timings, since performance is being benchmarked in discussion #987 and PR #989.

TomNicholas · 2025-04-19T03:10:27Z

IIUC, then because you're passing load=True, then at no point is this code currently creating a "virtual" dataset.

danielfromearth · 2025-04-21T14:15:29Z

IIUC, then because you're passing load=True, then at no point is this code currently creating a "virtual" dataset.

Hmmm, @TomNicholas, I think you are correct as far as what is returned from the earthaccess.open_virtual_mfdataset() method, but I think there is still a virtual dataset being created temporarily. The way I understand dmrpp_zarr.py: regardless of whether load=True or load=False, the code creates a virtual dataset as an intermediate step, on line 112 in dmrpp_zarr.py, before converting them to kerchunk references and passing them to xarray in the load block, lines 129–131. Does that sound/look right?

Either way, this may be a place where there could be improvements in documentation to make this clearer.

TomNicholas · 2025-04-21T14:58:11Z

I think that's right, yes. But that does mean that in this notebook the user never sees a virtual dataset. It's purely an internal optimization at opening-time by earthaccess.

FWIW this relates to my proposal that what earthaccess should be used for is to generate Icechunk stores of interest, such as one for all TEMPO Level 3 data (containing virtual chunks), that people then open directly. See #956. In that paradigm you use basically the same notebook to create an actually virtual dataset, commit that to Icechunk, then tell any users who want to use TEMPO Level 3 data to simply open that Icechunk store.

ayushnag · 2025-04-23T18:38:02Z

@TomNicholas @danielfromearth yes the load=True param does create a temporary virtual reference file and then immediately loads it with kerchunk. This is because the function targets assumes the user is making this request for the first time and the combined manifest file needs to be generated first. However if you already have the manifest, you can avoid all these steps entirely and just load with kerchunk/virtualizarr.

This was implemented before ManifestStore was added to virtualizarr and that should be a much cleaner way of loading the dataset. Basically we can get rid of the load param since the user can easily access data once they have the virtual xarray dataset.

Also agree with the point that really the best way of doing this is some kind of combined Icechunk store for each collection that is constantly updated as data comes in.

danielfromearth · 2025-04-29T21:16:00Z

Well, in the short term, without Icechunk stores in Earthdata, do folks think it would still be beneficial to have this tutorial notebook here in the earthaccess repository?

@TomNicholas, @ayushnag, do you think some wording changes — to avoid confusion regarding the meaning of 'virtual dataset' — would suffice for now?

Would folks rather this demonstration notebook be put in a place outside of earthaccess, for example in the Earthdata Cloud Cookbook, or in the ASDC Data and User Services page?

battistowx · 2025-05-02T15:14:33Z

I think in the longer-term sense, it would probably be best if we had another similar notebook with icechunk and virtualizarr methods in the Cloud Cookbook, and this notebook could go in the earthaccess docs. We can elaborate much more on icechunk in the cloud cookbook notebook too.

…ebook

…using-TEMPO-data

… use too much memory

…fix subscript

danielfromearth · 2025-05-21T22:06:08Z

alright, I've tightened up the text and presentation a bit more, and thanks @betolink for the help with the Markdown fix. The notebook now uses a smaller example of a week's worth of data so that the notebook cells run 15–30 seconds each during the CI docs build, instead of multiple minutes each.

Reviewers: do you think it is ready for approval?

battistowx

This is looking great!

initial draft of TEMPO virtual dataset tutorial

63bae71

danielfromearth linked an issue Jan 8, 2025 that may be closed by this pull request

Add tutorial notebook for open_virtual_dataset #903

Closed

This comment was marked as resolved.

Sign in to view

danielfromearth added the impact: documentation Improvements or additions to documentation label Jan 24, 2025

danielfromearth self-assigned this Jan 24, 2025

This comment was marked as resolved.

Sign in to view

This comment was marked as outdated.

Sign in to view

add draft notebook demonstrating virtual dataset usage with TEMPO *Le…

aed618c

…vel 3* data

danielfromearth added 4 commits April 14, 2025 09:40

Merge branch 'main' into issue-903-tutorial-for-open_virtual_dataset-…

34ccf9b

…using-TEMPO-data

remove draft notebook that demonstrates working with Level-2 TEMPO da…

98ee876

…ta as a virtual dataset

update summary and prerequisites text

b27ec52

use dask ProgressBar and update text

c921e3c

danielfromearth marked this pull request as ready for review April 15, 2025 15:02

danielfromearth requested review from asteiker, ayushnag, battistowx and betolink April 15, 2025 15:02

danielfromearth changed the title ~~initial draft of TEMPO virtual dataset tutorial~~ Add notebook demonstrating workflow with TEMPO Level 3 data as a virtual dataset Apr 15, 2025

update CHANGELOG.md

31b3caa

danielfromearth requested review from andypbarrett and mfisher87 April 15, 2025 15:11

remove timings from notebook

a5572e1

danielfromearth added 2 commits May 13, 2025 17:03

replace existing virtual dataset tutorial with TEMPO level-3 demo not…

9b6dfda

…ebook

Merge branch 'main' into issue-903-tutorial-for-open_virtual_dataset-…

cacff66

…using-TEMPO-data

battistowx previously approved these changes May 14, 2025

View reviewed changes

add cartopy to docs dependency group

14bc121

danielfromearth dismissed battistowx’s stale review via 14bc121 May 15, 2025 14:22

danielfromearth added 3 commits May 20, 2025 09:37

Merge branch 'main' into issue-903-tutorial-for-open_virtual_dataset-…

40c8a43

…using-TEMPO-data

fix typo

310dc44

don't execute TEMPO virtual dataset tutorial notebook because it will…

88ddaa0

… use too much memory

danielfromearth requested a review from battistowx May 20, 2025 15:20

danielfromearth and others added 7 commits May 20, 2025 11:29

update note regarding load=True creating a temporary virtual dataset

2ccc2cf

change title and separate header cells

b8483c0

fix markdown

eb02ef9

remove 'execute_ignore' for tempo tutorial

1ef7ed7

remove dask progress bar to avoid bar printing every line in online docs

b30116c

fix typo of date range

c62dab5

change date range from a month to a week; add note about date range; …

3c354e5

…fix subscript

battistowx approved these changes May 21, 2025

View reviewed changes

danielfromearth merged commit 7f6bc6e into main May 23, 2025
11 checks passed

danielfromearth deleted the issue-903-tutorial-for-open_virtual_dataset-using-TEMPO-data branch May 23, 2025 13:55

danielfromearth added this to the As an earthaccess user, I want to open a collection-level reference file as a virtual dataset milestone Jul 22, 2025

mfisher87 mentioned this pull request Nov 19, 2025

[BUG] A few broken links on readthedocs site #1148

Open

1 task

Add notebook demonstrating workflow with TEMPO Level 3 data as a virtual dataset #924

Add notebook demonstrating workflow with TEMPO Level 3 data as a virtual dataset #924

Uh oh!

Conversation

danielfromearth commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as outdated.

danielfromearth commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielfromearth commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New notebook

Timings

Uh oh!

danielfromearth commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielfromearth commented Apr 18, 2025

Uh oh!

TomNicholas commented Apr 19, 2025

Uh oh!

danielfromearth commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomNicholas commented Apr 21, 2025

Uh oh!

ayushnag commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielfromearth commented Apr 29, 2025

Uh oh!

battistowx commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielfromearth commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

battistowx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

danielfromearth commented Jan 8, 2025 •

edited

Loading

github-actions bot commented Jan 8, 2025 •

edited

Loading

danielfromearth commented Apr 4, 2025 •

edited

Loading

danielfromearth commented Apr 11, 2025 •

edited

Loading

danielfromearth commented Apr 15, 2025 •

edited

Loading

danielfromearth commented Apr 21, 2025 •

edited

Loading

ayushnag commented Apr 23, 2025 •

edited

Loading

battistowx commented May 2, 2025 •

edited

Loading

danielfromearth commented May 21, 2025 •

edited

Loading