-
Notifications
You must be signed in to change notification settings - Fork 136
Add notebook demonstrating workflow with TEMPO Level 3 data as a virtual dataset #924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add notebook demonstrating workflow with TEMPO Level 3 data as a virtual dataset #924
Conversation
|
I will automatically update this comment whenever this PR is modified
|
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
|
Note: discussion of the data-related issue for the tutorial when using TEMPO Level-2 data has been happening over in zarr-developers/VirtualiZarr#487. |
|
Alright, I've changed tactics for the short-term due to various challenges (see previous comments above) of opening Level-2 TEMPO data as a virtual dataset. New notebookWith plans to still work on that, I've decided to create a new draft notebook that works well with Level-3 data from TEMPO. See it here. This notebook demonstrates working with a year's worth of TEMPO Level-3 data (it is 4,867 granules), including:
TimingsI've included
Would love to hear initial thoughts/comments/questions/suggestions on this! |
|
I think this looks okay now, but could use other pairs of eyes on it. As part of reviewing this, I would appreciate any suggestions or comments — whether on content, formatting, presentation, etc! |
|
IIUC, then because you're passing |
Hmmm, @TomNicholas, I think you are correct as far as what is returned from the Either way, this may be a place where there could be improvements in documentation to make this clearer. |
|
I think that's right, yes. But that does mean that in this notebook the user never sees a virtual dataset. It's purely an internal optimization at opening-time by FWIW this relates to my proposal that what earthaccess should be used for is to generate Icechunk stores of interest, such as one for all TEMPO Level 3 data (containing virtual chunks), that people then open directly. See #956. In that paradigm you use basically the same notebook to create an actually virtual dataset, commit that to Icechunk, then tell any users who want to use TEMPO Level 3 data to simply open that Icechunk store. |
|
@TomNicholas @danielfromearth yes the This was implemented before ManifestStore was added to virtualizarr and that should be a much cleaner way of loading the dataset. Basically we can get rid of the Also agree with the point that really the best way of doing this is some kind of combined Icechunk store for each collection that is constantly updated as data comes in. |
|
Well, in the short term, without Icechunk stores in Earthdata, do folks think it would still be beneficial to have this tutorial notebook here in the earthaccess repository? @TomNicholas, @ayushnag, do you think some wording changes — to avoid confusion regarding the meaning of 'virtual dataset' — would suffice for now? Would folks rather this demonstration notebook be put in a place outside of |
|
I think in the longer-term sense, it would probably be best if we had another similar notebook with icechunk and virtualizarr methods in the Cloud Cookbook, and this notebook could go in the |
… use too much memory
|
alright, I've tightened up the text and presentation a bit more, and thanks @betolink for the help with the Markdown fix. The notebook now uses a smaller example of a week's worth of data so that the notebook cells run 15–30 seconds each during the CI docs build, instead of multiple minutes each. Reviewers: do you think it is ready for approval? |
battistowx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great!
This change adds a tutorial notebook, which shows how to work with TEMPO Level-3 data using the new
earthaccess.open_virtual_mfdataset()functionality.Addresses #903.
Pull Request (PR) draft checklist - click to expand
contributing documentation
before getting started.
title such as "Add testing details to the contributor section of the README".
Example PRs: #763
example
closes #1. SeeGitHub docs - Linking a pull request to an issue.
CHANGELOG.mdwith details about your change in a section titled## Unreleased. If such a section does not exist, please create one. FollowCommon Changelog for your additions.
Example PRs: #763
README.mdwith details of changes to theearthaccess interface, if any. Consider new environment variables, function names,
decorators, etc.
Click the "Ready for review" button at the bottom of the "Conversation" tab in GitHub
once these requirements are fulfilled. Don't worry if you see any test failures in
GitHub at this point!
Pull Request (PR) merge checklist - click to expand
Please do your best to complete these requirements! If you need help with any of these
requirements, you can ping the
@nsidc/earthaccess-supportteam in a comment and wewill help you out!
Request containing "pre-commit.ci autofix" to automate this.
📚 Documentation preview 📚: https://earthaccess--924.org.readthedocs.build/en/924/