Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds open_datatree and load_datatree to the tutorial module #10082

Merged
merged 9 commits into from
Mar 18, 2025

Conversation

eni-awowale
Copy link
Collaborator

@eni-awowale eni-awowale commented Feb 27, 2025

Adds open_datatree and load_datatree to the tutorial module

  • Closes #xxxx
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got two comments. Other than that, we'll probably have to refactor the way we call pooch since we now basically duplicated the code of open_dataset (doesn't have to be in this PR, though).

cache_dir = tmp_path / tutorial._default_cache_dir_name
ds = tutorial.open_dataset(self.testfile, cache_dir=cache_dir).load()
ds = tutorial.open_dataset(testfile, cache_dir=cache_dir).load()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably better to just hard-code the dataset name into the test, there's no point in parametrizing this (to be clear, this part of the test suite is pretty old):

Suggested change
ds = tutorial.open_dataset(testfile, cache_dir=cache_dir).load()
ds = tutorial.open_dataset("tiny", cache_dir=cache_dir).load()

url = external_urls[name]
else:
path = pathlib.Path(name)
if not path.suffix:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the hdf5 file work with both netcdf4 and h5netcdf? Otherwise we might need to specialize, like we do with grib

Copy link
Collaborator Author

@eni-awowale eni-awowale Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, imerghh_730.HDF5 and imerghh_830.HDF5, works for both engines. I think if we wanted to add hdf5 files without named dimensions we would want to specify the engine as h5netcdf.

EDIT:
Since pydata/xarray-data#32 was merged, we do have to explicitly add the extension, e.g. xr.tutorial.open_datatree('imerghh_830.hdf5'), otherwise it defaults to .nc

from xarray import DataArray, tutorial
from xarray.tests import assert_identical, network
from xarray import DataArray, DataTree, tutorial
from xarray.testing import assert_identical
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this to use the xarray.testing module's assert_identical because xarray.tests didn't support DataTree objects.

@eni-awowale eni-awowale marked this pull request as ready for review February 27, 2025 22:03
@TomNicholas TomNicholas added topic-documentation topic-DataTree Related to the implementation of a DataTree class labels Feb 28, 2025
@eni-awowale
Copy link
Collaborator Author

FYI these checks look like they are failing in main 😬

@keewis
Copy link
Collaborator

keewis commented Feb 28, 2025

yep, that's a change to array-api-strict. If you want to fix them within the next four hours, I think it should be sufficient to cast the condition of duck_array_ops.where to a bool dtype (otherwise I'll send in a PR myself).

@eni-awowale
Copy link
Collaborator Author

Sure! Is there a issue for this yet? I can get a PR started and you can jump in when you're free.

@keewis
Copy link
Collaborator

keewis commented Feb 28, 2025

there's #10084, but nothing else. I think you can just open the PR

@eni-awowale
Copy link
Collaborator Author

Thanks @dcherian for merging the temp fix!

@keewis was there anything else you think we should add?

@dcherian dcherian requested a review from TomNicholas March 18, 2025 19:11
Comment on lines +270 to +271
* ``"imerghh_730"``: GPM IMERG Final Precipitation L3 Half Hourly 0.1 degree x 0.1 degree V07 from 2021-08-29T07:30:00.000Z
* ``"imerghh_830"``: GPM IMERG Final Precipitation L3 Half Hourly 0.1 degree x 0.1 degree V07 from 2021-08-29T08:30:00.000Z
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to go, though I do think that for tutorial documentation on xarray.DataTree we might want to add some other possible datasets because these two don't really have a structure that fully requires/shows off the use of DataTree (as I discussed with @eni-awowale the other day).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense to me! I think we can modify the IMERG dataset as you suggested and maybe note that the modified version is derived from this original product.

@TomNicholas TomNicholas merged commit bd92782 into pydata:main Mar 18, 2025
31 checks passed
dcherian added a commit to dcherian/xarray that referenced this pull request Mar 19, 2025
* main: (85 commits)
  Adds open_datatree and load_datatree to the tutorial module (pydata#10082)
  Fix version in requires_zarr_v3 fixture (pydata#10145)
  Fix `open_datatree` when `decode_cf=False` (pydata#10141)
  [docs] `DataTree` cannot be constructed from `DataArray` (pydata#10142)
  Refactor datetime and timedelta encoding for increased robustness (pydata#9498)
  Fix test_distributed::test_async (pydata#10138)
  Refactor concat / combine / merge into `xarray/structure` (pydata#10134)
  Split `apply_ufunc` out of `computation.py` (pydata#10133)
  Refactor modules from `core` into `xarray.computation` (pydata#10132)
  Refactor compatibility modules into xarray.compat package (pydata#10131)
  Fix type issues from pandas stubs (pydata#10128)
  Don't skip tests when on a `mypy` branch (pydata#10129)
  Change `python_files` in `pyproject.toml` to a list (pydata#10127)
  Better `uv` compatibility (pydata#10124)
  explicitly cast the dtype of `where`'s condition parameter to `bool` (pydata#10087)
  Use `to_numpy` in time decoding (pydata#10081)
  Pin pandas stubs (pydata#10119)
  Fix broken Zarr test (pydata#10109)
  Update asv badge url in README.md (pydata#10113)
  fix and supress some test warnings (pydata#10104)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class topic-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants