Adds open_datatree and load_datatree to the tutorial module #10082

eni-awowale · 2025-02-27T20:16:15Z

Adds open_datatree and load_datatree to the tutorial module

Closes #xxxx
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

keewis

I've got two comments. Other than that, we'll probably have to refactor the way we call pooch since we now basically duplicated the code of open_dataset (doesn't have to be in this PR, though).

keewis · 2025-02-27T20:35:44Z

xarray/tests/test_tutorial.py

        cache_dir = tmp_path / tutorial._default_cache_dir_name
-        ds = tutorial.open_dataset(self.testfile, cache_dir=cache_dir).load()
+        ds = tutorial.open_dataset(testfile, cache_dir=cache_dir).load()


it's probably better to just hard-code the dataset name into the test, there's no point in parametrizing this (to be clear, this part of the test suite is pretty old):

Suggested change

ds = tutorial.open_dataset(testfile, cache_dir=cache_dir).load()

ds = tutorial.open_dataset("tiny", cache_dir=cache_dir).load()

keewis · 2025-02-27T20:42:55Z

xarray/tutorial.py

+        url = external_urls[name]
+    else:
+        path = pathlib.Path(name)
+        if not path.suffix:


do the hdf5 file work with both netcdf4 and h5netcdf? Otherwise we might need to specialize, like we do with grib

Yes, imerghh_730.HDF5 and imerghh_830.HDF5, works for both engines. I think if we wanted to add hdf5 files without named dimensions we would want to specify the engine as h5netcdf.

EDIT:
Since pydata/xarray-data#32 was merged, we do have to explicitly add the extension, e.g. xr.tutorial.open_datatree('imerghh_830.hdf5'), otherwise it defaults to .nc

eni-awowale · 2025-02-27T20:20:22Z

xarray/tests/test_tutorial.py

-from xarray import DataArray, tutorial
-from xarray.tests import assert_identical, network
+from xarray import DataArray, DataTree, tutorial
+from xarray.testing import assert_identical


Updated this to use the xarray.testing module's assert_identical because xarray.tests didn't support DataTree objects.

eni-awowale · 2025-02-28T15:06:18Z

FYI these checks look like they are failing in main 😬

keewis · 2025-02-28T15:31:33Z

yep, that's a change to array-api-strict. If you want to fix them within the next four hours, I think it should be sufficient to cast the condition of duck_array_ops.where to a bool dtype (otherwise I'll send in a PR myself).

eni-awowale · 2025-02-28T15:48:17Z

Sure! Is there a issue for this yet? I can get a PR started and you can jump in when you're free.

keewis · 2025-02-28T16:32:26Z

there's #10084, but nothing else. I think you can just open the PR

eni-awowale · 2025-03-10T14:55:10Z

Thanks @dcherian for merging the temp fix!

@keewis was there anything else you think we should add?

TomNicholas · 2025-03-18T19:28:07Z

xarray/tutorial.py

+    * ``"imerghh_730"``: GPM IMERG Final Precipitation L3 Half Hourly 0.1 degree x 0.1 degree V07 from 2021-08-29T07:30:00.000Z
+    * ``"imerghh_830"``: GPM IMERG Final Precipitation L3 Half Hourly 0.1 degree x 0.1 degree V07 from 2021-08-29T08:30:00.000Z


I think this is good to go, though I do think that for tutorial documentation on xarray.DataTree we might want to add some other possible datasets because these two don't really have a structure that fully requires/shows off the use of DataTree (as I discussed with @eni-awowale the other day).

Yeah that makes sense to me! I think we can modify the IMERG dataset as you suggested and maybe note that the modified version is derived from this original product.

* main: (85 commits) Adds open_datatree and load_datatree to the tutorial module (pydata#10082) Fix version in requires_zarr_v3 fixture (pydata#10145) Fix `open_datatree` when `decode_cf=False` (pydata#10141) [docs] `DataTree` cannot be constructed from `DataArray` (pydata#10142) Refactor datetime and timedelta encoding for increased robustness (pydata#9498) Fix test_distributed::test_async (pydata#10138) Refactor concat / combine / merge into `xarray/structure` (pydata#10134) Split `apply_ufunc` out of `computation.py` (pydata#10133) Refactor modules from `core` into `xarray.computation` (pydata#10132) Refactor compatibility modules into xarray.compat package (pydata#10131) Fix type issues from pandas stubs (pydata#10128) Don't skip tests when on a `mypy` branch (pydata#10129) Change `python_files` in `pyproject.toml` to a list (pydata#10127) Better `uv` compatibility (pydata#10124) explicitly cast the dtype of `where`'s condition parameter to `bool` (pydata#10087) Use `to_numpy` in time decoding (pydata#10081) Pin pandas stubs (pydata#10119) Fix broken Zarr test (pydata#10109) Update asv badge url in README.md (pydata#10113) fix and supress some test warnings (pydata#10104) ...

eni-awowale and others added 3 commits February 27, 2025 14:35

added tutorial.open_datatree and tutorial.load_datatree

9a42379

updated tests to use fixture

0caf54a

Merge branch 'main' into feature/tutorial-open-datatree

ef4cbaa

keewis reviewed Feb 27, 2025

View reviewed changes

added whats-new.rst and api.rst

e092710

eni-awowale commented Feb 27, 2025

View reviewed changes

eni-awowale added 2 commits February 27, 2025 16:15

added suggestions fixed formatting for docs

37d5cc0

Added longname of GPM_3IMERGHH_07

d15e1a9

eni-awowale marked this pull request as ready for review February 27, 2025 22:03

TomNicholas added topic-documentation topic-DataTree Related to the implementation of a DataTree class labels Feb 28, 2025

eni-awowale and others added 2 commits February 28, 2025 11:10

Merge branch 'main' into feature/tutorial-open-datatree

aab8e1d

Merge branch 'main' into feature/tutorial-open-datatree

1529727

Merge branch 'main' into feature/tutorial-open-datatree

92fbd48

dcherian requested a review from TomNicholas March 18, 2025 19:11

TomNicholas approved these changes Mar 18, 2025

View reviewed changes

TomNicholas merged commit bd92782 into pydata:main Mar 18, 2025
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds open_datatree and load_datatree to the tutorial module #10082

Adds open_datatree and load_datatree to the tutorial module #10082

eni-awowale commented Feb 27, 2025 •

edited

Loading

keewis left a comment

keewis Feb 27, 2025

keewis Feb 27, 2025

eni-awowale Feb 27, 2025 •

edited

Loading

eni-awowale Feb 27, 2025

eni-awowale commented Feb 28, 2025

keewis commented Feb 28, 2025

eni-awowale commented Feb 28, 2025

keewis commented Feb 28, 2025

eni-awowale commented Mar 10, 2025

TomNicholas Mar 18, 2025

eni-awowale Mar 19, 2025

	ds = tutorial.open_dataset(testfile, cache_dir=cache_dir).load()
	ds = tutorial.open_dataset("tiny", cache_dir=cache_dir).load()

		* ``"imerghh_730"``: GPM IMERG Final Precipitation L3 Half Hourly 0.1 degree x 0.1 degree V07 from 2021-08-29T07:30:00.000Z
		* ``"imerghh_830"``: GPM IMERG Final Precipitation L3 Half Hourly 0.1 degree x 0.1 degree V07 from 2021-08-29T08:30:00.000Z

Adds open_datatree and load_datatree to the tutorial module #10082

Adds open_datatree and load_datatree to the tutorial module #10082

Conversation

eni-awowale commented Feb 27, 2025 • edited Loading

keewis left a comment

Choose a reason for hiding this comment

keewis Feb 27, 2025

Choose a reason for hiding this comment

keewis Feb 27, 2025

Choose a reason for hiding this comment

eni-awowale Feb 27, 2025 • edited Loading

Choose a reason for hiding this comment

eni-awowale Feb 27, 2025

Choose a reason for hiding this comment

eni-awowale commented Feb 28, 2025

keewis commented Feb 28, 2025

eni-awowale commented Feb 28, 2025

keewis commented Feb 28, 2025

eni-awowale commented Mar 10, 2025

TomNicholas Mar 18, 2025

Choose a reason for hiding this comment

eni-awowale Mar 19, 2025

Choose a reason for hiding this comment

eni-awowale commented Feb 27, 2025 •

edited

Loading

eni-awowale Feb 27, 2025 •

edited

Loading