-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SeasonGrouper, SeasonResampler #9524
base: main
Are you sure you want to change the base?
Conversation
54b2ef1
to
594d4a7
Compare
First comment, but I have performed only quick test 1 -In your short example, it's probably : Or my Github knowledge is too limited, and I'm not testing the right branch. 2 - Season grouperSeems OK for all I have tested. In particular I can :
3 - Season resamplerWorks as expected from the example. It could be useful to have a NaN value for an incomplete season : the first DJF cannot not be computed, and is not. This mean that the first value is not a DJF one, but a MAM value. Could be a bit misleading. 4 - cftimeI have tested it with cftime calendars instead of datetime. It works with the traditional calendar (gregorian, standard). But not with others like 360_day, 365_day, julian., proleptic_gregorian : 5 - Simple dataI've build a dataset with the number ot the month as a variable. So I'm sure that the computation is correct. Thanks' for these features. They are quit easy and straigthforward to use. In particular, it allows to work on variables, as xcdat features work on Dataset only, which yields a more complicated syntax. I'm gonna try to imagine further tests. Olivier |
Thanks @oliviermarti ! this is incredibly helpful
Yes, my mistake. I fixed the snippet.
This should not work, did you really get correct results.
The |
In fact not ! Only the first value is correct. A bit dangerous that it returns a result and not an error.
Olivier |
Hi @dcherian, thank you for this PR! I've been looking forward to having this feature in Xarray. No guarantees on a timeline, but I plan to start looking at this PR this week. I'll experiment with this feature and see how I can leverage it to simplify xCDAT PR #423 for custom seasons. I'll also try to contribute any useful tests. |
These two groupers allow defining custom seasons, and dropping incomplete seasons from the output. Both cases are treated by adjusting the factorization -- conversion from group labels to integer codes -- appropriately.
9180536
to
77dc5e0
Compare
Hey @dcherian, quick question. Will this PR add support for using For example, if I wanted to perform grouped averaging on year and custom seasons it might look like: ds.air.groupby(time=[ds.time.dt.year, SeasonGrouper(["JF", "MAM", "JJAS", "OND"])]).mean() |
Another question: If we're defining custom seasons with months that span the calendar year, those months are from the previous year correct? For example for "NDJFM", "ND" should be from the previous year. air.groupby(year=UniqueGrouper(), time=SeasonGrouper(["NDJFM"])) |
Yes it tried to be that smart |
@tomvothecoder @oliviermarti i fixed the existing tests now, please try it out! FWIW the need to support |
7aaafb2
to
c66ad96
Compare
I'm writing a few tests right now. How do you want me to add them to your fork branch?
I noticed in a test I'm writing for the above code that "ND" is being taken from the same year, not the previous year. I think we expect the previous year "ND" to be used instead. I will show a clear example once I add the test. |
Ah nice find. A PR to this branch should be the easiest |
* main: Add download stats badges (pydata#9786) Fix open_mfdataset for list of fsspec files (pydata#9785) add 'User-Agent'-header to pooch.retrieve (pydata#9782) Optimize `ffill`, `bfill` with dask when `limit` is specified (pydata#9771)
Gotcha, will do. RE: My comment above about annual seasonal averaging.I've attached the Python script that compares the annual seasonal averages between Xarray and xCDAT. The custom seasons are ResultsXarray (actual) uses the same year import numpy as np
import xarray as xr
import xcdat as xc # noqa: F401
from xarray.groupers import SeasonGrouper, UniqueGrouper
# Create a sample dataset from 2001-01-01 to 2002-12-30
time = xr.cftime_range("2001-01-01", "2002-12-30", freq="MS", calendar="standard")
data = np.array(
[
1.0,
1.25,
1.5,
1.75,
2.0,
1.1,
1.35,
1.6,
1.85,
1.2,
1.45,
1.7,
1.95,
1.05,
1.3,
1.55,
1.8,
1.15,
1.4,
1.65,
1.9,
1.25,
1.5,
1.75,
]
)
da = xr.DataArray(name="air", data=data, dims="time", coords={"time": time})
da["year"] = da.time.dt.year
# Actual (Xarray groupby with custom seasons)
# -------------------------------------------
actual = da.groupby(year=UniqueGrouper(), time=SeasonGrouper(["NDJFM", "AMJ"])).mean()
print(actual)
"""
Xarray uses the same year "ND" for "NDJFM" grouping (not expected).
<xarray.DataArray 'air' (year: 2, season: 2)> Size: 32B
array([[1.61666667, 1.38 ],
[1.5 , 1.51 ]])
Coordinates:
* year (year) int64 16B 2001 2002
* season (season) object 16B 'AMJ' 'NDJFM'
"""
# Expected (xCDAT groupby with custom seasons)
# --------------------------------------------
ds = da.to_dataset()
custom_seasons = [["Nov", "Dec", "Jan", "Feb", "Mar"], ["Apr", "May", "Jun"]]
expected = ds.temporal.group_average(
"air",
weighted=False,
freq="season",
season_config={"custom_seasons": custom_seasons},
)
print(expected)
"""
xCDAT uses the previous year "ND" for "NDJFM" grouping (expected).
<xarray.DataArray 'air' (time: 5)> Size: 40B
array([1.25 , 1.61666667, 1.49 , 1.5 , 1.625 ])
Coordinates:
* time (time) object 40B 2001-01-01 00:00:00 ... 2003-01-01 00:00:00
Attributes:
operation: temporal_avg
mode: group_average
freq: season
weighted: False
drop_incomplete_seasons: False
custom_seasons: ['NovDecJanFebMar', 'AprMayJun']
"""
print(expected.time)
"""
xCDAT represents time coords with cftime, with the middle month representing
the season.
<xarray.DataArray 'time' (time: 5)> Size: 40B
array([cftime.DatetimeGregorian(2001, 1, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2001, 5, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2002, 1, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2002, 5, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(2003, 1, 1, 0, 0, 0, 0, has_year_zero=False)],
dtype=object)
Coordinates:
* time (time) object 40B 2001-01-01 00:00:00 ... 2003-01-01 00:00:00
""" |
In xCDAT, I get the indices all of all time coords with months that span the calendar year and shift them over a year (+1) before grouping with Xarray (since Xarray uses same year months for grouping). I haven't looked at the Xarray code for grouping yet, but there is probably a cleaner way to support spanning years. |
* Add tests for SeasonalGrouper API * Add more tests
@tomvothecoder my mistake. that is a "resampling" operation, so da.resample(time=SeasonResampler(["NDJFM", "AMJ"], drop_incomplete=False)).mean() gives what you want:
We can't handle grouping by |
* main: (63 commits) Fix zarr upstream tests (pydata#9927) Update pre-commit hooks (pydata#9925) split out CFDatetimeCoder, deprecate use_cftime as kwarg (pydata#9901) dev whats-new (pydata#9923) Whats-new 2025.01.0 (pydata#9919) Silence upstream Zarr warnings (pydata#9920) time coding refactor (pydata#9906) fix warning from scipy backend guess_can_open on directory (pydata#9911) Enhance and move ISO-8601 parser to coding.times (pydata#9899) Edit serialization error message (pydata#9916) friendlier error messages for missing chunk managers (pydata#9676) Bump codecov/codecov-action from 5.1.1 to 5.1.2 in the actions group (pydata#9915) Rewrite interp to use `apply_ufunc` (pydata#9881) Skip dask rolling (pydata#9909) Explicitly configure ReadTheDocs build to use conf.py (pydata#9908) Cache pre-existing Zarr arrays in Zarr backend (pydata#9861) Optimize idxmin, idxmax with dask (pydata#9800) remove unused "type: ignore" comments in test_plot.py (fixed in matplotlib 3.10.0) (pydata#9904) move scalar-handling logic into `possibly_convert_objects` (pydata#9900) Add missing DataTree attributes to docs (pydata#9876) ...
* main: (85 commits) Adds open_datatree and load_datatree to the tutorial module (pydata#10082) Fix version in requires_zarr_v3 fixture (pydata#10145) Fix `open_datatree` when `decode_cf=False` (pydata#10141) [docs] `DataTree` cannot be constructed from `DataArray` (pydata#10142) Refactor datetime and timedelta encoding for increased robustness (pydata#9498) Fix test_distributed::test_async (pydata#10138) Refactor concat / combine / merge into `xarray/structure` (pydata#10134) Split `apply_ufunc` out of `computation.py` (pydata#10133) Refactor modules from `core` into `xarray.computation` (pydata#10132) Refactor compatibility modules into xarray.compat package (pydata#10131) Fix type issues from pandas stubs (pydata#10128) Don't skip tests when on a `mypy` branch (pydata#10129) Change `python_files` in `pyproject.toml` to a list (pydata#10127) Better `uv` compatibility (pydata#10124) explicitly cast the dtype of `where`'s condition parameter to `bool` (pydata#10087) Use `to_numpy` in time decoding (pydata#10081) Pin pandas stubs (pydata#10119) Fix broken Zarr test (pydata#10109) Update asv badge url in README.md (pydata#10113) fix and supress some test warnings (pydata#10104) ...
@oliviermarti & @tomvothecoder are you able to do one more test run here? It's basically complete though could use more tests as always. |
* main: Refactoring/fixing zarr-python v3 incompatibilities in xarray datatrees (pydata#10020) Refactor calendar fixtures (pydata#10150) Use flox for grouped first, last. (pydata#10148) Update flaky pydap test (pydata#10149)
07411f6
to
8d69e34
Compare
commit 583a3d2 Author: Deepak Cherian <[email protected]> Date: Wed Mar 19 12:55:54 2025 -0600 fix mypy commit 699c3b8 Author: Deepak Cherian <[email protected]> Date: Wed Mar 19 09:30:38 2025 -0600 Preserve label ordering for multi-variable GroupBy
8d69e34
to
85d9217
Compare
* upstream/main: Move chunks-related functions to a new file (pydata#10172) Preserve label ordering for multi-variable GroupBy (pydata#10151) Update DataArray.to_zarr to match Dataset.to_zarr. (pydata#10164) Fix numpy advanced indexing docs link (pydata#10160) Forbid datatree to zarr append dim (pydata#10156) Fix GitHub Actions badge in README (pydata#10155) Add dev whats-new (pydata#10152) Release 2025.03.0 (pydata#10143)
e1ebb6b
to
6297c1c
Compare
These two groupers allow defining custom seasons, and dropping incomplete seasons from the output. Both cases are treated by adjusting the factorization -- conversion from group labels to integer codes -- appropriately.
Docs are here: https://xray--9524.org.readthedocs.build/en/9524/user-guide/time-series.html#handling-seasons
The last piece from #8509
whats-new.rst
api.rst
Example:
TODO:
drop_incomplete
in SeasonGroupercc @tomvothecoder do you have time to contribute some tests? I bet we'll simplify a bunch of xcdat this way, and you probably already have tests :)