Use DatasetGroupBy.quantile for DatasetGroupBy.median for multiple groups when using dask arrays #9935
Description
Is your feature request related to a problem?
I am grouping data in a Dataset and computing statistics. I wanted to take the median over (two) groups, but I got the following message:
>>> ds.groupby(['x', 'y']).median()
# NotImplementedError: The da.nanmedian function only works along an axis or a subset of axes. The full algorithm is difficult to do in parallel
while ds.groupby(['x']).median()
works without any problem.
I noticed that this issue is because the DataArrays are dask arrays: if they are numpy arrays, there is no problem. In addition, if .median()
is replaced by .quantile(0.5)
, there is no problem either. See below:
import dask.array as da
import numpy as np
import xarray as xr
rng = da.random.default_rng(0)
ds = xr.Dataset(
{'a': (('x', 'y'), rng.random((10, 10)))},
coords={'x': np.arange(5).repeat(2), 'y': np.arange(5).repeat(2)}
)
# Raises:
# NotImplementedError: The da.nanmedian function only works along an axis or a subset of axes. The full algorithm is difficult to do in parallel
try:
ds.groupby(['x', 'y']).median()
except NotImplementedError as e:
print(e)
# No problems with the following:
ds.groupby(['x']).median()
ds.groupby(['x', 'y']).quantile(0.5)
ds.compute().groupby(['x', 'y']).median() # Implicit conversion to numpy array
Describe the solution you'd like
A straightforward solution seems to be to use DatasetGroupBy.quantile(0.5)
for DatasetGroupBy.median()
if the median is to be computed over multiple groups.
Describe alternatives you've considered
No response
Additional context
My xr.show_versions()
:
xarray: 2024.10.0
pandas: 2.2.3
numpy: 1.26.4
scipy: 1.14.1
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.4.1
h5py: 3.12.1
zarr: 2.18.3
cftime: 1.6.4.post1
nc_time_axis: None
iris: None
bottleneck: 1.4.2
dask: 2024.11.2
distributed: None
matplotlib: 3.9.2
cartopy: 0.24.0
seaborn: 0.13.2
numbagg: None
fsspec: 2024.10.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.5.0
pip: 24.3.1
conda: None
pytest: None
mypy: None
IPython: 8.29.0
sphinx: 7.4.7