Groupby-map is slow with out of order indices

### What is your issue?

_I think that this is a longstanding problem.  Sorry if I missed an existing github issue._

I was looking at an Dask-array-backed Xarray workload with @phofl and we were both concerned about some performance we were seeing with groupby-aggregations called with out-of-order indices.  Here is a minimal example:

```python
import xarray as xr
import dask.array as da
import numpy as np
import pandas as pd

lat = np.linspace(-89.5, 89.5, 100)
lon = np.linspace(-179.375, 179.375, 100)
time = pd.date_range(
    start="1990-01-01", end="2000-12-31", freq="D",
)

arr = (
    xr.DataArray(
        da.random.random((100, 100, len(time)), chunks=(100, 100, 365)),
        dims=["lat", "lon", "time"],
        coords={"lat": lat, "lon": lon, "time": time},
        name="arr"
    )
    .to_dataset()
)

arr["arr"].data
```
<img width="580" alt="Screenshot 2024-07-09 at 11 58 07 AM" src="https://github.com/pydata/xarray/assets/306380/5faa4854-5888-46d2-972e-560728244efc">

```python
def f(x):
    return x

result = arr.groupby("time.dayofyear").map(f)
result["arr"].data
```
<img width="560" alt="Screenshot 2024-07-09 at 11 58 34 AM" src="https://github.com/pydata/xarray/assets/306380/46f6e297-0089-484b-b02b-f745262da90b">


Performance here is bad in a few ways:

- Output chunk sizes are very small (12 chunks turns into 4000 chunks)
- There are a lot of tasks
- There are a lot of layers (365 new layers)

We think that what is happening here looks like this:

1. slice underlying array with a very out-of-order array to arrange groups to be close to each other
2. Iterate through each group and apply function
3. slice the underlying array with the inverse array to put everything back in the right place

For steps (1) and (3) above performance is bad in a way that we can reduce to a dask array performance issue.  Here is a small reproducer for that:

```python
x = da.random.random((100, 100, 10000))
x
```
<img width="578" alt="Screenshot 2024-07-09 at 11 51 52 AM" src="https://github.com/pydata/xarray/assets/306380/49f6d675-0bd9-4732-860f-e5885ab753c0">


```python
idx = np.random.randint(0, x.shape[2], x.shape[2])
x[:, :, idx]
```
<img width="577" alt="Screenshot 2024-07-09 at 11 52 07 AM" src="https://github.com/pydata/xarray/assets/306380/03f21a35-1690-44a7-8f5c-fb551d034e08">

We think that we can make this better on our end, and can take that away as homework.

However for step (2) we think that this probably has to be a change in xarray.  Ideally xarray would call something like `map_blocks`, rather than iterate through each group.  This would be a special-case for dask-array.  Is this ok?  

Also, we _think_ that this has a lot of impact throughout xarray, but are not sure.  Is this also the code path taken in sum/max/etc..?  (assuming that `flox` is not around).  Mostly we're curious how much we all should prioritize this.

## Asks

Some questions: 

-  Does our understanding of the situation sound right?
- Is avoiding iteration through groups for dask arrays doable?
- is anyone around to do this within xarray if we're also improving slicing on the dask array side?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Groupby-map is slow with out of order indices #9220

What is your issue?

Asks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Groupby-map is slow with out of order indices #9220

Description

What is your issue?

Asks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions