|
1 |
| -[](https://github.com/dcherian/dask_groupby/actions)[](https://github.com/dcherian/dask_groupby/actions)[](https://codecov.io/gh/dcherian/dask_groupby) |
| 1 | +[](https://github.com/dcherian/flox/actions)[](https://github.com/dcherian/flox/actions)[](https://codecov.io/gh/dcherian/flox) |
2 | 2 |
|
3 |
| -# dask_groupby |
| 3 | +# flox |
| 4 | + |
| 5 | +This project explores strategies for fast GroupBy reductions with dask.array. It used to be called `dask_groupby` |
| 6 | + |
| 7 | +This repo explores strategies for a distributed GroupBy with dask |
| 8 | +arrays. It was motivated by |
| 9 | + |
| 10 | +1. Dask Dataframe GroupBy |
| 11 | + [blogpost](https://blog.dask.org/2019/10/08/df-groupby) |
| 12 | +2. numpy_groupies in Xarray |
| 13 | + [issue](https://github.com/pydata/xarray/issues/4473) |
4 | 14 |
|
5 | 15 | (See a
|
6 | 16 | [presentation](https://docs.google.com/presentation/d/1muj5Yzjw-zY8c6agjyNBd2JspfANadGSDvdd6nae4jg/edit?usp=sharing)
|
7 | 17 | about this package).
|
8 | 18 |
|
| 19 | +## Acknowledgements |
| 20 | + |
| 21 | +This work was funded in part by NASA-ACCESS 80NSSC18M0156 "Community tools for analysis of NASA Earth Observing System |
| 22 | +Data in the Cloud" (PI J. Hamman), and [NCAR's Earth System Data Science Initiative](https://ncar.github.io/esds/). |
| 23 | +It was motivated by many discussions in the [Pangeo](https://pangeo.io) community. |
| 24 | + |
9 | 25 | ## API
|
10 | 26 |
|
11 | 27 | There are three functions
|
12 |
| -1. `groupby_reduce(dask_array, by_dask_array, "mean")` |
| 28 | +1. `flox.groupby_reduce(dask_array, by_dask_array, "mean")` |
13 | 29 | "pure" dask array interface
|
14 |
| -2. `xarray_groupby_reduce(groupby_object, "mean")` |
15 |
| - xarray groupby interface that accepts a GroupBy object for convenience |
16 |
| -3. `xarray_reduce(xarray_object, by_dataarray, "mean")` |
| 30 | +1. `flox.xarray.xarray_reduce(xarray_object, by_dataarray, "mean")` |
17 | 31 | "pure" xarray interface
|
18 | 32 |
|
19 | 33 | ## Implementation
|
20 | 34 |
|
21 |
| -This repo explores strategies for a distributed GroupBy with dask |
22 |
| -arrays. It was motivated by |
23 |
| - |
24 |
| -1. Dask Dataframe GroupBy |
25 |
| - [blogpost](https://blog.dask.org/2019/10/08/df-groupby) |
26 |
| -2. numpy_groupies in Xarray |
27 |
| - [issue](https://github.com/pydata/xarray/issues/4473) |
28 |
| - |
29 | 35 | The core GroupBy operation is outsourced to
|
30 | 36 | [numpy_groupies](https://github.com/ml31415/numpy-groupies). The GroupBy
|
31 | 37 | reduction is first applied blockwise. Those intermediate results are
|
|
0 commit comments