Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: xarray-contrib/flox
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.10.1
Choose a base ref
...
head repository: xarray-contrib/flox
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref
  • 17 commits
  • 32 files changed
  • 4 contributors

Commits on Apr 3, 2025

  1. Copy the full SHA
    01e2fce View commit details
  2. remove engine_no_numba

    dcherian committed Apr 3, 2025
    Copy the full SHA
    a7423e6 View commit details

Commits on Apr 4, 2025

  1. Copy the full SHA
    1b58ad1 View commit details
  2. Update docstrings (#432)

    dcherian authored Apr 4, 2025
    Copy the full SHA
    a4be06f View commit details
  3. Allow reindexing to sparse (#430)

    * Support reindexing to a sparse array
    
    * Fix benchmarks
    
    * fix?
    
    * fix benchmarks
    dcherian authored Apr 4, 2025
    Copy the full SHA
    f8cfb5d View commit details

Commits on Apr 5, 2025

  1. Parallelize ravel-multi-index (#433)

    * Refactor out factorize loop
    
    * threadpool
    
    * Split out ravel_multi_index bits
    
    * Dask-ify ravel multi index
    
    * cleanup
    
    * Types
    dcherian authored Apr 5, 2025
    Copy the full SHA
    8dac463 View commit details
  2. Copy the full SHA
    091e73d View commit details
  3. Copy the full SHA
    c171ea0 View commit details
  4. Doc updates (#436)

    dcherian authored Apr 5, 2025
    Copy the full SHA
    89e8238 View commit details

Commits on Apr 7, 2025

  1. Fix sparse reindexing some more. (#437)

    * Allow empty groups with sparse reindexing
    
    * Fix sparse reindexing
    
    * fix docs
    
    * more test
    
    * Fix tests
    
    * Nicer error
    
    * test for errors
    dcherian authored Apr 7, 2025
    Copy the full SHA
    11feda2 View commit details
  2. Copy the full SHA
    ad09efc View commit details

Commits on Apr 8, 2025

  1. [pre-commit.ci] pre-commit autoupdate (#439)

    * [pre-commit.ci] pre-commit autoupdate
    
    updates:
    - [github.com/astral-sh/ruff-pre-commit: v0.9.1 → v0.11.4](astral-sh/ruff-pre-commit@v0.9.1...v0.11.4)
    - [github.com/executablebooks/mdformat: 0.7.21 → 0.7.22](hukkin/mdformat@0.7.21...0.7.22)
    - [github.com/codespell-project/codespell: v2.3.0 → v2.4.1](codespell-project/codespell@v2.3.0...v2.4.1)
    - [github.com/abravalheri/validate-pyproject: v0.23 → v0.24.1](abravalheri/validate-pyproject@v0.23...v0.24.1)
    - [github.com/rhysd/actionlint: v1.7.6 → v1.7.7](rhysd/actionlint@v1.7.6...v1.7.7)
    
    * fix typo
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Co-authored-by: Deepak Cherian <deepak@cherian.net>
    pre-commit-ci[bot] and dcherian authored Apr 8, 2025
    Copy the full SHA
    6d34d62 View commit details

Commits on Apr 9, 2025

  1. Copy the full SHA
    ce7b9b7 View commit details

Commits on May 1, 2025

  1. Bump codecov/codecov-action from 5.4.0 to 5.4.2 (#441)

    Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 5.4.0 to 5.4.2.
    - [Release notes](https://github.com/codecov/codecov-action/releases)
    - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
    - [Commits](codecov/codecov-action@v5.4.0...v5.4.2)
    
    ---
    updated-dependencies:
    - dependency-name: codecov/codecov-action
      dependency-version: 5.4.2
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    dependabot[bot] authored May 1, 2025
    Copy the full SHA
    9f67286 View commit details

Commits on May 13, 2025

  1. Fix benchmarks (#444)

    dcherian authored May 13, 2025
    Copy the full SHA
    a83a0c7 View commit details

Commits on May 14, 2025

  1. Refactor strategies (#445)

    dcherian authored May 14, 2025
    Copy the full SHA
    b32a602 View commit details

Commits on May 16, 2025

  1. Copy the full SHA
    619a390 View commit details
2 changes: 1 addition & 1 deletion .github/workflows/benchmarks.yml
Original file line number Diff line number Diff line change
@@ -10,7 +10,7 @@ jobs:
# if: ${{ contains( github.event.pull_request.labels.*.name, 'run-benchmark') && github.event_name == 'pull_request' || github.event_name == 'workflow_dispatch' }} # Run if the PR has been labelled correctly.
if: ${{ github.event_name == 'pull_request' || github.event_name == 'workflow_dispatch' }} # Always run.
name: Linux
runs-on: ubuntu-20.04
runs-on: ubuntu-latest
env:
ASV_DIR: "./asv_bench"

4 changes: 2 additions & 2 deletions .github/workflows/ci-additional.yaml
Original file line number Diff line number Diff line change
@@ -77,7 +77,7 @@ jobs:
--ignore flox/tests \
--cov=./ --cov-report=xml
- name: Upload code coverage to Codecov
uses: codecov/codecov-action@v5.4.0
uses: codecov/codecov-action@v5.4.2
with:
file: ./coverage.xml
flags: unittests
@@ -132,7 +132,7 @@ jobs:
python -m mypy --install-types --non-interactive --cache-dir=.mypy_cache/ --cobertura-xml-report mypy_report
- name: Upload mypy coverage to Codecov
uses: codecov/codecov-action@v5.4.0
uses: codecov/codecov-action@v5.4.2
with:
file: mypy_report/cobertura.xml
flags: mypy
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -76,7 +76,7 @@ jobs:
python -c "import xarray; xarray.show_versions()"
pytest --durations=20 --durations-min=0.5 -n auto --cov=./ --cov-report=xml --hypothesis-profile ci
- name: Upload code coverage to Codecov
uses: codecov/codecov-action@v5.4.0
uses: codecov/codecov-action@v5.4.2
with:
file: ./coverage.xml
flags: unittests
3 changes: 2 additions & 1 deletion .github/workflows/upstream-dev-ci.yaml
Original file line number Diff line number Diff line change
@@ -78,7 +78,8 @@ jobs:
git+https://github.com/Unidata/cftime
python -m pip install \
git+https://github.com/dask/dask \
git+https://github.com/ml31415/numpy-groupies
git+https://github.com/ml31415/numpy-groupies \
git+https://github.com/pydata/sparse
- name: Install flox
run: |
10 changes: 5 additions & 5 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -4,7 +4,7 @@ ci:
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: "v0.9.1"
rev: "v0.11.4"
hooks:
- id: ruff
args: ["--fix", "--show-fixes"]
@@ -24,7 +24,7 @@ repos:
- id: check-docstring-first

- repo: https://github.com/executablebooks/mdformat
rev: 0.7.21
rev: 0.7.22
hooks:
- id: mdformat
additional_dependencies:
@@ -38,19 +38,19 @@ repos:
args: [--extra-keys=metadata.kernelspec metadata.language_info.version]

- repo: https://github.com/codespell-project/codespell
rev: v2.3.0
rev: v2.4.1
hooks:
- id: codespell
additional_dependencies:
- tomli

- repo: https://github.com/abravalheri/validate-pyproject
rev: v0.23
rev: v0.24.1
hooks:
- id: validate-pyproject

- repo: https://github.com/rhysd/actionlint
rev: v1.7.6
rev: v1.7.7
hooks:
- id: actionlint
files: ".github/workflows/"
8 changes: 6 additions & 2 deletions asv_bench/benchmarks/combine.py
Original file line number Diff line number Diff line change
@@ -14,7 +14,11 @@ def _get_combine(combine):
if combine == "grouped":
return partial(flox.core._grouped_combine, engine="numpy")
else:
return partial(flox.core._simple_combine, reindex=False)
try:
reindex = flox.ReindexStrategy(blockwise=False)
except AttributeError:
reindex = False
return partial(flox.core._simple_combine, reindex=reindex)


class Combine:
@@ -41,7 +45,7 @@ def peakmem_combine(self, kind, combine):
class Combine1d(Combine):
"""
Time the combine step for dask reductions,
this is for reducting along a single dimension
this is for reducing along a single dimension
"""

def setup(self, *args, **kwargs) -> None:
1 change: 1 addition & 0 deletions ci/docs.yml
Original file line number Diff line number Diff line change
@@ -15,6 +15,7 @@ dependencies:
- matplotlib-base
- myst-parser
- myst-nb
- sparse
- sphinx
- sphinx-remove-toctrees
- furo>=2024.08
1 change: 1 addition & 0 deletions ci/env-numpy1.yml
Original file line number Diff line number Diff line change
@@ -11,6 +11,7 @@ dependencies:
- pandas
- numpy<2
- scipy
- sparse
- lxml # for mypy coverage report
- matplotlib
- pip
1 change: 1 addition & 0 deletions ci/environment.yml
Original file line number Diff line number Diff line change
@@ -11,6 +11,7 @@ dependencies:
- pandas
- numpy>=1.22
- scipy
- sparse
- lxml # for mypy coverage report
- matplotlib
- pip
1 change: 1 addition & 0 deletions ci/no-dask.yml
Original file line number Diff line number Diff line change
@@ -8,6 +8,7 @@ dependencies:
- cftime
- numpy>=1.22
- scipy
- sparse
- pip
- pytest
- pytest-cov
1 change: 1 addition & 0 deletions ci/no-numba.yml
Original file line number Diff line number Diff line change
@@ -11,6 +11,7 @@ dependencies:
- pandas
- numpy>=1.22
- scipy
- sparse
- lxml # for mypy coverage report
- matplotlib
- pip
1 change: 1 addition & 0 deletions ci/no-xarray.yml
Original file line number Diff line number Diff line change
@@ -7,6 +7,7 @@ dependencies:
- pandas
- numpy>=1.22
- scipy
- sparse
- pip
- pytest
- pytest-cov
8 changes: 8 additions & 0 deletions docs/source/arrays.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# Duck Array Support

## Sparse Arrays

`sparse.COO` arrays from the `pydata/sparse` project are supported using algorithms that work on the underlying dense data.
See `aggregate_sparse.py` for details.
At the moment the following reductions are supported: `sum`, `nansum`, `min`, `nanmin`, `max`, `nanmax`, `count`.

## Other array types

Aggregating over other array types will work if the array types supports the following methods, [ufunc.reduceat](https://numpy.org/doc/stable/reference/generated/numpy.ufunc.reduceat.html) or [ufunc.at](https://numpy.org/doc/stable/reference/generated/numpy.ufunc.at.html)

| Reduction | `method="numpy"` | `method="flox"` |
20 changes: 20 additions & 0 deletions docs/source/implementation.md
Original file line number Diff line number Diff line change
@@ -110,6 +110,26 @@ width: 100%

This approach allows grouping by a dask array so group labels can be discovered at compute time, similar to `dask.dataframe.groupby`.

### reindexing to a sparse array

For large numbers of groups, we might be reducing to a very sparse array (e.g. [this issue](https://github.com/xarray-contrib/flox/issues/428)).

To control memory, we can instruct flox to reindex the intermediate results to a `sparse.COO` array using:

```python
from flox import ReindexArrayType, ReindexStrategy

ReindexStrategy(
# do not reindex to the full output grid at the blockwise aggregation stage
blockwise=False,
# when combining intermediate results after blockwise aggregation, reindex to the
# common grid using a sparse.COO array type
array_type=ReindexArrayType.SPARSE_COO,
)
```

See [this user story](user-stories/large-zonal-stats) for more discussion.

### Example

For example, consider `groupby("time.month")` with monthly frequency data and chunksize of 4 along `time`.
1 change: 1 addition & 0 deletions docs/source/user-stories.md
Original file line number Diff line number Diff line change
@@ -10,4 +10,5 @@
user-stories/climatology-hourly-cubed.ipynb
user-stories/custom-aggregations.ipynb
user-stories/nD-bins.ipynb
user-stories/large-zonal-stats.ipynb
```
Loading