Add xarray SPI API, modern tooling, and numba acceleration #596

monocongo · 2025-12-30T18:34:41Z

Description

add modernization docs/guidance and floating-point comparison example for contributors
introduce numba/mypy/pre-commit dependencies and hook config for quality gate
harden compute/palmer kernels with typing and numba JIT while keeping SPI/Gamma/Pearson behavior intact
expose xarray-friendly SPI via indices.spi_xarray and an accessor, registering in init.py
refactor SPI CLI to use xarray/apply_ufunc (optional dask) instead of shared-memory multiprocessing; preserve grid/division/timeseries flows and daily↔366-day handling
expand tests for accessor, zero-precipitation fallbacks, and compute assertions; refresh uv.lock

Testing

uv run pytest tests/test_compute.py tests/test_zero_precipitation_fix.py
uv run pytest tests/test_accessors.py
(optional) smoke the CLI for SPI (monthly and daily) against a small NetCDF input with --multiprocessing all to exercise dask path

Summary by Sourcery

Modernize SPI computation and tooling by introducing an xarray/NumPy-first API with optional Dask parallelism, adding numba-accelerated kernels and stricter typing, and simplifying the CLI away from custom multiprocessing while preserving scientific behavior.

New Features:

Expose an xarray-based SPI API via a new spi_xarray helper and a DataArray accessor for computing SPI directly on labeled data.
Add documentation and examples for floating-point comparison practices and a 2025 modernization roadmap for the project.

Enhancements:

Refactor SPI CLI and core SPI pipeline to use xarray.apply_ufunc with optional Dask instead of shared-memory multiprocessing, while retaining grid, divisions, and timeseries workflows and daily-to-366‑day handling.
Introduce numba-accelerated kernels for Palmer water balance and key Pearson Type III/Gamma routines, along with stronger typing and type aliases in compute to improve robustness and performance.
Update logging, error handling, and fallback behavior in distribution fitting to be more explicit, type-safe, and numerically stable.

Build:

Add numba as a runtime dependency and mypy and pre-commit as development dependencies, updating the uv lockfile accordingly.

CI:

Introduce a pre-commit configuration to run Ruff, Ruff-format, mypy (on compute), and YAML checks before commits.

Documentation:

Add contributor-facing docs outlining a 2025 modernization roadmap and an implementation plan, plus a runnable example demonstrating safe floating-point comparisons and tolerance selection.

Tests:

Add tests for the xarray SPI accessor to ensure round‑trip correctness and attribute preservation.
Tighten and extend tests around zero-precipitation handling, Pearson fallback behavior, and gamma transform edge cases, including safer floating-point assertions.

- Add numba for JIT-compiled performance kernels - Configure mypy with strict typing overrides - Introduce pre-commit hooks for code quality - Update lockfile with new dependencies

- Implement spi_xarray() using apply_ufunc for chunked computation - Add ClimateIndicesAccessor for DataArray.climate_indices.spi() - Register accessor in package __init__.py - Include comprehensive accessor test coverage

- Replace shared-memory multiprocessing with apply_ufunc pipeline - Support optional dask parallelism via --multiprocessing flag - Handle daily 366-day calendar transforms for leap year alignment - Maintain backward compatibility with saved fitting parameters

sourcery-ai · 2025-12-30T18:34:50Z

Reviewer's Guide

Refactors SPI computation to an xarray/NumPy/Numba-based pipeline with an xarray SPI API and accessor, replaces the legacy shared-memory multiprocessing SPI CLI with dask-friendly apply_ufunc flows, and hardens core compute/Palmer kernels with typing and JIT acceleration while adding modern tooling (pre-commit, mypy) and documentation/examples for contributors.

Sequence diagram for the new SPI CLI xarray/dask pipeline

sequenceDiagram
    actor User
    participant CLI as spi_cli_main
    participant Dask as dask_Client
    participant SPI as _compute_write_index
    participant XR as xarray
    participant IDX as indices_spi_xarray
    participant FS as netcdf_filesystem

    User->>CLI: invoke spi CLI with args
    CLI->>CLI: parse_args and validate
    CLI->>CLI: determine multiprocessing mode
    alt multiprocessing all
        CLI->>Dask: create Client(n_workers)
        Dask-->>CLI: client ready
    end
    CLI->>SPI: _compute_write_index(keyword_arguments)

    SPI->>XR: open_dataset(netcdf_precip, chunks time -1)
    SPI->>SPI: select precip variable and normalize units
    SPI->>SPI: transpose dims based on InputType

    alt periodicity daily
        SPI->>XR: apply_ufunc(transform_to_366day)
        XR-->>SPI: precip_366day DataArray
    end

    SPI->>SPI: load_or_build_fitting_dataset

    loop for each scale
        SPI->>SPI: build fitting_var_names for scale
        alt load_params is not None
            SPI->>XR: read fitting parameters from ds_fitting
        else save_params is not None
            SPI->>XR: apply_ufunc(sum_to_scale)
            SPI->>XR: apply_ufunc(gamma_parameters or pearson_parameters)
            XR-->>SPI: alphas betas prob_zero loc scale skew
            SPI->>XR: write parameters into ds_fitting
        end

        loop for distribution in gamma pearson
            SPI->>IDX: spi_xarray(precip_da, scale, distribution, fitting_params)
            IDX->>XR: apply_ufunc(spi_1d, dask parallelized)
            XR-->>IDX: spi_values DataArray
            IDX-->>SPI: spi_values DataArray

            alt periodicity daily
                SPI->>XR: apply_ufunc(transform_to_gregorian)
                XR-->>SPI: spi_gregorian
            end

            SPI->>XR: build_dataset_spi_grid_or_divisions_or_timeseries
            XR-->>SPI: ds_spi
            SPI->>FS: ds_spi.to_netcdf(output_path)
        end
    end

    alt save_params is not None and ds_fitting not None
        SPI->>FS: ds_fitting.to_netcdf(save_params)
    end

    SPI-->>CLI: success
    alt multiprocessing all
        CLI->>Dask: client.close()
    end
    CLI-->>User: SPI NetCDF files written

File-Level Changes

Change	Details	Files
Refactor SPI CLI computation from shared-memory multiprocessing to xarray.apply_ufunc with optional Dask and add an xarray-based SPI helper in the indices module.	Remove global shared-memory arrays and multiprocessing workers used to scatter/gather SPI inputs and results. Always open precipitation datasets chunked along the time dimension and normalize units to millimeters, handling inches conversion explicitly. Handle daily data by transforming to/from a 366-day calendar using xarray.apply_ufunc wrappers around existing utilities. Build or load gamma/Pearson fitting parameter datasets using xarray and compute them via apply_ufunc over time, reusing the new xarray SPI helper for SPI values. Generate SPI datasets for grid, divisions, and timeseries inputs using helper builders that now take precomputed SPI DataArrays rather than reading from shared memory. Initialize an optional dask.distributed.Client for CLI multiprocessing modes and ensure clean shutdown.	`src/climate_indices/__spi__.py` `src/climate_indices/__main__.py`
Introduce an xarray SPI API and DataArray accessor to compute SPI while preserving coordinates and metadata.	Add spi_xarray in the indices module that wraps the existing spi function with xarray.apply_ufunc, supporting both on-the-fly fitting and precomputed gamma/Pearson parameters and Dask-backed arrays. Register an xarray DataArray accessor (IndicesAccessor) under .indices that exposes a spi method, handling distribution string/enum normalization and attribute propagation. Add unit tests that verify the accessor round-trips against the scalar SPI implementation and preserves time coordinates and expected SPI attributes. Expose the accessor in the package so importing climate_indices activates the registration.	`src/climate_indices/indices.py` `src/climate_indices/accessors.py` `src/climate_indices/__init__.py` `tests/test_accessors.py`
Harden core compute and Palmer kernels with type annotations and Numba JIT acceleration while preserving numerical behavior and zero-precipitation handling.	Introduce numpy.typing-based FloatArray/BoolArray aliases, ParamSpec/TypeVar helpers, and a _numba_jit wrapper to type-safe JIT-decorated functions. Annotate key compute functions (validation, scaling, gamma/pearson parameter and transform functions, and distribution fallback utilities) with precise types and improve mask handling using np.ma.getmaskarray. Refactor Pearson fitting to separate CDF evaluation from probability adjustment, JIT-compile the minimum/fit helpers, and keep the final normal transform in transform_fitted_pearson. Numba-accelerate Palmer water balance helpers and factor the main monthly loop into a JIT-compiled _calc_water_balances_numba, with a thin Python wrapper retaining the existing data dict interface. Adjust tests to use np.isclose and updated union-style isinstance checks and to assert expected NaN/finite behavior for zeros and sparse data scenarios.	`src/climate_indices/compute.py` `src/climate_indices/palmer.py` `tests/test_compute.py` `tests/test_zero_precipitation_fix.py`
Add modernization tooling, typing configuration, and contributor-focused docs including floating-point comparison guidance and a 2025 roadmap.	Add numba to runtime dependencies and mypy/pre-commit to dev extras, updating the uv.lock file accordingly. Configure pre-commit with ruff, ruff-format, mypy-on-compute.py, and basic YAML checks to enforce style and typing. Extend mypy configuration with ignore_missing_imports for numba and selected scientific libraries and ignore_errors overrides for legacy modules that are not yet fully typed. Document modernization plans and testing strategy in a 2025 modernization roadmap and an implementation-plan document oriented around agentic, incremental tasks. Provide a floating_point_comparisons example script demonstrating safe use of np.isclose/np.allclose and appropriate tolerances for scientific tests.	`pyproject.toml` `uv.lock` `.pre-commit-config.yaml` `docs/2025_modernization_roadmap.md` `docs/dev/2025_implementation_plan.md` `docs/examples/floating_point_comparisons.py`

Possibly linked issues

#: The PR adds spi_xarray and a dask-aware SPI CLI, fixing SPI computations that previously failed under xarray+dask.
##???: The IndexError was in multiprocessing gamma fitting; this PR removes that codepath, refactoring SPI CLI to xarray/dask instead.
Precipitation units causing error (ValueError: Unsupported precipitation units: kg m-2 s-1) #531: PR modernizes SPI computation and explicitly fixes zero-precipitation/L-moments issues that caused missing SPI values in the issue.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 6 issues, and left some high level feedback:

In _compute_write_index, the ds_fitting logic will break when neither load_params nor save_params is provided (it remains None but is later accessed for .coords in the per-scale loop); consider either always initializing a temporary fitting dataset or fully guarding all ds_fitting[...] accesses behind ds_fitting is not None.
The new spi_xarray helper in indices.py ends with an unreachable return xr.apply_ufunc(...) block after raising ValueError, and references _spi_1d that may be out of scope; this dead code should be removed or refactored to avoid confusion.
The SPI CLI now opens all inputs with chunks={"time": -1} regardless of InputType, which disables spatial chunking that previously existed for grid/division inputs; consider preserving chunking along spatial dimensions while still enforcing a single time chunk to keep memory usage manageable on large grids.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `_compute_write_index`, the `ds_fitting` logic will break when neither `load_params` nor `save_params` is provided (it remains `None` but is later accessed for `.coords` in the per-scale loop); consider either always initializing a temporary fitting dataset or fully guarding all `ds_fitting[...]` accesses behind `ds_fitting is not None`.
- The new `spi_xarray` helper in `indices.py` ends with an unreachable `return xr.apply_ufunc(...)` block after raising `ValueError`, and references `_spi_1d` that may be out of scope; this dead code should be removed or refactored to avoid confusion.
- The SPI CLI now opens all inputs with `chunks={"time": -1}` regardless of `InputType`, which disables spatial chunking that previously existed for grid/division inputs; consider preserving chunking along spatial dimensions while still enforcing a single time chunk to keep memory usage manageable on large grids.

## Individual Comments

### Comment 1
<location> `src/climate_indices/palmer.py:787-790` </location>
<code_context>
         for param in ["alpha", "beta", "gamma", "delta"]:
             if (
                 param in fitting_params
-                and isinstance(fitting_params[param], (list, tuple, np.ndarray))
+                and isinstance(fitting_params[param], list | tuple | np.ndarray)
                 and len(fitting_params[param]) == 12
             ):
</code_context>

<issue_to_address>
**issue (bug_risk):** Runtime use of `list | tuple | np.ndarray` in `isinstance` is invalid and will raise `TypeError`.

`isinstance` doesn’t accept PEP 604 unions; it requires a type or a tuple of types. This call will raise `TypeError: isinstance() argument 2 cannot be a union`. Use `isinstance(fitting_params[param], (list, tuple, np.ndarray))` so the check executes correctly.
</issue_to_address>

### Comment 2
<location> `src/climate_indices/indices.py:230-239` </location>
<code_context>
+def spi_xarray(
</code_context>

<issue_to_address>
**issue:** Unreachable duplicate `xr.apply_ufunc` call at end of `spi_xarray` should be removed.

Because the `ValueError` for unsupported distributions is raised just before the final `return xr.apply_ufunc(...)`, that block is dead code and duplicates the earlier `apply_ufunc` call for the `fitting_params is None` case. Please remove the trailing `return xr.apply_ufunc(...)` after the `raise ValueError` to simplify control flow and avoid confusion.
</issue_to_address>

### Comment 3
<location> `src/climate_indices/__spi__.py:319-327` </location>
<code_context>
-            raise ValueError(message)
-
-        # convert daily values into 366-day years
-        if periodicity == compute.Periodicity.daily:
-            initial_year = int(dataset_climatology["time"][0].dt.year)
-            final_year = int(dataset_climatology["time"][-1].dt.year)
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Daily reindexing to a synthetic `np.arange` time coordinate may desynchronize data and time metadata.

Here you reshape to a 366‑day calendar and set `time = np.arange(total_years * period_length)`, but `ds_precip['time']` remains the original Gregorian coordinate and is later used to build outputs. This desynchronizes the DataArray’s `time` from the dataset‑level `time`. Please either (a) update `ds_precip[time_dim]` to the new coordinate, or (b) keep the 366‑day calendar as a separate auxiliary time coordinate to avoid inconsistent state.

Suggested implementation:

```python
            time = np.arange(total_years * period_length)

            # keep dataset- and dataarray-level time coordinates in sync with the
            # synthetic 366-day calendar used for daily reindexing
            dataset_climatology = dataset_climatology.assign_coords(time=time)
            if "time" in ds_precip.coords:
                ds_precip = ds_precip.assign_coords(time=time)

```

This change assumes:
1. `time` is the correct name of the time dimension across `dataset_climatology` and `ds_precip`.
2. `np` is already imported (it must be, since `np.arange` is used).

If your code uses a different `time_dim` name, adjust `"time"` in `assign_coords` calls to that variable/name. Also ensure that all subsequent operations that rely on the original Gregorian `time` coordinate are compatible with the synthetic 366-day index; if they require the original calendar, consider storing it in a separate auxiliary coordinate (e.g., `original_time`) before overwriting `time`.
</issue_to_address>

### Comment 4
<location> `tests/test_zero_precipitation_fix.py:534` </location>
<code_context>
             compute.calculate_time_step_params(insufficient_data)
         except compute.DistributionFittingError as e:
             # Should catch both InsufficientDataError and PearsonFittingError
-            assert isinstance(e, (compute.InsufficientDataError, compute.PearsonFittingError))
+            assert isinstance(e, compute.InsufficientDataError | compute.PearsonFittingError)
             assert str(e)  # Should have meaningful message

</code_context>

<issue_to_address>
**issue (bug_risk):** The isinstance check using `|` will fail at runtime; it must receive a type or tuple of types.

`isinstance` doesn’t support PEP 604 unions (`A | B`) as its second argument; it only accepts a type or a tuple of types. This will raise `TypeError` when the test runs. Please revert to the tuple form so the assertion actually checks for both exception types:

```python
assert isinstance(e, (compute.InsufficientDataError, compute.PearsonFittingError))
```
</issue_to_address>

### Comment 5
<location> `tests/test_accessors.py:8-17` </location>
<code_context>
+def test_spi_accessor_round_trip(tmp_path):
</code_context>

<issue_to_address>
**suggestion (testing):** Accessor test only covers 1-D gamma/monthly; consider adding tests for other distributions and periodicities.

The current round-trip test is a solid start, but it only exercises a 1-D `DataArray` with a gamma distribution and monthly periodicity. To better validate the accessor, please add tests for:

1. Pearson distribution, to exercise the more complex fitting path.
2. Daily periodicity with appropriate time coordinates, to align with the core `spi` daily handling.
3. An invalid 2-D `DataArray` case, asserting the expected `ValueError`.

This will bring test coverage closer to the combinations supported by the core SPI code.
</issue_to_address>

### Comment 6
<location> `tests/test_accessors.py:45-54` </location>
<code_context>
+    nc_path = tmp_path / "precip.nc"
+    ds.to_netcdf(nc_path, engine="h5netcdf")
+
+    with xr.open_dataset(nc_path) as opened:
+        precip = opened["precip"]
+        spi_da = precip.indices.spi(
+            scale=3,
+            distribution="gamma",
+            data_start_year=2000,
+            calibration_year_initial=2000,
+            calibration_year_final=2001,
+            periodicity=compute.Periodicity.monthly,
+        )
+
+    expected = indices.spi(
+        values,
+        3,
+        indices.Distribution.gamma,
+        2000,
+        2000,
+        2001,
+        compute.Periodicity.monthly,
+    )
+
+    assert isinstance(spi_da, xr.DataArray)
+    assert spi_da.dims == ("time",)
+    assert np.array_equal(spi_da["time"].values, time)
+    assert spi_da.attrs["long_name"] == "Standardized Precipitation Index"
+    assert spi_da.attrs["units"] == "unitless"
+    np.testing.assert_allclose(spi_da.values, expected, equal_nan=True)
</code_context>

<issue_to_address>
**suggestion (testing):** No tests currently cover Dask-chunked `DataArray`s or time-chunking behaviour for the SPI API.

This test only exercises the in-memory accessor path (which calls `indices.spi` directly) and never touches the `xarray.apply_ufunc`/Dask implementation used by `indices.spi_xarray` and the CLI. To cover the new API, please add tests that:

- Use `DataArray`s with explicit Dask chunking (e.g. `chunk({'time': -1})` and `chunk({'time': 10})`).
- Call `indices.spi_xarray` for both gamma and Pearson.
- Compare results to `indices.spi` on the same data, and verify that unsupported time chunking either behaves correctly or fails with a clear error.

This will exercise the parallel execution path and guard against regressions as xarray/Dask evolve.

Suggested implementation:

```python
    assert isinstance(spi_da, xr.DataArray)
    assert spi_da.dims == ("time",)
    assert np.array_equal(spi_da["time"].values, time)
    assert spi_da.attrs["long_name"] == "Standardized Precipitation Index"
    assert spi_da.attrs["units"] == "unitless"
    np.testing.assert_allclose(spi_da.values, expected, equal_nan=True)


@pytest.mark.parametrize("distribution_name", ["gamma", "pearson"])
@pytest.mark.parametrize("chunk_spec", [{"time": -1}, {"time": 10}])
def test_spi_xarray_with_dask_chunked_dataarray(distribution_name, chunk_spec):
    """Ensure spi_xarray works correctly with Dask-chunked DataArrays.

    This exercises the xarray.apply_ufunc/Dask path used by the public API and CLI.
    """
    values = np.array(
        [
            np.nan,
            0.553276,
            0.650286,
            0.810409,
            0.722108,
            1.071896,
            0.792567,
            1.175593,
            1.200544,
            0.973729,
            1.024978,
            1.035279,
            1.059984,
            1.117864,
            1.151453,
            1.084299,
            0.866829,
            0.910031,
            0.876611,
            0.676108,
            0.704111,
            0.667759,
            0.746614,
            0.574251,
        ],
        dtype=float,
    )
    time = np.arange(values.size)
    da = xr.DataArray(values, dims=("time",), coords={"time": time}, name="precip")

    # Explicitly chunk along time to trigger the Dask execution path.
    da_chunked = da.chunk(chunk_spec)

    expected = indices.spi(
        values,
        3,
        getattr(indices.Distribution, distribution_name),
        2000,
        2000,
        2001,
        compute.Periodicity.monthly,
    )

    spi_da = indices.spi_xarray(
        da_chunked,
        scale=3,
        distribution=distribution_name,
        data_start_year=2000,
        calibration_year_initial=2000,
        calibration_year_final=2001,
        periodicity=compute.Periodicity.monthly,
    )

    # spi_xarray may return a lazy Dask-backed DataArray; compute before comparison.
    spi_da_computed = spi_da.compute()

    assert isinstance(spi_da, xr.DataArray)
    assert spi_da_computed.dims == ("time",)
    assert np.array_equal(spi_da_computed["time"].values, time)
    assert spi_da_computed.attrs["long_name"] == "Standardized Precipitation Index"
    assert spi_da_computed.attrs["units"] == "unitless"
    np.testing.assert_allclose(spi_da_computed.values, expected, equal_nan=True)

```

1. Ensure `pytest` is imported in this file if it is not already, e.g. `import pytest` near the top of `tests/test_accessors.py`.
2. If `indices.spi_xarray` requires a different argument name/signature (e.g. `dist` instead of `distribution`), adjust the call accordingly:
   - Update `indices.spi_xarray(da_chunked, scale=3, distribution=distribution_name, ...)` to use the correct parameter names.
3. If the actual test suite uses a shared fixture for `values`/`time`, you may want to refactor the duplicated array literal into that fixture and reuse it in both tests to keep things DRY.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-12-30T18:36:50Z

src/climate_indices/palmer.py

        for param in ["alpha", "beta", "gamma", "delta"]:
            if (
                param in fitting_params
-                and isinstance(fitting_params[param], (list, tuple, np.ndarray))
+                and isinstance(fitting_params[param], list | tuple | np.ndarray)


issue (bug_risk): Runtime use of list | tuple | np.ndarray in isinstance is invalid and will raise TypeError.

isinstance doesn’t accept PEP 604 unions; it requires a type or a tuple of types. This call will raise TypeError: isinstance() argument 2 cannot be a union. Use isinstance(fitting_params[param], (list, tuple, np.ndarray)) so the check executes correctly.

src/climate_indices/indices.py

src/climate_indices/__spi__.py

sourcery-ai · 2025-12-30T18:36:50Z

tests/test_zero_precipitation_fix.py

            compute.calculate_time_step_params(insufficient_data)
        except compute.DistributionFittingError as e:
            # Should catch both InsufficientDataError and PearsonFittingError
-            assert isinstance(e, (compute.InsufficientDataError, compute.PearsonFittingError))


issue (bug_risk): The isinstance check using | will fail at runtime; it must receive a type or tuple of types.

isinstance doesn’t support PEP 604 unions (A | B) as its second argument; it only accepts a type or a tuple of types. This will raise TypeError when the test runs. Please revert to the tuple form so the assertion actually checks for both exception types:

assert isinstance(e, (compute.InsufficientDataError, compute.PearsonFittingError))

- Project Overview: Added xarray-native API with optional Dask parallelism - Module Structure: Added accessors.py, __spi__.py, and updated module descriptions - Key Design Patterns: - Replaced multiprocessing architecture with Xarray/Dask processing pipeline - Added Xarray-Native SPI API section with code examples for both function and accessor APIs - Added Numba Acceleration section (_pearson_fit(), _minimum_possible()) - Added distribution fallback strategy for robust fitting - Development Commands: Added pre-commit hook command, updated uv sync syntax - CLI Usage: - Added --multiprocessing all example for Dask parallelism - Added parallelization options and fitting parameter persistence sections - Development Notes: - Updated dependencies to include numba, mypy, pre-commit - Added Pre-commit Hooks section with setup instructions - Added Mypy Configuration section with strict settings - Updated performance considerations with numba JIT and Dask distributed

Removes unreachable dead code in spi_xarray after ValueError and improves memory efficiency for large gridded datasets. Changes: - indices.py: Remove unreachable xr.apply_ufunc block after raise - __spi__.py: Preserve spatial chunking (lat/lon/division) while keeping time as single chunk for rolling window correctness - Add clarifying comment about daily time coordinate handling - Add comprehensive accessor tests for Pearson distribution, 2-D validation, and Dask-chunked DataArrays The spatial chunking improvement maintains correctness (single time chunk for rolling windows) while reducing memory usage on large gridded datasets by allowing Dask to chunk spatial dimensions.

Adds notebooks/00_quickstart_indices_demo.ipynb demonstrating core climate indices (SPI, SPEI, PET, PNP) using both NumPy and xarray APIs. Uses bundled test fixtures for reproducibility without external data dependencies. Notebook includes: - Environment setup with version tracking - Programmatic data loading from tests/fixture/ - SPI/SPEI computation with multiple time scales - PET calculations via Thornthwaite and Hargreaves methods - PNP (percent-of-normal precipitation) computation - Validation plots for visual verification Intended as entry point for new users to explore library capabilities interactively.

…oml: - numpy (explicit version for stability) - matplotlib, ipykernel, notebook (for Jupyter workflows)

Add a unified context system for AI coding tools (Claude Code, Codex, Gemini CLI) following the pattern established in the cdip project. New files: - AGENTS.md: canonical instructions for all AI tools - GEMINI.md: thin pointer for Gemini CLI - context/: on-demand reference library - INDEX.md, architecture.md, project_brief.md, tech_stack.md - python_conventions.md, scientific_computing.md, uv_rules.md - climate_indices_reference.md, dev_workflow.md - plan_validation_checklist.md - .claude/context/brief.md: brief for Claude's context window - .codex/config.toml, INSTRUCTIONS.md: Codex sandbox config - .gemini/GEMINI.md: Gemini-specific pointer Updated: - CLAUDE.md: add note pointing to AGENTS.md and context/INDEX.md This enables consistent AI-assisted development across different tools while keeping context loading token-economical (load only what's needed).

sonarqubecloud · 2026-01-01T19:45:00Z

Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

monocongo · 2026-01-07T13:10:28Z

Code review

Found 2 issues:

Missing Google-style docstrings on 4 functions (Global CLAUDE.md says "Docstrings: All functions must have Google-style docstrings")

The following functions lack docstrings:

_numba_jit():

climate_indices/src/climate_indices/compute.py

Lines 43 to 45 in 3636866

    
           def _numba_jit(*args: object, **kwargs: object) -> Callable[[Callable[P, R]], Callable[P, R]]: 
        
               return cast(Callable[[Callable[P, R]], Callable[P, R]], numba.jit(*args, **kwargs))

reshape_values():

climate_indices/src/climate_indices/compute.py

Lines 330 to 337 in 3636866

    
           # +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
        
           def reshape_values(values: FloatArray, periodicity: Periodicity) -> FloatArray: 
        
               if periodicity is Periodicity.monthly: 
        
                   return cast(FloatArray, utils.reshape_to_2d(values, 12)) 
        
               elif periodicity is Periodicity.daily: 
        
                   return cast(FloatArray, utils.reshape_to_2d(values, 366)) 
        
               else: 
        
                   raise ValueError(f"Invalid periodicity argument: {periodicity}")

validate_values_shape():

climate_indices/src/climate_indices/compute.py

Lines 339 to 343 in 3636866

    
           def validate_values_shape(values: FloatArray) -> int: 
        
               if len(values.shape) != 2 or values.shape[1] not in (12, 366): 
        
                   _log_and_raise_shape_error(shape=values.shape) 
        
               return int(values.shape[1])

adjust_calibration_years():

climate_indices/src/climate_indices/compute.py

Lines 345 to 351 in 3636866

    
           def adjust_calibration_years( 
        
               data_start_year: int, data_end_year: int, calibration_start_year: int, calibration_end_year: int 
        
           ) -> tuple[int, int]: 
        
               if (calibration_start_year < data_start_year) or (calibration_end_year > data_end_year): 
        
                   return data_start_year, data_end_year 
        
               return calibration_start_year, calibration_end_year

Missing from __future__ import annotations import (Global CLAUDE.md says "Imports: Use from __future__ import annotations and TYPE_CHECKING for type-only imports")

Both files use modern type hints but are missing the future annotations import:

compute.py:

climate_indices/src/climate_indices/compute.py

Lines 3 to 16 in 3636866

    
           """ 
        
           import logging 
        
           from collections.abc import Callable 
        
           from enum import Enum 
        
           from typing import NoReturn, ParamSpec, TypeAlias, TypeVar, cast 
        
           import numba 
        
           import numpy as np 
        
           import scipy.stats 
        
           import scipy.version 
        
           from numpy.typing import NDArray 
        
           from packaging.version import Version

indices.py:

climate_indices/src/climate_indices/indices.py

Lines 1 to 8 in 3636866

    
           """Main level API module for computing climate indices""" 
        
           import logging 
        
           from enum import Enum 
        
           import numpy as np 
        
           import xarray as xr

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

monocongo · 2026-01-15T21:37:17Z

Review findings

[P2] Clip negatives before computing saved fitting params — /Users/james.a/git/climate_indices/src/climate_indices/__spi__.py:463-471
- When --save_params is used, the fitting parameters are derived from scaled_values computed via compute.sum_to_scale, which skips the negative-value clipping performed by compute.scale_values.
- If input precipitation contains negative values (common in some gridded products and bias-corrected datasets), saved alpha/beta/loc/scale/skew parameters are fit to negative totals while SPI later uses clipped values, yielding inconsistent and biased SPI output.
- The previous multiprocessing path always clipped via scale_values, so this is a regression specific to the save-params path.

monocongo added 5 commits December 30, 2025 12:52

docs: add 2025 modernization roadmap and development guidance

d11180b

build: add numba, mypy, and pre-commit tooling

4148804

- Add numba for JIT-compiled performance kernels - Configure mypy with strict typing overrides - Introduce pre-commit hooks for code quality - Update lockfile with new dependencies

build: add numba, mypy, and pre-commit tooling

e4968df

- Add numba for JIT-compiled performance kernels - Configure mypy with strict typing overrides - Introduce pre-commit hooks for code quality - Update lockfile with new dependencies

feat(api): add xarray-native SPI API and DataArray accessor

f12906f

- Implement spi_xarray() using apply_ufunc for chunked computation - Add ClimateIndicesAccessor for DataArray.climate_indices.spi() - Register accessor in package __init__.py - Include comprehensive accessor test coverage

monocongo self-assigned this Dec 30, 2025

sourcery-ai bot reviewed Dec 30, 2025

View reviewed changes

monocongo added 7 commits December 30, 2025 13:43

docs(notebooks): add notebook-related dev dependencies to pyproject.t…

6b2598b

…oml: - numpy (explicit version for stability) - matplotlib, ipykernel, notebook (for Jupyter workflows)

feat(accessors): support multidimensional SPI via xarray apply_ufunc

64f2fcc

docs(notebooks): update quickstart Dask SPI example to use accessor

3636866

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add xarray SPI API, modern tooling, and numba acceleration #596

Add xarray SPI API, modern tooling, and numba acceleration #596

Uh oh!

monocongo commented Dec 30, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Dec 30, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Dec 30, 2025

Uh oh!

Uh oh!

Uh oh!

sourcery-ai bot Dec 30, 2025

Uh oh!

sonarqubecloud bot commented Jan 1, 2026

Uh oh!

monocongo commented Jan 7, 2026

Uh oh!

monocongo commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add xarray SPI API, modern tooling, and numba acceleration #596

Are you sure you want to change the base?

Add xarray SPI API, modern tooling, and numba acceleration #596

Uh oh!

Conversation

monocongo commented Dec 30, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for the new SPI CLI xarray/dask pipeline

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sourcery-ai bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Jan 1, 2026

Quality Gate failed

Uh oh!

monocongo commented Jan 7, 2026

Code review

Uh oh!

monocongo commented Jan 15, 2026

Review findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

monocongo commented Dec 30, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Dec 30, 2025 •

edited

Loading