Skip to content

Conversation

@monocongo
Copy link
Owner

@monocongo monocongo commented Dec 30, 2025

Description

  • add modernization docs/guidance and floating-point comparison example for contributors
  • introduce numba/mypy/pre-commit dependencies and hook config for quality gate
  • harden compute/palmer kernels with typing and numba JIT while keeping SPI/Gamma/Pearson behavior intact
  • expose xarray-friendly SPI via indices.spi_xarray and an accessor, registering in init.py
  • refactor SPI CLI to use xarray/apply_ufunc (optional dask) instead of shared-memory multiprocessing; preserve grid/division/timeseries flows and daily↔366-day handling
  • expand tests for accessor, zero-precipitation fallbacks, and compute assertions; refresh uv.lock

Testing

  • uv run pytest tests/test_compute.py tests/test_zero_precipitation_fix.py
  • uv run pytest tests/test_accessors.py
  • (optional) smoke the CLI for SPI (monthly and daily) against a small NetCDF input with --multiprocessing all to exercise dask path

Summary by Sourcery

Modernize SPI computation and tooling by introducing an xarray/NumPy-first API with optional Dask parallelism, adding numba-accelerated kernels and stricter typing, and simplifying the CLI away from custom multiprocessing while preserving scientific behavior.

New Features:

  • Expose an xarray-based SPI API via a new spi_xarray helper and a DataArray accessor for computing SPI directly on labeled data.
  • Add documentation and examples for floating-point comparison practices and a 2025 modernization roadmap for the project.

Enhancements:

  • Refactor SPI CLI and core SPI pipeline to use xarray.apply_ufunc with optional Dask instead of shared-memory multiprocessing, while retaining grid, divisions, and timeseries workflows and daily-to-366‑day handling.
  • Introduce numba-accelerated kernels for Palmer water balance and key Pearson Type III/Gamma routines, along with stronger typing and type aliases in compute to improve robustness and performance.
  • Update logging, error handling, and fallback behavior in distribution fitting to be more explicit, type-safe, and numerically stable.

Build:

  • Add numba as a runtime dependency and mypy and pre-commit as development dependencies, updating the uv lockfile accordingly.

CI:

  • Introduce a pre-commit configuration to run Ruff, Ruff-format, mypy (on compute), and YAML checks before commits.

Documentation:

  • Add contributor-facing docs outlining a 2025 modernization roadmap and an implementation plan, plus a runnable example demonstrating safe floating-point comparisons and tolerance selection.

Tests:

  • Add tests for the xarray SPI accessor to ensure round‑trip correctness and attribute preservation.
  • Tighten and extend tests around zero-precipitation handling, Pearson fallback behavior, and gamma transform edge cases, including safer floating-point assertions.

- Add numba for JIT-compiled performance kernels
- Configure mypy with strict typing overrides
- Introduce pre-commit hooks for code quality
- Update lockfile with new dependencies
- Add numba for JIT-compiled performance kernels
- Configure mypy with strict typing overrides
- Introduce pre-commit hooks for code quality
- Update lockfile with new dependencies
- Implement spi_xarray() using apply_ufunc for chunked computation
- Add ClimateIndicesAccessor for DataArray.climate_indices.spi()
- Register accessor in package __init__.py
- Include comprehensive accessor test coverage
- Replace shared-memory multiprocessing with apply_ufunc pipeline
- Support optional dask parallelism via --multiprocessing flag
- Handle daily 366-day calendar transforms for leap year alignment
- Maintain backward compatibility with saved fitting parameters
@monocongo monocongo self-assigned this Dec 30, 2025
@sourcery-ai
Copy link

sourcery-ai bot commented Dec 30, 2025

Reviewer's Guide

Refactors SPI computation to an xarray/NumPy/Numba-based pipeline with an xarray SPI API and accessor, replaces the legacy shared-memory multiprocessing SPI CLI with dask-friendly apply_ufunc flows, and hardens core compute/Palmer kernels with typing and JIT acceleration while adding modern tooling (pre-commit, mypy) and documentation/examples for contributors.

Sequence diagram for the new SPI CLI xarray/dask pipeline

sequenceDiagram
    actor User
    participant CLI as spi_cli_main
    participant Dask as dask_Client
    participant SPI as _compute_write_index
    participant XR as xarray
    participant IDX as indices_spi_xarray
    participant FS as netcdf_filesystem

    User->>CLI: invoke spi CLI with args
    CLI->>CLI: parse_args and validate
    CLI->>CLI: determine multiprocessing mode
    alt multiprocessing all
        CLI->>Dask: create Client(n_workers)
        Dask-->>CLI: client ready
    end
    CLI->>SPI: _compute_write_index(keyword_arguments)

    SPI->>XR: open_dataset(netcdf_precip, chunks time -1)
    SPI->>SPI: select precip variable and normalize units
    SPI->>SPI: transpose dims based on InputType

    alt periodicity daily
        SPI->>XR: apply_ufunc(transform_to_366day)
        XR-->>SPI: precip_366day DataArray
    end

    SPI->>SPI: load_or_build_fitting_dataset

    loop for each scale
        SPI->>SPI: build fitting_var_names for scale
        alt load_params is not None
            SPI->>XR: read fitting parameters from ds_fitting
        else save_params is not None
            SPI->>XR: apply_ufunc(sum_to_scale)
            SPI->>XR: apply_ufunc(gamma_parameters or pearson_parameters)
            XR-->>SPI: alphas betas prob_zero loc scale skew
            SPI->>XR: write parameters into ds_fitting
        end

        loop for distribution in gamma pearson
            SPI->>IDX: spi_xarray(precip_da, scale, distribution, fitting_params)
            IDX->>XR: apply_ufunc(spi_1d, dask parallelized)
            XR-->>IDX: spi_values DataArray
            IDX-->>SPI: spi_values DataArray

            alt periodicity daily
                SPI->>XR: apply_ufunc(transform_to_gregorian)
                XR-->>SPI: spi_gregorian
            end

            SPI->>XR: build_dataset_spi_grid_or_divisions_or_timeseries
            XR-->>SPI: ds_spi
            SPI->>FS: ds_spi.to_netcdf(output_path)
        end
    end

    alt save_params is not None and ds_fitting not None
        SPI->>FS: ds_fitting.to_netcdf(save_params)
    end

    SPI-->>CLI: success
    alt multiprocessing all
        CLI->>Dask: client.close()
    end
    CLI-->>User: SPI NetCDF files written
Loading

File-Level Changes

Change Details Files
Refactor SPI CLI computation from shared-memory multiprocessing to xarray.apply_ufunc with optional Dask and add an xarray-based SPI helper in the indices module.
  • Remove global shared-memory arrays and multiprocessing workers used to scatter/gather SPI inputs and results.
  • Always open precipitation datasets chunked along the time dimension and normalize units to millimeters, handling inches conversion explicitly.
  • Handle daily data by transforming to/from a 366-day calendar using xarray.apply_ufunc wrappers around existing utilities.
  • Build or load gamma/Pearson fitting parameter datasets using xarray and compute them via apply_ufunc over time, reusing the new xarray SPI helper for SPI values.
  • Generate SPI datasets for grid, divisions, and timeseries inputs using helper builders that now take precomputed SPI DataArrays rather than reading from shared memory.
  • Initialize an optional dask.distributed.Client for CLI multiprocessing modes and ensure clean shutdown.
src/climate_indices/__spi__.py
src/climate_indices/__main__.py
Introduce an xarray SPI API and DataArray accessor to compute SPI while preserving coordinates and metadata.
  • Add spi_xarray in the indices module that wraps the existing spi function with xarray.apply_ufunc, supporting both on-the-fly fitting and precomputed gamma/Pearson parameters and Dask-backed arrays.
  • Register an xarray DataArray accessor (IndicesAccessor) under .indices that exposes a spi method, handling distribution string/enum normalization and attribute propagation.
  • Add unit tests that verify the accessor round-trips against the scalar SPI implementation and preserves time coordinates and expected SPI attributes.
  • Expose the accessor in the package so importing climate_indices activates the registration.
src/climate_indices/indices.py
src/climate_indices/accessors.py
src/climate_indices/__init__.py
tests/test_accessors.py
Harden core compute and Palmer kernels with type annotations and Numba JIT acceleration while preserving numerical behavior and zero-precipitation handling.
  • Introduce numpy.typing-based FloatArray/BoolArray aliases, ParamSpec/TypeVar helpers, and a _numba_jit wrapper to type-safe JIT-decorated functions.
  • Annotate key compute functions (validation, scaling, gamma/pearson parameter and transform functions, and distribution fallback utilities) with precise types and improve mask handling using np.ma.getmaskarray.
  • Refactor Pearson fitting to separate CDF evaluation from probability adjustment, JIT-compile the minimum/fit helpers, and keep the final normal transform in transform_fitted_pearson.
  • Numba-accelerate Palmer water balance helpers and factor the main monthly loop into a JIT-compiled _calc_water_balances_numba, with a thin Python wrapper retaining the existing data dict interface.
  • Adjust tests to use np.isclose and updated union-style isinstance checks and to assert expected NaN/finite behavior for zeros and sparse data scenarios.
src/climate_indices/compute.py
src/climate_indices/palmer.py
tests/test_compute.py
tests/test_zero_precipitation_fix.py
Add modernization tooling, typing configuration, and contributor-focused docs including floating-point comparison guidance and a 2025 roadmap.
  • Add numba to runtime dependencies and mypy/pre-commit to dev extras, updating the uv.lock file accordingly.
  • Configure pre-commit with ruff, ruff-format, mypy-on-compute.py, and basic YAML checks to enforce style and typing.
  • Extend mypy configuration with ignore_missing_imports for numba and selected scientific libraries and ignore_errors overrides for legacy modules that are not yet fully typed.
  • Document modernization plans and testing strategy in a 2025 modernization roadmap and an implementation-plan document oriented around agentic, incremental tasks.
  • Provide a floating_point_comparisons example script demonstrating safe use of np.isclose/np.allclose and appropriate tolerances for scientific tests.
pyproject.toml
uv.lock
.pre-commit-config.yaml
docs/2025_modernization_roadmap.md
docs/dev/2025_implementation_plan.md
docs/examples/floating_point_comparisons.py

Possibly linked issues

  • #: The PR adds spi_xarray and a dask-aware SPI CLI, fixing SPI computations that previously failed under xarray+dask.
  • ##???: The IndexError was in multiprocessing gamma fitting; this PR removes that codepath, refactoring SPI CLI to xarray/dask instead.
  • Precipitation units causing error (ValueError: Unsupported precipitation units: kg m-2 s-1) #531: PR modernizes SPI computation and explicitly fixes zero-precipitation/L-moments issues that caused missing SPI values in the issue.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 6 issues, and left some high level feedback:

  • In _compute_write_index, the ds_fitting logic will break when neither load_params nor save_params is provided (it remains None but is later accessed for .coords in the per-scale loop); consider either always initializing a temporary fitting dataset or fully guarding all ds_fitting[...] accesses behind ds_fitting is not None.
  • The new spi_xarray helper in indices.py ends with an unreachable return xr.apply_ufunc(...) block after raising ValueError, and references _spi_1d that may be out of scope; this dead code should be removed or refactored to avoid confusion.
  • The SPI CLI now opens all inputs with chunks={"time": -1} regardless of InputType, which disables spatial chunking that previously existed for grid/division inputs; consider preserving chunking along spatial dimensions while still enforcing a single time chunk to keep memory usage manageable on large grids.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `_compute_write_index`, the `ds_fitting` logic will break when neither `load_params` nor `save_params` is provided (it remains `None` but is later accessed for `.coords` in the per-scale loop); consider either always initializing a temporary fitting dataset or fully guarding all `ds_fitting[...]` accesses behind `ds_fitting is not None`.
- The new `spi_xarray` helper in `indices.py` ends with an unreachable `return xr.apply_ufunc(...)` block after raising `ValueError`, and references `_spi_1d` that may be out of scope; this dead code should be removed or refactored to avoid confusion.
- The SPI CLI now opens all inputs with `chunks={"time": -1}` regardless of `InputType`, which disables spatial chunking that previously existed for grid/division inputs; consider preserving chunking along spatial dimensions while still enforcing a single time chunk to keep memory usage manageable on large grids.

## Individual Comments

### Comment 1
<location> `src/climate_indices/palmer.py:787-790` </location>
<code_context>
         for param in ["alpha", "beta", "gamma", "delta"]:
             if (
                 param in fitting_params
-                and isinstance(fitting_params[param], (list, tuple, np.ndarray))
+                and isinstance(fitting_params[param], list | tuple | np.ndarray)
                 and len(fitting_params[param]) == 12
             ):
</code_context>

<issue_to_address>
**issue (bug_risk):** Runtime use of `list | tuple | np.ndarray` in `isinstance` is invalid and will raise `TypeError`.

`isinstance` doesn’t accept PEP 604 unions; it requires a type or a tuple of types. This call will raise `TypeError: isinstance() argument 2 cannot be a union`. Use `isinstance(fitting_params[param], (list, tuple, np.ndarray))` so the check executes correctly.
</issue_to_address>

### Comment 2
<location> `src/climate_indices/indices.py:230-239` </location>
<code_context>
+def spi_xarray(
</code_context>

<issue_to_address>
**issue:** Unreachable duplicate `xr.apply_ufunc` call at end of `spi_xarray` should be removed.

Because the `ValueError` for unsupported distributions is raised just before the final `return xr.apply_ufunc(...)`, that block is dead code and duplicates the earlier `apply_ufunc` call for the `fitting_params is None` case. Please remove the trailing `return xr.apply_ufunc(...)` after the `raise ValueError` to simplify control flow and avoid confusion.
</issue_to_address>

### Comment 3
<location> `src/climate_indices/__spi__.py:319-327` </location>
<code_context>
-            raise ValueError(message)
-
-        # convert daily values into 366-day years
-        if periodicity == compute.Periodicity.daily:
-            initial_year = int(dataset_climatology["time"][0].dt.year)
-            final_year = int(dataset_climatology["time"][-1].dt.year)
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Daily reindexing to a synthetic `np.arange` time coordinate may desynchronize data and time metadata.

Here you reshape to a 366‑day calendar and set `time = np.arange(total_years * period_length)`, but `ds_precip['time']` remains the original Gregorian coordinate and is later used to build outputs. This desynchronizes the DataArray’s `time` from the dataset‑level `time`. Please either (a) update `ds_precip[time_dim]` to the new coordinate, or (b) keep the 366‑day calendar as a separate auxiliary time coordinate to avoid inconsistent state.

Suggested implementation:

```python
            time = np.arange(total_years * period_length)

            # keep dataset- and dataarray-level time coordinates in sync with the
            # synthetic 366-day calendar used for daily reindexing
            dataset_climatology = dataset_climatology.assign_coords(time=time)
            if "time" in ds_precip.coords:
                ds_precip = ds_precip.assign_coords(time=time)

```

This change assumes:
1. `time` is the correct name of the time dimension across `dataset_climatology` and `ds_precip`.
2. `np` is already imported (it must be, since `np.arange` is used).

If your code uses a different `time_dim` name, adjust `"time"` in `assign_coords` calls to that variable/name. Also ensure that all subsequent operations that rely on the original Gregorian `time` coordinate are compatible with the synthetic 366-day index; if they require the original calendar, consider storing it in a separate auxiliary coordinate (e.g., `original_time`) before overwriting `time`.
</issue_to_address>

### Comment 4
<location> `tests/test_zero_precipitation_fix.py:534` </location>
<code_context>
             compute.calculate_time_step_params(insufficient_data)
         except compute.DistributionFittingError as e:
             # Should catch both InsufficientDataError and PearsonFittingError
-            assert isinstance(e, (compute.InsufficientDataError, compute.PearsonFittingError))
+            assert isinstance(e, compute.InsufficientDataError | compute.PearsonFittingError)
             assert str(e)  # Should have meaningful message

</code_context>

<issue_to_address>
**issue (bug_risk):** The isinstance check using `|` will fail at runtime; it must receive a type or tuple of types.

`isinstance` doesn’t support PEP 604 unions (`A | B`) as its second argument; it only accepts a type or a tuple of types. This will raise `TypeError` when the test runs. Please revert to the tuple form so the assertion actually checks for both exception types:

```python
assert isinstance(e, (compute.InsufficientDataError, compute.PearsonFittingError))
```
</issue_to_address>

### Comment 5
<location> `tests/test_accessors.py:8-17` </location>
<code_context>
+def test_spi_accessor_round_trip(tmp_path):
</code_context>

<issue_to_address>
**suggestion (testing):** Accessor test only covers 1-D gamma/monthly; consider adding tests for other distributions and periodicities.

The current round-trip test is a solid start, but it only exercises a 1-D `DataArray` with a gamma distribution and monthly periodicity. To better validate the accessor, please add tests for:

1. Pearson distribution, to exercise the more complex fitting path.
2. Daily periodicity with appropriate time coordinates, to align with the core `spi` daily handling.
3. An invalid 2-D `DataArray` case, asserting the expected `ValueError`.

This will bring test coverage closer to the combinations supported by the core SPI code.
</issue_to_address>

### Comment 6
<location> `tests/test_accessors.py:45-54` </location>
<code_context>
+    nc_path = tmp_path / "precip.nc"
+    ds.to_netcdf(nc_path, engine="h5netcdf")
+
+    with xr.open_dataset(nc_path) as opened:
+        precip = opened["precip"]
+        spi_da = precip.indices.spi(
+            scale=3,
+            distribution="gamma",
+            data_start_year=2000,
+            calibration_year_initial=2000,
+            calibration_year_final=2001,
+            periodicity=compute.Periodicity.monthly,
+        )
+
+    expected = indices.spi(
+        values,
+        3,
+        indices.Distribution.gamma,
+        2000,
+        2000,
+        2001,
+        compute.Periodicity.monthly,
+    )
+
+    assert isinstance(spi_da, xr.DataArray)
+    assert spi_da.dims == ("time",)
+    assert np.array_equal(spi_da["time"].values, time)
+    assert spi_da.attrs["long_name"] == "Standardized Precipitation Index"
+    assert spi_da.attrs["units"] == "unitless"
+    np.testing.assert_allclose(spi_da.values, expected, equal_nan=True)
</code_context>

<issue_to_address>
**suggestion (testing):** No tests currently cover Dask-chunked `DataArray`s or time-chunking behaviour for the SPI API.

This test only exercises the in-memory accessor path (which calls `indices.spi` directly) and never touches the `xarray.apply_ufunc`/Dask implementation used by `indices.spi_xarray` and the CLI. To cover the new API, please add tests that:

- Use `DataArray`s with explicit Dask chunking (e.g. `chunk({'time': -1})` and `chunk({'time': 10})`).
- Call `indices.spi_xarray` for both gamma and Pearson.
- Compare results to `indices.spi` on the same data, and verify that unsupported time chunking either behaves correctly or fails with a clear error.

This will exercise the parallel execution path and guard against regressions as xarray/Dask evolve.

Suggested implementation:

```python
    assert isinstance(spi_da, xr.DataArray)
    assert spi_da.dims == ("time",)
    assert np.array_equal(spi_da["time"].values, time)
    assert spi_da.attrs["long_name"] == "Standardized Precipitation Index"
    assert spi_da.attrs["units"] == "unitless"
    np.testing.assert_allclose(spi_da.values, expected, equal_nan=True)


@pytest.mark.parametrize("distribution_name", ["gamma", "pearson"])
@pytest.mark.parametrize("chunk_spec", [{"time": -1}, {"time": 10}])
def test_spi_xarray_with_dask_chunked_dataarray(distribution_name, chunk_spec):
    """Ensure spi_xarray works correctly with Dask-chunked DataArrays.

    This exercises the xarray.apply_ufunc/Dask path used by the public API and CLI.
    """
    values = np.array(
        [
            np.nan,
            0.553276,
            0.650286,
            0.810409,
            0.722108,
            1.071896,
            0.792567,
            1.175593,
            1.200544,
            0.973729,
            1.024978,
            1.035279,
            1.059984,
            1.117864,
            1.151453,
            1.084299,
            0.866829,
            0.910031,
            0.876611,
            0.676108,
            0.704111,
            0.667759,
            0.746614,
            0.574251,
        ],
        dtype=float,
    )
    time = np.arange(values.size)
    da = xr.DataArray(values, dims=("time",), coords={"time": time}, name="precip")

    # Explicitly chunk along time to trigger the Dask execution path.
    da_chunked = da.chunk(chunk_spec)

    expected = indices.spi(
        values,
        3,
        getattr(indices.Distribution, distribution_name),
        2000,
        2000,
        2001,
        compute.Periodicity.monthly,
    )

    spi_da = indices.spi_xarray(
        da_chunked,
        scale=3,
        distribution=distribution_name,
        data_start_year=2000,
        calibration_year_initial=2000,
        calibration_year_final=2001,
        periodicity=compute.Periodicity.monthly,
    )

    # spi_xarray may return a lazy Dask-backed DataArray; compute before comparison.
    spi_da_computed = spi_da.compute()

    assert isinstance(spi_da, xr.DataArray)
    assert spi_da_computed.dims == ("time",)
    assert np.array_equal(spi_da_computed["time"].values, time)
    assert spi_da_computed.attrs["long_name"] == "Standardized Precipitation Index"
    assert spi_da_computed.attrs["units"] == "unitless"
    np.testing.assert_allclose(spi_da_computed.values, expected, equal_nan=True)

```

1. Ensure `pytest` is imported in this file if it is not already, e.g. `import pytest` near the top of `tests/test_accessors.py`.
2. If `indices.spi_xarray` requires a different argument name/signature (e.g. `dist` instead of `distribution`), adjust the call accordingly:
   - Update `indices.spi_xarray(da_chunked, scale=3, distribution=distribution_name, ...)` to use the correct parameter names.
3. If the actual test suite uses a shared fixture for `values`/`time`, you may want to refactor the duplicated array literal into that fixture and reuse it in both tests to keep things DRY.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 787 to +790
for param in ["alpha", "beta", "gamma", "delta"]:
if (
param in fitting_params
and isinstance(fitting_params[param], (list, tuple, np.ndarray))
and isinstance(fitting_params[param], list | tuple | np.ndarray)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Runtime use of list | tuple | np.ndarray in isinstance is invalid and will raise TypeError.

isinstance doesn’t accept PEP 604 unions; it requires a type or a tuple of types. This call will raise TypeError: isinstance() argument 2 cannot be a union. Use isinstance(fitting_params[param], (list, tuple, np.ndarray)) so the check executes correctly.

compute.calculate_time_step_params(insufficient_data)
except compute.DistributionFittingError as e:
# Should catch both InsufficientDataError and PearsonFittingError
assert isinstance(e, (compute.InsufficientDataError, compute.PearsonFittingError))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): The isinstance check using | will fail at runtime; it must receive a type or tuple of types.

isinstance doesn’t support PEP 604 unions (A | B) as its second argument; it only accepts a type or a tuple of types. This will raise TypeError when the test runs. Please revert to the tuple form so the assertion actually checks for both exception types:

assert isinstance(e, (compute.InsufficientDataError, compute.PearsonFittingError))

- Project Overview: Added xarray-native API with optional Dask parallelism
- Module Structure: Added accessors.py, __spi__.py, and updated module descriptions

- Key Design Patterns:
  - Replaced multiprocessing architecture with Xarray/Dask processing pipeline
  - Added Xarray-Native SPI API section with code examples for both function and accessor APIs
  - Added Numba Acceleration section (_pearson_fit(), _minimum_possible())
  - Added distribution fallback strategy for robust fitting
- Development Commands: Added pre-commit hook command, updated uv sync syntax
- CLI Usage:
  - Added --multiprocessing all example for Dask parallelism
  - Added parallelization options and fitting parameter persistence sections
- Development Notes:
  - Updated dependencies to include numba, mypy, pre-commit
  - Added Pre-commit Hooks section with setup instructions
  - Added Mypy Configuration section with strict settings
  - Updated performance considerations with numba JIT and Dask distributed
Removes unreachable dead code in spi_xarray after ValueError and improves memory efficiency for large gridded datasets.

Changes:
- indices.py: Remove unreachable xr.apply_ufunc block after raise
- __spi__.py: Preserve spatial chunking (lat/lon/division) while keeping time as single chunk for rolling window correctness
- Add clarifying comment about daily time coordinate handling
- Add comprehensive accessor tests for Pearson distribution, 2-D validation, and Dask-chunked DataArrays

The spatial chunking improvement maintains correctness (single time chunk for rolling windows) while reducing memory usage on large gridded datasets by allowing Dask to chunk spatial dimensions.
Adds notebooks/00_quickstart_indices_demo.ipynb demonstrating core climate indices (SPI, SPEI, PET, PNP) using both NumPy and xarray APIs. Uses bundled test fixtures for reproducibility without external data dependencies.

Notebook includes:
- Environment setup with version tracking
- Programmatic data loading from tests/fixture/
- SPI/SPEI computation with multiple time scales
- PET calculations via Thornthwaite and Hargreaves methods
- PNP (percent-of-normal precipitation) computation
- Validation plots for visual verification

Intended as entry point for new users to explore library capabilities interactively.
…oml:

  - numpy (explicit version for stability)
  - matplotlib, ipykernel, notebook (for Jupyter workflows)
Add a unified context system for AI coding tools (Claude Code, Codex, Gemini CLI) following the pattern established in the cdip project.

New files:
  - AGENTS.md: canonical instructions for all AI tools
  - GEMINI.md: thin pointer for Gemini CLI
  - context/: on-demand reference library
    - INDEX.md, architecture.md, project_brief.md, tech_stack.md
    - python_conventions.md, scientific_computing.md, uv_rules.md
    - climate_indices_reference.md, dev_workflow.md
    - plan_validation_checklist.md
  - .claude/context/brief.md: brief for Claude's context window
  - .codex/config.toml, INSTRUCTIONS.md: Codex sandbox config
  - .gemini/GEMINI.md: Gemini-specific pointer

Updated:
  - CLAUDE.md: add note pointing to AGENTS.md and context/INDEX.md

This enables consistent AI-assisted development across different tools while keeping context loading token-economical (load only what's needed).
@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 1, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@monocongo
Copy link
Owner Author

Code review

Found 2 issues:

  1. Missing Google-style docstrings on 4 functions (Global CLAUDE.md says "Docstrings: All functions must have Google-style docstrings")

The following functions lack docstrings:

  • _numba_jit():
    def _numba_jit(*args: object, **kwargs: object) -> Callable[[Callable[P, R]], Callable[P, R]]:
    return cast(Callable[[Callable[P, R]], Callable[P, R]], numba.jit(*args, **kwargs))
  • reshape_values():
    # +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    def reshape_values(values: FloatArray, periodicity: Periodicity) -> FloatArray:
    if periodicity is Periodicity.monthly:
    return cast(FloatArray, utils.reshape_to_2d(values, 12))
    elif periodicity is Periodicity.daily:
    return cast(FloatArray, utils.reshape_to_2d(values, 366))
    else:
    raise ValueError(f"Invalid periodicity argument: {periodicity}")
  • validate_values_shape():
    def validate_values_shape(values: FloatArray) -> int:
    if len(values.shape) != 2 or values.shape[1] not in (12, 366):
    _log_and_raise_shape_error(shape=values.shape)
    return int(values.shape[1])
  • adjust_calibration_years():
    def adjust_calibration_years(
    data_start_year: int, data_end_year: int, calibration_start_year: int, calibration_end_year: int
    ) -> tuple[int, int]:
    if (calibration_start_year < data_start_year) or (calibration_end_year > data_end_year):
    return data_start_year, data_end_year
    return calibration_start_year, calibration_end_year
  1. Missing from __future__ import annotations import (Global CLAUDE.md says "Imports: Use from __future__ import annotations and TYPE_CHECKING for type-only imports")

Both files use modern type hints but are missing the future annotations import:

  • compute.py:
    """
    import logging
    from collections.abc import Callable
    from enum import Enum
    from typing import NoReturn, ParamSpec, TypeAlias, TypeVar, cast
    import numba
    import numpy as np
    import scipy.stats
    import scipy.version
    from numpy.typing import NDArray
    from packaging.version import Version
  • indices.py:
    """Main level API module for computing climate indices"""
    import logging
    from enum import Enum
    import numpy as np
    import xarray as xr

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@monocongo
Copy link
Owner Author

Review findings

  • [P2] Clip negatives before computing saved fitting params — /Users/james.a/git/climate_indices/src/climate_indices/__spi__.py:463-471
    • When --save_params is used, the fitting parameters are derived from scaled_values computed via compute.sum_to_scale, which skips the negative-value clipping performed by compute.scale_values.
    • If input precipitation contains negative values (common in some gridded products and bias-corrected datasets), saved alpha/beta/loc/scale/skew parameters are fit to negative totals while SPI later uses clipped values, yielding inconsistent and biased SPI output.
    • The previous multiprocessing path always clipped via scale_values, so this is a regression specific to the save-params path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants