Skip to content

Conversation

@monocongo
Copy link
Owner

@monocongo monocongo commented Dec 24, 2025

Description

Fixes a bug where SPI calculation would return NaN for input values of 0.0 when using the Gamma distribution.

Fix Details

  • src/climate_indices/compute.py:
  • Updated gamma_parameters to operate on a copy of the input array, preventing side effects (in-place modification) on the user's data.
  • Updated transform_fitted_gamma to use explicit masking for zero values. It now correctly treats the zero-probability mass by forcing the Gamma CDF contribution to 0.0 for zero inputs, preventing NaN propagation.

Verification

  • Updated tests/test_compute.py to verify that zero inputs now produce valid SPI values.
  • Verified that existing regression tests pass.

Resolves #533

Summary by Sourcery

Fix SPI Gamma transformation to handle zero inputs without mutating caller data or producing NaNs.

Bug Fixes:

  • Prevent NaN SPI outputs when transforming fitted Gamma distributions with zero-valued inputs by explicitly treating zeros as a separate probability mass with zero Gamma CDF contribution.
  • Avoid unintended in-place modification of user input arrays during Gamma parameter estimation and transformation.

Tests:

  • Update SPI Gamma transformation tests to compare only valid (non-NaN) fixture values and to assert that zero-precipitation inputs yield non-NaN SPI values.

- Prevent in-place modification of input array in gamma_parameters
- Correctly handle zero values in transform_fitted_gamma by masking and explicit probability assignment
- Fixes #533
@monocongo monocongo self-assigned this Dec 24, 2025
@sourcery-ai
Copy link

sourcery-ai bot commented Dec 24, 2025

Reviewer's Guide

Adjusts SPI Gamma computation to avoid mutating input arrays and to correctly handle zero-valued inputs, preventing NaNs in SPI results and tightening tests accordingly.

Sequence diagram for updated SPI Gamma transform handling zero inputs

sequenceDiagram
    participant Caller
    participant transform_fitted_gamma
    participant gamma_parameters
    participant scipy_stats_gamma

    Caller->>transform_fitted_gamma: transform_fitted_gamma(values, alphas, betas, ...)
    activate transform_fitted_gamma

    transform_fitted_gamma->>transform_fitted_gamma: compute zeros = (values == 0).sum(axis=0)
    transform_fitted_gamma->>transform_fitted_gamma: probabilities_of_zero = zeros / values.shape[0]
    transform_fitted_gamma->>transform_fitted_gamma: values_for_fitting = values.copy()
    transform_fitted_gamma->>transform_fitted_gamma: zero_mask = (values == 0)
    transform_fitted_gamma->>transform_fitted_gamma: values_for_fitting[zero_mask] = np.nan

    alt alphas or betas not provided
        transform_fitted_gamma->>gamma_parameters: gamma_parameters(values_for_fitting, ...)
        activate gamma_parameters
        gamma_parameters-->>transform_fitted_gamma: alphas, betas
        deactivate gamma_parameters
    end

    transform_fitted_gamma->>scipy_stats_gamma: gamma.cdf(values_for_fitting, a=alphas, scale=betas)
    activate scipy_stats_gamma
    scipy_stats_gamma-->>transform_fitted_gamma: gamma_probabilities
    deactivate scipy_stats_gamma

    transform_fitted_gamma->>transform_fitted_gamma: gamma_probabilities[zero_mask] = 0.0

    transform_fitted_gamma-->>Caller: transformed SPI values
    deactivate transform_fitted_gamma
Loading

File-Level Changes

Change Details Files
Ensure gamma parameter estimation does not mutate caller-provided data and correctly ignores zeros via a copied, validated array.
  • Change gamma_parameters to operate on a copy of the validated input array returned by _validate_array.
  • Continue replacing zeros with NaNs on the internal working array only, so caller data remains unchanged.
src/climate_indices/compute.py
Rework transform_fitted_gamma to treat zero inputs as a separate probability mass while keeping the Gamma CDF contribution for zeros at 0.0.
  • Compute zero counts and zero probabilities as before, but introduce a separate working copy values_for_fitting and a zero_mask to avoid modifying the original input.
  • Pass values_for_fitting into gamma_parameters when fitting alphas and betas instead of the raw values array.
  • Compute gamma CDF on values_for_fitting, then explicitly set gamma_probabilities at zero_mask positions to 0.0 to avoid NaNs for zeros while keeping them out of the Gamma fit.
src/climate_indices/compute.py
Tighten SPI Gamma regression test to validate non-NaN behavior for zero-precipitation inputs while still matching legacy results on valid fixture data. tests/test_compute.py

Assessment against linked issues

Issue Objective Addressed Explanation
#533 Ensure SPI calculation using the Gamma distribution does not produce NaN (missing pixel/grid values) when input precipitation values are zero (e.g., after replacing nulls with 0 in the input dataset).

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sonarqubecloud
Copy link

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • There is duplicated logic for handling zeros and NaNs between gamma_parameters and transform_fitted_gamma; consider centralizing this behavior in a shared helper to reduce the chance of future divergence in how zero values are treated.
  • In the updated test_transform_fitted_gamma, you only assert that SPI values for zero precipitation are non-NaN; it may be more robust to also assert something about their expected magnitude or sign so regressions in the actual numerical result for zeros are caught, not just NaN propagation.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- There is duplicated logic for handling zeros and NaNs between `gamma_parameters` and `transform_fitted_gamma`; consider centralizing this behavior in a shared helper to reduce the chance of future divergence in how zero values are treated.
- In the updated `test_transform_fitted_gamma`, you only assert that SPI values for zero precipitation are non-NaN; it may be more robust to also assert something about their expected magnitude or sign so regressions in the actual numerical result for zeros are caught, not just NaN propagation.

## Individual Comments

### Comment 1
<location> `src/climate_indices/compute.py:913-921` </location>
<code_context>
     zeros = (values == 0).sum(axis=0)
     probabilities_of_zero = zeros / values.shape[0]

+    # create a working copy to avoid modifying the input array
+    # and to safely replace zeros with NaNs for fitting/CDF computation
+    values_for_fitting = values.copy()
+
+    # store mask of zero values
+    zero_mask = (values == 0)
+
     # replace zeros with NaNs
</code_context>

<issue_to_address>
**suggestion (performance):** You can reuse a single zero mask instead of recomputing the equality operation.

You compute `values == 0` twice: for `zeros` and for `zero_mask`. Instead, compute it once as `zero_mask = (values == 0)` and then use `zeros = zero_mask.sum(axis=0)` to avoid redundant work and keep zero-handling logic centralized.
</issue_to_address>

### Comment 2
<location> `src/climate_indices/compute.py:937-942` </location>
<code_context>

     # find the gamma probability values using the gamma CDF
-    gamma_probabilities = scipy.stats.gamma.cdf(values, a=alphas, scale=betas)
+    gamma_probabilities = scipy.stats.gamma.cdf(values_for_fitting, a=alphas, scale=betas)
+
+    # where the input values were zero the CDF will have returned NaN, but since
+    # we're treating zeros as a separate probability mass we should treat the
+    # gamma probability for zeros as 0.0
+    gamma_probabilities[zero_mask] = 0.0

     # TODO explain this better
</code_context>

<issue_to_address>
**question (bug_risk):** Clarify how this zero handling interacts with columns where the gamma fit is ill-defined (e.g., all-zero data).

For all-zero columns, `values_for_fitting` becomes all-NaN, so the fitted `alphas`/`betas` (and thus `gamma.cdf`) are likely all-NaN. With `zero_mask` true everywhere, `gamma_probabilities[zero_mask] = 0.0` then forces the entire column to 0.0 instead of NaN. This alters the current behavior and may break any downstream logic that interprets NaNs as "no valid fit," so it’s worth confirming this is the desired treatment for all-zero / unfit columns.
</issue_to_address>

### Comment 3
<location> `tests/test_compute.py:95-98` </location>
<code_context>
+        err_msg="Transformed gamma fitted daily values mismatch on valid fixture values"
     )

+    # Check that values where input was zero are NOT NaN in computed result
+    mask_zeros = (precips_mm_daily == 0)
+    assert not np.any(np.isnan(computed_values[mask_zeros])), \
+            "Computed SPI should not be NaN for zero precipitation"
+
     # confirm that we can call with a calibration period out of the valid range
</code_context>

<issue_to_address>
**suggestion (testing):** Add a dedicated test case for the all-zeros (or mostly-zeros) edge case to more directly exercise the bugfix.

The current assertion only checks zero handling for this fixture. The regression here was specifically about zero-precipitation inputs producing NaNs when zeros dominate or all values are zero. To capture that explicitly and protect against regressions, consider a focused test such as:

```python
def test_transform_fitted_gamma_all_zeros_produces_finite_spi():
    values = np.zeros((N, 1), dtype=float)  # small N
    result = compute.transform_fitted_gamma(
        values,
        data_start_year,
        calibration_year_start_daily,
        calibration_year_end_daily,
        compute.Periodicity.daily,
    )
    assert not np.any(np.isnan(result)), "SPI should not be NaN when all inputs are zero"
```

This directly guards the `probability_of_zero == 1.0` case and makes future refactors less likely to reintroduce NaNs there.

Suggested implementation:

```python
    # confirm that we can call with a calibration period out of the valid range
    # and as a result use the full period of record as the calibration period instead
    computed_values = compute.transform_fitted_gamma(

def test_transform_fitted_gamma_all_zeros_produces_finite_spi():
    N = 10
    values = np.zeros((N, 1), dtype=float)

    result = compute.transform_fitted_gamma(
        values,
        data_start_year,
        calibration_year_start_daily,
        calibration_year_end_daily,
        compute.Periodicity.daily,
    )

    assert not np.any(np.isnan(result)), \
        "SPI should not be NaN when all inputs are zero"

```

The new test `test_transform_fitted_gamma_all_zeros_produces_finite_spi` must be defined at module scope (not nested inside another test). If the search/replace context above lands inside an existing test function, please move this new test so that it sits alongside the other `transform_fitted_gamma` tests (e.g., immediately after the existing daily gamma test function).

This test assumes that `np`, `compute`, `data_start_year`, `calibration_year_start_daily`, and `calibration_year_end_daily` are already available in this module as in the surrounding tests. If any of these are provided via fixtures or differently named constants, adjust the arguments accordingly.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 913 to +921
zeros = (values == 0).sum(axis=0)
probabilities_of_zero = zeros / values.shape[0]

# create a working copy to avoid modifying the input array
# and to safely replace zeros with NaNs for fitting/CDF computation
values_for_fitting = values.copy()

# store mask of zero values
zero_mask = (values == 0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): You can reuse a single zero mask instead of recomputing the equality operation.

You compute values == 0 twice: for zeros and for zero_mask. Instead, compute it once as zero_mask = (values == 0) and then use zeros = zero_mask.sum(axis=0) to avoid redundant work and keep zero-handling logic centralized.

Comment on lines +937 to +942
gamma_probabilities = scipy.stats.gamma.cdf(values_for_fitting, a=alphas, scale=betas)

# where the input values were zero the CDF will have returned NaN, but since
# we're treating zeros as a separate probability mass we should treat the
# gamma probability for zeros as 0.0
gamma_probabilities[zero_mask] = 0.0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (bug_risk): Clarify how this zero handling interacts with columns where the gamma fit is ill-defined (e.g., all-zero data).

For all-zero columns, values_for_fitting becomes all-NaN, so the fitted alphas/betas (and thus gamma.cdf) are likely all-NaN. With zero_mask true everywhere, gamma_probabilities[zero_mask] = 0.0 then forces the entire column to 0.0 instead of NaN. This alters the current behavior and may break any downstream logic that interprets NaNs as "no valid fit," so it’s worth confirming this is the desired treatment for all-zero / unfit columns.

Comment on lines +95 to +98
# Check that values where input was zero are NOT NaN in computed result
mask_zeros = (precips_mm_daily == 0)
assert not np.any(np.isnan(computed_values[mask_zeros])), \
"Computed SPI should not be NaN for zero precipitation"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add a dedicated test case for the all-zeros (or mostly-zeros) edge case to more directly exercise the bugfix.

The current assertion only checks zero handling for this fixture. The regression here was specifically about zero-precipitation inputs producing NaNs when zeros dominate or all values are zero. To capture that explicitly and protect against regressions, consider a focused test such as:

def test_transform_fitted_gamma_all_zeros_produces_finite_spi():
    values = np.zeros((N, 1), dtype=float)  # small N
    result = compute.transform_fitted_gamma(
        values,
        data_start_year,
        calibration_year_start_daily,
        calibration_year_end_daily,
        compute.Periodicity.daily,
    )
    assert not np.any(np.isnan(result)), "SPI should not be NaN when all inputs are zero"

This directly guards the probability_of_zero == 1.0 case and makes future refactors less likely to reintroduce NaNs there.

Suggested implementation:

    # confirm that we can call with a calibration period out of the valid range
    # and as a result use the full period of record as the calibration period instead
    computed_values = compute.transform_fitted_gamma(

def test_transform_fitted_gamma_all_zeros_produces_finite_spi():
    N = 10
    values = np.zeros((N, 1), dtype=float)

    result = compute.transform_fitted_gamma(
        values,
        data_start_year,
        calibration_year_start_daily,
        calibration_year_end_daily,
        compute.Periodicity.daily,
    )

    assert not np.any(np.isnan(result)), \
        "SPI should not be NaN when all inputs are zero"

The new test test_transform_fitted_gamma_all_zeros_produces_finite_spi must be defined at module scope (not nested inside another test). If the search/replace context above lands inside an existing test function, please move this new test so that it sits alongside the other transform_fitted_gamma tests (e.g., immediately after the existing daily gamma test function).

This test assumes that np, compute, data_start_year, calibration_year_start_daily, and calibration_year_end_daily are already available in this module as in the surrounding tests. If any of these are provided via fixtures or differently named constants, adjust the arguments accordingly.

@monocongo monocongo merged commit 8d5607f into master Dec 24, 2025
4 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing Pixel or Grid Value in the output while Calculating Standard Precipitation Index (SPI) using climate_indices library in Python

2 participants