Skip to content

Conversation

@monocongo
Copy link
Owner

@monocongo monocongo commented Dec 25, 2025

Summary

Fixes incorrect SPI (Standardized Precipitation Index) output when precipitation data contains zeros. Previously, zero precipitation values could produce NaN results due to issues in the gamma distribution fitting process. This MR ensures zeros are handled correctly and produce meaningful drought indicators.

Problem

When computing SPI using gamma distribution fitting (transform_fitted_gamma), zero precipitation values caused two issues:

  1. Zeros passed to gamma CDF: The scipy.stats.gamma.cdf() function received zero values, which combined with the fitted gamma parameters could produce NaN outputs
  2. All-zero time steps: When a calendar day/month had 100% zero precipitation in the historical record, the resulting SPI was undefined or incorrectly indicated extreme wetness (+∞)

Solution

Core fix: Proper zero-value handling in gamma fitting

  • Zeros are now explicitly excluded from gamma parameter fitting by replacing them with NaN before calling gamma_parameters()
  • A zero_mask tracks original zero positions for later probability adjustment
  • After computing gamma.cdf(), zero positions are assigned a gamma probability of 0.0 (the correct value for the lower bound of the distribution)
  • The final probability calculation correctly combines the probability-of-zero with the gamma probability

All-zero time steps produce extreme drought (-∞)

When probability_of_zero == 1.0 (all historical values are zero for a time step), the SPI now correctly returns -∞ (extreme drought) rather than NaN or +∞. This is the correct climatological interpretation: a location with no recorded precipitation is in perpetual extreme drought.

Code quality improvements

  • Added _replace_zeros_with_nan() helper to centralize duplicated zero-handling logic between gamma_parameters() and transform_fitted_gamma()
  • Compute zero_mask once and reuse it (avoids redundant values == 0 comparisons)
  • Added comprehensive inline documentation explaining the all-zero edge case behavior

Changes

File Changes
src/climate_indices/compute.py Added _replace_zeros_with_nan() helper; refactored gamma_parameters() and transform_fitted_gamma() to handle zeros correctly
tests/test_compute.py Enhanced assertions for zero-precipitation; added dedicated test_transform_fitted_gamma_all_zeros_produces_finite_spi()

Test plan

  • Existing test_transform_fitted_gamma passes with enhanced zero-value assertions
  • New test_transform_fitted_gamma_all_zeros_produces_finite_spi verifies all-zero input produces -∞ (not NaN)
  • All 6 tests in test_compute.py pass
  • Linting passes (ruff check)

Behavior summary

Input condition Previous behavior New behavior
Zero precipitation (zeros rare historically) NaN Negative SPI (drought)
Zero precipitation (zeros common historically) NaN Near-zero or positive SPI (normal)
All-zero time step (100% zeros in history) NaN or +∞ -∞ (extreme drought)

Summary by Sourcery

Handle zero-precipitation values correctly in SPI gamma-distribution transformation and centralize zero-handling logic.

Bug Fixes:

  • Prevent NaN and incorrect +∞ SPI values by correctly handling zero and all-zero precipitation cases in gamma-based SPI computation.

Enhancements:

  • Introduce a shared _replace_zeros_with_nan helper and reuse zero masks to simplify and clarify gamma fitting logic and edge-case behavior.

Tests:

  • Expand SPI zero-precipitation assertions and add a dedicated test ensuring all-zero precipitation inputs yield finite, negative SPI values rather than NaNs or +∞.

Previously, SPI calculation for time steps with 100% zero precipitation history resulted in +infinity (clipped to 3.09), incorrectly indicating extreme wetness. This change ensures that such cases return NaN (undefined), which is statistically correct for degenerate distributions.

Updated tests/test_compute.py to allow NaN values for zero-precipitation inputs when the historical probability of zero is 1.0.
Addresses code review feedback on SPI zero-precipitation handling:

- Add _replace_zeros_with_nan() helper to eliminate duplicated zero-to-NaN conversion logic between gamma_parameters() and transform_fitted_gamma()
- Compute zero_mask once and reuse it, avoiding redundant equality operations (performance improvement)
- Add detailed comment explaining all-zero column behavior: produces -infinity (extreme drought) not NaN, which is the correct interpretation for drought monitoring
- Enhance test assertions to verify zero values produce finite results (not NaN or +infinity)
- Add dedicated test for all-zeros edge case to guard against future regressions

Reduces code duplication and improves maintainability while clarifying the semantic meaning of all-zero precipitation in drought indices.

Refs #533
@sourcery-ai
Copy link

sourcery-ai bot commented Dec 25, 2025

Reviewer's Guide

Refactors SPI gamma-fitting to centralize zero-handling, ensures zeros are excluded from parameter fitting, and adjusts probability calculations so zero and all-zero precipitation inputs yield finite, climatologically consistent SPI values, with tests updated and expanded to cover these edge cases.

Sequence diagram for SPI transform_fitted_gamma zero-handling and probability flow

sequenceDiagram
    actor Caller
    participant TransformFittedGamma
    participant _validate_array
    participant _replace_zeros_with_nan
    participant gamma_parameters
    participant scipy_stats_gamma
    participant scipy_stats_norm

    Caller->>TransformFittedGamma: transform_fitted_gamma(values, alphas, betas, periodicity)
    TransformFittedGamma->>_validate_array: _validate_array(values, periodicity)
    _validate_array-->>TransformFittedGamma: validated_values

    TransformFittedGamma->>_replace_zeros_with_nan: _replace_zeros_with_nan(validated_values)
    _replace_zeros_with_nan-->>TransformFittedGamma: zero_mask, values_for_fitting

    TransformFittedGamma->>TransformFittedGamma: zeros = zero_mask.sum(axis=0)
    TransformFittedGamma->>TransformFittedGamma: probabilities_of_zero = zeros / n_samples
    TransformFittedGamma->>TransformFittedGamma: set probabilities_of_zero == 1.0 to 0.0

    alt alphas or betas not provided
        TransformFittedGamma->>gamma_parameters: gamma_parameters(values_for_fitting, data_start_year, data_end_year, periodicity)
        gamma_parameters->>_validate_array: _validate_array(values_for_fitting, periodicity)
        gamma_parameters->>_replace_zeros_with_nan: _replace_zeros_with_nan(validated_values)
        _replace_zeros_with_nan-->>gamma_parameters: zero_mask_fit, values_no_zeros
        gamma_parameters-->>TransformFittedGamma: alphas, betas
    end

    TransformFittedGamma->>scipy_stats_gamma: gamma.cdf(values_for_fitting, alphas, scale=betas)
    scipy_stats_gamma-->>TransformFittedGamma: gamma_probabilities

    TransformFittedGamma->>TransformFittedGamma: gamma_probabilities[zero_mask] = 0.0
    TransformFittedGamma->>TransformFittedGamma: probabilities = probabilities_of_zero + (1.0 - probabilities_of_zero) * gamma_probabilities

    TransformFittedGamma->>scipy_stats_norm: norm.ppf(probabilities)
    scipy_stats_norm-->>TransformFittedGamma: spi_values

    TransformFittedGamma-->>Caller: spi_values (finite, -infinity for all-zero history)
Loading

Class diagram for compute module gamma fitting helpers and SPI transformation

classDiagram
    class ComputeModule {
        +gamma_parameters(values, data_start_year, data_end_year, periodicity)
        +transform_fitted_gamma(values, data_start_year, data_end_year, periodicity, alphas, betas)
        +_replace_zeros_with_nan(values)
    }

    class gamma_parameters {
        +values
        +data_start_year
        +data_end_year
        +periodicity
        +returns alphas
        +returns betas
    }

    class transform_fitted_gamma {
        +values
        +data_start_year
        +data_end_year
        +periodicity
        +alphas
        +betas
        +uses zero_mask
        +uses probabilities_of_zero
        +returns spi_values
    }

    class _replace_zeros_with_nan {
        +values
        +returns zero_mask
        +returns values_copy
    }

    ComputeModule ..> gamma_parameters : defines
    ComputeModule ..> transform_fitted_gamma : defines
    ComputeModule ..> _replace_zeros_with_nan : defines

    transform_fitted_gamma --> _replace_zeros_with_nan : calls
    gamma_parameters --> _replace_zeros_with_nan : calls
    transform_fitted_gamma --> gamma_parameters : optionally calls
Loading

File-Level Changes

Change Details Files
Centralize and correct zero-precipitation handling in gamma fitting and SPI transformation logic so zeros are excluded from parameter estimation but still influence probabilities correctly, including the all-zero edge case.
  • Introduced a _replace_zeros_with_nan helper that returns a zero_mask and a copy of the input array with zeros replaced by NaN for downstream gamma fitting.
  • Updated gamma_parameters to rely on _replace_zeros_with_nan after array validation instead of doing inline zero-to-NaN replacement.
  • Refactored transform_fitted_gamma to compute zero_mask and values_for_fitting via _replace_zeros_with_nan, reuse zero_mask for probability_of_zero calculations, and avoid repeated equality checks.
  • Modified transform_fitted_gamma so that time steps with probability_of_zero == 1.0 have their probability_of_zero set to 0.0, leading to SPI = -inf (extreme drought) for all-zero histories, and documented this behavior inline.
src/climate_indices/compute.py
Strengthen SPI tests around zero and all-zero precipitation cases to assert finite, non-NaN, and non-+inf behavior, and add a dedicated regression test for all-zero inputs.
  • Extended test_transform_fitted_gamma to assert that SPI values at zero-precipitation positions are never NaN and are not +inf, allowing real values or -inf when appropriate.
  • Added test_transform_fitted_gamma_all_zeros_produces_finite_spi to verify that all-zero daily precipitation yields finite, non-NaN, negative SPI values, confirming the extreme drought interpretation.
  • Performed minor whitespace/style cleanup in tests to align with formatting expectations.
tests/test_compute.py

Possibly linked issues

  • #Missing SPI values when there are 0mm precipitations: PR directly addresses SPI NaNs/missing values for zero precipitation that the issue reports, by fixing gamma handling.
  • #(no explicit ID provided): The PR changes gamma zero-handling to prevent NaN SPI outputs, directly addressing the null SPI values reported.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In transform_fitted_gamma, the all-zeros case is currently handled indirectly by forcing NaNs through the gamma pipeline and then relying on the probabilities_of_zero == 1.0 adjustment; consider an explicit early branch for the all-zero time-step case to make the behavior clearer and less dependent on downstream side effects.
  • The _replace_zeros_with_nan helper always copies the input array, and gamma_parameters/transform_fitted_gamma also call _validate_array which may already copy; consider documenting or optimizing this to avoid unnecessary duplication for large inputs.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `transform_fitted_gamma`, the all-zeros case is currently handled indirectly by forcing NaNs through the gamma pipeline and then relying on the `probabilities_of_zero == 1.0` adjustment; consider an explicit early branch for the all-zero time-step case to make the behavior clearer and less dependent on downstream side effects.
- The `_replace_zeros_with_nan` helper always copies the input array, and `gamma_parameters`/`transform_fitted_gamma` also call `_validate_array` which may already copy; consider documenting or optimizing this to avoid unnecessary duplication for large inputs.

## Individual Comments

### Comment 1
<location> `tests/test_compute.py:158-167` </location>
<code_context>
+def test_transform_fitted_gamma_all_zeros_produces_finite_spi():
</code_context>

<issue_to_address>
**issue (testing):** Test name/docstring imply finite SPI, but the assertions also allow -inf; align the test with the intended behavior

The MR description says all-zero time steps should yield `-∞` SPI, while the test name/docstring talk about "finite SPI" and "negative infinity or large negative values", so the assertions (`not np.any(np.isnan(result))` and `np.all(result < 0)`) don’t uniquely specify what’s correct.

Please either:
- Rename the test and update the docstring to explicitly allow `-inf`, or
- Tighten the assertions to enforce the exact contract (e.g. `np.all(np.isneginf(result))` if `-inf` is required, or explicitly allow "finite or -inf" and assert that via a mask).

This will make the expected behavior for all-zero histories precise and prevent regressions from slipping through.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +158 to +167
def test_transform_fitted_gamma_all_zeros_produces_finite_spi():
"""
Test that all-zero precipitation produces finite SPI values, not NaN.
When all precipitation values are zero, SPI should indicate extreme drought
(negative infinity or large negative values), not NaN.
"""
# one year of daily data (366 days)
n_years = 1
n_days_per_year = 366
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (testing): Test name/docstring imply finite SPI, but the assertions also allow -inf; align the test with the intended behavior

The MR description says all-zero time steps should yield -∞ SPI, while the test name/docstring talk about "finite SPI" and "negative infinity or large negative values", so the assertions (not np.any(np.isnan(result)) and np.all(result < 0)) don’t uniquely specify what’s correct.

Please either:

  • Rename the test and update the docstring to explicitly allow -inf, or
  • Tighten the assertions to enforce the exact contract (e.g. np.all(np.isneginf(result)) if -inf is required, or explicitly allow "finite or -inf" and assert that via a mask).

This will make the expected behavior for all-zero histories precise and prevent regressions from slipping through.

Replace exact equality check (== 1.0) with np.isclose() to handle floating-point arithmetic precision issues when detecting all-zero time steps. Prevents edge cases where probability_of_zero is 0.9999999... from being missed due to rounding errors.

Refs #533
@sonarqubecloud
Copy link

@monocongo monocongo merged commit f7c6d44 into master Dec 25, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants