-
Notifications
You must be signed in to change notification settings - Fork 174
Fix SPI Gamma calculation for zero inputs (Issue #533) #594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Prevent in-place modification of input array in gamma_parameters - Correctly handle zero values in transform_fitted_gamma by masking and explicit probability assignment - Fixes #533
Reviewer's GuideAdjusts SPI Gamma computation to avoid mutating input arrays and to correctly handle zero-valued inputs, preventing NaNs in SPI results and tightening tests accordingly. Sequence diagram for updated SPI Gamma transform handling zero inputssequenceDiagram
participant Caller
participant transform_fitted_gamma
participant gamma_parameters
participant scipy_stats_gamma
Caller->>transform_fitted_gamma: transform_fitted_gamma(values, alphas, betas, ...)
activate transform_fitted_gamma
transform_fitted_gamma->>transform_fitted_gamma: compute zeros = (values == 0).sum(axis=0)
transform_fitted_gamma->>transform_fitted_gamma: probabilities_of_zero = zeros / values.shape[0]
transform_fitted_gamma->>transform_fitted_gamma: values_for_fitting = values.copy()
transform_fitted_gamma->>transform_fitted_gamma: zero_mask = (values == 0)
transform_fitted_gamma->>transform_fitted_gamma: values_for_fitting[zero_mask] = np.nan
alt alphas or betas not provided
transform_fitted_gamma->>gamma_parameters: gamma_parameters(values_for_fitting, ...)
activate gamma_parameters
gamma_parameters-->>transform_fitted_gamma: alphas, betas
deactivate gamma_parameters
end
transform_fitted_gamma->>scipy_stats_gamma: gamma.cdf(values_for_fitting, a=alphas, scale=betas)
activate scipy_stats_gamma
scipy_stats_gamma-->>transform_fitted_gamma: gamma_probabilities
deactivate scipy_stats_gamma
transform_fitted_gamma->>transform_fitted_gamma: gamma_probabilities[zero_mask] = 0.0
transform_fitted_gamma-->>Caller: transformed SPI values
deactivate transform_fitted_gamma
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 3 issues, and left some high level feedback:
- There is duplicated logic for handling zeros and NaNs between
gamma_parametersandtransform_fitted_gamma; consider centralizing this behavior in a shared helper to reduce the chance of future divergence in how zero values are treated. - In the updated
test_transform_fitted_gamma, you only assert that SPI values for zero precipitation are non-NaN; it may be more robust to also assert something about their expected magnitude or sign so regressions in the actual numerical result for zeros are caught, not just NaN propagation.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- There is duplicated logic for handling zeros and NaNs between `gamma_parameters` and `transform_fitted_gamma`; consider centralizing this behavior in a shared helper to reduce the chance of future divergence in how zero values are treated.
- In the updated `test_transform_fitted_gamma`, you only assert that SPI values for zero precipitation are non-NaN; it may be more robust to also assert something about their expected magnitude or sign so regressions in the actual numerical result for zeros are caught, not just NaN propagation.
## Individual Comments
### Comment 1
<location> `src/climate_indices/compute.py:913-921` </location>
<code_context>
zeros = (values == 0).sum(axis=0)
probabilities_of_zero = zeros / values.shape[0]
+ # create a working copy to avoid modifying the input array
+ # and to safely replace zeros with NaNs for fitting/CDF computation
+ values_for_fitting = values.copy()
+
+ # store mask of zero values
+ zero_mask = (values == 0)
+
# replace zeros with NaNs
</code_context>
<issue_to_address>
**suggestion (performance):** You can reuse a single zero mask instead of recomputing the equality operation.
You compute `values == 0` twice: for `zeros` and for `zero_mask`. Instead, compute it once as `zero_mask = (values == 0)` and then use `zeros = zero_mask.sum(axis=0)` to avoid redundant work and keep zero-handling logic centralized.
</issue_to_address>
### Comment 2
<location> `src/climate_indices/compute.py:937-942` </location>
<code_context>
# find the gamma probability values using the gamma CDF
- gamma_probabilities = scipy.stats.gamma.cdf(values, a=alphas, scale=betas)
+ gamma_probabilities = scipy.stats.gamma.cdf(values_for_fitting, a=alphas, scale=betas)
+
+ # where the input values were zero the CDF will have returned NaN, but since
+ # we're treating zeros as a separate probability mass we should treat the
+ # gamma probability for zeros as 0.0
+ gamma_probabilities[zero_mask] = 0.0
# TODO explain this better
</code_context>
<issue_to_address>
**question (bug_risk):** Clarify how this zero handling interacts with columns where the gamma fit is ill-defined (e.g., all-zero data).
For all-zero columns, `values_for_fitting` becomes all-NaN, so the fitted `alphas`/`betas` (and thus `gamma.cdf`) are likely all-NaN. With `zero_mask` true everywhere, `gamma_probabilities[zero_mask] = 0.0` then forces the entire column to 0.0 instead of NaN. This alters the current behavior and may break any downstream logic that interprets NaNs as "no valid fit," so it’s worth confirming this is the desired treatment for all-zero / unfit columns.
</issue_to_address>
### Comment 3
<location> `tests/test_compute.py:95-98` </location>
<code_context>
+ err_msg="Transformed gamma fitted daily values mismatch on valid fixture values"
)
+ # Check that values where input was zero are NOT NaN in computed result
+ mask_zeros = (precips_mm_daily == 0)
+ assert not np.any(np.isnan(computed_values[mask_zeros])), \
+ "Computed SPI should not be NaN for zero precipitation"
+
# confirm that we can call with a calibration period out of the valid range
</code_context>
<issue_to_address>
**suggestion (testing):** Add a dedicated test case for the all-zeros (or mostly-zeros) edge case to more directly exercise the bugfix.
The current assertion only checks zero handling for this fixture. The regression here was specifically about zero-precipitation inputs producing NaNs when zeros dominate or all values are zero. To capture that explicitly and protect against regressions, consider a focused test such as:
```python
def test_transform_fitted_gamma_all_zeros_produces_finite_spi():
values = np.zeros((N, 1), dtype=float) # small N
result = compute.transform_fitted_gamma(
values,
data_start_year,
calibration_year_start_daily,
calibration_year_end_daily,
compute.Periodicity.daily,
)
assert not np.any(np.isnan(result)), "SPI should not be NaN when all inputs are zero"
```
This directly guards the `probability_of_zero == 1.0` case and makes future refactors less likely to reintroduce NaNs there.
Suggested implementation:
```python
# confirm that we can call with a calibration period out of the valid range
# and as a result use the full period of record as the calibration period instead
computed_values = compute.transform_fitted_gamma(
def test_transform_fitted_gamma_all_zeros_produces_finite_spi():
N = 10
values = np.zeros((N, 1), dtype=float)
result = compute.transform_fitted_gamma(
values,
data_start_year,
calibration_year_start_daily,
calibration_year_end_daily,
compute.Periodicity.daily,
)
assert not np.any(np.isnan(result)), \
"SPI should not be NaN when all inputs are zero"
```
The new test `test_transform_fitted_gamma_all_zeros_produces_finite_spi` must be defined at module scope (not nested inside another test). If the search/replace context above lands inside an existing test function, please move this new test so that it sits alongside the other `transform_fitted_gamma` tests (e.g., immediately after the existing daily gamma test function).
This test assumes that `np`, `compute`, `data_start_year`, `calibration_year_start_daily`, and `calibration_year_end_daily` are already available in this module as in the surrounding tests. If any of these are provided via fixtures or differently named constants, adjust the arguments accordingly.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| zeros = (values == 0).sum(axis=0) | ||
| probabilities_of_zero = zeros / values.shape[0] | ||
|
|
||
| # create a working copy to avoid modifying the input array | ||
| # and to safely replace zeros with NaNs for fitting/CDF computation | ||
| values_for_fitting = values.copy() | ||
|
|
||
| # store mask of zero values | ||
| zero_mask = (values == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (performance): You can reuse a single zero mask instead of recomputing the equality operation.
You compute values == 0 twice: for zeros and for zero_mask. Instead, compute it once as zero_mask = (values == 0) and then use zeros = zero_mask.sum(axis=0) to avoid redundant work and keep zero-handling logic centralized.
| gamma_probabilities = scipy.stats.gamma.cdf(values_for_fitting, a=alphas, scale=betas) | ||
|
|
||
| # where the input values were zero the CDF will have returned NaN, but since | ||
| # we're treating zeros as a separate probability mass we should treat the | ||
| # gamma probability for zeros as 0.0 | ||
| gamma_probabilities[zero_mask] = 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (bug_risk): Clarify how this zero handling interacts with columns where the gamma fit is ill-defined (e.g., all-zero data).
For all-zero columns, values_for_fitting becomes all-NaN, so the fitted alphas/betas (and thus gamma.cdf) are likely all-NaN. With zero_mask true everywhere, gamma_probabilities[zero_mask] = 0.0 then forces the entire column to 0.0 instead of NaN. This alters the current behavior and may break any downstream logic that interprets NaNs as "no valid fit," so it’s worth confirming this is the desired treatment for all-zero / unfit columns.
| # Check that values where input was zero are NOT NaN in computed result | ||
| mask_zeros = (precips_mm_daily == 0) | ||
| assert not np.any(np.isnan(computed_values[mask_zeros])), \ | ||
| "Computed SPI should not be NaN for zero precipitation" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Add a dedicated test case for the all-zeros (or mostly-zeros) edge case to more directly exercise the bugfix.
The current assertion only checks zero handling for this fixture. The regression here was specifically about zero-precipitation inputs producing NaNs when zeros dominate or all values are zero. To capture that explicitly and protect against regressions, consider a focused test such as:
def test_transform_fitted_gamma_all_zeros_produces_finite_spi():
values = np.zeros((N, 1), dtype=float) # small N
result = compute.transform_fitted_gamma(
values,
data_start_year,
calibration_year_start_daily,
calibration_year_end_daily,
compute.Periodicity.daily,
)
assert not np.any(np.isnan(result)), "SPI should not be NaN when all inputs are zero"This directly guards the probability_of_zero == 1.0 case and makes future refactors less likely to reintroduce NaNs there.
Suggested implementation:
# confirm that we can call with a calibration period out of the valid range
# and as a result use the full period of record as the calibration period instead
computed_values = compute.transform_fitted_gamma(
def test_transform_fitted_gamma_all_zeros_produces_finite_spi():
N = 10
values = np.zeros((N, 1), dtype=float)
result = compute.transform_fitted_gamma(
values,
data_start_year,
calibration_year_start_daily,
calibration_year_end_daily,
compute.Periodicity.daily,
)
assert not np.any(np.isnan(result)), \
"SPI should not be NaN when all inputs are zero"The new test test_transform_fitted_gamma_all_zeros_produces_finite_spi must be defined at module scope (not nested inside another test). If the search/replace context above lands inside an existing test function, please move this new test so that it sits alongside the other transform_fitted_gamma tests (e.g., immediately after the existing daily gamma test function).
This test assumes that np, compute, data_start_year, calibration_year_start_daily, and calibration_year_end_daily are already available in this module as in the surrounding tests. If any of these are provided via fixtures or differently named constants, adjust the arguments accordingly.



Description
Fixes a bug where SPI calculation would return
NaNfor input values of0.0when using the Gamma distribution.Fix Details
src/climate_indices/compute.py:gamma_parametersto operate on a copy of the input array, preventing side effects (in-place modification) on the user's data.transform_fitted_gammato use explicit masking for zero values. It now correctly treats the zero-probability mass by forcing the Gamma CDF contribution to0.0for zero inputs, preventingNaNpropagation.Verification
tests/test_compute.pyto verify that zero inputs now produce valid SPI values.Resolves #533
Summary by Sourcery
Fix SPI Gamma transformation to handle zero inputs without mutating caller data or producing NaNs.
Bug Fixes:
Tests: