Skip to content

Update SpreadSkillRatio docstrings and tests#338

Merged
sgreenbury merged 1 commit intomainfrom
add-ssr-tests
Apr 21, 2026
Merged

Update SpreadSkillRatio docstrings and tests#338
sgreenbury merged 1 commit intomainfrom
add-ssr-tests

Conversation

@sgreenbury
Copy link
Copy Markdown
Contributor

@sgreenbury sgreenbury commented Apr 21, 2026

Summary

  • Adds regression tests covering lead-time behaviour of SpreadSkillRatio (monotonic decrease when skill grows; near-1 for calibrated ensembles; stateful aggregation is mean-of-ratios).
  • Aligns SpreadSkillRatio.__init__ with other metrics by forwarding **kwargs so reduce_all can be configured at construction.
  • Clarifies the reduction order in the metric docstring (reduce variance/MSE, then sqrt, then divide).

Why

SSR behaviour/aggregation can be misinterpreted when debugging calibration vs coverage; the tests lock in the intended semantics and catch future accidental changes.

Note on production output

Existing SSR values in rollout/eval CSVs are unchanged by this PR. The **kwargs addition is a cleanliness change: src/autocast/scripts/eval/encoder_processor_decoder.py::_build_per_timestep_metric_factory already handled the missing kwarg via a TypeError fallback that instantiated the metric and set metric.reduce_all = False afterwards, which is behaviorally equivalent.

Test plan

  • uv run pytest tests/metrics/test_ensemble.py -k spread_skill_ratio

Updated the metric docstrings to explicitly describe the "mean of
per-sample ratios" aggregation convention. This prevents future
confusion about macroscopic computation differences and explicitly
links to the expected Lola behaviour.

Updated the `__init__` method to forward `**kwargs` to the base
class so that `reduce_all` can be passed correctly during metric
instantiation, standardizing the interface with other metrics.

Added three comprehensive tests:
- SSR monotonically decreases when skill grows and spread is fixed
- Calibrated ensemble has SSR near 1.0 (finite-ensemble correction)
- Stateful update() matches a mean-of-ratios (not macroscopic ratio)
@sgreenbury sgreenbury merged commit a14f458 into main Apr 21, 2026
3 checks passed
@sgreenbury sgreenbury deleted the add-ssr-tests branch April 21, 2026 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant