Update SpreadSkillRatio docstrings and tests by sgreenbury · Pull Request #338 · alan-turing-institute/autocast

sgreenbury · 2026-04-21T07:54:22Z

Summary

Adds regression tests covering lead-time behaviour of SpreadSkillRatio (monotonic decrease when skill grows; near-1 for calibrated ensembles; stateful aggregation is mean-of-ratios).
Aligns SpreadSkillRatio.__init__ with other metrics by forwarding **kwargs so reduce_all can be configured at construction.
Clarifies the reduction order in the metric docstring (reduce variance/MSE, then sqrt, then divide).

Why

SSR behaviour/aggregation can be misinterpreted when debugging calibration vs coverage; the tests lock in the intended semantics and catch future accidental changes.

Note on production output

Existing SSR values in rollout/eval CSVs are unchanged by this PR. The **kwargs addition is a cleanliness change: src/autocast/scripts/eval/encoder_processor_decoder.py::_build_per_timestep_metric_factory already handled the missing kwarg via a TypeError fallback that instantiated the metric and set metric.reduce_all = False afterwards, which is behaviorally equivalent.

Test plan

uv run pytest tests/metrics/test_ensemble.py -k spread_skill_ratio

Updated the metric docstrings to explicitly describe the "mean of per-sample ratios" aggregation convention. This prevents future confusion about macroscopic computation differences and explicitly links to the expected Lola behaviour. Updated the `__init__` method to forward `**kwargs` to the base class so that `reduce_all` can be passed correctly during metric instantiation, standardizing the interface with other metrics. Added three comprehensive tests: - SSR monotonically decreases when skill grows and spread is fixed - Calibrated ensemble has SSR near 1.0 (finite-ensemble correction) - Stateful update() matches a mean-of-ratios (not macroscopic ratio)

sgreenbury force-pushed the add-ssr-tests branch from 879768d to dd6f1e2 Compare April 21, 2026 08:20

sgreenbury merged commit a14f458 into main Apr 21, 2026
3 checks passed

sgreenbury deleted the add-ssr-tests branch April 21, 2026 08:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update SpreadSkillRatio docstrings and tests#338

Update SpreadSkillRatio docstrings and tests#338
sgreenbury merged 1 commit intomainfrom
add-ssr-tests

sgreenbury commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sgreenbury commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Note on production output

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sgreenbury commented Apr 21, 2026 •

edited

Loading