Update eval scripts and default metrics by sgreenbury · Pull Request #335 · alan-turing-institute/autocast

sgreenbury · 2026-04-20T16:10:53Z

This pull request introduces a comprehensive set of evaluation scripts and configuration updates to support systematic comparisons. It adds six SLURM submission scripts for running evaluations in both ambient and latent rollout modes, expands the set of evaluation metrics across all relevant config files, and provides detailed documentation for running and interpreting these evaluations.

Major changes include:

1. New Evaluation Scripts for SLURM Batch Submission

Added six bash scripts under slurm_scripts/comparison/eval/ to automate evaluation of CRPS and FM models in both ambient and cached-latent (latent and ambient rollout) settings. Each script supports dry-run and real submission modes, handles batch size and metrics configuration, and manages AE checkpoint mapping for latent models. [1] [2] [3] [4] [5] [6]

2. Documentation for Evaluation Procedures

Added a detailed README.md in the evaluation scripts directory. This documentation explains the rationale for batch sizes, the use of eval.mode, the structure and independence of scripts, and instructions for dry-run and submission.

3. Expanded Evaluation Metrics in Configurations

Updated all relevant evaluation config files (default.yaml, encoder_processor_decoder.yaml, processor.yaml) to include a more comprehensive set of metrics (e.g., nmse, nmae, nrmse, vmse, linf, winkler, etc.), ensuring consistency and richer evaluation outputs across model types and rollout modes. [1] [2] [3] [4] [5]

4. Minor Codebase Improvement

Added a missing import contextlib in src/autocast/scripts/eval/encoder_processor_decoder.py to support context management in the evaluation CLI.

These changes collectively enable robust, reproducible, and well-documented evaluation of both ambient and latent models, facilitating apples-to-apples comparison and more informative benchmarking.

…point

sgreenbury added 11 commits April 19, 2026 07:33

Initial eval scripts and README

ab9ba0b

Fix latent eval scripts

e76b193

Add funcs to load aeconfig, inject missing encoder/decoder from check…

673377a

…point

Set default eval.devices to 1 in eval if not specified

708614d

Fix with overrides required for eval

67277de

Add latent rollout script

ffc10b5

Update eval metrics and defaults

e964947

Add latent rollout script versions

e3164de

Merge remote-tracking branch 'origin/main' into 2026-04-19/eval-scripts

5849477

Update gitignore

3af85f3

Update scripts and README following merging #327

5f0537d

sgreenbury merged commit 724223d into main Apr 20, 2026
3 checks passed

sgreenbury deleted the 2026-04-19/eval-scripts branch April 20, 2026 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update eval scripts and default metrics#335

Update eval scripts and default metrics#335
sgreenbury merged 11 commits intomainfrom
2026-04-19/eval-scripts

sgreenbury commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sgreenbury commented Apr 20, 2026

1. New Evaluation Scripts for SLURM Batch Submission

2. Documentation for Evaluation Procedures

3. Expanded Evaluation Metrics in Configurations

4. Minor Codebase Improvement

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant