Add 0.75 ensemble evaluation support and workflow enhancements by sgreenbury · Pull Request #343 · alan-turing-institute/autocast

sgreenbury · 2026-04-22T22:09:54Z

This pull request adds support for specifying a custom evaluation output subdirectory in the workflow, which allows evaluation results to be organized under different folders (e.g., for partial-schedule checkpoints). The main changes include updating the CLI, command logic, and tests to handle the new --output-subdir argument, and adding a new SLURM script for evaluating 75%-schedule checkpoints.

Evaluation output directory customization:

Added a --output-subdir argument (default: "eval") to the eval CLI command, allowing users to specify a custom subdirectory for evaluation outputs. [1] [2]
Updated build_eval_overrides and eval_command in commands.py to use the specified output_subdir instead of hardcoding "eval". [1] [2] [3]
Added and updated tests to verify that the custom output subdirectory is honored throughout the CLI and command layers. [1] [2] [3]

New evaluation script for partial checkpoints:

Added slurm_scripts/ablations/ensemble_size/eval_0p75/submit_eval_crps_ambient.sh, which evaluates the 75%-schedule (third quarter) checkpoints and outputs results to an isolated eval_0p75/ directory.
Updated the documentation in README.md to explain the new script and the organization of evaluation outputs for ensemble-size ablation runs. [1] [2]

Add a dedicated ensemble-size ambient eval submitter for the 75% quarter checkpoint and keep its outputs isolated under an eval_0p75 work directory. Extend the eval workflow CLI with an output subdirectory option so these partial-schedule runs can reuse the normal wrapper without mixing logs, videos, or metrics with the canonical final-checkpoint evals.

sgreenbury added 3 commits April 22, 2026 21:42

Remove bs32 from script

ca6a34d

Add comment on eval mode to script

b4d8be6

sgreenbury merged commit 7303f43 into main Apr 22, 2026
0 of 3 checks passed

sgreenbury deleted the add-eval-for-earlier-ckpt branch April 22, 2026 22:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 0.75 ensemble evaluation support and workflow enhancements#343

Add 0.75 ensemble evaluation support and workflow enhancements#343
sgreenbury merged 3 commits intomainfrom
add-eval-for-earlier-ckpt

sgreenbury commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sgreenbury commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant