Add eval.mode selector for ambient vs latent rollout by sgreenbury · Pull Request #327 · alan-turing-institute/autocast

sgreenbury · 2026-04-17T11:23:13Z

Summary

Add an explicit eval.mode config key (auto | ambient | latent, default auto) to autocast eval so users can force the rollout regime used when evaluating an EPD stack against cached latents vs raw data.
When eval.mode=ambient is requested but the datamodule yields EncodedBatch (cached latents), auto-swap in the raw-data datamodule saved alongside the cache by autocast cache-latents (<cache_dir>/autoencoder_config.yaml). An explicit datamodule=... override still wins. Passing cached latents for latent / auto continues to work unchanged.
Validate loudly: unknown modes error early, and a resolved eval path that doesn't match the requested mode (e.g. ambient but we only have cached latents and no AE checkpoint) raises instead of silently falling back.
Add unit tests for mode normalization, path resolution, the validation branches, and the datamodule auto-swap (happy path + missing-config / missing-data-path errors).
Add an end-to-end invariant test that EncoderProcessorDecoder.rollout invokes the encoder once per rollout step (via a counting PermuteConcat wrapper). This pins the contract eval.mode=ambient rests on: each step decodes and re-encodes, so decode/encode drift is included in the metrics.
Document the new knob in src/autocast/configs/eval/README.md, including an ambient-vs-latent ablation recipe.

Why

Historically autocast eval on a processor checkpoint + cached latents rolls out entirely in latent space and only decodes at the end for metrics. That isn't apples-to-apples with models that natively roll out in data space (e.g. CRPS baselines). eval.mode=ambient makes the comparison flexible and ambient-vs-latent an easy ablation; auto preserves existing behaviour.

Test plan

pytest tests/models/test_encoder_processor_decoder.py tests/scripts/test_eval_encoder_processor_decoder.py (58 passed locally)
ruff check, ruff format, pyright via pre-commit
CI green on this PR
Follow-up: spot-check one real EPD checkpoint with eval.mode=ambient vs latent on a small cached-latents dataset to confirm the two paths produce different numbers (out of scope for this PR)

Processor checkpoints trained on cached latents were evaluated with rollout happening entirely in latent space, with the decoder applied only once before metrics. This made comparisons against data-space baselines (e.g. CRPS against a non-autoencoder model) unfair because decode/encode drift accumulated across rollout steps was hidden. Add an explicit eval.mode config (auto | ambient | latent) that forces the rollout regime and validates the resolved code path against the requested mode. When ambient is requested on a cached_latents datamodule, substitute the raw-data datamodule from autoencoder_config.yaml saved by autocast cache-latents so the encoder sees matching fields and normalization. Auto remains the default and preserves historical behavior.

Pin that EncoderProcessorDecoder.rollout invokes the encoder once per rollout step by counting encode calls via a wrapped PermuteConcat. This guards eval.mode=ambient against a silent regression where a refactor collapses the rollout into a latent-only loop: in that case ambient and latent eval would report identical numbers, so the ablation (the whole reason the mode exists) would be meaningless.

sgreenbury force-pushed the add-eval-modes branch from 92e64d6 to 2a4ce25 Compare April 19, 2026 07:27

sgreenbury added 2 commits April 19, 2026 07:33

sgreenbury force-pushed the add-eval-modes branch from 2a4ce25 to d53d411 Compare April 19, 2026 07:33

sgreenbury merged commit 9830e1d into main Apr 20, 2026
3 checks passed

sgreenbury deleted the add-eval-modes branch April 20, 2026 15:35

sgreenbury added a commit that referenced this pull request Apr 20, 2026

Update scripts and README following merging #327

5f0537d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add eval.mode selector for ambient vs latent rollout#327

Add eval.mode selector for ambient vs latent rollout#327
sgreenbury merged 2 commits intomainfrom
add-eval-modes

sgreenbury commented Apr 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sgreenbury commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sgreenbury commented Apr 17, 2026 •

edited

Loading