Add encode_once eval mode and auto dispatcher by sgreenbury · Pull Request #339 · alan-turing-institute/autocast

sgreenbury · 2026-04-21T14:20:15Z

Summary

Add a new evaluation mode, encode_once, that gives a fair apples-to-apples comparison for processors trained in latent space. The processor rolls out entirely in its native latent space (no decode/encode drift charged to it), but metrics are computed against the original raw denormalized ground truth -- so latent-rollout models can be scored directly against pure-ambient baselines without either side getting an unfair penalty or advantage.

New encode_once mode: encoder runs once on raw inputs, processor rolls out in latent space, decoder runs per step; metrics compare decoded predictions against denormalized raw batch.output_fields.
eval.mode=auto (new default) dispatches to the faithful concrete mode per run:
- full EPD / stateless AE -> ambient
- processor-only + autoencoder reachable -> encode_once
- processor-only + cached latents, no AE -> latent
Resolved mode is logged at INFO. encode_once on a full EPD aliases back to ambient with a warning (no separate latent rollout to isolate).
eval.latent_space_metrics flag replaces the previous silent "no decoder, compute in raw latent space" fallback. When eval.mode=latent cannot build a decoder the run now fails fast and asks the user to either fix the AE path or set eval.latent_space_metrics=true for a dev sense check. Rejected for auto/ambient/encode_once (those modes require a decoder by definition).
Docs, docstrings, and test parametrizations streamlined in a follow-up refactor commit.

See src/autocast/configs/eval/README.md for the full mode table and auto-dispatch rules.

Test plan

ruff check + ruff format --check clean
pyright clean
pytest tests/scripts/test_eval_encoder_processor_decoder.py -- 60 passed, including new coverage for
- auto dispatch across the (processor_only, batch_type, ae_ckpt) matrix
- encode_once path resolution and _resolve_eval_path matrix
- _validate_latent_space_metrics_flag rejection on auto/ambient/encode_once
- _require_decoder_unless_latent_metrics_opt_in fail-fast vs. opt-in warning paths
- _maybe_swap_to_ambient_datamodule swap for encode_once

Introduce a third eval path that encodes once, rolls out in latent space, then decodes against raw denormalized ground truth. This isolates processor error from autoencoder encode/decode drift while still scoring against real (not AE-reconstructed) truth. eval.mode now accepts auto|ambient|encode_once|latent, defaulting to auto. The auto dispatcher resolves to ambient for full EPD checkpoints, encode_once for processor-only runs that can build a decoder, and latent otherwise. encode_once on an EPD run is aliased back to ambient with a warning since the decoder is already in the loop.

Previously eval.mode=latent silently fell back to computing metrics directly in the autoencoder's raw latent space whenever no decoder could be built from the cached-latents directory. Those numbers look like evaluation results but are not comparable across runs (latent space is basis-dependent) and physics-aware metrics are not meaningful there. Remove the silent fallback. If eval.mode=latent resolves to the latent- only path, fail fast with a message pointing at the new opt-in flag eval.latent_space_metrics (default false). Set it to true alongside eval.mode=latent to skip the decoder entirely as a cheap dev sense-check for iterating on a small processor paired with an expensive autoencoder; a prominent warning flags the caveat. The flag is rejected for auto / ambient / encode_once because those modes require a decoder by definition.

Collapse narrative comments and docstrings that duplicated the eval README, inline the resolve-auto branches, and drop the unused RESOLVABLE_EVAL_MODES constant. Shorten user-facing error and warning strings to describe the current behaviour only. Fold two parametrized no-op tests into the nearest behavioural test so coverage is preserved while the suite is less noisy.

sgreenbury added 3 commits April 21, 2026 12:25

sgreenbury merged commit de86747 into main Apr 21, 2026
3 checks passed

sgreenbury deleted the 2026-04-20/encode-once-eval branch April 21, 2026 16:09

sgreenbury mentioned this pull request Apr 22, 2026

Refactor and update eval scripts and docs #342

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add encode_once eval mode and auto dispatcher#339

Add encode_once eval mode and auto dispatcher#339
sgreenbury merged 3 commits intomainfrom
2026-04-20/encode-once-eval

sgreenbury commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sgreenbury commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sgreenbury commented Apr 21, 2026 •

edited

Loading