Scripts and config updates for comparisons#329
Merged
sgreenbury merged 11 commits intomainfrom Apr 18, 2026
Merged
Conversation
Adds per-dataset local_experiment configs for the 4 target datasets: - cache_latents/<dataset>/cache_latents.yaml bakes in the datamodule plus encoder/decoder periodic + pixel_shuffle so they match the paired AE training config (ae/<dataset>/ae_dc_large.yaml) without bash overrides. - processor/<dataset>/fm_vit_large.yaml captures the reference FM-in-latent setup: ddp_4gpu, cached_latents datamodule, flow_matching_vit processor (hid_channels=640, flow_ode_steps=50), adamw_half (lr=1e-4, warmup=0), bs=256/GPU, float32_matmul_precision=high, val_metrics disabled.
Top-level folders under local_hydra/local_experiment/ now match the autocast CLI subcommands (ae/, cache_latents/, epd/, processor/) rather than mixing experiment types (crps/) with CLI kinds. This keeps ambient-space EPD variants (CRPS and upcoming FM-in-ambient) under epd/<dataset>/ and latent-space processor variants (FM and upcoming CRPS-in-latent) under processor/<dataset>/. No content changes — pure rename. Nothing has been launched against these configs yet, so the paths can change safely.
Adds processor/<dataset>/crps_vit_azula_large.yaml for the 4 target datasets: AzulaViTProcessor (hidden_dim=632, n_noise_channels=1024) trained on cached latents, with n_members=8 + AlphaFairCRPSLoss / AlphaFairCRPS (matches the ambient-space CRPS head under epd/<dataset>/crps_vit_azula_large.yaml). Configs are self-contained: ddp_4gpu_slurm, cached_latents datamodule, adamw_half (lr=2e-4, warmup=0), bs=32/GPU (ProcessorModelEnsemble expands the batch by n_members internally, so 32 mirrors the ambient-space sizing), float32_matmul_precision=high, val_metrics disabled.
Adds epd/<dataset>/fm_vit_large.yaml for the 4 target datasets: the same permute_concat encoder + channels_last decoder as ambient CRPS (epd/<dataset>/crps_vit_azula_large.yaml), but with FlowMatchingProcessor + ViT backbone in place of AzulaViTProcessor. Configs are self-contained: ddp_4gpu_slurm, dataset datamodule, adamw_half (lr=1e-4, warmup=0), bs=32/GPU, flow_ode_steps=50, hid_channels=640, patch_size=4 (keeps 16x16=256 tokens — architecture parity with vit_azula_large), train_in_latent_space=true (MSE on permute-concat features), val_metrics disabled. Model sizes across CRPS/FM and ambient/latent still need to be balanced — this commit just wires the variant up.
ProcessorModelEnsemble / EncoderProcessorDecoderEnsemble repeat the batch by n_members=8 internally, so the ambient CRPS baseline already runs at an effective 256 samples per GPU per step. FM/diffusion has no such multiplier and needs large raw batches to keep the velocity-field estimate low-variance — matching at bs=32 understated the FM budget. Setting FM-in-ambient bs=256/GPU so it matches CRPS effective batch. FM-in-latent was already at 256 (no change).
Adds slurm_scripts/comparison/ containing the full SLURM submission
tree for the CRPS/FM x ambient/latent x 4-datasets study:
slurm_scripts/comparison/
ae/ AE timing + 24h final runs
cache_latents/ cache generation + FM-in-latent + CRPS-in-latent
epd/ CRPS-in-ambient + FM-in-ambient (timing + large)
These live in-repo for provenance — which configs were submitted with
which AE run dirs, budgets, and overrides. Scripts are launched directly
from the repo (paths like $HOME/autocast/outputs/... are absolute, so
working directory doesn't matter).
Drops slurm_scripts from .gitignore so future comparison-study scripts
are tracked the same way. run_scripts stays ignored (used as a
/projects/ workspace symlink for outputs/artifacts).
…iants Sets all 16 local_experiment configs to DiT-canonical proportions (depth=12, heads=8, head_dim≈64) targeting ~80M processor params: CRPS (ambient + latent): hidden_dim=568, n_layers=12, num_heads=8 FM (ambient + latent): hid_channels=704, hid_blocks=12, attention_heads=8 Patch sizes: ambient variants: patch=4 -> 16x16=256 tokens on 64x64 input latent variants: patch=1 -> 8x8=64 tokens on 8x8 latent (CRPS-latent gets explicit patch_size: 1 override -- vit_azula_large defaults to 4, which gives only 4 tokens on an 8x8 latent) Verified param counts by instantiation (AzulaViTProcessor / TemporalViTBackbone with dc_large latent dims: 8 channels, 8x8 spatial): CRPS ambient 80.75M FM ambient 80.04M CRPS latent 80.72M FM latent 79.91M Adds slurm_scripts/comparison/README.md documenting the full study design: variant layout, submission order, model-size matrix, batch-parity rule, and the latent patch_size gotcha. Updates local_hydra/local_experiment/README.md with a short pointer to the comparison README.
Updates all 8 timing + large script headers to match the DiT-aligned model matrix (hid_channels=704/hid_blocks=12 for FM; hidden_dim=568/ n_layers=12/num_heads=8 for CRPS; n_members=8) and points at the authoritative yaml paths under local_hydra/local_experiment/<kind>/ <dataset>/ so future drift is caught by reading the config directly. submit_cache_latents.sh now falls back to the newest <ae_run_dir>/autocast/*/checkpoints/latest-*.ckpt when autoencoder.ckpt is not yet written. This unblocks the latent timing runs while AE training is still in progress -- compute throughput is independent of AE quality so a temp checkpoint is fine for timing. Final large/ runs must wait for autoencoder.ckpt (warning added inline).
Default trainer callbacks now include: - Rolling ModelCheckpoint with save_last: true (real file, not symlink), so last.ckpt always captures the final-epoch state even when every_n_epochs doesn't land on max_epochs. - Best-val ModelCheckpoint (monitor=val_loss), with save_last: false so it doesn't contend with the rolling callback for last.ckpt ownership. - EMACallback with decay=0.999. Half-life ~700 steps is a sensible fraction of our shortest runs (FM-ambient ~12k steps) without being wildly reactive on the longest (CRPS-latent ~420k steps). The four final-run submit scripts (FM/CRPS x ambient/latent large) now override the rolling callback to save at 25/50/75/100% of the cosine schedule (every_n_epochs = COSINE_EPOCHS / 4, save_top_k=-1). Combined with save_last: true on the rolling callback, this guarantees a final-state checkpoint plus three learning-curve snapshots per run.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces scripts and standardized experiment configs for each dataset and model variant, ensuring reproducibility and alignment. The changes also update documentation to guide users on the new configuration structure.
New Experiment Configurations
Cache Latents (Latent Space Preparation)
cache_latents.yamlconfigs for each dataset (advection_diffusion,conditioned_navier_stokes,gpe_laser_wake_only,gray_scott) to standardize latent caching with architectures matching the corresponding autoencoder training configs. [1] [2] [3] [4]Ambient-Space Experiments (
epd/)crps_vit_azula_large.yaml) and FM (fm_vit_large.yaml) configs for each dataset in theepd/directory, specifying model architectures, batch sizes, optimizer settings, and metrics for both training paradigms. [1] [2] [3] [4] [5] [6] [7] [8]Latent-Space Processor Experiments (
processor/)processor/directory, using cached latents as input and mirroring the ambient-space experiment structure for direct comparison. [1] [2] [3] [4] [5] [6] [7]