Skip to content

Commit 4fe6d1a

Browse files
authored
Merge pull request #340 from alan-turing-institute/2026-04-19/ablation-scripts
Update ablation scripts for ensemble and model size evaluations
2 parents de86747 + 017144d commit 4fe6d1a

30 files changed

Lines changed: 1454 additions & 0 deletions
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# @package _global_
2+
defaults:
3+
- /local_experiment/epd/conditioned_navier_stokes/crps_vit_azula_large
4+
- _self_
5+
6+
experiment_name: ablation_model_size_crps_vit_azula_0p4x_conditioned_navier_stokes
7+
8+
datamodule:
9+
# Keep effective per-GPU batch at 256 after increasing n_members 8 -> 16.
10+
batch_size: 16
11+
12+
model:
13+
n_members: 16
14+
processor:
15+
# Aspect-preserving ~0.39x variant (depth 8, width aspect-matched to
16+
# baseline 568/12 = 47.3). Measured at ~31.6M processor params versus
17+
# ~80.8M for the baseline 568/12/8 model.
18+
hidden_dim: 376
19+
n_layers: 8
20+
num_heads: 8
21+
n_noise_channels: 1024
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# @package _global_
2+
defaults:
3+
- /local_experiment/epd/conditioned_navier_stokes/crps_vit_azula_large
4+
- _self_
5+
6+
experiment_name: ablation_model_size_crps_vit_azula_2x_conditioned_navier_stokes
7+
8+
datamodule:
9+
# Keep effective per-GPU batch at 256 after increasing n_members 8 -> 16.
10+
batch_size: 16
11+
12+
model:
13+
n_members: 16
14+
processor:
15+
# Measured at ~169.3M processor params in the CNS ambient setup, versus
16+
# ~80.8M for the baseline 568/12/8 model (~2.09x).
17+
hidden_dim: 768
18+
n_layers: 16
19+
num_heads: 8
20+
n_noise_channels: 1024
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# @package _global_
2+
defaults:
3+
- /local_experiment/epd/conditioned_navier_stokes/fm_vit_large
4+
- _self_
5+
6+
experiment_name: ablation_model_size_fm_vit_0p4x_conditioned_navier_stokes
7+
8+
model:
9+
processor:
10+
backbone:
11+
# Aspect-preserving ~0.32x variant (depth 8, width aspect-matched to
12+
# baseline 704/12 = 58.7). Measured at ~25.6M processor params versus
13+
# ~80.0M for the baseline 704/12/8 backbone.
14+
hid_channels: 472
15+
hid_blocks: 8
16+
attention_heads: 8
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# @package _global_
2+
defaults:
3+
- /local_experiment/epd/conditioned_navier_stokes/fm_vit_large
4+
- _self_
5+
6+
experiment_name: ablation_model_size_fm_vit_2x_conditioned_navier_stokes
7+
8+
model:
9+
processor:
10+
backbone:
11+
# Measured at ~168.6M processor params in the CNS ambient setup, versus
12+
# ~80.3M for the baseline 704/12/8 backbone (~2.10x).
13+
hid_channels: 896
14+
hid_blocks: 16
15+
attention_heads: 8

slurm_scripts/ablations/README.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Ablations
2+
3+
Sensitivity sweeps, comparisons, and ablations that sit on top of the main
4+
4-dataset comparison in `slurm_scripts/comparison/`. "Ablation" is used
5+
loosely here for all three — true ablations (EMA on/off), comparisons
6+
(FM vs diffusion, ViT vs U-Net), and sweeps (ensemble size, noise
7+
channels) — to match how ML papers usually label this section.
8+
9+
Most ablations are still **CNS-only for now**. The current exception is
10+
`ensemble_size` under the `eff_bs1024` regime, which now extends to the
11+
other three main comparison datasets (`gray_scott`,
12+
`gpe_laser_only_wake`, `advection_diffusion`) in addition to CNS. Each
13+
script keeps dataset coverage local so widening an ablation remains a
14+
small edit.
15+
16+
## Status table
17+
18+
| ablation | type | datasets | runs | status |
19+
|---|---|---|---|---|
20+
| ensemble_size (m=16, fixed bs=32) | sweep | CNS | 1 | ready |
21+
| ensemble_size (m=16, fixed global eff. bs=1024) | sweep | GS / GPE / CNS / AD | 4 | timing ready |
22+
| noise_channels | sweep | CNS | 1+ | stub |
23+
| crps_variants (AlphaFair / Fair / CRPS) | comparison | CNS | 3 | stub |
24+
| fm_vs_diffusion | comparison | CNS | 1 | stub |
25+
| arch_unet_fno_vit | comparison | CNS | 2 | stub |
26+
| model_size | sweep | CNS | 2 | ready |
27+
| cached_latent_crps | comparison | CNS | 1 (done, 2026-04-19) | stub |
28+
| cond_global_vs_permute | comparison | CNS | 1 (done for CRPS-ViT, 2026-04-18) | stub |
29+
| eval_only/ode_steps | eval-only | FM runs | 0 | stub |
30+
| eval_only/ema | eval-only | EMA ckpts | 0 | stub |
31+
32+
"Done" entries refer to runs already produced by
33+
`slurm_scripts/comparison/` that double as the CNS data point for this
34+
ablation — no new training required, but they should be eval'd through
35+
the same pipeline.
36+
37+
## Design notes
38+
39+
- **Flexible by construction.** Each ablation is a self-contained
40+
subdirectory. Changing the knob values, swapping to a different
41+
baseline, or dropping an ablation is a localized edit. Dataset
42+
coverage lives inside each ablation's submit scripts, so extending one
43+
ablation does not spill into the others.
44+
- **Baselines stay in `local_hydra/local_experiment/{epd,processor}/`.**
45+
Ablation configs extend those via Hydra `defaults`. When the sweep is
46+
a one-liner (e.g. ensemble size → `model.n_members` +
47+
`datamodule.batch_size`), the submit script uses CLI overrides and no
48+
new config file is created. When the ablation materially changes the
49+
architecture (model size, arch comparison), each variant gets its own
50+
yaml under `local_hydra/local_experiment/ablations/<name>/<dataset>/`.
51+
- **Timing first, then 24h schedule.** Same two-step pattern as
52+
`slurm_scripts/comparison/`: each ablation has a `*_timing.sh` (5-epoch
53+
run → `timing.ckpt`) and a `*_large.sh` (24h run with cosine epochs
54+
computed from timing).
55+
56+
## Submission workflow
57+
58+
1. `submit_*_timing.sh` — 5-epoch timing runs, producing `timing.ckpt`.
59+
2. Extract per-combo `cosine_epochs` via
60+
`uv run autocast time-epochs --from-checkpoint <path>/timing.ckpt -b 24`
61+
and paste into `submit_*_large.sh` (matches `comparison/` flow).
62+
3. `submit_*_large.sh` — 24h production runs, dry-run first.
63+
4. Eval from the script local to the study:
64+
`slurm_scripts/comparison/eval/` for the canonical comparison suite, and
65+
`slurm_scripts/ablations/<name>/eval/` for ablation-only run sets that have
66+
not been promoted into the main comparison yet.
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Architecture comparison: U-Net, FNO, ViT
2+
3+
Compare U-Net and FNO backbones against the ViT (Azula) baseline on the
4+
CRPS ambient path.
5+
6+
**Status:** stub — no scripts yet.
7+
8+
## Baseline
9+
10+
`local_hydra/local_experiment/epd/conditioned_navier_stokes/crps_vit_azula_large.yaml`
11+
(ViT-Azula, ~81M params).
12+
13+
## Knob
14+
15+
Swap `model.processor` backbone while trying to match parameter count
16+
(~80M) and per-epoch budget. Candidate configs to crib from:
17+
18+
- `local_hydra/local_experiment/epd_crps_unet_azula.yaml` — U-Net +
19+
CRPS.
20+
- `local_hydra/local_experiment/epd_crps_fno.yaml` — FNO + CRPS.
21+
22+
Each will need per-CNS `local_experiment/ablations/arch/<arch>.yaml`
23+
that matches the ambient baseline's encoder/decoder/loss so only the
24+
backbone varies.
25+
26+
## Datasets
27+
28+
CNS only for now. Table says 2 datasets × 2 non-ViT archs = 4 runs
29+
(CNS gives 2: U-Net and FNO).
30+
31+
## Outstanding decisions
32+
33+
- How to match parameter count across architectures — the comparison
34+
table for the main study (see `slurm_scripts/comparison/README.md`)
35+
locked ~80M for ViT variants; we need equivalent targets for U-Net
36+
and FNO.
37+
- Whether FNO needs a different patch-size / token structure.
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Cached-latent CRPS
2+
3+
CRPS loss trained in cached-latent space (processor-only training on
4+
pre-encoded latents, decoded only at eval time).
5+
6+
**Status:** CNS data point exists —
7+
`outputs/2026-04-19/crps_cns64_vit_azula_large_58712c4_71ba7be`.
8+
No new training script needed for this pass; eval is handled by
9+
`slurm_scripts/comparison/eval/submit_eval_crps_latent.sh`.
10+
11+
## Baseline
12+
13+
`local_hydra/local_experiment/processor/conditioned_navier_stokes/crps_vit_azula_large.yaml`.
14+
15+
## Next steps
16+
17+
- When the second dataset is added, extend the `DATASETS` map in
18+
`submit_eval_crps_latent.sh` and submit a matching training run via
19+
`slurm_scripts/comparison/cached_latents/submit_crps_latent_large.sh`.
20+
- Decide whether to include `eval.mode=latent` ablation alongside
21+
`eval.mode=ambient` for this ablation specifically — it answers "how
22+
much of the latent-CRPS gap is decode/encode drift?".
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Conditioning: global_cond (AdaLN) vs permute_concat
2+
3+
Swap the CRPS ambient conditioning path from `permute_concat` (spatial
4+
channel concatenation) to `identity` encoder + `include_global_cond:
5+
true` (AdaLN modulation on the backbone). Makes conditioning flow match
6+
FM ambient, isolating the encoder effect.
7+
8+
**Status:** CNS data point exists for CRPS-ViT —
9+
`outputs/2026-04-18/crps_cns64_vit_azula_large_0f89f06_cf53b48`. No new
10+
CRPS-ViT training needed for this pass; U-Net equivalent is pending.
11+
12+
## Baselines
13+
14+
- CRPS-ViT with identity+global_cond:
15+
`local_hydra/local_experiment/epd/conditioned_navier_stokes/crps_vit_azula_large_identity_global_cond.yaml`.
16+
- CRPS-ViT with permute_concat (main baseline):
17+
`.../crps_vit_azula_large.yaml`.
18+
19+
## Outstanding
20+
21+
- U-Net analogue: need `crps_unet_large_identity_global_cond.yaml`
22+
mirroring the ViT ablation. U-Net backbone `include_global_cond` path
23+
to be verified against
24+
`src/autocast/processors/` U-Net module.
25+
- Eval for the existing CNS ViT ablation run is covered by
26+
`slurm_scripts/comparison/eval/submit_eval_crps_ambient.sh` (included
27+
in its RUN_DIRS).
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# CRPS loss variants
2+
3+
Compare `AlphaFairCRPS` (baseline) vs `FairCRPS` vs `CRPS`.
4+
5+
**Status:** stub — no scripts yet.
6+
7+
## Baseline
8+
9+
`local_hydra/local_experiment/epd/conditioned_navier_stokes/crps_vit_azula_large.yaml`
10+
(uses `AlphaFairCRPSLoss`).
11+
12+
## Knob
13+
14+
Swap `model.loss_func._target_` and the matching `train_metrics.crps`
15+
target:
16+
17+
| variant | loss_func | metric |
18+
|---|---|---|
19+
| AlphaFairCRPS (baseline) | `autocast.losses.ensemble.AlphaFairCRPSLoss` | `autocast.metrics.ensemble.AlphaFairCRPS` |
20+
| FairCRPS | `autocast.losses.ensemble.FairCRPSLoss` | `autocast.metrics.ensemble.FairCRPS` |
21+
| CRPS | `autocast.losses.ensemble.CRPSLoss` | `autocast.metrics.ensemble.CRPS` |
22+
23+
Exact class paths to be verified against
24+
`src/autocast/losses/ensemble.py` and `metrics/ensemble.py` before
25+
scripting.
26+
27+
## Datasets
28+
29+
CNS only for now. Table spec'd 2 datasets × 3 losses = 6 runs — CNS
30+
gives us 3 runs for this pass.
31+
32+
## Implementation sketch
33+
34+
Single-file sweep via CLI overrides in `submit_crps_variants_*.sh` with
35+
a `LOSSES` array of `(name, loss_target, metric_target)` triples.
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Ensemble size ablation
2+
3+
First-pass defaults focus on `n_members=16` under two batch-size
4+
regimes. For the current submission pass, the active scripts are pared
5+
down to just three `eff_bs1024` runs on `gray_scott`,
6+
`gpe_laser_only_wake`, and `advection_diffusion`; the CNS entries and
7+
`fixed_bs32` combo are left commented for later reuse. All runs inherit
8+
from the matching per-dataset
9+
`local_hydra/local_experiment/epd/<dataset>/crps_vit_azula_large.yaml`;
10+
the ablation is a pure CLI override on `model.n_members` +
11+
`datamodule.batch_size`, so no new experiment configs are needed.
12+
13+
## Knob map
14+
15+
Main baseline is `bs_crps=32 × n_members=8 × 4 GPUs = 1024 global
16+
effective` (i.e. `256 effective per-GPU`).
17+
18+
### Fixed batch size = 32/GPU (same as baseline)
19+
20+
Keep `datamodule.batch_size=32` and set `n_members=16`.
21+
This doubles effective batch vs baseline.
22+
23+
| n_members | bs_per_gpu | effective per-GPU | effective global |
24+
|---:|---:|---:|---:|
25+
| 16 | 32 | 512 | 2048 |
26+
27+
### Fixed global effective batch = 1024 (matches baseline compute budget)
28+
29+
Keep `bs_crps × n_members × 4 GPUs = 1024`. With `n_members=16`,
30+
`bs_per_gpu=16`.
31+
32+
| n_members | bs_per_gpu | effective per-GPU | effective global |
33+
|---:|---:|---:|---:|
34+
| 16 | 16 | 256 | 1024 |
35+
36+
## Dataset coverage
37+
38+
| dataset | `fixed_bs32` | `eff_bs1024` |
39+
|---|---:|---:|
40+
| `conditioned_navier_stokes` | yes | yes |
41+
| `gray_scott` | no | yes |
42+
| `gpe_laser_only_wake` | no | yes |
43+
| `advection_diffusion` | no | yes |
44+
45+
This keeps the original CNS pilot in reserve while the active submit
46+
scripts target only the three compute-matched (`1024` effective global
47+
batch) CRPS ablations on the other comparison datasets.
48+
49+
## Files
50+
51+
| file | purpose |
52+
|---|---|
53+
| `submit_ensemble_timing.sh` | 5-epoch timing for the three active `eff_bs1024` runs (`gray_scott`, `gpe_laser_only_wake`, `advection_diffusion`) → `timing.ckpt` per run |
54+
| `submit_ensemble_large.sh` | 24h production runs for the same three active runs, using cached or timing-derived cosine schedules |
55+
| `eval/submit_eval_crps_ambient.sh` | ambient eval for the current `m=16` CRPS run set (CNS `fixed_bs32` pilot plus all available `eff_bs1024` runs), with conservative `eval.batch_size=4` and explicit `eval.n_members=10` to match the comparison-study eval regime |
56+
57+
## Extending the sweep
58+
59+
Add more lines to `COMBOS` in both submit scripts. Invariants are checked
60+
per regime so bad tuples fail fast before any submission:
61+
62+
- `fixed_bs32`: require `bs_per_gpu=32`; vary `n_members`.
63+
- `eff_bs1024`: require `bs_per_gpu × n_members × 4 GPUs = 1024`.
64+
65+
Dataset coverage is controlled separately via `REGIMES_BY_DATASET` in
66+
each submit script, so extending `eff_bs1024` without broadening
67+
`fixed_bs32` is a one-line change per dataset.
68+
69+
## Eval placement
70+
71+
Ensemble-size eval now lives under `slurm_scripts/ablations/ensemble_size/eval/`
72+
rather than `slurm_scripts/comparison/eval/`. The reason is organizational:
73+
the run set is still partly ablation-only (`fixed_bs32`) even though the
74+
`eff_bs1024` subset may later graduate into the main comparison baseline.
75+
76+
If that promotion happens, move the promoted run dirs into a comparison-level
77+
eval script and leave only the genuinely ablation-only runs here.
78+
79+
## Scheduling
80+
81+
`submit_ensemble_large.sh` first checks `COSINE_EPOCHS_BY_COMBO`. If a
82+
key is missing, it looks for the matching timing run
83+
`outputs/*/crps_<dataset>_<regime>_m<n_members>/timing.ckpt` and derives
84+
`trainer.max_epochs` on the fly with:
85+
86+
`uv run autocast time-epochs --from-checkpoint <path>/timing.ckpt -b 24 -m 0.02`
87+
88+
That means the added `gray_scott`, `gpe_laser_only_wake`, and
89+
`advection_diffusion` `eff_bs1024` runs become submit-ready as soon as
90+
their timing jobs finish, without another script edit.

0 commit comments

Comments
 (0)