Add normalisation to ST dataset and data module by radka-j · Pull Request #100 · alan-turing-institute/autocast

radka-j · 2025-12-11T17:40:14Z

Closes #72

NOTES:

The Well ZScoreNormalizer is instantiated with a stats dictionary that must have the following keys:
- [mean, std, mean_delta, std_delta].
- It feels like we don't always want to have to calculate the delta stats to instantiate this?
The ZScoreNormalizer also requires some fields that are currently extracted from metadata
- this means we can only do normalisation after the metadata is created (dont love this)
- the extraction from metadata is currently not general enough (it wouldn't work for the well data)
- should we just read it from the stats file?
We infer normalisation from trainer.datamodule attributes - this requires the user to call trainer.fit(model, datamodule) instead of trainer.fit(model, datamodule.train_dataloader(), datamodule.val_dataloader()). If the user does not pass a datamodule to the trainer, an alternative is to set the normaliser after instantiation (trainer.norm = ...)

TODO:

Normalize with stats read from file
Add denormalize handling
- Create denorm mixin
- Add logic to processors to infer whether data is normalised (The Well decides based on Dataset attributes)
- Call denorm before returning predictions if normalisation is used (training, validation and testing are all done with normalised data)
- Add option to denorm validation and test predictions
Update notebooks/scripts use trainer.fit(model, datamodule) instead of passing in data loaders (so that normalisation can be inferred)
Ensure that predict_step works with RolloutMixin._predict (that there are no conflicts or unexpected behaviour)
Add tests

Can do later:

Normalize with stats read from file
- Improve extraction of field names for norm initialisation
- Add handling for inconsistent norm args
Normalize with stats passed as dict
Normalize with stats computed on train data

review-notebook-app · 2025-12-12T16:04:42Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

radka-j · 2025-12-15T10:48:24Z

Closing as agreed to split the work here into multiple PRs (and to rethink how best to handle denormalisation for evaluation purposes).

radka-j added 12 commits December 11, 2025 17:38

add init normalization impl with stats read from yaml file

787b14d

add DenormMixin

a8b5195

split denorm into two methods

b2b3ef8

add DenorMixing to all models

8fbc2b9

fix import

36e1d34

Merge branch 'main' into normalization

70f6bef

make DenormMixin a lightning module

a4018a1

rm duplicate inheritance from lightning module

96b7680

clean up code

8963380

clean up

fb310d9

only pass datamodule to trainer.fit/test

235f262

only pass datamodule to trainer.fit/test

f80bc03

radka-j added 4 commits December 12, 2025 16:50

Merge branch 'main' into normalization

db54a9f

Merge branch 'main' into normalization

15ef5fa

add logging

abd2169

raise error if data cannot be normalized under current impl

c6cd6c1

radka-j mentioned this pull request Dec 15, 2025

Add option to compute metrics on denormalised data #115

Closed

radka-j closed this Dec 15, 2025

sgreenbury deleted the normalization branch December 19, 2025 13:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add normalisation to ST dataset and data module#100

Add normalisation to ST dataset and data module#100
radka-j wants to merge 16 commits intomainfrom
normalization

radka-j commented Dec 11, 2025 •

edited

Loading

Uh oh!

review-notebook-app Bot commented Dec 12, 2025

Uh oh!

radka-j commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

radka-j commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app Bot commented Dec 12, 2025

Uh oh!

radka-j commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

radka-j commented Dec 11, 2025 •

edited

Loading