Skip to content

Add normalisation to ST dataset and data module#100

Closed
radka-j wants to merge 16 commits intomainfrom
normalization
Closed

Add normalisation to ST dataset and data module#100
radka-j wants to merge 16 commits intomainfrom
normalization

Conversation

@radka-j
Copy link
Copy Markdown
Member

@radka-j radka-j commented Dec 11, 2025

Closes #72

NOTES:

  • The Well ZScoreNormalizer is instantiated with a stats dictionary that must have the following keys:
    • [mean, std, mean_delta, std_delta].
    • It feels like we don't always want to have to calculate the delta stats to instantiate this?
  • The ZScoreNormalizer also requires some fields that are currently extracted from metadata
    • this means we can only do normalisation after the metadata is created (dont love this)
    • the extraction from metadata is currently not general enough (it wouldn't work for the well data)
    • should we just read it from the stats file?
  • We infer normalisation from trainer.datamodule attributes - this requires the user to call trainer.fit(model, datamodule) instead of trainer.fit(model, datamodule.train_dataloader(), datamodule.val_dataloader()). If the user does not pass a datamodule to the trainer, an alternative is to set the normaliser after instantiation (trainer.norm = ...)

TODO:

  • Normalize with stats read from file
  • Add denormalize handling
    • Create denorm mixin
    • Add logic to processors to infer whether data is normalised (The Well decides based on Dataset attributes)
    • Call denorm before returning predictions if normalisation is used (training, validation and testing are all done with normalised data)
    • Add option to denorm validation and test predictions
  • Update notebooks/scripts use trainer.fit(model, datamodule) instead of passing in data loaders (so that normalisation can be inferred)
  • Ensure that predict_step works with RolloutMixin._predict (that there are no conflicts or unexpected behaviour)
  • Add tests

Can do later:

  • Normalize with stats read from file
    • Improve extraction of field names for norm initialisation
    • Add handling for inconsistent norm args
  • Normalize with stats passed as dict
  • Normalize with stats computed on train data

@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@radka-j
Copy link
Copy Markdown
Member Author

radka-j commented Dec 15, 2025

Closing as agreed to split the work here into multiple PRs (and to rethink how best to handle denormalisation for evaluation purposes).

@radka-j radka-j closed this Dec 15, 2025
@sgreenbury sgreenbury deleted the normalization branch December 19, 2025 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement normalization for spatiotemporal dataset

1 participant