This repository contains accompanying the paper "Exploring Design Choices For Autoregressive Deep Learning Climate Models"
Our work explores key design choices for autoregressive Deep Learning (DL) models to achieve stable 10-year rollouts that preserve the statistics of the reference climate. We quantitatively compare the long-term stability of three prominent model architectures — FourCastNet, SFNO, and ClimaX — trained on ERA5 reanalysis data at 5.625° resolution and systematically assess the impact of autoregressive training steps, model capacity and choice of prognostic variables.
- Set up environment (e.g. conda)
- Python Version 3.11.11, e.g. via
conda create dlclim python=3.11.11 && conda activate dlclim - Install the dependencies listed in the requirements.txt file, e.g. via
pip install -r requirements.txt - Install this package in the environment via
pip install -e .
- Python Version 3.11.11, e.g. via
- Data
- Set an environmental variable
DATA_DIR, e.g. via bashexport DATA_DIR=YOUR_PATH - For a quick start, download example data from this zenodo repository and extract the files into the path of
DATA_DIR - Alternatively, download the full raw dataset (see section Data download) and run the script
bash example_scripts/preprocessing.sh
- Set an environmental variable
- Training of own models
- Set an environmental variable
RESULTS_DIR, e.g. via bashexport RESULTS_DIR=YOUR_PATH - Run the training of a single model by running
bash example_scripts/train.sh - Example configurations can be found in the folder example_configs
- An example hyperparameter search can be run with the script
bash example_scripts/hyperparameter_search.sh
- Set an environmental variable
- Evaluation
- Set an environmental variable
RESULTS_DIR, e.g. via bashexport RESULTS_DIR=YOUR_PATH - Run the evaluation script once you trained a model with the scripts
example_scripts/evaluate.sh
- Set an environmental variable
We use two data sources:
Download the 5.625 degree data from the TUM server---which can also be done via the command line, as detailed here -- to the DATA_DIR directory.
For best compatibility with this repository, store the netcdf files of each variable in a separate folder, i.e.,
.
|-- DATA_DIR
| |-- ERA5
| |-- weatherbench1
| |-- r64x32
| |-- 10m_u_component_of_wind
| |-- 10m_v_component_of_wind
| |-- 2m_temperature
| |-- constants
| |-- geopotential
| |-- potential_vorticity
| |-- relative_humidity
| |-- specific_humidity
| |-- temperature
| |-- toa_incident_solar_radiation
| |-- total_cloud_cover
| |-- total_precipitation
| |-- u_component_of_wind
| |-- v_component_of_wind
| |-- vorticity
We provide a script `data_preprocessing/compute_normalization.py' to compute mean and standard deviation per variable and level which is used for normalization.
Daily total solar irradiance (TSI) from an official forcing for CMIP6 and is available for download here
We provide a script data_preprocessing/compute_tisr_heppa.py adapted from the GraphCast to compute Top-of-the-atmosphere incoming solar radiation from the TSI values.
The training script is based on PyTorch Lightning CLI.
train.pystarts model trainingdatamodule.pydefines a LightningDataModule which wraps a PyTorch Iterable-style datasetmodelmodule.pydefines a LightningModule defining multi-step autoregressive training logicmetrics.pyhandles metric computation and its necessary data during trainingautoregressive_rollout.pyis a python script to perform an inference rollout with a trained model- the
modelsfolder contains code that defines the model architectures - the
evaluationfolder contains scripts to evaluate inference rollouts - the
data_preprocessingfolder contains scripts to derive necessary data products from the raw data (see section Data download)