Software to train climate reconstruction technology (image inpainting with partial convolutions) with numerical model output and to re-fill missing values in observational datasets (e.g., HadCRUT4) using trained models.
- python>=3.7
- pytorch>=1.8.0
- tqdm>=4.59.0
- torchvision>=0.2.1
- numpy>=1.20.1
- matplotlib>=3.4.3
- tensorboardX>=2.4.0
- tensorboard>=2.8.0
- xarray>=0.20.2
- netcdf4>=1.5.8
- setuptools==59.5.0
- xesmf>=0.6.2
- cartopy>=0.20.2
- numba>=0.55.1
An Anaconda environment with all the required dependencies can be created using environment.yml:
conda env create -f environment.ymlTo activate the environment, use:
conda activate craiclimatereconstructionAI can be installed using pip in the current directory:
pip install .The software can be used to:
- train a model (training)
- infill climate datasets using a trained model (evaluation)
The directory containing the climate datasets should have the following sub-directories:
data_largeandval_largefor trainingtest_largefor evaluation
The climate datasets should be in netCDF format and placed in the corresponding sub-directories.
The missing values can be defined separately as masks. These masks should be in netCDF format and have the same dimension as the climate dataset.
A PyTorch model is required for the evaluation.
Once installed, the package can be used as:
- a command line interface (CLI):
- training:
crai-train
- evaluation:
crai-evaluate
- a Python library:
- training:
from climatereconstructionai import train train()
- evaluation:
from climatereconstructionai import evaluate evaluate()
For more information about the arguments:
crai-train --help
usage: crai-train [-h] [--data-root-dir DATA_ROOT_DIR] [--mask-dir MASK_DIR] [--log-dir LOG_DIR] [--img-names IMG_NAMES] [--mask-names MASK_NAMES] [--data-types DATA_TYPES] [--device DEVICE] [--prev-next PREV_NEXT] [--lstm-steps LSTM_STEPS]
[--prev-next-steps PREV_NEXT_STEPS] [--encoding-layers ENCODING_LAYERS] [--pooling-layers POOLING_LAYERS] [--image-sizes IMAGE_SIZES] [--weights WEIGHTS] [--attention] [--channel-reduction-rate CHANNEL_REDUCTION_RATE]
[--disable-skip-layers] [--disable-first-last-bn] [--out-channels OUT_CHANNELS] [--snapshot-dir SNAPSHOT_DIR] [--resume-iter RESUME_ITER] [--batch-size BATCH_SIZE] [--n-threads N_THREADS] [--finetune] [--lr LR]
[--lr-finetune LR_FINETUNE] [--max-iter MAX_ITER] [--log-interval LOG_INTERVAL] [--save-snapshot-image] [--save-model-interval SAVE_MODEL_INTERVAL] [--loss-criterion LOSS_CRITERION] [--eval-timesteps EVAL_TIMESTEPS]
[--load-from-file LOAD_FROM_FILE]
optional arguments:
-h, --help show this help message and exit
--data-root-dir DATA_ROOT_DIR
Root directory containing the climate datasets
--mask-dir MASK_DIR Directory containing the mask datasets
--log-dir LOG_DIR Directory where the log files will be stored
--img-names IMG_NAMES
Comma separated list of netCDF files (climate dataset)
--mask-names MASK_NAMES
Comma separated list of netCDF files (mask dataset). If None, it extracts the masks from the climate dataset
--data-types DATA_TYPES
Comma separated list of variable types, in the same order as img-names and mask-names
--device DEVICE Device used by PyTorch (cuda or cpu)
--prev-next PREV_NEXT
--lstm-steps LSTM_STEPS
Number of considered sequences for lstm (0 = lstm module is disabled)
--prev-next-steps PREV_NEXT_STEPS
--encoding-layers ENCODING_LAYERS
Number of encoding layers in the CNN
--pooling-layers POOLING_LAYERS
Number of pooling layers in the CNN
--image-sizes IMAGE_SIZES
Spatial size of the datasets (latxlon must be of shape NxN)
--weights WEIGHTS Initialization weight
--attention Enable the attention module
--channel-reduction-rate CHANNEL_REDUCTION_RATE
Channel reduction rate for the attention module
--disable-skip-layers
Disable the skip layers
--disable-first-last-bn
Disable the batch normalization on the first and last layer
--out-channels OUT_CHANNELS
Number of channels for the output image
--snapshot-dir SNAPSHOT_DIR
Parent directory of the training checkpoints and the snapshot images
--resume-iter RESUME_ITER
Iteration step from which the training will be resumed
--batch-size BATCH_SIZE
Batch size
--n-threads N_THREADS
Number of threads
--finetune Enable the fine tuning mode (use fine tuning parameterization and disable batch normalization
--lr LR Learning rate
--lr-finetune LR_FINETUNE
Learning rate for fine tuning
--max-iter MAX_ITER Maximum number of iterations
--log-interval LOG_INTERVAL
Iteration step interval at which a tensorboard summary log should be written
--save-snapshot-image
Save evaluation images for the iteration steps defined in --log-interval
--save-model-interval SAVE_MODEL_INTERVAL
Iteration step interval at which the model should be saved
--loss-criterion LOSS_CRITERION
Index defining the loss function (0=original from Liu et al., 1=MAE of the hole region)
--eval-timesteps EVAL_TIMESTEPS
Iteration steps for which an evaluation is performed
--load-from-file LOAD_FROM_FILE
Load all the arguments from a text filecrai-evaluate --help
usage: crai-evaluate [-h] [--data-root-dir DATA_ROOT_DIR] [--mask-dir MASK_DIR] [--log-dir LOG_DIR] [--img-names IMG_NAMES] [--mask-names MASK_NAMES] [--data-types DATA_TYPES] [--device DEVICE] [--prev-next PREV_NEXT] [--lstm-steps LSTM_STEPS]
[--prev-next-steps PREV_NEXT_STEPS] [--encoding-layers ENCODING_LAYERS] [--pooling-layers POOLING_LAYERS] [--image-sizes IMAGE_SIZES] [--weights WEIGHTS] [--attention] [--channel-reduction-rate CHANNEL_REDUCTION_RATE]
[--disable-skip-layers] [--disable-first-last-bn] [--out-channels OUT_CHANNELS] [--model-dir MODEL_DIR] [--model-names MODEL_NAMES] [--dataset-name DATASET_NAME] [--evaluation-dirs EVALUATION_DIRS] [--eval-names EVAL_NAMES]
[--infill {infill,test}] [--create-graph] [--original-network] [--partitions PARTITIONS] [--maxmem MAXMEM] [--load-from-file LOAD_FROM_FILE]
optional arguments:
-h, --help show this help message and exit
--data-root-dir DATA_ROOT_DIR
Root directory containing the climate datasets
--mask-dir MASK_DIR Directory containing the mask datasets
--log-dir LOG_DIR Directory where the log files will be stored
--img-names IMG_NAMES
Comma separated list of netCDF files (climate dataset)
--mask-names MASK_NAMES
Comma separated list of netCDF files (mask dataset). If None, it extracts the masks from the climate dataset
--data-types DATA_TYPES
Comma separated list of variable types, in the same order as img-names and mask-names
--device DEVICE Device used by PyTorch (cuda or cpu)
--prev-next PREV_NEXT
--lstm-steps LSTM_STEPS
Number of considered sequences for lstm (0 = lstm module is disabled)
--prev-next-steps PREV_NEXT_STEPS
--encoding-layers ENCODING_LAYERS
Number of encoding layers in the CNN
--pooling-layers POOLING_LAYERS
Number of pooling layers in the CNN
--image-sizes IMAGE_SIZES
Spatial size of the datasets (latxlon must be of shape NxN)
--weights WEIGHTS Initialization weight
--attention Enable the attention module
--channel-reduction-rate CHANNEL_REDUCTION_RATE
Channel reduction rate for the attention module
--disable-skip-layers
Disable the skip layers
--disable-first-last-bn
Disable the batch normalization on the first and last layer
--out-channels OUT_CHANNELS
Number of channels for the output image
--model-dir MODEL_DIR
Directory of the trained models
--model-names MODEL_NAMES
Model names
--dataset-name DATASET_NAME
Name of the dataset for format checking
--evaluation-dirs EVALUATION_DIRS
Directory where the output files will be stored
--eval-names EVAL_NAMES
Prefix used for the output filenames
--infill {infill,test}
Infill the climate dataset ('test' if mask order is irrelevant, 'infill' if mask order is relevant)
--create-graph Create a Tensorboard graph of the NN
--original-network Use the original network architecture (from Kadow et al.)
--partitions PARTITIONS
Split the climate dataset into several partitions along the time coordinate
--maxmem MAXMEM Maximum available memory in MB (overwrite partitions parameter)
--load-from-file LOAD_FROM_FILE
Load all the arguments from a text fileAn example can be found in the directory demo.
The instructions to run the example are given in the README.md file.
climatereconstructionAI is licensed under the terms of the BSD 3-Clause license.
climatereconstructionAI is maintained by the Climate Informatics and Technology group at DKRZ (Deutsches Klimarechenzentrum).
- Previous contributing authors: Naoto Inoue, Christopher Kadow, Stephan Seitz
- Current contributing authors: Johannes Meuer, Étienne Plésiat.