Subseasonal-to-seasonal (S2S) forecasting bridges the gap between weather forecasts (up to 2 weeks) and seasonal forecasts (3-6 months). This recipe demonstrates how to run ensemble forecasts for S2S timescales using Earth2Studio. Reflecting the need for larger ensembles and longer forecast timescales, the recipe supports multi-GPU distributed inference, along with parallel I/O to efficiently save forecast data as it is generated, and permits only saving a subset of the forecast outputs if storage space is a constraint. Among the models in Earth2studio, DLESyM and HENS-SFNO are best-posed for S2S forecasting, so this recipe supports both models; it can be extended to support other models as well.
Key features include:
- Multi-GPU inference
- Parallel, non-blocking I/O using zarr format
- Usage of one or more diagnostic models in the forecast pipeline
- Storage space reduction via regional output or coarsening
- Multi-GPU scoring of forecast outputs
- Capability to score using ECMWF AIWQ S2S metrics
To use the HENS component of this recipe, we need to download the pre-trained checkpoints, kindly provided by the authors of the method. If desired, the following command can be used to download the entire model registry from NERSC:
Warning
The entire model registry is 241 Gb.
Tip
To download the individual checkpoints on an "as need" basis, just run the recipe with out executing this command. The checkpoints will get downloaded to the registry folder defined in the config from the HuggingFace mirror. (coming soon)
wget --recursive --no-parent --no-host-directories \
--cut-dirs=4 --show-progress --reject="index.html*" \
--directory-prefix=s2s_model_registry \
https://portal.nersc.gov/cfs/m4416/hens/earth2mip_prod_registry/Using HENS in this recipe also requires the skill file for scaling the noise vector in the breeding steps of the initial condition (IC) perturbation. This can be downloaded with the following command:
wget --no-parent --no-host-directories --show-progress \
--directory-prefix=s2s_model_registry \
https://portal.nersc.gov/cfs/m4416/hens/d2m_sfno_linear_74chq_sc2_layers8_edim620_wstgl2-epoch70_seed16.ncThe HENS checkpoints have a larger memory footprint than other models supported in Earth2Studio. For the best experience the following hardware specifications are recommended:
- GPU: CUDA Compute Compatibility >8.0
- GPU Memory: >40Gb
- Storage: >512Gb NVMe SSD
Start by installing the required packages with pip:
# If using uv package manager
uv sync
# Pip
pip install -r recipe-requirements.txtTo run an example ensemble forecast with DLESyM, run
# With uv
uv run python main.py --config-name=pnw_dlesym ncheckpoints=2
# With pip
python main.py --config-name=pnw_dlesym ncheckpoints=2Here, we use a command-line config override to limit the number of model checkpoints in the forecast to just two, for speed. More instructions on configuring the ensemble can be found below in the Configuration section.
The overall structure and usage of this recipe is similar to that of the HENS recipe, since many of the features useful for HENS in medium-range forecasts are also useful for S2S ensemble forecasting (whether with HENS or other models).
This recipe includes a few reference workflows which are configured
through the use of Hydra YAML files.
These can get executed by directly invoking the main.py file.
This example takes the heatwave that occurred in June 2021 over western North America as a case-study for S2S forecasts. This event was extreme in both its duration and intensity, setting several temperature records and causing severe wildfires and substantial damage. Research has shown that S2S forecast models began indicating warm anomalies as far as about 3 weeks in advance, though none could capture the exact location and intensity of the heatwave at that time (Lin, Mo, & Vitart 2022).
Since S2S forecasts can potentially generate a lot of data with large ensembles and longer rollouts, this example showcases how the output saved to disk can be limited to just a few variables of interest subselected to a specific region ("cropbox") to cut down on storage requirements.
Execute this example by running:
uv run main.py --config-name=pnw_dlesymWe also provide a corresponding config for the HENS-SFNO model to run for this case-study:
uv run main.py --config-name=pnw_sfnoThe latter model is larger, and takes longer to complete a forecast. Both commands above may be parallelized to accelerate execution with multiple GPUs, which is highly recommended if possible. See the section on Parallelism for detailed instructions.
In developing and assessing S2S forecast models, it is important to evaluate their performance across a range of conditions rather than just individual case-studies. Since predictability of a given spatial scale rapidly decreases with lead time, it only makes sense to evaluate S2S forecasts at coarser resolution if looking at global skill metrics; a common evaluation resolution used by ECMWF and others is 1.5 degrees, which conveniently also reduces the data volume generated by S2S ensembles.
This example generates S2S forecasts for the entire globe across a range of initial conditions spanning a full year (2018 taken as a test year), so despite coarsening the data before saving to disk it still generates a substantial amount of data (about 300 GB). To make this run in a reasonable amount of time, multi-GPU execution is highly recommended.
Warning
This example generates a lot of data, roughly 300 GB.
On a system with ngpu GPUs available, run:
uv run torchrun --nproc_per_node=$ngpu --standalone main.py --config-name global_dlesymMore considerations on multi-GPU execution are discussed in the Parallelism section.
Similar to what is demonstrated in the HENS recipe, it is possible to use diagnostic models "on top of" the core prognostic model used for S2S forecasting, provided all the input variables of the diagnostic model are available in the prognostic model outputs.
For reference, we provide a config that diagnoses total precipitation using the outputs of a HENS-SFNO ensemble, again subselected to the Pacific Northwest region for efficiency:
uv run python main.py --config-name pnw_sfno_precip.yamlThis recipe is highly customisable and extensible via the use of Hydra. While
config items are documented with in-line comments in the yaml configuration files under
cfg, we describe a few key high-level configuration settings here:
The essential settings to identify a given forecast run are the project and run_id,
which are composed with the output_path to determine where all run outputs and logs
will be saved (${output_path}/${project}_${run_id}). Then, the core settings for
ensemble size and forecast duration are specified as:
nsteps: 12 # number of steps to run the forecast
nperturbed: 4 # Number of perturbations applied to the initial condition for each model checkpoint
# Total ensemble size is nperturbed * ncheckpoints
ncheckpoints: 16 # number of model checkpoints to use
# For DLESyM, ncheckpoints represents the total number of atmos/ocean model checkpoint pairs
batch_size: 4 # inference batch sizeAs mentioned in the in-line comments, this recipe creates ensembles by combining initial
condition perturbations with multiple model checkpoints to produce variability in the
ensemble. Each model checkpoint (set of trained weights) is paired with nperturbed
initial condition perturbations to produce a total of ncheckpoints*nperturbed ensemble
members. With DLESyM, there are 4 atmosphere and 4 ocean models available, leading to a
total of 16 possible checkpoints; HENS-SFNO has up to 29 model checkpoints available for
use in an ensemble.
To provide maximum flexibility and extensibility, we use a hydra defaults list to select
model and perturbation configs from the cfg/forecast_model and cfg/perturbation
folders. This allows one to add additional models/perturbations, if desired. To select a
specific combination, set the defaults in the top-level config file (see existing configs
for a reference):
defaults:
- forecast_model: dlesym
- perturbation: gaussianTip
Not all perturbation methods in Earth2studio are compatible with the DLESyM model, currently. In particular, multi-timestep perturbation methods like (normal/hemispheric-centered) bred vectors will not work.
Aside from these main config items, there are also options to choose a data_source and
configure the file_output and scoring behavior. Refer to the example config files
to see how these are set.
To reduce the storage requirements when running larger ensembles and longer rollouts, there
are several configurations available to subselect outputs if desired under the file_output
section of the config. First, one can only save variables of interest by specifying a list
in file_output.output_vars, which is a required config item. Also required is the output
resolution to save data at, which would be latlon721x1440 for default full resolution or
latlon121x240 for 1.5 degree resolution, commonly used in S2S scoring.
One can also choose to save outputs only over a specific region of the globe using the
optional file_output.cropboxes config setup. For each region, or "cropbox", you can
specify the name of the region and the min/max latitude/longitude values of it. Refer to
pnw_dlesym.yaml for a full cropbox example config.
This recipe also provides mechanisms to score model outputs using a variety of metrics. The
scoring script is score.py and is run in much the same way main.py is used to run
a forecast. It is expected that the same config file is used when scoring a given forecast
as was used when running it.
The scoring script will read data from the output of the forecast run (forecast.zarr in)
the run's output directory and write scores to a new file score.zarr. There are two
general ways of scoring model outputs:
- Using metrics defined in Earth2studio
- Using metrics defined as part of the ECMWF AI Weather Quest competition
These can be configured under the scoring section of the config file:
scoring:
aiwq:
variables: ["t2m"]
variables: ["t2m", "z500"] # variables to score
temporal_aggregation: "weekly" # temporal aggregation to apply before scoring
metrics:
crps:
_target_: earth2studio.statistics.crps
ensemble_dimension: "ensemble"
reduction_dimensions: ["lat", "lon"]
fair: True
score_path: '${output_path}/${project}_${run_id}'The aiwq subsection can be omitted if AI Weather Quest scoring is not desired. For the
more general metrics section, each metric can be instantiated using hydra, and they
will be computed for each variable specified in scoring.variables and saved to disk. As
weekly averaging is common in S2S scoring, we provide a mechanism to do so via the
scoring.temporal_aggregation. We can also take advantage of multi-GPU execution to speed
up the scoring process as before:
uv run torchrun --nproc_per_node=8 --standalone score.py --config-name global_dlesymTo use the AI Weather Quest scoring routines, you must have installed the AI-WQ-package and have registered with AI Weather Quest. Once you have registered, set your password in the following environment variable before running any scoring routines:
export AIWQ_SUBMIT_PWD=<your_AI-WQ_password>This will permit you to use ECMWF's official scoring metrics computed against the official competition verification data. The AI Weather Quest focues on RPSS scores of weekly-averaged forecast outputs, computed against climatological quintiles.
There are a number of limitations associated with using AI_WQ_package for scoring:
- The initialization date of the forecast must be on a Thursday
- The initialization date must be recent (on or after March 2025) to be able to download the corresponding verification data. This precludes using some ERA5 data sources.
- The scoring routines will download data (verification ERA5 data, weekly mean climatology, land-sea mask) when run.
The example config global_dlesym_aiwq.yaml demonstrates a config that can run and score
an ensemble appropriate for the AI Weather Quest surface temperature variable.
This recipe uses the physicsnemo.distributed.DistributedManager to handle multi-GPU
execution. Example commands in this document make use of the torchrun launcher provided
by PyTorch as it is widely available and compatible. To turn a single-GPU run (e.g., one
launched by uv run python main.py --config-name pnw_dlesym.yaml) into a multi-GPU run,
simply add torchrun and the number of GPUs $ngpu you'd like to parallelize over:
uv run torchrun --nproc_per_node=$ngpu --standalone main.py --config-name global_dlesymThe --standalone argument indicate to torchrun that all GPUs are on the same
machine/node; if running over multiple nodes refer to the torchrun guide
for how to properly configure the process group.
Note that torchrun is not required, and the DistributedManager will also work for
process groups initialized with SLURM (srun) or MPI (mpirun).
There are a number of considerations with multi-GPU execution worth highlighting here for more advanced users and developers. Primarily, certain download/caching operations in Earth2studio utilities are not thread-safe, and can cause uncontrolled behavior if not run in a coordinated fashion:
load_default_packageof an Earth2studio modelfetch_dataof some Earth2studio data sources__call__of some perturbation methods, which implicitly callfetch_data- Initialization of output files in an I/O backend
To handle these, a utility routine run_with_rank_ordered_execution is provided and used
in this recipe, which can wrap a function call and ensure that one rank (by default rank 0)
will run the function first, before the rest. This allows e.g. filesystem objects in a
cache or output directory to be created properly before other ranks acccess them.
Developers extending this recipe should use run_with_rank_ordered_execution for any
operations that might lead to race conditions; however it is important that when using it,
all ranks enter and leave the function the same number of times (i.e., execution must be
load-balanced so all ranks enter and exit the barriers in the routine as expected).
For the parallel I/O capabilities in this recipe, it is important to consider the chunk size used with the output zarr file. Execution is parallelized across GPUs over the ensemble and time (initial condition) dimensions, so for safe concurrent writes we keep the chunk size at 1 in those dimensions; I/O performance then depends on the chunking used for the remaining dimensions. A good rule of thumb for zarr chunking is to aim for chunk size of around 10MB, but other sizes may work well too in this case depending on the output grid (resolution or cropbox size) data is being saved on.
- The 2021 Western North American Heatwave and Its Subseasonal Predictions
- A Deep Learning Earth System Model for Efficient Simulation of the Observed Climate
- Huge Ensembles Part I: Design of Ensemble Weather Forecasts using Spherical Fourier Neural Operators
- Spherical Fourier Neural Operators
- NERSC Registry
