Add GraphCast (1 degree model) #256

rodrigoalmeida94 · 2025-04-11T13:24:21Z

Earth2Studio Pull Request

Description

Add support for GraphCast model (small). In addition, adds ARCOExtra data (to produce relative humidity and accumulated precipitation on 6h intervals).
Closes #199

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.

ToDos:

Fix mypy
Add explicit device management (torch to jax)

Dev notes

This is quite a hacky implementation based loosely on https://github.com/ecmwf-lab/ai-models-graphcast and https://github.com/ankurmahesh/earth2mip-fork/blob/70fec6cbb388ae46591a5827c5e60621b728f602/earth2mip/networks/graphcast.py
I tried to use as much of the original graphcast logic: this results in a weird data flow, going from (torch.array, coords) -> xr.Dataset (creating the variables structure + generating forcings) -> running inference with the xr.Dataset in JAX -> convert back to earth2studio (torch.array, coords) convention
In order to make use of the graphcast data handling utilities, I had to save the number of forecast steps as class instance variable, which is then used to create the data + iterator to produce forecasts. This means that an extra step is required when running the predictions (e.g. run.deterministic(["2019-02-01"], nsteps, model.set_nsteps(nsteps), data, io), which is definitely not ideal.
I also made use of the ARCOLexicon (with the extra derived variables like relative humidity and total_precipitation_6hr) since graphcast expects the "long" variable names.

NickGeneva · 2025-04-12T00:06:35Z

Hi @rodrigoalmeida94

This is awesome, thanks so much for adding this!
We'll start taking a look at this next week, there may be some edits coming in as we review on our side.

loliverhennigh · 2025-04-15T23:18:51Z

Hi @rodrigoalmeida94

Finishing up some other work but will start working on this next week. We are planning to add the 0.25 degree model in as well. Ill start pushing changes to your branch starting tomorrow if thats ok.

rodrigoalmeida94 · 2025-04-16T08:26:22Z

Hi @rodrigoalmeida94

Finishing up some other work but will start working on this next week. We are planning to add the 0.25 degree model in as well. Ill start pushing changes to your branch starting tomorrow if thats ok.

@loliverhennigh sounds good! I wanted to add the device management (for jax), going to try to add something for that today or tomorrow.

nbren12 · 2025-04-22T02:59:32Z

FYI. in case it's helpful, there is a graphcast runner here that works with the 0.25 deg models: https://github.com/NVIDIA/earth2mip/blob/main/earth2mip/networks/graphcast.py.

I got it to work with a custom tisr implementation in physicsnemo so it can beyond the duration of the input data. It's been used for inferences in a few of our papers.

loliverhennigh · 2025-05-06T20:20:31Z

Hey @rodrigoalmeida94 , I have some changes for this PR. Could you give me push access to your fork? If not I will fork off it and make another PR.

rodrigoalmeida94 · 2025-05-06T20:44:32Z

@loliverhennigh just did, hope it worked?

loliverhennigh · 2025-05-15T20:07:50Z

/blossom-ci

loliverhennigh · 2025-05-15T21:20:25Z

/blossom-ci

rodrigoalmeida94 · 2025-05-16T14:33:56Z

@loliverhennigh the good news is that yes, I can run all my perturbation stuff with GraphCastMini class, everything works smoothly, which is great 🥳

The bad news is that I was trying to check the predictions against the original implementation as you suggested, and it's not adding up. See notebook here https://github.com/rodrigoalmeida94/earth2studio/blob/check-graphcast/check_graphcast.ipynb

My check here was to make use of the example batch data that the GraphCast repo provides and compute the predictions using the original methods and our implementation. The differences in the predictions are quite large (up to 15K in t2m) so something must be off - I was thinking maybe we are dealing with the lead times somehow wrong, but honestly not really sure how. Maybe I mixed up something in the notebook?

rodrigoalmeida94 · 2025-05-19T14:45:41Z

Okay so good news: I tested this again using the WB2ERA5 data source and now the predictions are the same as in the original repository (notebook 1 and 2).

I think the previous notebooks I was dealing with the lead times wrong (because I was using a local copy of ARCO, which was only for 2022).

loliverhennigh · 2025-05-19T20:54:45Z

/blossom-ci

loliverhennigh · 2025-05-19T20:58:33Z

/blossom-ci

NickGeneva · 2025-05-21T19:19:53Z

/blossom-ci

NickGeneva · 2025-05-21T20:21:17Z

/blossom-ci

NickGeneva · 2025-05-21T23:15:26Z

Cross checked the implementation with my own validation script based on @rodrigoalmeida94 notebooks and also looking at the GC repo

import os

os.environ["XLA_FLAGS"] = "--xla_gpu_deterministic_ops=true"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

from earth2studio.models.px import GraphCastSmall
from graphcast import data_utils
import xarray as xr
import dataclasses
from graphcast import rollout
from graphcast import data_utils
import jax
import numpy as np
from earth2studio.data.utils import fetch_data
from earth2studio.utils.coords import map_coords

from earth2studio.data import WB2ERA5
from datetime import datetime

# Set up data and model
model = GraphCastSmall.load_model(GraphCastSmall.load_default_package())
ds = WB2ERA5(cache=True)

x, coords = fetch_data(
    source=ds,
    time=[datetime(2022,1,1)],
    variable=model.input_coords()["variable"],
    lead_time=model.input_coords()["lead_time"],
    device="cuda",
)
x_input, coords_input = map_coords(x, coords, model.input_coords())
iter = model.create_iterator(x_input, coords_input)

n_steps = 6

# Earth2Studio wrapper forward prediction
outputs = []
step = 0
for x, coords in iter:
    print(x.shape)
    outputs.append(xr.DataArray(x.cpu(), coords=coords))
    step += 1
    if step > n_steps:
        break
prediction_e2studio = xr.concat(outputs, dim="lead_time")
prediction_e2studio.to_netcdf("e2s.nc")
del iter

# Graphcast original prediction
batch, target_lead_times = model.from_dataarray_to_dataset(xr.DataArray(x_input.cpu(), coords=coords_input), lead_time=6*n_steps)
eval_inputs, eval_targets, eval_forcings = data_utils.extract_inputs_targets_forcings(
        batch, target_lead_times=target_lead_times,
        **dataclasses.asdict(model.ckpt.task_config))

generator = rollout.chunked_prediction_generator(model.run_forward,
    rng=jax.random.PRNGKey(0),
    inputs=eval_inputs,
    targets_template=eval_targets * np.nan,
    forcings=eval_forcings
)
prediction_graphcast = [next(generator) for _ in range(n_steps)]
prediction_graphcast = xr.concat(prediction_graphcast, dim="time")
prediction_graphcast.to_netcdf("gc.nc")

print(prediction_e2studio)
print(prediction_graphcast)

# Plot difference between two variables for each time-step
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots(3, n_steps, figsize=(18, 8))
for i in range(n_steps):
    e2s_data = prediction_e2studio.sel(variable="u10m").isel(time=0, lead_time=i+1)
    gc_data = prediction_graphcast['10m_u_component_of_wind'].isel(time=i, batch=0)
    
    ax[0, i].imshow(e2s_data, cmap='RdBu_r', vmin=-30, vmax=30)
    ax[1, i].imshow(gc_data, cmap='RdBu_r', vmin=-30, vmax=30)
    ax[2, i].imshow(np.abs(e2s_data - gc_data), cmap='magma', vmin=0, vmax=2)
    ax[0, i].set_title(f'Timestep {i}')
    
ax[0, 0].set_ylabel(f'Earth2Studio')
ax[1, 0].set_ylabel(f'GraphCast')
ax[2, 0].set_ylabel(f'Diff')
plt.tight_layout()
plt.savefig("u10m.png")

plt.close("all")
fig, ax = plt.subplots(3, n_steps, figsize=(18, 8))
for i in range(n_steps):
    e2s_data = prediction_e2studio.sel(variable="z500").isel(time=0, lead_time=i+1)
    gc_data = prediction_graphcast['geopotential'].isel(time=i, batch=0, level=7)
    
    ax[0, i].imshow(e2s_data, cmap='viridis')
    ax[1, i].imshow(gc_data, cmap='viridis')
    ax[2, i].imshow(np.abs(e2s_data - gc_data), cmap='magma', vmin=0, vmax=10)
    ax[0, i].set_title(f'Timestep {i}')

ax[0, 0].set_ylabel(f'Earth2Studio')
ax[1, 0].set_ylabel(f'GraphCast')
ax[2, 0].set_ylabel(f'Diff')
plt.tight_layout()
plt.savefig("z500.png")

I chose one surface variable and also a pressure variable.
This generates the following images, the results are identical for 6 time-steps.

u10m

z500

NickGeneva · 2025-05-21T23:15:36Z

/blossom-ci

NickGeneva · 2025-05-21T23:52:26Z

/blossom-ci

NickGeneva · 2025-05-22T00:46:23Z

/blossom-ci

NickGeneva · 2025-05-22T00:53:53Z

/blossom-ci

NickGeneva · 2025-05-22T01:42:48Z

Thank you for the great contribution @rodrigoalmeida94 !

We greatly appreciate it and are already working on also adding the 0.25 degree model.

rodrigoalmeida94 and others added 11 commits April 8, 2025 18:12

add working version

c7e73ec

use arcolexicon

1ab533d

add deps

49f7356

move to optional

5c0be4f

sync uv

6937c86

add graphcast deps

9299608

add arcoextra source

0167f5b

make tests pass

6d699ad

fix ruff and black

cd8bf08

fix type hints

f4e81f3

test refactor

29d7548

NickGeneva added the 1 - On Deck To be worked on next label Apr 11, 2025

NickGeneva requested review from NickGeneva and loliverhennigh April 11, 2025 16:08

Rodrigo Almeida and others added 9 commits April 24, 2025 12:20

add some handling of devices

302714b

make tests pass, drop time

9b8a252

add error for more than 1 time

0a6590d

fix dimensions

415f0ef

Merge remote-tracking branch 'origin/main' into graphcast

8bd49c7

add to init

13b037c

add graphcast to all

48f6eea

inter is interp

cb92379

merging

d5191ef

fixing lexicon to use ARCO

064c438

fixing versioning

3ac6a08

merged

a5b3b6d

NickGeneva added the ! - Release PRs or Issues releating to a release label May 20, 2025

NickGeneva added 5 commits May 21, 2025 16:36

Merge branch 'main' into graphcast

8a3133b

Makefile update

5c0e1fa

Minor api doc improvements

ba1391d

Little updates for docs

aba4cab

Renaming to graphcast small

083230f

Little fixes

ee89cb1

Bug fix

7e91f3d

NickGeneva approved these changes May 21, 2025

View reviewed changes

Test fixes

670cb2b

Fixing CI

d725652

Fixing CI

e188c9e

NickGeneva merged commit 368fba0 into NVIDIA:main May 22, 2025
11 checks passed

rodrigoalmeida94 deleted the graphcast branch May 22, 2025 08:55

Add GraphCast (1 degree model) #256

Add GraphCast (1 degree model) #256

Uh oh!

Conversation

rodrigoalmeida94 commented Apr 11, 2025 • edited by NickGeneva Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Earth2Studio Pull Request

Description

Checklist

ToDos:

Dev notes

Uh oh!

NickGeneva commented Apr 12, 2025

Uh oh!

loliverhennigh commented Apr 15, 2025

Uh oh!

rodrigoalmeida94 commented Apr 16, 2025

Uh oh!

nbren12 commented Apr 22, 2025

Uh oh!

loliverhennigh commented May 6, 2025

Uh oh!

rodrigoalmeida94 commented May 6, 2025

Uh oh!

loliverhennigh commented May 15, 2025

Uh oh!

loliverhennigh commented May 15, 2025

Uh oh!

rodrigoalmeida94 commented May 16, 2025

Uh oh!

rodrigoalmeida94 commented May 19, 2025

Uh oh!

loliverhennigh commented May 19, 2025

Uh oh!

loliverhennigh commented May 19, 2025

Uh oh!

NickGeneva commented May 21, 2025

Uh oh!

NickGeneva commented May 21, 2025

Uh oh!

NickGeneva commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NickGeneva commented May 21, 2025

Uh oh!

NickGeneva commented May 21, 2025

Uh oh!

NickGeneva commented May 22, 2025

Uh oh!

NickGeneva commented May 22, 2025

Uh oh!

Uh oh!

NickGeneva commented May 22, 2025

Uh oh!

Uh oh!

rodrigoalmeida94 commented Apr 11, 2025 •

edited by NickGeneva

Loading

NickGeneva commented May 21, 2025 •

edited

Loading