Skip to content

HRRR Async Refactor #301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 13, 2025
Merged

Conversation

NickGeneva
Copy link
Collaborator

@NickGeneva NickGeneva commented May 9, 2025

Earth2Studio Pull Request

Description

round two, this time with HRRR. Similar overhaul as GFS focuses on complete async and interfacing directly with the grib files to get significant speed ups

Sanity check:

from datetime import datetime, timedelta
from earth2studio.data.hrrr import HRRR
from earth2studio.lexicon import HRRRLexicon
import time
import random


start = time.time()

dtime = [datetime(2023,2, 23, 18), datetime(2024,8, 12, 0)]

# Get intersection of variables supported by both lexicons
common_variables = list(set(HRRRLexicon.VOCAB.keys()))
common_variables.sort()

random.seed(0)
variables = random.sample(list(common_variables), len(common_variables))

ds = HRRR(cache=False)
da = ds(time=dtime, variable=variables)
print(f"Time taken: {time.time() - start:.2f} seconds")
da.to_netcdf("hrrr_new.nc", engine="h5netcdf")

Checked with:

import xarray as xr
import numpy as np

ds_old = xr.load_dataarray("hrrr_old.nc")
ds_new = xr.load_dataarray("hrrr_new.nc")

print(ds_old.data.shape)
print(ds_new.data.shape)
print(np.sum(ds_old.data- ds_new.data))

that gives:

(earth2studio) (base) local-ngeneva@ipp2-2268:~/earth2studio$ uv run test2.py 
/localhome/local-ngeneva/earth2studio/test2.py:5: FutureWarning: In a future version of xarray decode_timedelta will default to False rather than None. To silence this warning, set decode_timedelta to True, False, or a 'CFTimedeltaCoder' instance.
  ds_new = xr.load_dataarray("hrrr_new.nc")
(3, 636, 1059, 1799)
(3, 636, 1059, 1799)
0.0

New version Time taken: 93.54 seconds
Main branch version Time taken: 2760.56 seconds

Also plotted a few for visual comparison:

comparison_2

A similar process was done with the forecast source but with a set of two lead times at (timedelta(hours=1), timedelta(hours=3), timedelta(hours=18)) and making sure tp (APCP is included and is the 1 hour accumulated)

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.

Dependencies

@NickGeneva
Copy link
Collaborator Author

/blossom-ci

@NickGeneva NickGeneva self-assigned this May 10, 2025
@NickGeneva
Copy link
Collaborator Author

NickGeneva commented May 13, 2025

Grid validation:

import numpy as np
from herbie import Herbie
import pyproj


def hrrr_grid() -> tuple[np.array, np.array]:
    """Generates the HRRR lambert conformal projection grid coordinates. Creates the
    HRRR grid using single parallel lambert conformal mapping

    Note
    ----
    For more information about the HRRR grid see:

    - https://ntrs.nasa.gov/api/citations/20160009371/downloads/20160009371.pdf
    
    Returns
    -------
    Returns:
        tuple: (lat, lon) in degrees
    """
    # a, b is radius of globe 6371229
    p1 = pyproj.CRS("proj=lcc lon_0=262.5 lat_0=38.5 lat_1=38.5 lat_2=38.5 a=6371229 b=6371229")
    p2 = pyproj.CRS("latlon")
    transformer = pyproj.Transformer.from_proj(p2, p1)
    itransformer = pyproj.Transformer.from_proj(p1, p2)

    # Start with getting grid bounds based on lat / lon box (SW-NW-NE-SE)
    # Reference seems a bit incorrect from the actual data, grabbed from S3 HRRR gribs
    # Perhaps cell points? IDK
    lat = np.array([21.138123, 47.83862349881542, 47.84219502248866, 21.140546625419148])
    lon = np.array([237.280472, 225.90452026573686, 299.0828072281622, 287.71028150897075])

    easting, northing = transformer.transform(lat, lon)
    E, N = np.meshgrid(np.linspace(easting[0], easting[2], 1799), np.linspace(northing[0] , northing[1] , 1059))
    lat, lon = itransformer.transform(E, N)
    lon = np.where(lon < 0, lon + 360, lon)
    return lat, lon

def main():
    H = Herbie('2021-01-01 12:00', model='hrrr', product='sfc', fxx=6)
    H.download(':500 mb')
    x = H.xarray('TMP:2 m')

    lat, lon = hrrr_grid()

    print(np.mean(x.coords['latitude'].values - lat))
    print(np.mean(x.coords['longitude'].values - lon))


if __name__ == "__main__":
    main()
    ```

@NickGeneva
Copy link
Collaborator Author

/blossom-ci

@NickGeneva
Copy link
Collaborator Author

/blossom-ci

@NickGeneva
Copy link
Collaborator Author

/blossom-ci

@NickGeneva
Copy link
Collaborator Author

/blossom-ci

@NickGeneva
Copy link
Collaborator Author

/blossom-ci

@NickGeneva
Copy link
Collaborator Author

/blossom-ci

@NickGeneva
Copy link
Collaborator Author

/blossom-ci

@NickGeneva NickGeneva merged commit 836d886 into NVIDIA:main May 13, 2025
8 of 10 checks passed
@NickGeneva NickGeneva deleted the ngeneva/hrrr_refactor branch May 28, 2025 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant