Fix multimet dataloader to support dynamic variable product names with underscores by joshsturtevant · Pull Request #238 · google-research/flood-forecasting

joshsturtevant · 2026-02-12T16:03:39Z

Summary

Fixed bug where CHIRPS_GEFS would not parse correctly
Makes dynamic variable product name parsing in multimet.py more robust by normalizing product keys and handling known exceptions (beyond just ERA5_LAND)
Keeps existing config compatibility, but makes the loader more robust

Key Changes

Normalizes product names (case-insensitive, hyphens/underscores handled)
Adds explicit mapping for products that previously broke parsing (e.g., underscores)
Handles config keys like chirps_gefs, chirpsgefs, CHIRPS_GEFS, and CHIRPSGEFS
Adds logging when loading Zarr stores for both hindcast and forecast products

Motivation

Previously, the CHIRPS_GEFS product name would not parse correctly due to underscore
In contrast, the ERA5_LAND product name had a hardwired fix; this PR generalizes the solution

Testing

Ran with configs using various CHIRPS_GEFS and ERA5_LAND naming configurations (w and w/o underscores, etc.)
Loader correctly resolves to the correct product name and loads data as expected

…duct name normalization/robustness

google-cla · 2026-02-12T16:03:50Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

…ing during finetuning

grey-nearing

Hi Josh,

Thank you SO MUCH for being the first public contributor to GoogleHydrology!!! And thank you for your patience while we worked on the backend to make sure we had a system in place for handling public PRs!

This is a great fix for CHIRPS. Would you mind taking a look at my questions? It's possible that I've missed something important.

Thanks,
Grey

grey-nearing · 2026-02-24T13:56:18Z

googlehydrology/datasetzoo/multimet.py

-def _get_products_and_bands_from_feature_strings(
-    features: Iterable[str],
-) -> dict[str, list[str]]:
+def _get_products_and_bands_from_feature_dict(feature_dict):


The config input properties are not always dicts. They can be either dicts or lists, depending on whether you want to use a model with vs. without feature groups. This is why the cfg.hindcast_inputs and cfg.forecast_inputs are flattened in lines 182 and 183.

However, this caused me to notice that the typehints in ~/googlehydrology/utils/config.py are wrong -- the typehints only show lists, not dicts, and it should be a Union. I'm fixing this presently.

grey-nearing · 2026-02-24T13:57:30Z

googlehydrology/datasetzoo/multimet.py

        LOGGER.debug(f'Sample index size: {sizeof(indices) / 1024**2} MB')

+        # TODO(future) :: Move above to the scalar compute block
+        LOGGER.debug('scaler check zero scale')


Could I ask you to pull and merge the most recent changes from main? This replication of scaler saving was removed in a recent PR.

grey-nearing · 2026-02-24T13:59:35Z

googlehydrology/datasetzoo/multimet.py

 MULTIMET_MINIMUM_LEAD_TIME = 1

+# Caravan Multimet products that are available in the GCS zarr store
+KNOWN_GCS_PRODUCTS = {


I like the basic strategy here of generalizing the removal of underscores in variable names, but only to known products. Is it necessary to keep a list of known products? It looks like most of these are not used for anything functional, and instead the PRODUCT_ALIASES is doing the heavy lifting. Is there a need to keep this known products list?

grey-nearing · 2026-02-24T14:03:41Z

googlehydrology/datasetzoo/multimet.py

+
+        # If it's a known product, use canonical name
+        # (e.g., user says 'era5land', but dataset is 'ERA5_LAND')
+        for known in KNOWN_GCS_PRODUCTS:


Is this loop doing something after the mapping call in line 1111?

joshsturtevant added 2 commits February 12, 2026 08:08

fixed product name parsing to work for CHIRPS_GEFS; added general pro…

88bbff0

…duct name normalization/robustness

use consistent config access in data loading

604aa4b

joshsturtevant requested a review from amitmarkel as a code owner February 12, 2026 16:03

amitmarkel removed their request for review February 16, 2026 08:43

amitmarkel assigned grey-nearing Feb 16, 2026

joshsturtevant and others added 3 commits February 19, 2026 16:56

add gefsv12 to multimet products; add if statement to stop scaler sav…

f4f38e9

…ing during finetuning

remove gefsv12 product from gcs

2c9426d

Merge branch 'main' into js-dev

445ec6f

grey-nearing reviewed Feb 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix multimet dataloader to support dynamic variable product names with underscores#238

Fix multimet dataloader to support dynamic variable product names with underscores#238
joshsturtevant wants to merge 5 commits intogoogle-research:mainfrom
joshsturtevant:js-dev

joshsturtevant commented Feb 12, 2026

Uh oh!

google-cla bot commented Feb 12, 2026

Uh oh!

grey-nearing left a comment

Uh oh!

grey-nearing Feb 24, 2026

Uh oh!

grey-nearing Feb 24, 2026

Uh oh!

grey-nearing Feb 24, 2026

Uh oh!

grey-nearing Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joshsturtevant commented Feb 12, 2026

Uh oh!

google-cla bot commented Feb 12, 2026

Uh oh!

grey-nearing left a comment

Choose a reason for hiding this comment

Uh oh!

grey-nearing Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

grey-nearing Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

grey-nearing Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

grey-nearing Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants