Flood Forecasting

🌊 This repository implements the state-of-the-art models that power Google FloodHub.

This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.

The repository provides open-source replication of Google’s global flood-forecasting models. By open-sourcing these models, we aim to foster transparency, enable in-house integration in production systems, and accelerate academic research.

This repository is a fork of NeuralHydrology, which has been heavily modified and extended to support forecast sequences using the specific model architectures that are used operationally in the Google FloodHub.

Models

This repository contains implementations of the core models used in Google's production forecasting systems.

Mean-Embedding-Forecast-LSTM

The Mean Embedding Forecast LSTM is a forecasting model that uses separate embedding networks for hindcast and forecast inputs. It aggregates these inputs using masked means before passing them into respective LSTMs for the hindcast and forecast periods.

Status: Current production model (as of December 2025) for Google FloodHub.
Reference: Gauch, Martin, et al. "How to deal with missing input data." Hydrology and Earth System Sciences (2025).

Handoff-Forecast-LSTM

The State Handoff Forecast LSTM is a forecasting model that uses a state-handoff to transition from a hindcast sequence (LSTM) model to a forecast sequence (LSTM) model. The hindcast model runs from the past up to the present (the issue time of the forecast) and then passes the cell state and hidden state of the LSTM into a (nonlinear) handoff network, which is used to initialize a new LSTM that rolls out over the forecast period.

Status: Former production model for Google FloodHub.
Reference: Nearing, Grey, et al. "Global prediction of extreme floods in ungauged watersheds." Nature (2024).

Installation

We recommend using Conda to manage dependencies like PyTorch and CUDA.

Create and Activate the Environment:

# Create the environment from the file in the repo  
conda env create -f environments/conda.yml

# Activate the environment (MANDATORY)  
conda activate googlehydrology

Install the Package:
Install in editable mode so that changes to the source code are reflected immediately:
```
# Run from the root of the repository  
pip install -e .
```

🚀 Tutorial Notebook

The most direct way to explore this repository is through our interactive tutorial: GoogleHydrology Tutorial Notebook.

What you will learn:

Model Evaluation: Load pre-trained Google Hydrology models and calculate performance metrics (NSE, KGE) on real-world basin data.
Fine-Tuning for Performance: Learn how to fine-tune the static_attributes_fc layer. This is a powerful technique for improving predictions on "outlier" basins (e.g., basins with unusual sizes or geology) without retraining the entire model.
Visualizing Results: Compare model hydrographs against observed discharge data.

Run it now:

Data Setup

GoogleHydrology uses the Caravan dataset for streamflow observations and static catchment attributes.

1. Download Caravan (NetCDF Version)

A small sample is provided in tutorial/data/Caravan-nc. For full runs:

Visit the Zenodo repository.
Download the NetCDF version (Caravan-nc.tar.gz).

Unpack it locally:

mkdir -p ~/data/  
tar -xvzf Caravan-nc.tar.gz -C ~/data/

2. MultiMet Data

The MultiMet forcing data extension is accessed directly from Google Cloud Storage. Ensure your configuration points to: gs://caravan-multimet/v1.1

Usage

The package installs the run command as the primary entry point.

Training a Model

run train --config-file /path/to/your/training_config_file.yml

Evaluation

Calculate performance metrics (NSE, KGE) on the test set:

run evaluate --run-dir /path/to/your/model_run/

Inference

Generate predictions (without skipping NaN observations):

run infer --run-dir /path/to/your/model_run/

Configuration

Experiments are defined by YAML files. Update the following paths in your config (e.g., tutorial/training-config.yml):

run_dir: Where weights and logs are saved.
train_basin_file: Path to the list of basin IDs.
targets_data_dir / statics_data_dir: Path to your local Caravan NetCDF data.
dynamics_data_dir: Path to forcing data (e.g., gs://caravan-multimet/v1.1).

Example Configurations

The ~/flood-forecasting/example-configs directory contains reference YAML files that define the experimental setups for different model architectures and datasets.

floodhub-settings-config.yml
- Model Architecture: mean_embedding_forecast_lstm
- Dataset: MultiMet (Global Caravan dataset)
- Description: This configuration is designed to replicate the training settings of the current (2025) operational FloodHub model as closely as possible within this open-source framework.
handoff-forecast-lstm-config.yml
- Model Architecture: handoff_forecast_lstm
- Dataset: MultiMet (Global Caravan dataset)
- Description: Provides the settings used for the former operational model. This configuration aligns with the methodology described in the Nature (2024) paper for global ungauged flood prediction.
camels-multimet-mean-embedding-forecast-lstm-config.yml
- Model Architecture: mean_embedding_forecast_lstm
- Dataset: CAMELS-US (531 basins)
- Description: A benchmarking configuration for the Mean-Embedding model tailored for the CAMELS-US dataset. It is optimized for evaluating model stability and performance on a standard hydrological benchmark. Our team uses this as a reference point during model development, and it is included in this repository because this is what we use to ensure that any changes to the repository work as expected.
camels-multimet-handoff-forecast-lstm-config.yml
- Model Architecture: handoff_forecast_lstm
- Dataset: CAMELS-US (531 basins)
- Description: A benchmarking configuration for the State Handoff model tailored for the CAMELS-US dataset, used to compare the handoff approach against other architectures on US-based basin data.

Issue Reporting

If you encounter bugs, please use the GitHub Issue Tracker. Provide a clear description, steps to reproduce, and the expected behavior.

Name		Name	Last commit message	Last commit date
Latest commit History 1,706 Commits
.github		.github
docs		docs
environments		environments
example-configs		example-configs
googlehydrology		googlehydrology
test		test
tools		tools
tutorial		tutorial
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
CODEOWNERS		CODEOWNERS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flood Forecasting

🌊 This repository implements the state-of-the-art models that power Google FloodHub.

Models

Mean-Embedding-Forecast-LSTM

Handoff-Forecast-LSTM

Installation

🚀 Tutorial Notebook

Data Setup

1. Download Caravan (NetCDF Version)

2. MultiMet Data

Usage

Training a Model

Evaluation

Inference

Configuration

Example Configurations

Issue Reporting

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Flood Forecasting

🌊 This repository implements the state-of-the-art models that power Google FloodHub.

Models

Mean-Embedding-Forecast-LSTM

Handoff-Forecast-LSTM

Installation

🚀 Tutorial Notebook

Data Setup

1. Download Caravan (NetCDF Version)

2. MultiMet Data

Usage

Training a Model

Evaluation

Inference

Configuration

Example Configurations

Issue Reporting

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages