Skip to content

google-research/flood-forecasting

Repository files navigation

Flood Forecasting

🌊 This repository implements the state-of-the-art models that power Google FloodHub.

This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.

The repository provides open-source replication of Google’s global flood-forecasting models. By open-sourcing these models, we aim to foster transparency, enable in-house integration in production systems, and accelerate academic research.

This repository is a fork of NeuralHydrology, which has been heavily modified and extended to support forecast sequences using the specific model architectures that are used operationally in the Google FloodHub.

Models

This repository contains implementations of the core models used in Google's production forecasting systems.

Mean-Embedding-Forecast-LSTM

The Mean Embedding Forecast LSTM is a forecasting model that uses separate embedding networks for hindcast and forecast inputs. It aggregates these inputs using masked means before passing them into respective LSTMs for the hindcast and forecast periods.

Handoff-Forecast-LSTM

The State Handoff Forecast LSTM is a forecasting model that uses a state-handoff to transition from a hindcast sequence (LSTM) model to a forecast sequence (LSTM) model. The hindcast model runs from the past up to the present (the issue time of the forecast) and then passes the cell state and hidden state of the LSTM into a (nonlinear) handoff network, which is used to initialize a new LSTM that rolls out over the forecast period.

Installation

We recommend using Conda to manage dependencies like PyTorch and CUDA.

  1. Create and Activate the Environment:

    # Create the environment from the file in the repo  
    conda env create -f environments/conda.yml
    
    # Activate the environment (MANDATORY)  
    conda activate googlehydrology  
    
  2. Install the Package:
    Install in editable mode so that changes to the source code are reflected immediately:

    # Run from the root of the repository  
    pip install -e .
    

🚀 Tutorial Notebook

The most direct way to explore this repository is through our interactive tutorial: GoogleHydrology Tutorial Notebook.

What you will learn:

  • Model Evaluation: Load pre-trained Google Hydrology models and calculate performance metrics (NSE, KGE) on real-world basin data.
  • Fine-Tuning for Performance: Learn how to fine-tune the static_attributes_fc layer. This is a powerful technique for improving predictions on "outlier" basins (e.g., basins with unusual sizes or geology) without retraining the entire model.
  • Visualizing Results: Compare model hydrographs against observed discharge data.

Run it now: Open In Colab

Data Setup

GoogleHydrology uses the Caravan dataset for streamflow observations and static catchment attributes.

1. Download Caravan (NetCDF Version)

A small sample is provided in tutorial/data/Caravan-nc. For full runs:

  1. Visit the Zenodo repository.

  2. Download the NetCDF version (Caravan-nc.tar.gz).

  3. Unpack it locally:

    mkdir -p ~/data/  
    tar -xvzf Caravan-nc.tar.gz -C ~/data/
    

2. MultiMet Data

The MultiMet forcing data extension is accessed directly from Google Cloud Storage. Ensure your configuration points to: gs://caravan-multimet/v1.1

Usage

The package installs the run command as the primary entry point.

Training a Model

run train --config-file /path/to/your/training_config_file.yml

Evaluation

Calculate performance metrics (NSE, KGE) on the test set:

run evaluate --run-dir /path/to/your/model_run/

Inference

Generate predictions (without skipping NaN observations):

run infer --run-dir /path/to/your/model_run/

Configuration

Experiments are defined by YAML files. Update the following paths in your config (e.g., tutorial/training-config.yml):

  • run_dir: Where weights and logs are saved.
  • train_basin_file: Path to the list of basin IDs.
  • targets_data_dir / statics_data_dir: Path to your local Caravan NetCDF data.
  • dynamics_data_dir: Path to forcing data (e.g., gs://caravan-multimet/v1.1).

Example Configurations

The ~/flood-forecasting/example-configs directory contains reference YAML files that define the experimental setups for different model architectures and datasets.

  • floodhub-settings-config.yml
    • Model Architecture: mean_embedding_forecast_lstm
    • Dataset: MultiMet (Global Caravan dataset)
    • Description: This configuration is designed to replicate the training settings of the current (2025) operational FloodHub model as closely as possible within this open-source framework.
  • handoff-forecast-lstm-config.yml
    • Model Architecture: handoff_forecast_lstm
    • Dataset: MultiMet (Global Caravan dataset)
    • Description: Provides the settings used for the former operational model. This configuration aligns with the methodology described in the Nature (2024) paper for global ungauged flood prediction.
  • camels-multimet-mean-embedding-forecast-lstm-config.yml
    • Model Architecture: mean_embedding_forecast_lstm
    • Dataset: CAMELS-US (531 basins)
    • Description: A benchmarking configuration for the Mean-Embedding model tailored for the CAMELS-US dataset. It is optimized for evaluating model stability and performance on a standard hydrological benchmark. Our team uses this as a reference point during model development, and it is included in this repository because this is what we use to ensure that any changes to the repository work as expected.
  • camels-multimet-handoff-forecast-lstm-config.yml
    • Model Architecture: handoff_forecast_lstm
    • Dataset: CAMELS-US (531 basins)
    • Description: A benchmarking configuration for the State Handoff model tailored for the CAMELS-US dataset, used to compare the handoff approach against other architectures on US-based basin data.

Issue Reporting

If you encounter bugs, please use the GitHub Issue Tracker. Provide a clear description, steps to reproduce, and the expected behavior.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors