Global flood prediction model based on Google's Flood Hub and several spatio-temporal graph architectures. Uses a graph neural network on upstream basins to prevent information loss due to area-weighted averaging over large upstream basin geometries.
Operates on ERA5-Land data aggregated over HydroATLAS Level 7 basin geometries, predicting GRDC flow data for North America.
Future work would see aggregation over level 12 geometries for maximum granularity, as well as a global scale dataset.
Run export/Basin_Export.ipynb in a Google Colab environment to export ERA5 data for individual basins.
Make sure to either create a "Basin Differentiation" folder in your Google Drive or change the folder name. Also make sure to sign up for Google Earth Engine, then create a project and change the project name in Basin_Export to the name of your project. This process will take a while, and Google offers a Task Manager to track queued tasks. (Maximum number of concurrent tasks is capped, you may need to run the export script multiple times)
Specify region and level of study with HydroSHEDS parameters:
hydrobasins = ee.FeatureCollection("WWF/HydroSHEDS/v1/Basins/hybas_7")
basinsStringed = hydrobasins.map(lambda f: f.set('HYBAS_ID', ee.String(f.get('HYBAS_ID'))))
northAmericaBasins = basinsStringed.filter(ee.Filter.Or(ee.Filter.stringStartsWith('HYBAS_ID', '7'), ee.Filter.stringStartsWith('HYBAS_ID', '8')))
basinIDs = northAmericaBasins.aggregate_array('PFAF_ID').distinct().sort().getInfo()
basinType = type(basinIDs[0])
Current code specifies HydroSHEDS level 7 geometries, and the regions North America (7) and Arctic (8)
Download both the BasinATLAS and RiverATLAS datasets from the HydroATLAS compiled dataset
Download series of GRDC data for region of study from the GRDC Official Site
The dataset expects certain features to be in specific folders in the data folder. To start, create the data folder (or edit config file to use a different folder).
Unzip the BasinATLAS and RiverATLAS data folders directly into data. Next, create a folder named series inside of data that contains both ERA5 and GRDC folders. Place the exported ERA5 .csv files in the ERA5 folder, and do the same with the GRDC .txt files in their folder.
Last, create a folder named joined in data. This will be populated by data.py.
The final folder structure should look something like this:
├── catalog
│ ├── BasinATLAS_Catalog_v10.pdf
│ └── HydroATLAS_TechDoc_v10.pdf
├── data
│ ├── BasinATLAS_v10_shp
│ ├── RiverATLAS_v10_shp
│ ├── series
│ │ ├── GRDC
│ │ │ ├── Q001334.cmd.txt
│ │ │ └── Q001335.cmd.txt
│ │ └── ERA5
│ │ ├── Basin_AreaWeighted_TS_7001240.csv
│ │ └── Basin_AreaWeighted_TS_7001232.csv
│ └── joined
├── train.ipynb
├── test.ipynb
└── README.md
python3.11 -m venv venv
.\venv\Scripts\activate
python -m pip install --upgrade pip
pip install -r requirements.txt
You can choose to either run this script or disable the visualizer in the training notebook.
Running the script creates a visualizer for model metrics such as loss, recall, and NSE. This could technically be run on another device if client is updated in the training notebook to connect to the machine.
Begins training. Make sure to specify the chosen config, model, and dataset. First run will take some extra time precomputing joins on ERA5, GRDC, and BasinATLAS data.





