Description
Challenge 31 - Flood forecasting: the power of citizen science
Stream 3 - Applied data science for weather, climate and atmosphere
Goal
Develop a Python package to facilitate the use of crowdsourced hydrological measurements for forecast validation
Mentors and skills
- Mentors: Marie-Amelie Boucher, Cinzia Mazzetti, Florian Pappenberger, Jan Seibert, Juan Colonese
- Skills required:
- Python
- Machine learning
- Image analysis
- Pattern recognition
- Basic geomatics
Challenge description
Why do we need a solution
Floods are one of the biggest disasters killing countless numbers of people and destroying properties. Forecasting these killers is important to reduce such impacts. The key to improving these forecasts are observations and in particular new types of observation such as crowdsourced data that offer significant opportunities.
Recently, exciting initiatives such as CrowdWater have turned information from people into incredible rich scientific data. In the case of crowdsourcing, people send geo-referenced pictures of streams or rivers along with the corresponding variations of water level. Thousands of data have been gathered over the world like that, covering areas where no other observation is available. The challenge is to convert this precious data into something that can be used in flood forecasting models so that the information is not lost but used to improve the models to help save a life. This project is about solving two key challenges that stop CrowdWater information to be used in the CEMS GloFAS flood forecasting system:
- locate the CrowdWater data points onto GloFAS rivers, which are a simplified representation of true rivers
- Convert CrowdWater information into data consistent with GloFAS.
Data and software
We plan to use CrowdWater virtual stations located on larger rivers and drainage networks from CEMS (EFAS and GloFAS). We plan to use OpenStreetMap to identify rivers and derive metadata. There is also a possibility of using synthetic data (i.e. designed to replicate data that could be obtained by CrowdWater in the future in addition to the data series which already exist.
What could be the solution
We are looking for a solution that will 1) transform water level variations to a variable that can be used for verification of GloFAS and EFAS forecasts and 2) map CrowdWater virtual stations to GloFAS and EFAS points. This can be achieved through a variety of methods, for instance by mimicking the human mapping procedure, through the use of image analysis and/or pattern recognition techniques to match the real river to the representation of the model and then map the stations to the correct model pixels, also exploiting additional metadata such as the station name or the river name. Another possibility is to compute stations' upstream drainage area by using a digital elevation model (DEM) and geomatics tools in Python. The mapping of each station should ideally include a quality flag showing a confidence level in the mapping result.
Ideas for the implementation
We envisage that implementation might include the following steps:
Using a selection of CrowdWater stations for which there also exists an official river gauge, train a machine learning (ML) model to learn the relationship between water level variations, other explanatory variables, and streamflow. Then, use this ML model to translate water level variations into streamflow for all candidate CrowdWater stations (whether or not an official river gauge is also available)
Extract the river map for the area surrounding the station and the available metadata, such as rivers names from OpenStreetMap or any other open dataset. Another option is to compute the station’s upstream drainage area using a DEM and geomatics tools.
Map the station using coordinates and metadata (like the name of the river or the name of a nearby location).
Activity