This is a final participation project for AI Saturdays Lagos Flipped Cohort 9 to develop a model that can predict pluvial (surface water) flood susceptibility. The project utilizes a dataset centered on the Ibadan Metropolis in Oyo State, Nigeria, assessing various environmental and topographical factors to predict flood risk.
Increasing cases of climate change and rapid urbanization have heightened the risk of flooding, posing a significant threat to lives and economies. This project aims to address the challenge of pluvial flood risk by building a machine learning model.
The goal is to develop a Susceptibility Assessment Application (SUSAP) that can detect and predict areas susceptible to pluvial floods. By analyzing several key environmental variables, the model will help in managing and mitigating potential flood risks in urban areas. The dataset for this study was collected for the Ibadan Metropolis, Nigeria, from sources like the United States Geological Survey (USGS) and the Copernicus Climate Data Store.
The dataset contains 144,401 records from five local government areas in Ibadan - Ibadan North LG, Ibadan North-East LG, Ibadan North-West LG, Ibadan South-West LG and Ibadan South-East LG Areas. It was generated by identifying key conditioning variables related to pluvial floods and interpreting satellite imagery (SRTM DEM) using ArcGIS software.
The primary data is contained in a single file, Pluvial_Flood_Dataset.xlsx, which includes both the features and the target variable for training and evaluation.
The dataset includes eight key conditioning variables that influence pluvial flooding:
- X,Y: These are the coordinates of each point. X for Longitude and Y for Latitude.
- Slope: The steepness of the terrain
- Curvature: This defines how the surface bends and can be concave or convex.
- Topographic Wetness Index (TWI): This quantifies the effect of topography on the location and size of saturated source areas.
- Flow Accumulation (FA): This is the number of upstream cells contributing flow to a single cell.
- Drainage: The total length of streams per unit area (assumed).
- Rainfall: Precipitation data for the area.
- Aspect: The compass direction that a slope faces.
The target variable is a classification of flood susceptibility (No Flood - 0, Low Flood - 1, Moderate - 2, High - 3, Very High - 4).
This section will be updated as the team finalizes its strategy. A potential workflow includes:
- Exploratory Data Analysis (EDA): An analysis was carried out to check the dataset and see the different conditioning variables. Visualisations like heatmap were also used to show relationship between the columns.
- Geospatial Visualization: Maps were created to visualize the spatial patterns of rainfall, flooding and drainage.
- Feature Engineering: New features were created and the dataset was prepared for modelling.
- Model Selection: An experiment will be carried out on different models using the sklearn ML library and possibly, Neural Networks.
- Model Training and Hyperparameter Tuning: The models will be trained and validated using the training and validation dataset and the best performing model will be selected.
- Hosting: The model will be hosted using Streamlit after finalisation of the project.
The performance of the classification model will be evaluated using standard metrics such as:
- Accuracy
- Precision
- Recall
- F1-Score
- Confusion Matrix
- Area Under the ROC Curve (AUC)
- Dr. Oladapo Kayode Abiodun for creating and sharing this valuable dataset on Kaggle.
- AI Saturdays Lagos for providing the platform and opportunity for this project.