This project is my final project submission for the chapter Data Engineering of the Udacity Data Science Nano Degree Program.
In this project, we have built a web application allowing to classify messages sent during disasters into one of 36 categories. The application uses a Machine Learning model which was trained on a set of pre-labeled real-life examples. After a series of data engineering and cleaning steps, the multiclass-multioutput model is trained and stored. Through a web app, users can interact with the model and classify unseen messages. Furthermore, some visualizations of the underlying data can be found.
This project aims at solving one of the most challenging problems in ML and Data Science these days. Following a disaster, typically there are millions of messages/tweets/texts generated which is overwhelming for disaster response organizations at the time where they have the least capacity. For them it is crucial to filter and extract those messages which are most important so that they can address the most pressing issues and situations quickly and appropriately and forward the messages to the right response team depending on the topic/area of help that is needed. The provided application allows Disaster Response teams to analyze, filter and priotize the vast amount of generated messages in a quick and automated fashion so that they can better target their resources on the people in need after a disaster.
run.py - python script to launch web application.
Folder: templates - web dependency files (go.html & master.html) required to run the web application.
disaster_messages.csv - real messages sent during disaster events (provided by Figure Eight)
disaster_categories.csv - categories of the messages
process_data.py - ETL pipeline used to load, clean, extract feature and store data in SQLite database
ETL Pipeline Preparation.ipynb - Jupyter Notebook used to prepare ETL pipeline
DisasterResponse.db - cleaned data stored in SQlite database
train_classifier.py - ML pipeline used to load cleaned data, train model and save trained model as pickle (.pkl) file for later use
classifier.pkl - pickle file contains trained model
ML Pipeline Preparation.ipynb - Jupyter Notebook used to prepare ML pipeline
All required libraries are included in the Anaconda distribution.
-
Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
- To run ETL pipeline that cleans data and stores in database
-
Run the following command in the app's directory to run your web app.
python run.py
-
Go to http://0.0.0.0:3001/