Disaster Response Pipeline Project

Requirements

python 3.10
libraries provided in requirements.txt

File descriptions

disaster_categories.csv and disaster_messages.csv are the data sources for this project containing labels and messages corresponsing to these labels respectively
process_data.py processes the above files to split the labels into their individual columns and leave 1 or 0. This is followed by a bit of cleaning and finally storage into a database
train_classifier.py uses the cleaned data, from the databasepreviously mentioned, to train a naive bayes machine learning model from scikit-learn. The data is put through a tokenizing function that splits the string into words, strips out unnecessary words like "is", "the", etc. and changes them to their base forms using lematization. The resulting model is stored in a pickle file using python's standard persistance library with the same name
run.py deploy's a flask app with plotly visualizations which uses the model stored from the previous script to classify messages and label them according to the pre-programed labels
ETL Pipeline preparation.ipynb contains the same code as process_data.py and was used to develop the script and inspect it's different stages
ML Pipeline Preparation.ipynb contains the same code as train_classifier.py and was used to develop the script and inspect it's different stages. It also contains experimentation with hyperparameter tuning using GridSearchCV. The algorithm yielded a set of better parameters but when using them on the test data no significant improvement was observed.

Motivation

This project was created to experiment with machine learning solutions offered by sci-kit-learn and visualization capabilities offered by plotly and flask.

Instructions:

Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
Run the following command in the app's directory to run your web app. python run.py
Follow the http address given by flask in the CLI to check the visualization

Aknowledgements

I would like to thank scikit-learn for putting together this amazing library and the programming community for doing so much knowledge sharing. Especially for this project:

Alan Jones from Medium for his post on flask and plotly
Koo Ping Shung from Medium for his post regarding accuracy, presission, recall and F1
Javed Shaikh from towardsdatascience for his post showing different different algorithms that can be used.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
app		app
data		data
models		models
.gitignore		.gitignore
ETL Pipeline Preparation.ipynb		ETL Pipeline Preparation.ipynb
ML Pipeline Preparation.ipynb		ML Pipeline Preparation.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Disaster Response Pipeline Project

Requirements

File descriptions

Motivation

Instructions:

Aknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

CristianCapsuna/Disaster_response_project

Folders and files

Latest commit

History

Repository files navigation

Disaster Response Pipeline Project

Requirements

File descriptions

Motivation

Instructions:

Aknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages