This project will use the train.csv dataset provided by Kaggle in order to train a model and test it on the dataset after splitting it so that I can predict who will survive the Titanic.
I will be building three models that will predict who survives the split and evaluate their preformance by calculating the accuracy, precision, recall, and f-score for each model. I will also do a cross validation to see if splitting the test further improves the models.
- Logistic Regression
- Support Vector Machine (SVM)
- K-Nearest Neighbor
Logisitic Regression model had the best performance. SVM model had a similar performance as the logistic regression model, but wasn't as accurate and precise. K-Nearest Neighbor was the worst performing model which shows that it isn't the best model for this type of dataset.