🚢 Titanic Survival Prediction

This project applies machine learning and deep learning techniques to predict passenger survival on the RMS Titanic disaster. Using the Titanic dataset from kaggle, we explore data cleaning, feature engineering, model training, and evaluation to build predictive models.

📊 Dataset Overview

Rows: 891 passengers
Target: Survived (0 = No, 1 = Yes)
Key Features:
- Pclass – Passenger class (1st, 2nd, 3rd)
- Sex – Gender
- Age – Age in years (with missing values)
- SibSp – Siblings/Spouses aboard
- Parch – Parents/Children aboard
- Fare – Ticket price
- Embarked – Port of embarkation (C/Q/S)

⚙️ Data Preprocessing

Filled missing Age with median.
Filled missing Embarked with mode.
Dropped Cabin (too many missing values).
Created new features:
- FamilySize = SibSp + Parch + 1
- IsAlone (binary flag)
- Title extracted from names (Mr, Mrs, Miss, etc.).
Encoded categorical variables (Sex, Embarked, Title).
Scaled continuous features (Age, Fare) for distance-based models.

🤖 Models Used

Baseline Models:
- Logistic Regression
- Random Forest
- Gradient Boosting
- SVM
- KNN
- Naive Bayes
Advanced Models:
- XGBoost (with hyperparameter tuning & class imbalance handling)
- Stacking Ensemble (Random Forest + Gradient Boosting + XGBoost, meta-learner = Logistic Regression)
- Neural Network (Keras Sequential with hidden layers)

📈 Results

Model	Accuracy (Test Set)
Logistic Regression	0.782
Random Forest	0.832
Gradient Boosting	0.838
SVM	0.832
KNN	0.804
Naive Bayes	0.782
XGBoost (tuned)	0.855
Stacking Ensemble	0.844
Neural Network	~0.81

✅ Best Model: XGBoost with class imbalance handling (Accuracy ≈ 85.5%)

🔑 Key Insights

Sex and Title were among the strongest predictors of survival.
Scaling significantly improved performance for SVM and KNN.
Handling class imbalance in XGBoost boosted test accuracy.
Ensemble methods (Stacking) gave performance close to the best single tuned model.

🛠️ Tech Stack

Python (Pandas, NumPy, Matplotlib, Scikit-learn, XGBoost, TensorFlow/Keras)
Jupyter Notebook for analysis and visualization

📌 Next Steps

Try more feature engineering (e.g., ticket grouping, cabin deck extraction).
Experiment with cross-validation and ensemble blending.
Deploy the best model with Streamlit/Flask for interactive predictions.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitattributes		.gitattributes
README.md		README.md
Titanic-Dataset.csv		Titanic-Dataset.csv
titanic.ipynb		titanic.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚢 Titanic Survival Prediction

📊 Dataset Overview

⚙️ Data Preprocessing

🤖 Models Used

📈 Results

🔑 Key Insights

🛠️ Tech Stack

📌 Next Steps

About

Uh oh!

Releases

Packages

Languages

fns12/titanic_survival_prediction

Folders and files

Latest commit

History

Repository files navigation

🚢 Titanic Survival Prediction

📊 Dataset Overview

⚙️ Data Preprocessing

🤖 Models Used

📈 Results

🔑 Key Insights

🛠️ Tech Stack

📌 Next Steps

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages