Skip to content

This project uses the famous Titanic dataset to predict passenger survival in the 1912 disaster. By applying machine learning classification techniques, the model learns from features to determine whether a passenger survived or not.

Notifications You must be signed in to change notification settings

fns12/titanic_survival_prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🚒 Titanic Survival Prediction

This project applies machine learning and deep learning techniques to predict passenger survival on the RMS Titanic disaster. Using the Titanic dataset from kaggle, we explore data cleaning, feature engineering, model training, and evaluation to build predictive models.


πŸ“Š Dataset Overview

  • Rows: 891 passengers

  • Target: Survived (0 = No, 1 = Yes)

  • Key Features:

    • Pclass – Passenger class (1st, 2nd, 3rd)
    • Sex – Gender
    • Age – Age in years (with missing values)
    • SibSp – Siblings/Spouses aboard
    • Parch – Parents/Children aboard
    • Fare – Ticket price
    • Embarked – Port of embarkation (C/Q/S)

βš™οΈ Data Preprocessing

  • Filled missing Age with median.

  • Filled missing Embarked with mode.

  • Dropped Cabin (too many missing values).

  • Created new features:

    • FamilySize = SibSp + Parch + 1
    • IsAlone (binary flag)
    • Title extracted from names (Mr, Mrs, Miss, etc.).
  • Encoded categorical variables (Sex, Embarked, Title).

  • Scaled continuous features (Age, Fare) for distance-based models.


πŸ€– Models Used

  • Baseline Models:

    • Logistic Regression
    • Random Forest
    • Gradient Boosting
    • SVM
    • KNN
    • Naive Bayes
  • Advanced Models:

    • XGBoost (with hyperparameter tuning & class imbalance handling)
    • Stacking Ensemble (Random Forest + Gradient Boosting + XGBoost, meta-learner = Logistic Regression)
    • Neural Network (Keras Sequential with hidden layers)

πŸ“ˆ Results

Model Accuracy (Test Set)
Logistic Regression 0.782
Random Forest 0.832
Gradient Boosting 0.838
SVM 0.832
KNN 0.804
Naive Bayes 0.782
XGBoost (tuned) 0.855
Stacking Ensemble 0.844
Neural Network ~0.81

βœ… Best Model: XGBoost with class imbalance handling (Accuracy β‰ˆ 85.5%)


πŸ”‘ Key Insights

  • Sex and Title were among the strongest predictors of survival.
  • Scaling significantly improved performance for SVM and KNN.
  • Handling class imbalance in XGBoost boosted test accuracy.
  • Ensemble methods (Stacking) gave performance close to the best single tuned model.

πŸ› οΈ Tech Stack

  • Python (Pandas, NumPy, Matplotlib, Scikit-learn, XGBoost, TensorFlow/Keras)
  • Jupyter Notebook for analysis and visualization

πŸ“Œ Next Steps

  • Try more feature engineering (e.g., ticket grouping, cabin deck extraction).
  • Experiment with cross-validation and ensemble blending.
  • Deploy the best model with Streamlit/Flask for interactive predictions.

About

This project uses the famous Titanic dataset to predict passenger survival in the 1912 disaster. By applying machine learning classification techniques, the model learns from features to determine whether a passenger survived or not.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published