Predict whether passengers survived or not using the Titanic dataset. Steps such as data preprocessing, feature engineering, hyperparameter tuning and model optimization were taken to increase the success of the model.
The dataset contains passenger information from the Titanic disaster. Each passenger has a survival status (Survived
) and various characteristics (age, gender, ticket class etc.).
Dataset Columns:
PassengerId
: Passenger IDSurvived
: 0 = dead, 1 = alivePclass
: Ticket class (1 = 1st class, 2 = 2nd class, 3 = 3rd class)Name
: Passenger nameSex
: Passenger genderAge
: Passenger ageSibSp
: Passenger number of siblings/spousesParch
: Passenger number of parents/childrenTicket
: Ticket numberFare
: Ticket fareCabin
: Passenger cabinEmbarked
: Passenger port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)
Missing Data Detection and Filling:
- Missing data were filled using methods such as mean, mode and median.
New Features Added:
FamilySize
: Family size (SibSp + Parch).IsAlone
: 1 if the passenger is alone, 0 otherwise.
Numericalization of Categorical Values:
- Categorical variables such as
Sex
andEmbarked
were converted to numerical values using the Label Encoding method.
- Important features were identified using Recursive Feature Elimination (RFE). The best 5 features were selected:
Pclass
,SibSp
,Parch
,FamilySize
,IsAlone
. - Logistic Regression model was created and trained on training data.
-
Hyperparameter Tuning: The hyperparameters of the model were optimized using
GridSearchCV
andRandomizedSearchCV
. -
Data Scaling: Data was standardized using StandardScaler to make the model work faster and more efficiently.
-
The success of the model was evaluated with metrics such as Accuracy, Precision, Recall, F1-Score.
-
The classification success of the model was visualized using Confusion Matrix.
- Pandas: Data manipulation and analysis
- Numpy: Numerical computations
- Scikit-learn: Model training, hyperparameter tuning, data preprocessing
- Matplotlib and Seaborn: Data visualization