The mean goal of this repository is to predict if a passenger survived the sinking of the Titanic or not, based on multiple features such as sex, age ,...etc.
- STEP 1: Exploratory Data Analysis
- STEP 2: Feature Engineering
- STEP 3: Pre-Modeling Tasks
- STEP 4: Modeling
- STEP 5: Evaluating the performance of the model
- STEP 6: Predictions and submission
- Python
- JupyterNotebook
- Sklearn
- Numpy
- Pandas
- Matplotlib.
- Seaborn
- Kaggle (https://www.kaggle.com/c/titanic/data).
In this phase we will extract the dataset and explore it, and we will do some descriptive statistics, and visualize our data.
-
Feature Engineering is a process of transforming the data into data which is easier to interept and also, to increase the predictive power of learning algorithm.
-
In this part we will create a new features that could improve predictions such as if the passenger is alone or not, and combining existing features to produce a more useful one, and dropping the columns doesn't improve predictions.
- Separating the independant and the dependant variable.
- Splitting the training data.
- In this part we'll try to build a Random Forest Model and then tunning the hyperparameters using the GridSearcCV.
-
Evaluating the machine learning model is a crucial part in any data science project. There are many metrics that helps us to evaluate our model accuracy.
-
Classification Accuracy
-
Classification Report
-
Precision Score
-
Recall Score
-
Confusion matrix
- AUC & ROC Curve






