Predicting Customer Responses to Insurance Offers Using ML
The objective is to predict which customers will respond positively to a vehicle insurance offer. This project is part of a binary classification challenge which was hosted on Kaggle. Submissions were evaluated using Area Under the ROC Curve (AUC).
Explore full implementation here: 🔗 PS4E7 - Stacking Boosters with ANN
-
📊 Data Integration & Inspection
- Combined official training dataset with original insurance dataset for feature enrichment.
-
🛠️ Preprocessing Pipelines
- Utilized Scikit-learn pipelines and transformers with encoders:
StandardScaler
,PowerTransformer
,OneHotEncoder
,OrdinalEncoder
.
- Utilized Scikit-learn pipelines and transformers with encoders:
-
🔍 Feature Engineering & Selection
- Applied mutual information filtering to retain informative features.
-
🧰 Modeling with Ensembles
- Trained and validated XGBoost, CatBoost, LightGBM classifiers using Stratified K-Fold CV.
- Hyperparameter tuning with Optuna and visual exploration tools.
-
🏋️ Submission Strategy
- Ensemble predictions via model averaging on test data.
-
✅ Public Leaderboard Scores: ranging from 0.50060 to 0.89727
-
🏁 Best Private Score: *0.89690
-
🥇 Rank Achieved: Ranked 70 / 2425 participants and 2234 teams as a solo participant
-
📁 Kaggle Competition: Binary Classification of Insurance Cross Selling
-
📂 Original Dataset: Health Insurance Cross Sell Prediction Data
-
Language: Python 🐍
-
Libraries:
-
pandas
,polars
,numpy
for data handling -
matplotlib
,seaborn
for EDA and plotting -
scikit-learn
,xgboost
,catboost
,lightgbm
for modeling -
optuna
for hyperparameter tuning
-
-
Tools:
-
Jupyter Notebook / Kaggle Notebooks for experimentation
-
Custom pipelines and scoring functions for AUC optimization
-