Health Insurance Cross-Sell Prediction

This repository implements and benchmarks several algorithms for the Kaggle Insurance Cross-Selling task, including a from-scratch AdaBoost implementation, classical ML models, and AWS SageMaker automated pipelines.

00_adaboost_impl.ipynb

Visual AdaBoost demonstrated with synthetic 2D data and selected sales data pair
Examine margins (CDF)
Optimise decision tree depth
Focus on hard samples: modify weight updates + remove easy samples
Optimise threshold for recall
Ensemble pruning: rank-, search-, and cluster-based methods

01_adaboost_opt.ipynb

AdaBoost optimisation: test ROC AUC 0.845 → 0.874
Early stopping (patience + tolerance)
Data balancing: undersampling vs SMOTE vs hybrid
Feature engineering: interactions + derived features; prune via mean CV α
Feature encoding + selection (manual + automated)
Stratified k-fold CV tuning: tree depth, criterion (Gini/entropy/log loss), η, rounds, weight factor, threshold, easy + hard sample removal

02_model_bench.ipynb

Benchmark 13 models (linear, instance-based, tree, ensemble)
Benchmark data balancing methods
Feature selection/engineering: LightGBM importance + CFS pruning
CatBoost (best): SMOTE, undersampling, full vs LightGBM top-5 vs CFS top-5, Optuna tuning → ROC/AUC 0.876

03_sagemaker_catboost_autopilot.ipynb

S3 + SageMaker: Kaggle API data import → SKLearnProcessor preprocessing
HyperparameterTuning:
- Job 1: major AUC gain (analyse AUC, parameter effects)
- Job 2: marginal gain (increase η + early stopping)
Final CatBoost: train job → endpoint → inference → AUC = 0.876
Autopilot benchmark: raw data → Boto3 monitoring → batch inference

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
images		images
00_adaboost_impl.ipynb		00_adaboost_impl.ipynb
01_adaboost_optim.ipynb		01_adaboost_optim.ipynb
02_model_bench.ipynb		02_model_bench.ipynb
03_sagemaker_catboost_autopilot.ipynb		03_sagemaker_catboost_autopilot.ipynb
README.md		README.md
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Health Insurance Cross-Sell Prediction

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Health Insurance Cross-Sell Prediction

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages