Skip to content

jarrowsm/health-insurance-cross-sell-prediction

Repository files navigation

Health Insurance Cross-Sell Prediction

This repository implements and benchmarks several algorithms for the Kaggle Insurance Cross-Selling task, including a from-scratch AdaBoost implementation, classical ML models, and AWS SageMaker automated pipelines.

00_adaboost_impl.ipynb

  • Visual AdaBoost demonstrated with synthetic 2D data and selected sales data pair
  • Examine margins (CDF)
  • Optimise decision tree depth
  • Focus on hard samples: modify weight updates + remove easy samples
  • Optimise threshold for recall
  • Ensemble pruning: rank-, search-, and cluster-based methods

01_adaboost_opt.ipynb

  • AdaBoost optimisation: test ROC AUC 0.845 → 0.874
  • Early stopping (patience + tolerance)
  • Data balancing: undersampling vs SMOTE vs hybrid
  • Feature engineering: interactions + derived features; prune via mean CV α
  • Feature encoding + selection (manual + automated)
  • Stratified k-fold CV tuning: tree depth, criterion (Gini/entropy/log loss), η, rounds, weight factor, threshold, easy + hard sample removal

02_model_bench.ipynb

  • Benchmark 13 models (linear, instance-based, tree, ensemble)
  • Benchmark data balancing methods
  • Feature selection/engineering: LightGBM importance + CFS pruning
  • CatBoost (best): SMOTE, undersampling, full vs LightGBM top-5 vs CFS top-5, Optuna tuning → ROC/AUC 0.876

03_sagemaker_catboost_autopilot.ipynb

  • S3 + SageMaker: Kaggle API data import → SKLearnProcessor preprocessing
  • HyperparameterTuning:
    • Job 1: major AUC gain (analyse AUC, parameter effects)
    • Job 2: marginal gain (increase η + early stopping)
  • Final CatBoost: train job → endpoint → inference → AUC = 0.876
  • Autopilot benchmark: raw data → Boto3 monitoring → batch inference

About

Optimising insurance cross selling using ML models, including custom AdaBoost and SageMaker AutoML

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors