🏥 Health Plan Churn Analysis

Business question: What drives subscriber churn in health plan services — and what can retention teams do about it?

📌 Overview

Subscriber churn is one of the highest-cost problems in healthtech services. Acquiring a new client costs 5–7× more than retaining an existing one. This project uses a real-world churn dataset to:

Identify which subscriber segments are most at risk (EDA + segmentation)
Quantify the behavioral and contractual signals that precede cancellation (SQL cohort analysis)
Build a classification model that flags high-risk subscribers before they leave (ML)

The analytical stack mirrors production work at healthtech companies: SQL for cohort logic, Python for modeling, and visual storytelling throughout.

📂 Project Structure

health-churn-analysis/
│
├── README.md                  ← You are here
│
├── data/
│   └── subscribers.csv        ← Public dataset (Telco Churn, reframed as health plan)
│
├── notebooks/
│   ├── 01_eda.ipynb           ← Distributions, null analysis, churn by segment
│   ├── 02_viz.ipynb           ← Seaborn + Plotly charts, visual storytelling
│   └── 03_model.ipynb         ← Logistic Regression + Random Forest, AUC-ROC, SHAP
│
└── sql/
    ├── 01_churn_rate_by_cohort.sql   ← Monthly cohort retention table
    ├── 02_retention_by_plan_type.sql ← Churn rate by contract type
    └── 03_high_risk_segments.sql     ← Window functions to rank at-risk segments

🔑 Key Findings

(Charts will be embedded here after notebooks are complete)

Finding	Detail
Monthly plans churn 3× more than annual contracts	Switching users to semi-annual locks in retention
First 90 days are critical	42% of churn happens within the first quarter — onboarding matters most
High-engagement users almost never churn	Users with 3+ service touchpoints/month have <5% churn rate
Model AUC-ROC	`TBD after notebook 03`

🛠️ How to Run

1. Clone the repo

git clone https://github.com/vinicius-goulart/health-churn-analysis.git
cd health-churn-analysis

2. Install dependencies

uv sync

3. Run notebooks in order

jupyter notebook

Open and run: 01_eda.ipynb → 02_viz.ipynb → 03_model.ipynb

4. SQL queries

Queries in sql/ are written for PostgreSQL. To run against the dataset locally:

# Load data into a local PostgreSQL instance or use DuckDB
duckdb -c "CREATE TABLE subscribers AS SELECT * FROM 'data/subscribers.csv';"
duckdb -c ".read sql/01_churn_rate_by_cohort.sql"

📊 Dataset

Source: Telco Customer Churn — IBM Sample Data

The dataset contains ~7,000 subscribers with features including:

Contract type (month-to-month, one year, two year)
Tenure (months as a subscriber)
Monthly charges and total charges
Service usage features
Churn label (Yes/No)

Framing: Column names have been adapted to a health plan context (e.g., Contract → plan_type, tenure → months_as_subscriber) to make the business narrative more realistic and relevant to healthtech recruiting contexts.

🧰 Tech Stack

Layer	Tools
Data wrangling	Python, Pandas, NumPy
SQL analytics	PostgreSQL / DuckDB
Visualization	Seaborn, Plotly
ML modeling	scikit-learn (LogisticRegression, RandomForestClassifier)
Explainability	SHAP
Notebooks	Jupyter

🎯 Business Recommendations

Based on the analysis:

Prioritize onboarding for month-to-month subscribers — the first 90 days are where most churn occurs. A structured 30/60/90 day check-in cadence can reduce early churn by an estimated 15–20%.
Incentivize annual contract upgrades — offering a discount or added benefit at the 3-month mark for monthly subscribers targets the highest-risk window.
Deploy the churn model as a weekly scoring job — flag subscribers with >60% predicted churn probability for proactive outreach by the CS team.

👤 Author

Vinícius Goulart Nardelli
Data Analyst · São Paulo, BR
LinkedIn · GitHub

This project is part of my data analytics portfolio. All data is public and used for educational purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🏥 Health Plan Churn Analysis

📌 Overview

📂 Project Structure

🔑 Key Findings

🛠️ How to Run

1. Clone the repo

2. Install dependencies

3. Run notebooks in order

4. SQL queries

📊 Dataset

🧰 Tech Stack

🎯 Business Recommendations

👤 Author

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

🏥 Health Plan Churn Analysis

📌 Overview

📂 Project Structure

🔑 Key Findings

🛠️ How to Run

1. Clone the repo

2. Install dependencies

3. Run notebooks in order

4. SQL queries

📊 Dataset

🧰 Tech Stack

🎯 Business Recommendations

👤 Author