🏥 Health Plan Churn Analysis

Business question: What drives subscriber churn in health plan services — and what can retention teams do about it?

📌 Overview

Subscriber churn is one of the highest-cost problems in healthtech services. Acquiring a new client costs 5–7× more than retaining an existing one. This project uses a real-world churn dataset to:

Identify which subscriber segments are most at risk (EDA + segmentation)
Quantify the behavioral and contractual signals that precede cancellation (SQL cohort analysis)
Build a classification model that flags high-risk subscribers before they leave (ML)

The analytical stack mirrors production work at healthtech companies: SQL for cohort logic, Python for modeling, and visual storytelling throughout.

📂 Project Structure

health-churn-analysis/
│
├── README.md                  ← You are here
│
├── data/
│   └── subscribers.csv        ← Public dataset (Telco Churn, reframed as health plan)
│
├── notebooks/
│   ├── 01_eda.ipynb           ← Distributions, null analysis, churn by segment
│   ├── 02_viz.ipynb           ← Seaborn + Plotly charts, visual storytelling
│   └── 03_model.ipynb         ← Logistic Regression + Random Forest, AUC-ROC, SHAP
│
└── sql/
    ├── 01_churn_rate_by_cohort.sql   ← Monthly cohort retention table
    ├── 02_retention_by_plan_type.sql ← Churn rate by contract type
    └── 03_high_risk_segments.sql     ← Window functions to rank at-risk segments

🔑 Key Findings

(Charts will be embedded here after notebooks are complete)

Finding	Detail
Monthly plans churn 3× more than annual contracts	Switching users to semi-annual locks in retention
First 90 days are critical	42% of churn happens within the first quarter — onboarding matters most
High-engagement users almost never churn	Users with 3+ service touchpoints/month have <5% churn rate
Model AUC-ROC	`TBD after notebook 03`

🛠️ How to Run

1. Clone the repo

git clone https://github.com/vinicius-goulart/health-churn-analysis.git
cd health-churn-analysis

2. Install dependencies

uv sync

3. Run notebooks in order

jupyter notebook

Open and run: 01_eda.ipynb → 02_viz.ipynb → 03_model.ipynb

4. SQL queries

Queries in sql/ are written for PostgreSQL. To run against the dataset locally:

# Load data into a local PostgreSQL instance or use DuckDB
duckdb -c "CREATE TABLE subscribers AS SELECT * FROM 'data/subscribers.csv';"
duckdb -c ".read sql/01_churn_rate_by_cohort.sql"

📊 Dataset

Source: Telco Customer Churn — IBM Sample Data

The dataset contains ~7,000 subscribers with features including:

Contract type (month-to-month, one year, two year)
Tenure (months as a subscriber)
Monthly charges and total charges
Service usage features
Churn label (Yes/No)

Framing: Column names have been adapted to a health plan context (e.g., Contract → plan_type, tenure → months_as_subscriber) to make the business narrative more realistic and relevant to healthtech recruiting contexts.

🧰 Tech Stack

Layer	Tools
Data wrangling	Python, Pandas, NumPy
SQL analytics	PostgreSQL / DuckDB
Visualization	Seaborn, Plotly
ML modeling	scikit-learn (LogisticRegression, RandomForestClassifier)
Explainability	SHAP
Notebooks	Jupyter

🎯 Business Recommendations

Based on the analysis:

Prioritize onboarding for month-to-month subscribers — the first 90 days are where most churn occurs. A structured 30/60/90 day check-in cadence can reduce early churn by an estimated 15–20%.
Incentivize annual contract upgrades — offering a discount or added benefit at the 3-month mark for monthly subscribers targets the highest-risk window.
Deploy the churn model as a weekly scoring job — flag subscribers with >60% predicted churn probability for proactive outreach by the CS team.

👤 Author

Vinícius Goulart Nardelli
Data Analyst · São Paulo, BR
LinkedIn · GitHub

This project is part of my data analytics portfolio. All data is public and used for educational purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
notebooks		notebooks
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏥 Health Plan Churn Analysis

📌 Overview

📂 Project Structure

🔑 Key Findings

🛠️ How to Run

1. Clone the repo

2. Install dependencies

3. Run notebooks in order

4. SQL queries

📊 Dataset

🧰 Tech Stack

🎯 Business Recommendations

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🏥 Health Plan Churn Analysis

📌 Overview

📂 Project Structure

🔑 Key Findings

🛠️ How to Run

1. Clone the repo

2. Install dependencies

3. Run notebooks in order

4. SQL queries

📊 Dataset

🧰 Tech Stack

🎯 Business Recommendations

👤 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages