End-to-End Customer Churn Prediction Pipeline

This repository contains a complete, production-ready pipeline for predicting customer churn. It demonstrates a journey from initial data analysis and feature engineering to model training, deployment, and robust monitoring. The project emphasizes a data-centric approach and showcases an innovative vision for an LLM-augmented MLOps pipeline.

The final model is a CatBoost classifier on a challenging, synthetically engineered dataset designed to mirror real-world complexity.

🚀 Quick Start

To get started with the project, run the following commands:

pip install -r requirements-dev.txt
jupyter notebook 1_data_preprocessing.ipynb

✨ Features

End-to-End Workflow: Covers every step from data preprocessing and feature engineering to model training and deployment.
Advanced Model Training: Utilizes CatBoost for high performance and Optuna for hyperparameter optimization.
Model Explainability: Integrates SHAP for understanding model predictions.
Production-Ready API: A FastAPI application (app.py) serves the trained model for real-time predictions.
Containerized Deployment: Includes a Dockerfile for easy containerization and deployment.
Robust Monitoring: Implements a monitoring script using evidently to detect data drift and ensure model health.
Innovative MLOps Concept: Proposes a next-generation, LLM-augmented monitoring system for automated root cause analysis and proactive testing.

📂 Project Structure

The repository is organized into notebooks, scripts, and artifacts, providing a clear and reproducible workflow.

1_data_preprocessing.ipynb: Notebook for data loading, cleaning, extensive feature engineering, and creating the final, analysis-ready dataset.
2_model_training.ipynb: Notebook for model training, hyperparameter optimization (using Optuna), evaluation, and SHAP-based explainability.
3_deployment_and_monitoring.ipynb: Notebook that defines the production API with FastAPI, documents the scalable architecture, and implements the final, robust monitoring script with evidently.
app.py: The Python script for the FastAPI prediction service.
Dockerfile: Defines the container for deploying the FastAPI application.
requirements-deploy.txt / requirements-dev.txt: Python dependencies for deployment and development, respectively.
.env.example: A template for providing API keys for LLM providers for the advanced monitoring features.
artifacts/: Directory containing all output files, including the trained model, preprocessor, evaluation reports, and data files.
README_SUMMARY.md: A detailed technical report summarizing the project's journey, decisions, and architecture.

🤖 The Vision: An LLM-Augmented MLOps Pipeline

This project proposes a forward-thinking MLOps design where Large Language Models (LLMs) are used to create a self-analyzing system. Instead of just flagging data drift, the pipeline can:

Detect & Export: Automatically run an evidently monitoring pipeline and export a machine-readable drift_report.json.
Reason & Analyze: Use an LLM agent to perform a root cause analysis on the drift report, identifying the "why" behind the issue.
Act & Triage: Programmatically create Jira tickets and Slack alerts based on the LLM's structured analysis.
Test & Qualify: Generate new, targeted test cases with an "Adversarial Tester" LLM and use an "LLM Judge" to get qualitative insights into model performance on new data cohorts.

This creates a full, automated loop: Detect -> Reason -> Recommend -> Test -> Qualify, representing a next-generation approach to building and maintaining machine learning systems.

Scalability

The project is designed with scalability in mind, proposing an architecture that can handle millions of daily predictions using technologies like Kafka for data streaming, Spark/Dask for distributed processing, and a Kubernetes-hosted Triton Inference Server for high-performance, auto-scaling model serving.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

End-to-End Customer Churn Prediction Pipeline

🚀 Quick Start

✨ Features

📂 Project Structure

🤖 The Vision: An LLM-Augmented MLOps Pipeline

Scalability

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
artifacts		artifacts
catboost_info		catboost_info
mlruns		mlruns
.env.example		.env.example
.gitignore		.gitignore
1_data_preprocessing.ipynb		1_data_preprocessing.ipynb
2_model_training.ipynb		2_model_training.ipynb
3_deployment_and_monitoring.ipynb		3_deployment_and_monitoring.ipynb
Dockerfile		Dockerfile
PROJECT_DIAGRAM.md		PROJECT_DIAGRAM.md
README.md		README.md
README_SUMMARY.md		README_SUMMARY.md
app.py		app.py
architecture_diagram.png		architecture_diagram.png
requirements-deploy.txt		requirements-deploy.txt
requirements-dev.txt		requirements-dev.txt

yohn-maistre/production-ml-pipeline-churn

Folders and files

Latest commit

History

Repository files navigation

End-to-End Customer Churn Prediction Pipeline

🚀 Quick Start

✨ Features

📂 Project Structure

🤖 The Vision: An LLM-Augmented MLOps Pipeline

Scalability

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages