Skip to content

An end-to-end machine learning pipeline for predicting customer churn, built with scikit-learn, XGBoost, LightGBM, CatBoost, FastAPI, Next.js, and deployed on Google Cloud Run and Vercel.

Notifications You must be signed in to change notification settings

WWI2196/telecom-churn-prediction

Repository files navigation

Telco Customer Churn Prediction

CI/CD Pipeline MLOps Pipeline Frontend CI/CD

Python 3.10+ Next.js FastAPI Docker License: MIT

An end-to-end machine learning pipeline for predicting customer churn, built with scikit-learn, XGBoost, LightGBM, CatBoost, FastAPI, Next.js, and deployed on Google Cloud Run and Vercel.


Table of Contents


Live Deployments

Service URL Description
Frontend churn-prediction-frontend.vercel.app Next.js Dashboard
API churn-prediction-api.run.app FastAPI Prediction Service
API Docs /docs Interactive Swagger UI
Health Check /health Service Health Status

System Architecture

High-Level Architecture

graph TB
    subgraph "Data Layer"
        A[IBM Telco Dataset] --> B[Data Loader]
        B --> C[Data Cleaner]
        C --> D[Feature Engineer]
    end

    subgraph "ML Pipeline"
        D --> E[Data Splitter]
        E --> F[Model Training]
        F --> G[Model Evaluation]
        G --> H[Best Model Selection]
    end

    subgraph "Deployment"
        H --> I[Docker Container]
        I --> J[Google Artifact Registry]
        J --> K[Google Cloud Run]
    end

    subgraph "Frontend"
        L[Next.js App] --> M[Vercel]
        M --> K
    end

    subgraph "CI/CD"
        N[GitHub Actions] --> I
        N --> L
    end
Loading

Request Flow

sequenceDiagram
    participant User
    participant Frontend as Frontend (Vercel)
    participant API as API (Cloud Run)
    participant Model as ML Model

    User->>Frontend: Enter customer data
    Frontend->>API: POST /predict
    API->>Model: Load and predict
    Model-->>API: Prediction result
    API-->>Frontend: JSON response
    Frontend-->>User: Display churn probability
Loading

Model Performance

Best Model: Logistic Regression (ROC-AUC: 84.17%)

Model Comparison

Rank Model ROC-AUC F1-Score Recall Precision Accuracy
1 Logistic Regression 0.8417 0.6173 0.7914 0.5060 0.7395
2 MLP Neural Network 0.8411 0.5892 0.6310 0.5528 0.7722
3 Random Forest 0.8407 0.6309 0.7540 0.5423 0.7658
4 CatBoost 0.8403 0.6198 0.8021 0.5051 0.7388
5 LightGBM 0.8357 0.6256 0.7460 0.5386 0.7630
6 Voting Ensemble 0.8332 0.6168 0.7273 0.5351 0.7587
7 XGBoost 0.8209 0.6168 0.7059 0.5477 0.7672
8 SVM 0.7962 0.6192 0.7914 0.5086 0.7417
9 Decision Tree 0.7573 0.5661 0.7273 0.4634 0.7040

Model Performance Visualization

xychart-beta
    title "Model ROC-AUC Comparison"
    x-axis ["LR", "MLP", "RF", "CatBoost", "LGBM", "Voting", "XGB", "SVM", "DT"]
    y-axis "ROC-AUC Score" 0.70 --> 0.90
    bar [0.8417, 0.8411, 0.8407, 0.8403, 0.8357, 0.8332, 0.8209, 0.7962, 0.7573]
Loading

Top Churn Predictors (SHAP Feature Importance)

xychart-beta
    title "SHAP Feature Importance"
    x-axis ["Contract-Tenure", "Tenure-Charge", "Fiber Optic", "Monthly Charges", "E-Check"]
    y-axis "Importance Score" 0 --> 0.70
    bar [0.61, 0.24, 0.23, 0.21, 0.15]
Loading

Business Metrics

  • Dataset: 7,043 customers (IBM Telco dataset)
  • Churn Rate: 26.5%
  • Model Recall: 80%+ (identifies 4 out of 5 churners)

Quick Start

Prerequisites

  • Python 3.10+
  • Node.js 18+ (for frontend)
  • Docker (optional)

Installation

# Clone repository
git clone https://github.com/WWI2196/telecom-churn-prediction.git
cd telecom-churn-prediction

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Train Models

# Run the full training pipeline
python src/train_pipeline.py

# Or run comprehensive model comparison
python src/compare_all_models.py

Start API Server

# Launch FastAPI server
uvicorn src.api.main:app --reload --port 8000

# Access API docs at http://localhost:8000/docs

Run Frontend

cd frontend
npm install
npm run dev

# Access dashboard at http://localhost:3000

Project Structure

telecom-churn-prediction/
├── .github/workflows/          # CI/CD pipelines
│   ├── ci.yml                  # Main CI/CD pipeline
│   ├── mlops.yml               # MLOps pipeline
│   └── frontend.yml            # Frontend deployment
├── config/
│   └── config.yaml             # Central configuration
├── data/
│   ├── raw/                    # Original dataset
│   ├── processed/              # Cleaned data
│   └── interim/                # Intermediate files
├── docs/                       # Documentation
│   ├── CI_CD_SETUP.md          # CI/CD guide
│   ├── MLOPS_PIPELINE.md       # MLOps documentation
│   ├── CONVERGENCE_GUIDE.md    # ML convergence guide
│   └── FEATURE_SELECTION_GUIDE.md
├── frontend/                   # Next.js dashboard
│   ├── src/
│   │   ├── app/                # App router pages
│   │   ├── components/         # UI components
│   │   └── lib/                # Utilities and API client
│   └── package.json
├── models/
│   ├── baseline/               # Classic ML models
│   ├── advanced/               # Boosting and DL models
│   └── best_model/             # Production model
├── reports/
│   ├── figures/                # Visualizations
│   └── metrics/                # Performance metrics
├── src/
│   ├── api/                    # FastAPI application
│   ├── data/                   # Data processing
│   ├── features/               # Feature engineering
│   ├── models/                 # Model classes
│   ├── visualization/          # Plotting utilities
│   └── train_pipeline.py       # Training orchestrator
├── tests/                      # Unit tests
├── Dockerfile.api              # API container
├── Dockerfile.streamlit        # Dashboard container
├── docker-compose.yml          # Local development
└── requirements.txt

CI/CD Pipeline

Pipeline Overview

flowchart LR
    subgraph "CI/CD Pipeline"
        A[Push to Main] --> B[Lint]
        B --> C[Test]
        C --> D[Security Scan]
        D --> E[Build Docker]
        E --> F[Trivy Scan]
        F --> G[Push to GHCR]
        G --> H[Push to Artifact Registry]
        H --> I[Deploy to Cloud Run]
    end
Loading

Detailed Pipeline Stages

flowchart TB
    subgraph "Stage 1: Code Quality"
        A1[Flake8 Linting]
        A2[Black Formatting]
        A3[isort Import Sorting]
    end

    subgraph "Stage 2: Testing"
        B1[Unit Tests]
        B2[Coverage Report]
        B3[Upload to Codecov]
    end

    subgraph "Stage 3: Security"
        C1[pip-audit]
        C2[Dependency Check]
    end

    subgraph "Stage 4: Build"
        D1[Build API Image]
        D2[Build Dashboard Image]
        D3[Save Artifacts]
    end

    subgraph "Stage 5: Container Security"
        E1[Trivy Container Scan]
        E2[Trivy IaC Scan]
        E3[SARIF Upload]
    end

    subgraph "Stage 6: Push"
        F1[Push to GHCR]
        F2[Tag with SHA]
    end

    subgraph "Stage 7: Deploy"
        G1[Push to Artifact Registry]
        G2[Deploy to Cloud Run]
        G3[Health Check]
    end

    A1 --> B1
    A2 --> B1
    A3 --> B1
    B1 --> C1
    C1 --> D1
    D1 --> E1
    E1 --> F1
    F1 --> G1
Loading

Workflow Files

Workflow File Trigger Purpose
CI/CD Pipeline ci.yml Push to main/develop Build, test, deploy API
MLOps Pipeline mlops.yml Push, weekly schedule Train and validate models
Frontend CI/CD frontend.yml Changes to frontend/ Deploy to Vercel

GitHub Secrets Required

Secret Description
GCP_PROJECT_ID Google Cloud project ID
GCP_SA_KEY Service account JSON key
VERCEL_TOKEN Vercel deployment token
VERCEL_ORG_ID Vercel organization ID
VERCEL_PROJECT_ID Vercel project ID

GitHub Variables

Variable Description
API_URL Production API URL for frontend

MLOps Pipeline

Pipeline Overview

flowchart TB
    subgraph "Data Ingestion"
        A[Download Dataset] --> B[Data Validation]
        B --> C[Data Cleaning]
    end

    subgraph "Feature Engineering"
        C --> D[Feature Creation]
        D --> E[Feature Selection]
        E --> F[Data Splitting]
    end

    subgraph "Model Training"
        F --> G[Baseline Models]
        F --> H[Advanced Models]
        G --> I[Model Evaluation]
        H --> I
    end

    subgraph "Model Selection"
        I --> J{ROC-AUC >= 0.75?}
        J -->|Yes| K[Save Best Model]
        J -->|No| L[Training Failed]
    end

    subgraph "Deployment"
        K --> M[Build Container]
        M --> N[Deploy to Cloud Run]
    end

    subgraph "Monitoring"
        N --> O[Performance Tracking]
        O --> P[Drift Detection]
        P -->|Drift Detected| A
    end
Loading

Training Pipeline Detail

flowchart LR
    subgraph "Data Processing"
        A[Raw Data<br/>7,043 rows] --> B[Clean Data<br/>Remove nulls]
        B --> C[Engineer Features<br/>+15 new features]
    end

    subgraph "Splitting"
        C --> D[Train Set<br/>60%]
        C --> E[Validation Set<br/>20%]
        C --> F[Test Set<br/>20%]
    end

    subgraph "Training"
        D --> G[Train 9 Models]
        E --> H[Hyperparameter Tuning]
        G --> H
    end

    subgraph "Evaluation"
        H --> I[Evaluate on Test]
        F --> I
        I --> J[Select Best Model]
    end
Loading

Model Training Workflow

stateDiagram-v2
    [*] --> DataIngestion
    DataIngestion --> DataValidation
    DataValidation --> DataCleaning
    DataCleaning --> FeatureEngineering
    FeatureEngineering --> DataSplitting
    DataSplitting --> ModelTraining
    ModelTraining --> ModelEvaluation
    ModelEvaluation --> ThresholdCheck
    ThresholdCheck --> ModelDeployment: Pass
    ThresholdCheck --> TrainingFailed: Fail
    ModelDeployment --> Monitoring
    Monitoring --> [*]
    TrainingFailed --> [*]
Loading

API Documentation

Endpoints

Method Endpoint Description
GET /health Health check and model status
POST /predict Single customer prediction
POST /predict/batch Batch predictions
GET /docs Interactive API documentation

Request/Response Flow

sequenceDiagram
    participant Client
    participant FastAPI
    participant Validator
    participant Model
    participant Response

    Client->>FastAPI: POST /predict
    FastAPI->>Validator: Validate input
    Validator-->>FastAPI: Valid
    FastAPI->>Model: Load model
    Model->>Model: Preprocess features
    Model->>Model: Generate prediction
    Model-->>FastAPI: Prediction result
    FastAPI->>Response: Format JSON
    Response-->>Client: Return prediction
Loading

Example Request

curl -X POST "https://<your-service>.run.app/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "tenure": 12,
    "MonthlyCharges": 75.50,
    "TotalCharges": 906.0,
    "Contract": "Month-to-month",
    "InternetService": "Fiber optic",
    "PaymentMethod": "Electronic check",
    "gender": "Male",
    "SeniorCitizen": 0,
    "Partner": "No",
    "Dependents": "No",
    "PhoneService": "Yes",
    "MultipleLines": "No",
    "OnlineSecurity": "No",
    "OnlineBackup": "No",
    "DeviceProtection": "No",
    "TechSupport": "No",
    "StreamingTV": "No",
    "StreamingMovies": "No",
    "PaperlessBilling": "Yes"
  }'

Example Response

{
  "customer_id": "CUST-001",
  "churn_probability": 0.72,
  "churn_prediction": "Yes",
  "risk_level": "High",
  "confidence": "85%"
}

Docker Deployment

Local Development

# Build and run all services
docker-compose up -d

# Services available:
# - API:       http://localhost:8000
# - Dashboard: http://localhost:8501

Container Architecture

graph TB
    subgraph "Docker Compose Stack"
        A[docker-compose.yml]
        
        subgraph "API Container"
            B[FastAPI]
            C[Uvicorn]
            D[ML Model]
        end
        
        subgraph "Dashboard Container"
            E[Streamlit]
            F[Visualization]
        end
        
        subgraph "Monitoring Stack"
            G[Prometheus]
            H[Grafana]
        end
    end
    
    A --> B
    A --> E
    A --> G
    B --> D
    G --> B
    H --> G
Loading

Production Deployment

# Full MLOps stack with monitoring
docker-compose -f docker-compose.monitoring.yml up -d

# Services:
# - API:        http://localhost:8000
# - Dashboard:  http://localhost:8501
# - Prometheus: http://localhost:9090
# - Grafana:    http://localhost:3000

Monitoring

Monitoring Architecture

graph LR
    subgraph "Application"
        A[FastAPI] --> B[Metrics Endpoint]
    end
    
    subgraph "Collection"
        B --> C[Prometheus]
    end
    
    subgraph "Visualization"
        C --> D[Grafana]
    end
    
    subgraph "Alerting"
        C --> E[AlertManager]
        E --> F[Email/Slack]
    end
Loading

Metrics Tracked

mindmap
  root((Metrics))
    Model Performance
      ROC-AUC
      Precision
      Recall
      F1-Score
    API Health
      Request Rate
      Latency P50/P95
      Error Rate
      Active Connections
    Data Quality
      Feature Drift
      Prediction Distribution
      Missing Values
    Infrastructure
      CPU Usage
      Memory Usage
      Container Health
Loading

Alert Rules

Alert Condition Severity
Model Accuracy Drop ROC-AUC < 0.75 Critical
High API Latency P95 > 500ms Warning
Data Drift Detected PSI > 0.2 Warning
High Error Rate > 5% requests Critical

Documentation

Document Description
CI/CD Setup Guide Complete CI/CD pipeline documentation
MLOps Pipeline Guide ML pipeline architecture and configuration
Convergence Guide Understanding model convergence
Feature Selection Guide Feature engineering documentation
Models Guide Model comparison and usage
API Reference Interactive API documentation (see deployed /docs endpoint)

Configuration

Edit config/config.yaml to customize:

# Model settings
models:
  random_state: 42
  
# Data splitting
splitting:
  test_size: 0.2
  validation_size: 0.2
  
# Optimization
optimization:
  n_trials: 100
  cv_folds: 5
  
# Thresholds
thresholds:
  roc_auc_minimum: 0.75

Testing

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=src --cov-report=html

# Run specific test file
pytest tests/unit/test_data_loader.py -v

References


License

This project is licensed under the MIT License - see the LICENSE file for details.


About

An end-to-end machine learning pipeline for predicting customer churn, built with scikit-learn, XGBoost, LightGBM, CatBoost, FastAPI, Next.js, and deployed on Google Cloud Run and Vercel.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published