Skip to content

Tdoggie/telecom-chrun-predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📞 SyriaTel Customer Churn Prediction

This README provides a high-level overview of the project approach and key findings. For detailed analysis, step-by-step methodology, code implementation, and comprehensive visualizations, please refer to the complete notebook.

🎯 Project Overview

This project develops a comprehensive machine learning solution to predict customer churn for SyriaTel, a telecommunications company. By identifying customers who are likely to discontinue their service, the company can proactively implement retention strategies to reduce revenue loss and improve customer satisfaction.

Business Problem: Customer churn is a critical challenge in the telecommunications industry, where acquiring new customers is significantly more expensive than retaining existing ones. SyriaTel needs to:

Technical Objective: Develop a binary classification model that can accurately predict which customers are likely to churn based on their usage patterns, service history, and demographic information. The model should prioritize:

  • High recall to catch most churning customers
  • Balanced precision to avoid excessive false positives
  • Interpretability to provide actionable business insights

📊 Dataset Description

The analysis uses telecommunications customer data containing:

  • Customer demographics and account information
  • Service usage patterns (calls, minutes, charges across different time periods)
  • Plan features and service options
  • Customer service interactions
  • Target variable: Binary churn indicator (churned/not churned)

🔬 Data Science Methodology

1️⃣ Data Understanding & Exploration

  • Initial Assessment: Dataset structure, missing values, and basic statistics
  • Target Analysis: Churn distribution and class imbalance evaluation
  • Feature Analysis: Identification of numerical vs. categorical variables

2️⃣ Exploratory Data Analysis (EDA)

  • Correlation Analysis: Understanding relationships between features
  • Distribution Analysis: Examining feature distributions by churn status
  • Categorical Analysis: Cross-tabulation of categorical features with churn
  • Pattern Discovery: Identifying key indicators of customer churn

3️⃣ Feature Engineering

  • New Feature Creation:
    • Average call duration metrics
    • Total usage aggregations across time periods
    • Usage intensity indicators
  • Categorical Encoding: Label encoding for categorical variables
  • Data Preprocessing: Handling data types and missing values

4️⃣ Feature Selection

  • Statistical Selection: Using F-score to identify top predictive features
  • Dimensionality Reduction: Selecting the most relevant features for modeling
  • Feature Importance Analysis: Understanding which factors drive churn

5️⃣ Model Development & Evaluation

  • Multiple Algorithm Testing:
    • Logistic Regression (with scaling)
    • Random Forest Classifier
    • Gradient Boosting Classifier
    • Decision Tree Classifier
  • Cross-Validation: 5-fold stratified cross-validation for robust evaluation
  • Performance Metrics: ROC-AUC, precision, recall, F1-score, and accuracy

6️⃣ Model Optimization

  • Hyperparameter Tuning: Grid search optimization for the best performing model
  • Performance Comparison: Evaluating improvements from tuning
  • Final Model Selection: Choosing the optimal model configuration

7️⃣ Business Impact Analysis

  • Confusion Matrix Interpretation: Understanding true/false positives and negatives
  • Cost-Benefit Analysis: Evaluating business impact of model predictions
  • Actionable Insights: Translating model results into business recommendations

🔍 Model Interpretability

The model provides insights into:

  • Feature Importance: Which customer characteristics most strongly predict churn
  • Risk Segmentation: How to categorize customers by churn probability
  • Business Drivers: Understanding the root causes of customer churn
  • Intervention Points: When and how to implement retention strategies

📈 Model Performance

The final model achieves:

  • ROC-AUC Score: Measures overall model discrimination ability
  • Precision: Percentage of predicted churners who actually churn
  • Recall: Percentage of actual churners correctly identified
  • F1-Score: Balanced measure of precision and recall

Specific performance metrics will be displayed upon running the complete pipeline.

💡 Business Recommendations

Based on the model results, the following strategies are recommended:

Immediate Actions

  1. Target High-Risk Customers: Focus retention efforts on customers identified by the model
  2. Proactive Outreach: Contact customers with high churn probability before they leave
  3. Personalized Offers: Design retention campaigns based on key churn factors

⚙️ Technical Requirements

# Core Libraries
pandas >= 1.3.0
numpy >= 1.21.0
scikit-learn >= 1.0.0
matplotlib >= 3.4.0
seaborn >= 0.11.0

# Additional Requirements
warnings (built-in)

📁 File Structure

project_directory/
│
├── Data/
│   └── telecom.csv          # Dataset file
├── index.ipynb              # Jupyter Notebook
├── README.md                # This file
└── presentation.pdf         # Non-technical Presentation

🚀 Usage Instructions

  1. Data Preparation: Ensure telecom.csv is in the Data/ directory
  2. Environment Setup: Install required libraries
  3. Run Analysis: Execute the Python script
  4. Review Results: Examine generated visualizations and performance metrics
  5. Business Implementation: Use recommendations for retention strategy

About

A comprehensive machine learning project that predicts customer churn for SyriaTel telecommunications company. Using Python and scikit-learn, this end-to-end data science pipeline analyzes customer usage patterns to identify at-risk customers, enabling proactive retention strategies and reducing revenue loss through targeted interventions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors