CTGAN-ENN: A tabular GAN-based Hybrid Sampling Method for Imbalanced and Overlapped Data in Customer Churn Prediction

Our research project in PhD program at AIDA (Applied Intelligence and Data Analytics) lab, College of Computing, Khon Kaen University, Thailand

Overview

CTGAN is a tabular GAN-based oversampling to address class imbalance but has a class overlap problem. We Combined CTGAN with the ENN under-sampling technique to overcome the class overlap. CTGAN-ENN reduced the number of class overlaps by each feature in all datasets.

The Result

Best F1-Score (0.994) in Mobile dataset with Random Forest Algorithm
Best AUC (1.000) in Mobile dataset with XGBoost Algorithm
Best G-Mean (0.984) in Telco 2 dataset with Random Forest and Gradient Boosting Algorithm

CTGAN-ENN visualization in Bank Dataset

We can see on the picture above, CTGAN-ENN clearly separated the customer churn class blue (not churn) and red (churn) and made machine learning algorithm easily to learn.

Installation

Install CTGAN-ENN using pip:

pip install ctganenn

Usage

Variables

minClass: the minority class in the dataset (dataframe).
majClass: the majority class in the dataset (dataframe).
genData: how much data that you want to generate from minorty class (int).
targetLabel: what is your target label name in dataset (string).

Example Usage

from ctganenn import CTGANENN

use the CTGANENN function with 4 variables

X, y=CTGANENN(minClass,majClass,genData,targetLabel)

Output

the output of method are X and y :

X : all features of your dataset
y : target label of your dataset

Classification process

you can process the X and y variable to the next step for classification stage. For example using Decision Tree Classifier:

model = tree.DecisionTreeClassifier()
classification = model.fit(X, y)

Limitation

CTGAN-ENN on this version only works for binary classification

Acknowledgments

This work was supported by Khon Kaen University ASEAN GMS grant and part of AIDA (Applied Intelligence and Data Analytics) lab in College of Computing, Khon Kaen University, Thailand.

Cite this work

@misc{ctganenn,
  author = {I Nyoman Mahayasa Adiputra, Paweena Wanchai},
  title = {CTGAN-ENN: A tabular GAN-based Hybrid Sampling Method for Imbalanced and Overlapped Data in Customer Churn Prediction},
  year = {2024},
  url = {https://doi.org/10.1186/s40537-024-00982-x}
}

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
cost-sensitive		cost-sensitive
data		data
experiment image		experiment image
image		image
measurement		measurement
ADASYN-INTSERVICE.ipynb		ADASYN-INTSERVICE.ipynb
ADASYN-bankchurn.ipynb		ADASYN-bankchurn.ipynb
ADASYN-club.ipynb		ADASYN-club.ipynb
ADASYN-mobile.ipynb		ADASYN-mobile.ipynb
ADASYN-telco-non-int.ipynb		ADASYN-telco-non-int.ipynb
ADASYN-telco.ipynb		ADASYN-telco.ipynb
ADAYSN-ins.ipynb		ADAYSN-ins.ipynb
CTGAN-INTSERVICE.ipynb		CTGAN-INTSERVICE.ipynb
CTGAN-bankchurn.ipynb		CTGAN-bankchurn.ipynb
CTGAN-club.ipynb		CTGAN-club.ipynb
CTGAN-ins.ipynb		CTGAN-ins.ipynb
CTGAN-mobile.ipynb		CTGAN-mobile.ipynb
CTGAN-telco-non-int.ipynb		CTGAN-telco-non-int.ipynb
CTGAN-telco.ipynb		CTGAN-telco.ipynb
README.md		README.md
SMOTE-INTSERVICE.ipynb		SMOTE-INTSERVICE.ipynb
SMOTE-bankchurn.ipynb		SMOTE-bankchurn.ipynb
SMOTE-club.ipynb		SMOTE-club.ipynb
SMOTE-ins.ipynb		SMOTE-ins.ipynb
XGRFADA-SMOTE-mobile.ipynb		XGRFADA-SMOTE-mobile.ipynb
XGRFADA-SMOTE-telco-non-int.ipynb		XGRFADA-SMOTE-telco-non-int.ipynb
XGRFADA-SMOTE-telco.ipynb		XGRFADA-SMOTE-telco.ipynb
wgangp-bank.ipynb		wgangp-bank.ipynb
wgangp-insurance.ipynb		wgangp-insurance.ipynb
wgangp-mobile.ipynb		wgangp-mobile.ipynb
wgangp-telco1.ipynb		wgangp-telco1.ipynb
wgangp-telco2.ipynb		wgangp-telco2.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CTGAN-ENN: A tabular GAN-based Hybrid Sampling Method for Imbalanced and Overlapped Data in Customer Churn Prediction

Overview

The Result

CTGAN-ENN visualization in Bank Dataset

Installation

Usage

Variables

Example Usage

use the CTGANENN function with 4 variables

Output

Classification process

Limitation

Acknowledgments

Cite this work

About

Uh oh!

Releases

Packages

Languages

mahayasa/gan-hybrid-sampling-customer-churn

Folders and files

Latest commit

History

Repository files navigation

CTGAN-ENN: A tabular GAN-based Hybrid Sampling Method for Imbalanced and Overlapped Data in Customer Churn Prediction

Overview

The Result

CTGAN-ENN visualization in Bank Dataset

Installation

Usage

Variables

Example Usage

use the CTGANENN function with 4 variables

Output

Classification process

Limitation

Acknowledgments

Cite this work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages