Skip to content

American Express Default Classifier: Credit Line - the project aims to utilize statistics, Python and engineering techniques through the Data Science Pipeline to build a learner that will successfully classify the probability of a customer defaulting.

Notifications You must be signed in to change notification settings

CamiloDS16/capstone_project-amex-credit-default-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 

Repository files navigation

capstone_project-amex-credit-default-

Predicting Credit Card Default likelihood: American Express

Table of Contents

  1. Introduction
  2. Data Wrangling
  3. Exploratory Data Analysis
  4. Data Preprocessing and Training
  5. Modeling
  6. Conclusions and Recommendations
  7. References
  8. Installation
  9. Technologies Used
  10. Contact

Introduction

This project seeks to address the significant challenge of credit default faced by American Express by leveraging data science. By developing a predictive model, we aim to forecast the likelihood of a customer defaulting on their credit card payments, thereby aiding American Express in managing credit default risks more effectively.

Business Problem

Credit defaults pose substantial financial risks to American Express. The goal is to harness data science to create a classification model using 2021 customer data to predict credit card payment defaults, enabling proactive mitigation strategies and fostering a financially secure customer-issuer relationship.

Data Wrangling

We performed data cleaning and preparation by loading data from a CSV file, standardizing nomenclatures, handling missing values, and profiling the dataset to ensure optimal design for future analysis steps.

Exploratory Data Analysis

We visualized the dataset to understand the inherent dynamics within the data, identify multicollinearity, and observe class imbalance issues which are crucial for the next steps of preprocessing and modeling.

Data Preprocessing and Training

In this stage, multicollinearity was addressed by dropping redundant features, the dataset was scaled using MinMaxScaler, and the class imbalance was handled using SMOTE to ensure a balanced dataset for effective modeling.

Multicollinearity: Feature Selection

Addressed multicollinearity by dropping features exhibiting high correlation to reduce redundancy and improve model performance.

Scaling Dataset: MinMaxScaler

Employed MinMaxScaler to harmonize the range of features, ensuring each feature has an equal opportunity to influence the model.

Class Imbalance: SMOTE

We utilized SMOTE to handle class imbalance, enhancing the dataset with more instances of the minority class for a balanced training set.

Modeling

The modeling phase involved evaluating different metrics, hyperparameter tuning, and selecting the XGBoost Classifier due to its high performance, efficiency, and suitability for handling imbalanced datasets.

Metrics

Employed AUC-ROC and AUC-PRC as primary metrics to evaluate model performance, with AUC-ROC used as a baseline for model selection due to its robustness across various thresholds.

Hyperparameter Tuning

Used RandomizedSearchCV for efficient hyperparameter tuning, optimizing the model's learning characteristics without exhaustive computational demand.

Model Selection

In this section, we evaluated the models based on their performance in an unseen test set and selected the best model to apply.

Conclusions and Recommendations

The analysis underscores the critical challenge of credit defaults and highlights the efficacy of the selected model in addressing this issue. Recommendations include continuous model evaluation, further feature engineering, devising risk mitigation strategies, and ensuring model interpretability, fairness, and regulatory compliance.

References

The references section lists all the external resources and data sources referred to in the project

Installation

The project was implemented in Python 3.8. To install the required packages, use the following command:

pip install -r requirements.txt

Technologies Used:

  • Python
  • Numpy
  • Pandas
  • Scikit-Learn
  • Matplotlib
  • Seaborn

Contact

If you have any questions, comments, or would like to contribute, please feel free to contact me at [email protected].

About

American Express Default Classifier: Credit Line - the project aims to utilize statistics, Python and engineering techniques through the Data Science Pipeline to build a learner that will successfully classify the probability of a customer defaulting.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published