Skip to content

EnYa32/SantanderTransactionClassifcation

Repository files navigation

title emoji colorFrom colorTo sdk app_file pinned short_description license
SantanderTransactionClassifcation
🏦
red
red
streamlit
src/streamlit_app.py
false
Predict customer transaction probability with a LightGBM.
mit

🏦 Santander Customer Transaction Prediction (LightGBM)

This project predicts the probability that a customer will make a specific transaction in the future. It is based on the Kaggle competition Santander Customer Transaction Prediction.

🔗 Live Demo & Code

💻 GitHub Repository: [https://github.com/EnYa32/SantanderTransactionClassifcation]

🏁 Kaggle Competition: [https://www.kaggle.com/code/enesyama/santandertransactionclassifcation]

📊 Visual Evaluation

Target Distribution — strong class imbalance → ROC-AUC used as main metric
Target Distribution

ROC Curve — LightGBM (AUC ≈ 0.8888)
ROC Curve

Confusion Matrix (fixed threshold view)
Confusion Matrix

Top Feature Importances — LightGBM
Feature Importance

🔍 Evaluation Notes

  • Metric: ROC-AUC (competition metric)
  • Strong class imbalance handled via probability modeling
  • Threshold used only for interpretability
  • Final model selected via cross-model comparison

Problem Statement

Given 200 anonymized numeric features (var_0 … var_199), predict the probability that a customer will perform a target transaction.

Challenges:

strong class imbalance

anonymized features (no domain meaning)

probability ranking more important than hard labels

✅ Model

We trained and compared multiple models using ROC-AUC (main metric due to class imbalance):

  • Logistic Regression: ROC-AUC ≈ 0.86
  • LightGBM (Final): ROC-AUC = 0.8888
  • XGBoost: ROC-AUC ≈ 0.8807

LightGBM achieved the best ROC-AUC and was selected as the final model.

📁 Project Files

  • app.py : Streamlit application
  • lightgbm_santander_model.pkl : saved LightGBM model (joblib)
  • requirements.txt : dependencies

Important: Put lightgbm_santander_model.pkl in the same folder as app.py.

🚀 How to Run Locally

pip install -r requirements.txt
streamlit run app.py
🧪 How to Use the App
Upload a CSV file (e.g., Kaggle test.csv) containing:

ID_code

var_0 ... var_199

The app outputs:

probability (0–1)

predicted label (based on a threshold slider)

Download:

predictions_lightgbm.csv (probability + label)

submission_lightgbm.csv (Kaggle submission format: ID_code, target)

📌 Notes
Kaggle evaluation uses probabilities (ROC-AUC). Do not apply a threshold for Kaggle submissions.

The threshold in the app is only for display (label).

Due to platform limits, large Kaggle test files are processed locally. This app demonstrates the deployed model on compatible CSV samples.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors