🏦 Santander Customer Transaction Prediction (LightGBM)

title	emoji	colorFrom	colorTo	sdk	app_file	pinned	short_description	license
SantanderTransactionClassifcation	🏦	red	red	streamlit	src/streamlit_app.py	false	Predict customer transaction probability with a LightGBM.	mit

🏦 Santander Customer Transaction Prediction (LightGBM)

This project predicts the probability that a customer will make a specific transaction in the future. It is based on the Kaggle competition Santander Customer Transaction Prediction.

🔗 Live Demo & Code

💻 GitHub Repository: [https://github.com/EnYa32/SantanderTransactionClassifcation]

🏁 Kaggle Competition: [https://www.kaggle.com/code/enesyama/santandertransactionclassifcation]

📊 Visual Evaluation

Target Distribution — strong class imbalance → ROC-AUC used as main metric

ROC Curve — LightGBM (AUC ≈ 0.8888)

Confusion Matrix (fixed threshold view)

Top Feature Importances — LightGBM

🔍 Evaluation Notes

Metric: ROC-AUC (competition metric)
Strong class imbalance handled via probability modeling
Threshold used only for interpretability
Final model selected via cross-model comparison

Problem Statement

Given 200 anonymized numeric features (var_0 … var_199), predict the probability that a customer will perform a target transaction.

Challenges:

strong class imbalance

anonymized features (no domain meaning)

probability ranking more important than hard labels

✅ Model

We trained and compared multiple models using ROC-AUC (main metric due to class imbalance):

Logistic Regression: ROC-AUC ≈ 0.86
LightGBM (Final): ROC-AUC = 0.8888
XGBoost: ROC-AUC ≈ 0.8807

LightGBM achieved the best ROC-AUC and was selected as the final model.

📁 Project Files

app.py : Streamlit application
lightgbm_santander_model.pkl : saved LightGBM model (joblib)
requirements.txt : dependencies

Important: Put lightgbm_santander_model.pkl in the same folder as app.py.

🚀 How to Run Locally

pip install -r requirements.txt
streamlit run app.py
🧪 How to Use the App
Upload a CSV file (e.g., Kaggle test.csv) containing:

ID_code

var_0 ... var_199

The app outputs:

probability (0–1)

predicted label (based on a threshold slider)

Download:

predictions_lightgbm.csv (probability + label)

submission_lightgbm.csv (Kaggle submission format: ID_code, target)

📌 Notes
Kaggle evaluation uses probabilities (ROC-AUC). Do not apply a threshold for Kaggle submissions.

The threshold in the app is only for display (label).

Due to platform limits, large Kaggle test files are processed locally. This app demonstrates the deployed model on compatible CSV samples.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
notebooks		notebooks
README.md		README.md
requirements.txt		requirements.txt
santander_class_balance.png		santander_class_balance.png
santander_confusion_matrix.png		santander_confusion_matrix.png
santander_feature_importance.png		santander_feature_importance.png
santander_roc_curve.png		santander_roc_curve.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏦 Santander Customer Transaction Prediction (LightGBM)

🔗 Live Demo & Code

📊 Visual Evaluation

Problem Statement

✅ Model

📁 Project Files

🚀 How to Run Locally

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🏦 Santander Customer Transaction Prediction (LightGBM)

🔗 Live Demo & Code

📊 Visual Evaluation

Problem Statement

✅ Model

📁 Project Files

🚀 How to Run Locally

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages