| title | emoji | colorFrom | colorTo | sdk | app_file | pinned | short_description | license |
|---|---|---|---|---|---|---|---|---|
SantanderTransactionClassifcation |
π¦ |
red |
red |
streamlit |
src/streamlit_app.py |
false |
Predict customer transaction probability with a LightGBM. |
mit |
This project predicts the probability that a customer will make a specific transaction in the future. It is based on the Kaggle competition Santander Customer Transaction Prediction.
π» GitHub Repository: [https://github.com/EnYa32/SantanderTransactionClassifcation]
π Kaggle Competition: [https://www.kaggle.com/code/enesyama/santandertransactionclassifcation]
Target Distribution β strong class imbalance β ROC-AUC used as main metric

ROC Curve β LightGBM (AUC β 0.8888)

Confusion Matrix (fixed threshold view)

Top Feature Importances β LightGBM

π Evaluation Notes
- Metric: ROC-AUC (competition metric)
- Strong class imbalance handled via probability modeling
- Threshold used only for interpretability
- Final model selected via cross-model comparison
Given 200 anonymized numeric features (var_0 β¦ var_199), predict the probability that a customer will perform a target transaction.
Challenges:
strong class imbalance
anonymized features (no domain meaning)
probability ranking more important than hard labels
We trained and compared multiple models using ROC-AUC (main metric due to class imbalance):
- Logistic Regression: ROC-AUC β 0.86
- LightGBM (Final): ROC-AUC = 0.8888
- XGBoost: ROC-AUC β 0.8807
LightGBM achieved the best ROC-AUC and was selected as the final model.
- app.py : Streamlit application
- lightgbm_santander_model.pkl : saved LightGBM model (joblib)
- requirements.txt : dependencies
Important: Put
lightgbm_santander_model.pklin the same folder asapp.py.
pip install -r requirements.txt
streamlit run app.py
π§ͺ How to Use the App
Upload a CSV file (e.g., Kaggle test.csv) containing:
ID_code
var_0 ... var_199
The app outputs:
probability (0β1)
predicted label (based on a threshold slider)
Download:
predictions_lightgbm.csv (probability + label)
submission_lightgbm.csv (Kaggle submission format: ID_code, target)
π Notes
Kaggle evaluation uses probabilities (ROC-AUC). Do not apply a threshold for Kaggle submissions.
The threshold in the app is only for display (label).
Due to platform limits, large Kaggle test files are processed locally. This app demonstrates the deployed model on compatible CSV samples.