🏦 Home Credit Default Risk Prediction

title	emoji	colorFrom	colorTo	sdk	pinned	license	short_description
Home Credit Default Risk Prediction	🍃	indigo	purple	docker	true	mit	ML Classification models applied to Home Credit Risk dataset

🏦 Home Credit Default Risk Prediction

1. Project Description

This project focuses on building a machine learning pipeline to predict a client's ability to repay a loan. It is a binary classification task that uses a real-world financial dataset to identify clients who may face payment difficulties.

The project goes beyond a standard model by including a practical application that:

Preprocesses and cleans the dataset for model training.
Trains a machine learning model to predict loan repayment risk.
Deploys an interactive predictor app using Marimo, hosted on Hugging Face Spaces.
Allows users to make predictions by providing the top 10 most influential features.

This work showcases a complete end-to-end workflow, transforming raw data into a functional, user-friendly tool for risk assessment.

Important

Check out the deployed app here: 👉️ Home Credit Default Risk Prediction App 👈️
Check out the Jupyter Notebook for a detailed walkthrough of the project here: 👉️ Jupyter Notebook 👈️

2. Methodology & Key Features

Model Selection: Four different models were trained and evaluated, with LightGBM selected as the final model due to its superior performance, achieving a ROC AUC score of 0.751 on the test set.
Automated Preprocessing: The data preprocessing pipeline handles common tasks such as feature scaling and categorical encoding, ensuring the model receives clean and formatted data.
Interactive Predictor: An application built with Marimo allows users to interact with the trained model directly. It uses the top 10 most important features—identified from the final LightGBM model—to generate real-time predictions.

3. Technology Stack

This project was built using the following technologies and libraries:

Dashboard & Hosting:

Marimo: A Python library for building interactive dashboards.
Hugging Face Spaces: Used for hosting and sharing the interactive dashboard.

Data Analysis & Visualization:

Pandas: For data manipulation and analysis.
Matplotlib: For creating static visualizations.
Seaborn: For creating statistical graphics.

Modeling & Training:

Scikit-Learn: For machine learning tasks such as preprocessing, feature engineering, and model training.
LightGBM: It is a gradient boosting framework that uses tree based learning algorithms.

Development Tools:

Ruff: A fast Python linter and code formatter.
uv: A fast Python package installer and resolver.

4. Dataset

This project utilizes the Home Credit Default Risk from Kaggle, a public dataset containing details on over 246,000 of individuals who have made payments on their loans.

Source: Kaggle Dataset

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
dataset		dataset
model		model
public		public
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LESSONS.md		LESSONS.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
app_bk.py		app_bk.py
development.md		development.md
requirements.txt		requirements.txt
tutorial_app.ipynb		tutorial_app.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🏦 Home Credit Default Risk Prediction

Table of Contents

1. Project Description

2. Methodology & Key Features

3. Technology Stack

4. Dataset

About

Uh oh!

Releases

Packages

Languages

License

iBrokeTheCode/Home_Credit_Default_Risk_Prediction

Folders and files

Latest commit

History

Repository files navigation

🏦 Home Credit Default Risk Prediction

Table of Contents

1. Project Description

2. Methodology & Key Features

3. Technology Stack

4. Dataset

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages