Skip to content

iBrokeTheCode/Home_Credit_Default_Risk_Prediction

Repository files navigation

title emoji colorFrom colorTo sdk pinned license short_description
Home Credit Default Risk Prediction
🍃
indigo
purple
docker
true
mit
ML Classification models applied to Home Credit Risk dataset

🏦 Home Credit Default Risk Prediction

Table of Contents

  1. Project Description
  2. Methodology & Key Features
  3. Technology Stack
  4. Dataset

1. Project Description

This project focuses on building a machine learning pipeline to predict a client's ability to repay a loan. It is a binary classification task that uses a real-world financial dataset to identify clients who may face payment difficulties.

The project goes beyond a standard model by including a practical application that:

  • Preprocesses and cleans the dataset for model training.
  • Trains a machine learning model to predict loan repayment risk.
  • Deploys an interactive predictor app using Marimo, hosted on Hugging Face Spaces.
  • Allows users to make predictions by providing the top 10 most influential features.

This work showcases a complete end-to-end workflow, transforming raw data into a functional, user-friendly tool for risk assessment.

Important

App

2. Methodology & Key Features

  • Model Selection: Four different models were trained and evaluated, with LightGBM selected as the final model due to its superior performance, achieving a ROC AUC score of 0.751 on the test set.
  • Automated Preprocessing: The data preprocessing pipeline handles common tasks such as feature scaling and categorical encoding, ensuring the model receives clean and formatted data.
  • Interactive Predictor: An application built with Marimo allows users to interact with the trained model directly. It uses the top 10 most important features—identified from the final LightGBM model—to generate real-time predictions.

3. Technology Stack

This project was built using the following technologies and libraries:

Dashboard & Hosting:

  • Marimo: A Python library for building interactive dashboards.
  • Hugging Face Spaces: Used for hosting and sharing the interactive dashboard.

Data Analysis & Visualization:

  • Pandas: For data manipulation and analysis.
  • Matplotlib: For creating static visualizations.
  • Seaborn: For creating statistical graphics.

Modeling & Training:

  • Scikit-Learn: For machine learning tasks such as preprocessing, feature engineering, and model training.
  • LightGBM: It is a gradient boosting framework that uses tree based learning algorithms.

Development Tools:

  • Ruff: A fast Python linter and code formatter.
  • uv: A fast Python package installer and resolver.

4. Dataset

This project utilizes the Home Credit Default Risk from Kaggle, a public dataset containing details on over 246,000 of individuals who have made payments on their loans.

About

ML Classification models applied to Home Credit Risk dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published