In the banking sector, lending money to high-risk clients leads to significant financial losses. This project aims to build a Machine Learning classification model to identify potential defaulters (Class 1) before the credit is approved, allowing the institution to minimize risk and optimize its portfolio.
- Language: Python
- Data Manipulation & Cleaning: Pandas, NumPy
- Machine Learning: Scikit-Learn (k-NN, SVM, StandardScaler)
- Data Visualization: Matplotlib, Seaborn
- Data Cleaning: Handled missing values and removed irrelevant identifiers to ensure model integrity.
- Exploratory Data Analysis (EDA): Identified key patterns, noting that higher interest rates strongly correlated with default rates.
- Feature Engineering: Converted categorical variables into numeric formats (One-Hot Encoding) and scaled features using
StandardScalerto prevent bias towards large numbers (like income). - Model Evaluation: Tested and compared K-Nearest Neighbors (k-NN) and Support Vector Machine (SVM).
The k-NN model outperformed the linear SVM. Because credit data is highly complex and overlapping (non-linear), a distance-based algorithm like k-NN was much more effective at identifying true defaults.
- Best Model: k-NN (k=5)
- Recall (Class 1 - Defaulters): 61%
- Overall Accuracy: 89%
By capturing 61% of potential defaulters accurately, this model provides a solid baseline for risk mitigation, potentially saving millions in bad loans.
π Click here to interact with the Power BI Dashboard
(If you are viewing this on mobile or without Power BI access, check the static preview below)
