Loan Default Prediction Analysis

Overview

This Jupyter notebook implements a machine learning pipeline to predict loan default risk. The analysis follows the steps of data collection, visualization, preprocessing, feature engineering, model development, evaluation, and provides recommendations for lenders.

Data Collection

Training Data: Contains borrower features (demographics, financials) and the target label Risk_Flag (0 = no default, 1 = default).
Test Data: Contains the same input features with an identifier column Id, but no target label. Reserved for final predictions.

Data Visualization

Bar charts of default rates by categorical variables (Marital Status, House Ownership, Car Ownership, Profession, City, State).
KDE plots comparing distributions of numeric features (Income, Age, Experience, Job Tenure, House Tenure) for defaulters vs. non-defaulters.

Data Preprocessing

Duplicate Removal & Column Normalization: Remove repeated records; standardize column names (lowercase, no whitespace).
ID Column Removal: Drop Id column to avoid leakage.
Encoding: Convert categorical and boolean fields to numeric.
Out-of-Fold Target Encoding: Apply to high-cardinality features (Profession, City, State) to generate risk scores without leakage.

Data Analysis

Correlation & Mutual Information

Compute Pearson correlation between features and Risk_Flag.
Assess non-linear relationships with mutual information.

Interaction Heatmap

Bin Income and Job Tenure into quintiles and visualize default rates to justify interaction features.

Feature Engineering

Job Stability: current_job_yrs / (experience + 1)
Residence Stability: current_house_yrs * (1 + house_owned)
Age Buckets: Discretize age into cohorts to capture non-linear effects.
Interaction Terms: income * job_stability and income * profession_risk.

Model Development

Data Split: 60% training (with SMOTE), 20% validation, 20% test (stratified).
Hyperparameter Tuning: Manual validation curves for:
- Logistic Regression (C)
- Decision Tree (max_depth, min_samples_leaf, max_features)
- Random Forest (n_estimators, max_depth, min_samples_leaf, max_features)
Final Model: Random Forest with 300 trees, max_depth=15, min_samples_leaf=5, max_features='sqrt'.

Model Evaluation

Metrics: ROC AUC, precision, recall, F1-score on training, validation, and test sets.
Lift Curve: Shows concentration of defaulters across risk deciles.

Results & Recommendations

Key Risk Factors: Stability ratios, age cohorts, geographic/occupational risk scores, interaction terms.
Model Performance: Training AUC ~0.98, Validation/Test AUC ~0.93/~0.92, recall of defaults ~0.74.
Recommended Actions:
- Integrate stability ratios into underwriting.
- Tailor policies for youngest/oldest age buckets.
- Adjust pricing/documentation by profession and region.
- Monitor top risk deciles with targeted outreach.
- Retrain model quarterly to adapt to new conditions.

Usage Instructions

Install dependencies:

pip install pandas numpy scikit-learn imbalanced-learn matplotlib seaborn

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
loan_default_prediction.ipynb		loan_default_prediction.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Loan Default Prediction Analysis

Overview

Table of Contents

Data Collection

Data Visualization

Data Preprocessing

Data Analysis

Correlation & Mutual Information

Interaction Heatmap

Feature Engineering

Model Development

Model Evaluation

Results & Recommendations

Usage Instructions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Loan Default Prediction Analysis

Overview

Table of Contents

Data Collection

Data Visualization

Data Preprocessing

Data Analysis

Correlation & Mutual Information

Interaction Heatmap

Feature Engineering

Model Development

Model Evaluation

Results & Recommendations

Usage Instructions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages