This repository contains the code and report for the Loan Default Prediction project, which aims to build a machine learning model to predict whether a customer will default on a loan based on historical data.
The project tackles the challenge of predicting loan defaults for a German bank using a dataset containing 17 features and 1000 customer records. The primary goal is to develop an accurate prediction model that aids in identifying potential defaulters and managing risk. • checking_balance - Amount of money available in account of customers • months_loan_duration - Duration since loan taken • credit_history - credit history of each customers • purpose - Purpose why loan has been taken • amount - Amount of loan taken • savings_balance - Balance in account • employment_duration - Duration of employment • percent_of_income - Percentage of monthly income • years_at_residence - Duration of current residence • age - Age of customer • other_credit - Any other credits taken • housing- Type of housing, rent or own • existing_loans_count - Existing count of loans • job - Job type • dependents - Any dependents on customer • phone - Having phone or not • default - Default status (Target column)
data/: Contains the dataset file "German_bank.csv".code/: Contains Python scripts and Jupyter notebooks for data analysis, model building, and evaluation.results/: Includes visualizations and model performance metrics.report/: Contains the final project report in both PDF and Markdown formats.model/: saved model files.readme.txt: The file you're currently reading.
The project is structured as follows:
- Introduction: Provides context and outlines the project's objectives and research questions.
- Methods and Materials: Describes the data analysis process, including Exploratory Data Analysis (EDA) and model building. Includes details on the machine learning models used and their evaluations.
- Results: Reports key findings from the analysis, including insights from EDA and the performance of different machine learning models.
- Discussion: Interprets the findings, discusses implications, limitations of the study, and potential future directions.
- Conclusions: Summarizes the main takeaways and insights from the project.
-
Clone the repository
-
Navigate to the
code/directory and run the Jupyter notebooks to replicate the analysis and model building process. -
The generated visualizations and model performance metrics can be found in the
results/directory. -
The final report is available in the
report/directory in both Docx.
The code and analysis were conducted using Python and various libraries such as Pandas, Matplotlib, Seaborn, Scikit-learn, and Jupyter Notebook. Make sure to have these libraries installed before running the code.
This project was completed as part of MSIS at University Of Arizona by Kapil Jain. The dataset used is sourced from the German bank.
For any questions or inquiries, please contact kapiljain1989@gmail.com.