This repository contains a machine learning project aimed at predicting the success of a Portuguese bank's phone-based marketing campaign. The goal is to identify strategies that will help increase client engagement and subscriptions to the bank's term deposit products in future campaigns.
The dataset includes information about the bank's previous marketing campaigns, such as client demographics, previous contact results, and the outcome of previous campaigns. By analyzing this data, the project aims to predict whether a client will subscribe to the term deposit product based on the features provided.
- Problem: Predict whether a customer will subscribe to the bank's term deposit product based on features like age, job, marital status, and contact information.
- Solution: Apply machine learning models to the dataset and use various classification algorithms to predict campaign success.
- Outcome: Improve marketing strategies by targeting the right customers and optimizing resource allocation.
- The dataset is provided in the
data/
folder. - Features include demographic data, information about the client's previous interactions with the bank, and the outcome of those interactions.
The following machine learning models were used to solve the classification problem:
- Random Forest Classifier
- XGBoost Classifier
- AdaBoost Classifier
- Gradient Boosting Classifier
- Logistic Regression
- Support Vector Classifier (SVC)
- K-Nearest Neighbors Classifier (KNN)
- Decision Tree Classifier
- Gaussian Naive Bayes (NB)
The models were evaluated using the following metrics:
- Accuracy: The proportion of correct predictions made by the model.
- Precision: The proportion of positive predictions that were actually correct.
- Recall: The proportion of actual positive cases that were correctly predicted.
- F1-Score: The harmonic mean of precision and recall, balancing both metrics.
- Confusion Matrix: A table showing the true positive, false positive, true negative, and false negative results.
After tuning the hyperparameters, the XGBoost Classifier was found to perform the best, with the highest recall score, indicating its ability to correctly identify clients who are more likely to subscribe.
Hyperparameters for models, especially RandomForestClassifier and XGBoost, were tuned using GridSearchCV to find the optimal parameters for the models.
- Random Forest Classifier achieved an accuracy score of 94% on the test data, with a recall of 92% and an F1-score of 94%.
- XGBoost Classifier performed slightly better with an accuracy of 94.29% and higher recall and precision scores.
- Best Model: XGBoost Classifier
- Best Parameters for XGBoost:
n_estimators = 200
learning_rate = 0.1
max_depth = 5
subsample = 0.9
colsample_bytree = 0.9
- Accuracy: 94.29%
- Recall (Class 1): 92%
- F1-Score: 94%
- Confusion Matrix: Showed that the model accurately identified both the "yes" and "no" subscription cases.
You can also view the live Tableau dashboard by following the link below:
Follow these steps to set up and run the project locally:
First, clone the project repository to your local machine using the following command:
```bash
git clone https://github.com/AryanAgarwal27/Bank_Marketing_Analysis.git
cd Bank_Marketing_Analysis
Creating a virtual environment helps isolate the dependencies for the project. To create and activate a virtual environment, run:
```bash
python -m venv venv
.\venv\Scripts\activate
```bash
python -m venv venv
source venv/bin/activate
The project has a list of dependencies stored in the requirements.txt
file. Install them using:
```bash
pip install -r requirements.txt
The dataset used for this project is not included in the repository. However, you can download it from the UCI Machine Learning Repository. Once downloaded, place the dataset (usually named bank-marketing.csv
) into the data/
folder in the project directory.
```plaintext
Bank_Marketing_Analysis/
│
├── data/
│ └── bank-marketing.csv
│
├── src/
│ ├── model.py
│ ├── data_preprocessing.py
│ └── utils.py
├── Picture/
│ ├── Accuracy_XGB.png
│ ├── Confusion_Matrix_XGB.png
│ └── Tableau_dashboard.png
├── requirements.txt
├── LICENSE
├── README.md
└── main.py
Once the dependencies are installed and the dataset is in place, you can run the project locally.
To run the project and train the model, execute the following command:
```bash
python main.py
- If you want to modify the hyperparameters or try different models, you can find the relevant code in the
src/
folder, specifically in themodel.py
anddata_preprocessing.py
files. - For more details on the project or to contribute, feel free to open an issue or create a pull request on GitHub.
This project is licensed under the MIT License - see the LICENSE file for details.