A production-ready MLOps pipeline for predicting bank term deposit subscriptions using XGBoost.
In banking, accurate prediction of which customers are likely to subscribe to term deposits helps optimize marketing campaigns and increase conversion rates. This project provides a production-ready prediction solution that:
- Predicts the likelihood of customers subscribing to term deposits
- Handles class imbalance common in marketing datasets
- Implements feature selection to identify key factors influencing subscriptions
- Provides interactive visualizations of model performance
This project uses the Bank Marketing dataset from the UCI Machine Learning Repository. The dataset contains:
- Customer demographic information (age, job, marital status, education)
- Financial attributes (housing, loan, balance)
- Campaign details (contact channel, day, month, duration)
- Previous campaign outcomes
- Target variable: whether the client subscribed to a term deposit (yes/no)
The data loader will automatically download and cache the dataset if it's not available locally. No need to manually download the data!
The project implements a complete ML pipeline with the following steps:
- Data Loading: Auto-download or load the bank marketing dataset
- Data Cleaning: Handle missing values and outliers
- Data Preprocessing: Process categorical variables, drop unnecessary columns
- Data Splitting: Split data into training and test sets
- Model Training: Train an XGBoost classifier with selected features
- Model Evaluation: Evaluate model performance and visualize results with interactive HTML visualization
This solution uses XGBoost, specifically designed to handle:
- Class Imbalance: Targets the common problem in marketing datasets where positive responses are rare
- Feature Importance: Automatically identifies and ranks the most influential factors
- Scalability: Efficiently processes large customer datasets
- Performance: Consistently outperforms traditional classifiers for this type of prediction task
- Python 3.9+
- ZenML installed and configured
# Clone the repository
git clone https://github.com/zenml-io/zenml-projects.git
cd zenml-projects/bank_subscription_prediction
# Install dependencies
pip install -r requirements.txt
# Initialize ZenML (if needed)
zenml init
python run.py
python run.py --config configs/more_trees.yaml
Config File | Description | Key Parameters |
---|---|---|
baseline.yaml |
Default XGBoost parameters | Base estimators and depth |
more_trees.yaml |
Increased number of estimators | 200 estimators |
deeper_trees.yaml |
Increased maximum tree depth | Max depth of 5 |
bank_subscription_prediction/
├── configs/ # YAML Configuration files
│ ├── __init__.py
│ ├── baseline.yaml # Baseline experiment config
│ ├── more_trees.yaml # Config with more trees
│ └── deeper_trees.yaml# Config with deeper trees
├── pipelines/ # ZenML pipeline definitions
│ ├── __init__.py
│ └── training_pipeline.py
├── steps/ # ZenML pipeline steps
│ ├── __init__.py
│ ├── data_loader.py
│ ├── data_cleaner.py
│ ├── data_preprocessor.py
│ ├── data_splitter.py
│ ├── model_trainer.py
│ └── model_evaluator.py
├── utils/ # Utility functions and helpers
│ ├── __init__.py
│ └── model_utils.py
├── __init__.py
├── requirements.txt # Project dependencies
├── README.md # Project documentation
└── run.py # Main script to run the pipeline
You can create new YAML configuration files by copying and modifying existing ones:
# my_custom_config.yaml
# Start with copying an existing config and modify the values
# environment configuration
settings:
docker:
required_integrations:
- sklearn
- pandas
- numpy
requirements:
- matplotlib
- xgboost
- plotly
- click
- pyarrow
# Model Control Plane config
model:
name: bank_subscription_classifier
version: 0.1.0
license: MIT
description: A bank term deposit subscription classifier
tags: ["bank_marketing", "classifier", "xgboost"]
# Custom step parameters
steps:
# ...other step params...
train_xgb_model_with_feature_selection:
n_estimators: 300
max_depth: 4
# ...other parameters...
A retail bank uses this pipeline to:
- Train models on historical marketing campaign data
- Identify key customer segments most likely to convert
- Deploy targeted campaigns to high-probability customers
- Achieve 35% higher conversion rates with 25% lower campaign costs
This solution can be integrated with existing banking systems:
- CRM Systems: Feed predictions into customer relationship management systems
- Marketing Automation: Provide segments for targeted campaign execution
- BI Dashboards: Export prediction insights to business intelligence tools
- Customer Service: Prioritize high-value potential customers for follow-up
This project is based on the Jupyter notebook predict_bank_cd_subs_by_xgboost_clf_for_imbalance_dataset.ipynb from IBM's xgboost-financial-predictions repository. The original work demonstrates XGBoost classification for imbalanced datasets and has been adapted into a complete ZenML pipeline.
This project is licensed under the Apache License 2.0.