On this project we are predicting whether clients will subscribe to a term deposit using the Bank Marketing dataset. A logistic regression model was developed, incorporating all available predictor variables after appropriate preprocessing. The model was evaluated using shuffled cross-validation with an emphasis on the F1 score balance precision and recall. The analysis was conducted using Python and key libraries such as NumPy, pandas, and scikit-learn, with all code documented for reproducibility.
Our final classifier performed fairly well on an unseen test data set, achieving an accuracy of 0.844, f1-score of 0.551, and roc-auc score of 0.91. This indicates that the model is reasonably effective at identifying clients who will subscribe to a term deposit, although there is room for improvement, particularly in recall. Further refinements could involve exploring additional features, tuning hyperparameters, or experimenting with alternative modeling techniques to enhance predictive performance.
This project follows a modular architecture with:
src/- Reusable, testable functionsscripts/- CLI scripts that orchestrate workflowstests/- Unit tests for quality assurance
View detailed architecture diagrams
- Charlene Chin
- Daniel Yorke
- Jackson Lu
- Mohammed Ibrahim
The final report can be found here.
We welcome feedback and suggestions for our project. Please see the link here for how to contribute.
Clone this repo, and using the command line, navigate to the root of this project.
git clone git@github.com:jluover9000/proj-522.git
cd *proj-522*- First time running the project, run the following from the root of this repository:
make docker-up-shell- Runs all scripts in order and renders the report in html and pdf
make allTo shut down the container and clean up the resources,
# Remove all generated data and results
make clean# Run all tests
make test
# Run tests with coverage report
make test-cov
# Or use pytest directly
pytest tests/ -v
pytest tests/ --cov=src --cov-report=htmlThe project uses a modular structure:
src/- Reusable functions (testable, pure functions)scripts/- CLI scripts that orchestrate src/ functionstests/- Unit tests for src/ modules
- After editing
environment.yml - Run
rm conda-lock.ymlthen entery - Run
conda-lock lock --file environment.yml -p linux-64 -p osx-64 -p osx-arm64 -p win-64
python>=3.10pandas==2.1.4ucimlrepojupyterlabnb_conda_kernels- Python and packages listed in
environment.yml
- MIT License
- This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
-
Moro, S., Cortez, P., & Rita, P. (2014). A data-driven approach to predict the success of bank telemarketing. Decision Support Systems, 62, 22-31.
-
Iosifidis, Vasileios, and Eirini Ntoutsi. 2019. “AdaFair: Cumulative Fairness Adaptive Boosting.”
-
Ross, Stéphane, Paul Mineiro, and John Langford. 2014. “Normalized Online Learning.”
-
DSCI 571 lecture notes.
-
DSCI 573 lecture notes.
We would like to thank the creators of the Bank Marketing dataset for making this valuable resource available to the research community. Their work has significantly contributed to advancements in predictive modeling and data analysis within the banking sector.