Bank Market Analysis Project

On this project we are predicting whether clients will subscribe to a term deposit using the Bank Marketing dataset. A logistic regression model was developed, incorporating all available predictor variables after appropriate preprocessing. The model was evaluated using shuffled cross-validation with an emphasis on the F1 score balance precision and recall. The analysis was conducted using Python and key libraries such as NumPy, pandas, and scikit-learn, with all code documented for reproducibility.

Our final classifier performed fairly well on an unseen test data set, achieving an accuracy of 0.844, f1-score of 0.551, and roc-auc score of 0.91. This indicates that the model is reasonably effective at identifying clients who will subscribe to a term deposit, although there is room for improvement, particularly in recall. Further refinements could involve exploring additional features, tuning hyperparameters, or experimenting with alternative modeling techniques to enhance predictive performance.

Project Architecture

This project follows a modular architecture with:

src/ - Reusable, testable functions
scripts/ - CLI scripts that orchestrate workflows
tests/ - Unit tests for quality assurance

View detailed architecture diagrams

List of Authors

Charlene Chin
Daniel Yorke
Jackson Lu
Mohammed Ibrahim

Report

The final report can be found here.

Contributing

We welcome feedback and suggestions for our project. Please see the link here for how to contribute.

Usage

Clone this repo, and using the command line, navigate to the root of this project.

git clone git@github.com:jluover9000/proj-522.git
cd *proj-522*

First time running the project, run the following from the root of this repository:

make docker-up-shell

Runs all scripts in order and renders the report in html and pdf

make all

Cleaning

To shut down the container and clean up the resources,

# Remove all generated data and results
make clean

Testing

Running Tests

# Run all tests
make test

# Run tests with coverage report
make test-cov

# Or use pytest directly
pytest tests/ -v
pytest tests/ --cov=src --cov-report=html

Code Organization

The project uses a modular structure:

src/ - Reusable functions (testable, pure functions)
scripts/ - CLI scripts that orchestrate src/ functions
tests/ - Unit tests for src/ modules

Developer

After editing environment.yml
Run rm conda-lock.yml then enter y
Run conda-lock lock --file environment.yml -p linux-64 -p osx-64 -p osx-arm64 -p win-64

Dependencies

python>=3.10
pandas==2.1.4
ucimlrepo
jupyterlab
nb_conda_kernels
Python and packages listed in environment.yml

Licenses

MIT License
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

References

UCI Machine Learning Repository: Bank Marketing Dataset.
Moro, S., Cortez, P., & Rita, P. (2014). A data-driven approach to predict the success of bank telemarketing. Decision Support Systems, 62, 22-31.
scikit-learn documentation
Bera, Suman, Deeparnab Chakrabarty, Nicolas Flores, and Maryam Negahbani. 2019. “Fair Algorithms for Clustering.”
Ziko, Imtiaz, Eric Granger, Jing Yuan, and Ismail Ayed. 2019. “Clustering with Fairness Constraints: A Flexible and Scalable Approach.”
Lamy, Alexandre, Ziyuan Zhong, Aditya Menon, and Nakul Verma. 2019. “Noise-Tolerant Fair Classification.”
Iosifidis, Vasileios, and Eirini Ntoutsi. 2019. “AdaFair: Cumulative Fairness Adaptive Boosting.”
Vaz, Afonso, Rafael Izbicki, and Rafael Stern. 2018. “Quantification under Prior Probability Shift: The Ratio Estimator and Its Extensions.”
Zhu, Zining, Jekaterina Novikova, and Frank Rudzicz. 2018. “Semi-supervised Classification by Reaching Consensus among Modalities.”
Yoon, Jinsung, William R. Zame, and Mihaela van der Schaar. 2017. “ToPs: Ensemble Learning with Trees of Predictors.”
Ross, Stéphane, Paul Mineiro, and John Langford. 2014. “Normalized Online Learning.”
DSCI 571 lecture notes.
DSCI 573 lecture notes.

Acknowledgements

We would like to thank the creators of the Bank Marketing dataset for making this valuable resource available to the research community. Their work has significantly contributed to advancements in predictive modeling and data analysis within the banking sector.

Name		Name	Last commit message	Last commit date
Latest commit History 294 Commits
.github/workflows		.github/workflows
data/altair		data/altair
docs		docs
reports		reports
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
conda-lock.yml		conda-lock.yml
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bank Market Analysis Project

Project Architecture

List of Authors

Report

Contributing

Usage

Cleaning

Testing

Running Tests

Code Organization

Developer

Dependencies

Licenses

References

Acknowledgements

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bank Market Analysis Project

Project Architecture

List of Authors

Report

Contributing

Usage

Cleaning

Testing

Running Tests

Code Organization

Developer

Dependencies

Licenses

References

Acknowledgements

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages