Skip to content

jluover9000/Bank-Marketing-Prediction-Project

Bank Market Analysis Project

On this project we are predicting whether clients will subscribe to a term deposit using the Bank Marketing dataset. A logistic regression model was developed, incorporating all available predictor variables after appropriate preprocessing. The model was evaluated using shuffled cross-validation with an emphasis on the F1 score balance precision and recall. The analysis was conducted using Python and key libraries such as NumPy, pandas, and scikit-learn, with all code documented for reproducibility.

Our final classifier performed fairly well on an unseen test data set, achieving an accuracy of 0.844, f1-score of 0.551, and roc-auc score of 0.91. This indicates that the model is reasonably effective at identifying clients who will subscribe to a term deposit, although there is room for improvement, particularly in recall. Further refinements could involve exploring additional features, tuning hyperparameters, or experimenting with alternative modeling techniques to enhance predictive performance.

Project Architecture

This project follows a modular architecture with:

  • src/ - Reusable, testable functions
  • scripts/ - CLI scripts that orchestrate workflows
  • tests/ - Unit tests for quality assurance

View detailed architecture diagrams

List of Authors

  • Charlene Chin
  • Daniel Yorke
  • Jackson Lu
  • Mohammed Ibrahim

Report

The final report can be found here.

Contributing

We welcome feedback and suggestions for our project. Please see the link here for how to contribute.

Usage

Clone this repo, and using the command line, navigate to the root of this project.

git clone git@github.com:jluover9000/proj-522.git
cd *proj-522*
  1. First time running the project, run the following from the root of this repository:
make docker-up-shell
  1. Runs all scripts in order and renders the report in html and pdf
make all

Cleaning

To shut down the container and clean up the resources,

# Remove all generated data and results
make clean

Testing

Running Tests

# Run all tests
make test

# Run tests with coverage report
make test-cov

# Or use pytest directly
pytest tests/ -v
pytest tests/ --cov=src --cov-report=html

Code Organization

The project uses a modular structure:

  • src/ - Reusable functions (testable, pure functions)
  • scripts/ - CLI scripts that orchestrate src/ functions
  • tests/ - Unit tests for src/ modules

Developer

  1. After editing environment.yml
  2. Run rm conda-lock.yml then enter y
  3. Run conda-lock lock --file environment.yml -p linux-64 -p osx-64 -p osx-arm64 -p win-64

Dependencies

  • python>=3.10
  • pandas==2.1.4
  • ucimlrepo
  • jupyterlab
  • nb_conda_kernels
  • Python and packages listed in environment.yml

Licenses

  • MIT License
  • This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

References

Acknowledgements

We would like to thank the creators of the Bank Marketing dataset for making this valuable resource available to the research community. Their work has significantly contributed to advancements in predictive modeling and data analysis within the banking sector.

About

Group 41's project for DSCI522

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages