It is important to structure your data science project based on a certain standard so that your teammates can easily maintain and modify your project.
This repository provides a template that incorporates best practices to create a maintainable and reproducible data science project.
- hydra: Manage configuration files - article
- pdoc: Automatically create an API documentation for your project
- pre-commit plugins: Automate code reviewing formatting
- Poetry: Dependency management - article
- uv: Ultra-fast Python package installer and resolver
- pip: Traditional Python package installer
Install Cookiecutter:
pip install cookiecutter
Create a project based on the template:
cookiecutter https://github.com/khuyentran1401/data-science-template
You will be prompted to choose your preferred dependency manager:
poetry
: Modern Python package and dependency manageruv
: Ultra-fast Python package installer and resolverpip
: Traditional Python package installer
Want to learn more about building production-ready data science projects? Check out my upcoming book:
Production Ready Data Science: From Prototyping to Production with Python
The book will cover:
- Best practices for structuring data science projects
- Tools and techniques for reproducible research
- Deploying and monitoring machine learning models
- And much more!
Sign up now to receive the first 3 chapters for free! You'll also be notified when the full book becomes available.