This release focuses on hardening the project infrastructure, modularizing code for reusability, and addressing feedback regarding data best practices.
Key Changes
1. Robust Automation with Makefile
We have standardized the entire workflow using a robust Makefile. You can now manage the project lifecycle with simple commands:
make all # Run the full analysis pipeline (download -> clean -> eda -> model -> report)
make test # Run the unit test suite with logging
make clean # Remove all generated artifacts and logs
make cl # Generate multi-platform lock files via conda-lock
make env # Create/Update the local environment from lock files
make up # Start Docker services2. Modularization & Testing
- Refactoring: Core logic for data downloading and EDA has been abstracted into reusable functions located in
src/. - Unit Testing: Implemented a
pytestframework. Tests are located intest/and strictly verify the functionality of the new modules.
3. Docker & Environment Fixes
- Resolved dependency conflicts (including
pyarrowandvl-convert-python) to ensure successful image saving and VegaFusion acceleration. - Fixed Docker environment compatibility to support cross-platform builds.
4. Golden Rule Fix (Feedback Addressed)
- Data Leakage: Addressed previous feedback by modifying the EDA workflow (
scripts/3_eda.py). The analysis now strictly usesclean_train.csvinstead of the full dataset to adhere to the Golden Rule of Machine Learning.
5. Documentation
- Refined
README.mdstructure to accurately reflect the new file organization and usage instructions.