Skip to content

Milestone 4 Release (v3.0.0)

Latest

Choose a tag to compare

@hugokwok0119 hugokwok0119 released this 14 Dec 00:27

This release focuses on hardening the project infrastructure, modularizing code for reusability, and addressing feedback regarding data best practices.

Key Changes

1. Robust Automation with Makefile

We have standardized the entire workflow using a robust Makefile. You can now manage the project lifecycle with simple commands:

make all    # Run the full analysis pipeline (download -> clean -> eda -> model -> report)
make test   # Run the unit test suite with logging
make clean  # Remove all generated artifacts and logs
make cl     # Generate multi-platform lock files via conda-lock
make env    # Create/Update the local environment from lock files
make up     # Start Docker services

2. Modularization & Testing

  • Refactoring: Core logic for data downloading and EDA has been abstracted into reusable functions located in src/.
  • Unit Testing: Implemented a pytest framework. Tests are located in test/ and strictly verify the functionality of the new modules.

3. Docker & Environment Fixes

  • Resolved dependency conflicts (including pyarrow and vl-convert-python) to ensure successful image saving and VegaFusion acceleration.
  • Fixed Docker environment compatibility to support cross-platform builds.

4. Golden Rule Fix (Feedback Addressed)

  • Data Leakage: Addressed previous feedback by modifying the EDA workflow (scripts/3_eda.py). The analysis now strictly uses clean_train.csv instead of the full dataset to adhere to the Golden Rule of Machine Learning.

5. Documentation

  • Refined README.md structure to accurately reflect the new file organization and usage instructions.