Skip to content

Latest commit

 

History

History
108 lines (88 loc) · 4.25 KB

File metadata and controls

108 lines (88 loc) · 4.25 KB

📋 Deliverables Checklist

This project includes all required deliverables for a production MLOps pipeline:

✅ Core Deliverables

  • Data Pipeline

    • Raw data ingestion (data/raw/Telco-Customer-Churn.csv)
    • Feature engineering pipeline (src/data/preprocess.py)
    • Preprocessed datasets (train/test splits in data/processed/)
    • Feature metadata (artifacts/models/feature_names.json)
  • Machine Learning Models

    • Scikit-learn pipeline (artifacts/models/sklearn_pipeline_mlflow.joblib)
    • PySpark distributed model (metadata in artifacts/models/pipeline_metadata.json)
    • Model versioning via MLflow (15+ versions registered)
    • Feature importances (artifacts/models/feature_importances.json)
  • Experiment Tracking

    • MLflow setup and configuration
    • Experiment runs logged (mlruns/ directory with 5 experiments)
    • Model registry with versioning
    • Metrics tracking (artifacts/metrics/ directory)
  • Workflow Orchestration

    • Airflow DAG implementation (dags/telco_churn_dag.py)
    • Task definitions (preprocess → train → inference)
    • Airflow configuration (airflow_home/airflow.cfg)
    • DAG execution logs
  • API & Deployment

    • REST API implementation (src/api/app.py)
    • API endpoints (/ping, /predict)
    • Docker containerization (Dockerfile)
    • Production-ready configuration

✅ Testing & Quality Assurance

  • Comprehensive Test Suite

    • 212 passing tests across 12 test modules
    • Unit tests (tests/test_preprocessing.py, tests/test_training.py)
    • Integration tests (tests/test_integration.py)
    • Data validation tests (tests/test_data_validation.py)
    • API tests (tests/test_inference.py)
    • Kafka tests (tests/test_consumer.py, tests/test_producer.py, tests/test_kafka_integration.py)
    • Feature scaling tests (tests/test_feature_scaling.py)
    • Schema validation tests (tests/test_schema_validator.py)
  • Code Quality

    • Type hints and documentation
    • PEP 8 compliance
    • Error handling and logging
    • Configuration management (config.py, config.yaml)

✅ Documentation

  • README.md (this file)

    • Project overview and business context
    • Complete installation instructions
    • Step-by-step usage guide
    • MLflow setup and instructions
    • Airflow setup and instructions
    • Troubleshooting guide
    • API documentation
  • Additional Documentation

    • Compliance report (compliance_report.md)
    • License file (LICENSE)
    • Requirements specification (requirements.txt)
    • Setup configuration (setup.py)
    • Jupyter notebooks (4 notebooks in notebooks/)
  • MLflow & Airflow Screenshots

    • MLflow UI screenshots (docs/screenshots_01/mlflow_runs.png, docs/screenshots_01/mlflow_model.png)
    • Airflow DAG screenshots (docs/screenshots_01/airflow_dags.png, docs/screenshots_01/airflow_run.png)
    • Kafka Batch Pipeline screenshots (docs/screenshots_02/Batch_Pipeline/)
    • Kafka Streaming Pipeline screenshots (docs/screenshots_02/Streaming_Pipeline/)
    • DAG validation screenshots (docs/screenshots_02/DAG_Validation/)
    • Screenshot instructions documented

✅ Artifacts & Outputs

  • Model Artifacts

    • Trained models (199 KB sklearn, 196 KB sklearn_mlflow, metadata for Spark)
    • Preprocessor pipeline (9 KB)
    • Model performance metrics (JSON files)
    • Prediction outputs (artifacts/predictions/batch_preds.csv)
  • Validation Reports

    • Full pipeline execution summary
    • Folder audit reports (before/after)
    • Test coverage reports (96.4% passing, 212/220 tests)
    • Compliance validation reports
    • End-to-end validation report (updated 2025-10-20)

✅ Reproducibility

  • Environment Setup

    • Requirements file with pinned versions
    • Setup script for package installation
    • Configuration files (Python + YAML)
    • Docker image for containerized execution
  • Automation

    • Makefile with common commands
    • Automated testing via pytest
    • CI/CD ready structure
    • Automated data preprocessing