This project includes all required deliverables for a production MLOps pipeline:
-
Data Pipeline
- Raw data ingestion (
data/raw/Telco-Customer-Churn.csv) - Feature engineering pipeline (
src/data/preprocess.py) - Preprocessed datasets (train/test splits in
data/processed/) - Feature metadata (
artifacts/models/feature_names.json)
- Raw data ingestion (
-
Machine Learning Models
- Scikit-learn pipeline (
artifacts/models/sklearn_pipeline_mlflow.joblib) - PySpark distributed model (metadata in
artifacts/models/pipeline_metadata.json) - Model versioning via MLflow (15+ versions registered)
- Feature importances (
artifacts/models/feature_importances.json)
- Scikit-learn pipeline (
-
Experiment Tracking
- MLflow setup and configuration
- Experiment runs logged (
mlruns/directory with 5 experiments) - Model registry with versioning
- Metrics tracking (
artifacts/metrics/directory)
-
Workflow Orchestration
- Airflow DAG implementation (
dags/telco_churn_dag.py) - Task definitions (preprocess → train → inference)
- Airflow configuration (
airflow_home/airflow.cfg) - DAG execution logs
- Airflow DAG implementation (
-
API & Deployment
- REST API implementation (
src/api/app.py) - API endpoints (
/ping,/predict) - Docker containerization (
Dockerfile) - Production-ready configuration
- REST API implementation (
-
Comprehensive Test Suite
- 212 passing tests across 12 test modules
- Unit tests (
tests/test_preprocessing.py,tests/test_training.py) - Integration tests (
tests/test_integration.py) - Data validation tests (
tests/test_data_validation.py) - API tests (
tests/test_inference.py) - Kafka tests (
tests/test_consumer.py,tests/test_producer.py,tests/test_kafka_integration.py) - Feature scaling tests (
tests/test_feature_scaling.py) - Schema validation tests (
tests/test_schema_validator.py)
-
Code Quality
- Type hints and documentation
- PEP 8 compliance
- Error handling and logging
- Configuration management (
config.py,config.yaml)
-
README.md (this file)
- Project overview and business context
- Complete installation instructions
- Step-by-step usage guide
- MLflow setup and instructions
- Airflow setup and instructions
- Troubleshooting guide
- API documentation
-
Additional Documentation
- Compliance report (
compliance_report.md) - License file (
LICENSE) - Requirements specification (
requirements.txt) - Setup configuration (
setup.py) - Jupyter notebooks (4 notebooks in
notebooks/)
- Compliance report (
-
MLflow & Airflow Screenshots
- MLflow UI screenshots (
docs/screenshots_01/mlflow_runs.png,docs/screenshots_01/mlflow_model.png) - Airflow DAG screenshots (
docs/screenshots_01/airflow_dags.png,docs/screenshots_01/airflow_run.png) - Kafka Batch Pipeline screenshots (
docs/screenshots_02/Batch_Pipeline/) - Kafka Streaming Pipeline screenshots (
docs/screenshots_02/Streaming_Pipeline/) - DAG validation screenshots (
docs/screenshots_02/DAG_Validation/) - Screenshot instructions documented
- MLflow UI screenshots (
-
Model Artifacts
- Trained models (199 KB sklearn, 196 KB sklearn_mlflow, metadata for Spark)
- Preprocessor pipeline (9 KB)
- Model performance metrics (JSON files)
- Prediction outputs (
artifacts/predictions/batch_preds.csv)
-
Validation Reports
- Full pipeline execution summary
- Folder audit reports (before/after)
- Test coverage reports (96.4% passing, 212/220 tests)
- Compliance validation reports
- End-to-end validation report (updated 2025-10-20)
-
Environment Setup
- Requirements file with pinned versions
- Setup script for package installation
- Configuration files (Python + YAML)
- Docker image for containerized execution
-
Automation
- Makefile with common commands
- Automated testing via pytest
- CI/CD ready structure
- Automated data preprocessing