Project -3 ANN with MLflow – End‑to‑End MLOps Project

mlops-01-data-versioning-dvc mlops-02-DVC-model-tracking mlops-03-ci-cd-ml-pipeline mlops-04-docker-k8s-serving

cd project1 git add . git commit -m "update" git push

git submodule add https://github.com/yourusername/project1.git git submodule add https://github.com/yourusername/project2.git

📌 MLOPS Projects Summary

🚀 Projects	🧩 Task	🎯 Objective	🛠️ Prominent Techniques / Tools
1.Mlflow Tracking Model Registry	Regression & experiment tracking	Learn MLflow tracking, inference, model registry, versioning	Python, Scikit‑learn, MLflow, Pandas, NumPy, MLflow PyFunc
2.House Price Prediction (MLflow)	End‑to‑end regression	Train, tune, compare, and register best model	Random Forest, GridSearchCV, MLflow Tracking & Registry
3.ANN with MLflow (End‑to‑End MLOps)	Neural network regression	Build production‑ready ANN with full ML lifecycle	Keras, TensorFlow, MLflow, Hyperopt (TPE), PyFunc
4.ML Pipeline with DVC & MLflow	Reproducible ML pipeline	Version data, models, and experiments together	DVC, MLflow, Random Forest, Git, DagsHub
5.Hello Docker Project	Containerization basics	Learn Docker image build & container execution	Docker, Dockerfile, Container Lifecycle
6.Airflow Math Sequence DAG	Workflow orchestration	Learn DAGs, dependencies, and XComs	Apache Airflow 2.x, TaskFlow API, Astro CLI
7.Airflow MLOps Pipeline	End‑to‑end MLOps workflow	Simulate real‑world ML pipeline with deploy decisions	Airflow, Python, MLOps Concepts, Astro CLI

Project -2 : House Price Prediction with MLflow Tracking

📌 Project Overview

This project demonstrates an end-to-end machine learning workflow for house price prediction using the California Housing dataset and MLflow for experiment tracking, hyperparameter tuning, and model registration.

The main goals of this project are:

Train a regression model with hyperparameter tuning
Track all experiments, parameters, and metrics using MLflow
Compare multiple runs in the MLflow UI
Register the best-performing model in the MLflow Model Registry

📊 Dataset

California Housing Dataset
Source: sklearn.datasets.fetch_california_housing
Number of samples: 20,640
Features: - MedInc (Median Income), - HouseAge, - AveRooms, - AveBedrms
- Population, - AveOccup, - Latitude, - Longitude
Target:
- Price (Median house value in units of $100,000)

🧠 Model

Algorithm: Random Forest Regressor
Evaluation metric: Mean Squared Error (MSE)
Hyperparameter tuning: GridSearchCV

✅ Project Workflow

1️⃣ Data Loading

Dataset loaded using fetch_california_housing
Converted into a pandas DataFrame
Target variable added as Price

2️⃣ Data Preparation

Independent variables (X) created by dropping the Price column
Dependent variable (y) set as Price
Train-test split performed (80% training / 20% testing)

3️⃣ Hyperparameter Tuning

Hyperparameter tuning implemented using GridSearchCV
Parameters tuned:
- n_estimators , - max_depth , - min_samples_split, - min_samples_leaf, - 3-fold cross-validation used
Scoring metric: neg_mean_squared_error

4️⃣ Model Training & Evaluation

Best model selected from GridSearchCV
Predictions generated on test data
Mean Squared Error calculated

5️⃣ MLflow Experiment Tracking

MLflow tracking server used (http://127.0.0.1:5000)
Logged to MLflow: Best hyperparameters, Mean Squared Error (MSE), Model artifacts
Model input/output signature inferred using infer_signature

6️⃣ Model Registration

Best-performing model registered in MLflow Model Registry
Registered model name:

Best Model: Random Forest Regressor

Best Hyperparameters: 
n_estimators: 200 
max_depth: None 
min_samples_split: 2 
min_samples_leaf: 1 
Mean Squared Error (MSE): ~0.25 
Model successfully tracked and registered in MLflow

Project -3 ANN with MLflow – End‑to‑End MLOps Project

🔍 Project Summary

Production-oriented MLOps pipeline demonstrating how to train, tune, track, register, and serve an Artificial Neural Network (ANN) using MLflow.
The project covers the full ML lifecycle from experimentation to deployment-ready inference.

💼 Why This Project Matters

This project demonstrates hands-on experience with:

✅ Experiment tracking at scale
✅ Hyperparameter optimization
✅ Model registry & versioning
✅ Reproducible ML workflows
✅ Deployment-ready model artifacts

It reflects real-world ML engineering and MLOps practices, not just model training.

🧠 Technical Highlights

Model

ANN built with Keras
Regression task (Wine Quality prediction)
Feature normalization inside the model graph
Metric-driven model selection (RMSE)

Optimization

Hyperopt + TPE for hyperparameter search
Search space:
- Learning rate (log-uniform)
- Momentum (uniform)
Best model selected automatically based on validation RMSE

🧪 Experiment Tracking & Model Management

MLflow Experiments
- Parameters
- Metrics
- Model artifacts
Nested runs for hyperparameter sweeps
MLflow Model Registry
- Versioned models
- Promotion-ready artifacts

🚀 Inference & Serving Readiness

Model loaded using MLflow PyFunc
Serving input validated prior to deployment
Compatible with:
- REST API serving
- Batch inference
- Cloud ML platforms

☁️ Deployment Readiness

MLflow-compatible model format
Can be packaged into:
- Docker containers
- Cloud-native serving endpoints
Clear separation of training, evaluation, and inference

🛠 Tech Stack

Python 3.10
Keras / TensorFlow
MLflow
Hyperopt
Scikit-learn
Pandas / NumPy

📊 Key Outcomes

Automated experiment comparison
Best-performing ANN selected and registered
Fully reproducible ML pipeline
Production-aligned workflow (training → registry → inference)

🎯 Skills Demonstrated

MLOps & ML Engineering
Experiment tracking & reproducibility
Hyperparameter optimization
Model versioning & governance
Deployment-oriented ML design

Project -4: Machine Learning Pipeline with DVC & MLflow

📌 Project Overview

This project demonstrates how to build an end-to-end machine learning pipeline using DVC (Data Version Control) for data and model versioning and MLflow for experiment tracking. The pipeline trains and evaluates a Random Forest Classifier on the Pima Indians Diabetes Dataset, following best practices for reproducibility and MLOps.

The goal of this project is to show how data, code, models, and experiments can be tracked together in a structured and scalable way. (dagshub.com) 🚀 Key Features

✅ End-to-end ML pipeline
✅ Data and model versioning with DVC
✅ Experiment tracking with MLflow
✅ Reproducible pipeline stages
✅ Modular and scalable project structure
✅ Integration-ready with cloud storage (S3 / GCS / Azure)

🧰 Tech Stack

Python
Scikit-learn
DVC – data & pipeline versioning
MLflow – experiment tracking
Git / DagsHub – code, data, and collaboration

📂 Project Structure text

machinelearningpipeline/ │ ├── data/ # Raw and processed datasets (DVC tracked) ├── src/ # Source code for pipeline stages │ ├── data_preprocessing.py │ ├── train_model.py │ └── evaluate_model.py │ ├── models/ # Trained models (DVC tracked) ├── dvc.yaml # DVC pipeline definition ├── params.yaml # Model and pipeline parameters ├── requirements.txt # Python dependencies └── README.md # Project documentation

🔄 Pipeline Stages

Data Preprocessing
    Cleans and prepares the dataset
    Splits data into training and test sets

Model Training
    Trains a Random Forest Classifier
    Logs parameters and metrics to MLflow

Model Evaluation
    Evaluates model performance
    Tracks evaluation metrics for comparison

▶️ How to Run the Project 1️⃣ Clone the Repository bash

git clone https://dagshub.com/venky.sarangi/machinelearningpipeline cd machinelearningpipeline

2️⃣ Create Environment & Install Dependencies bash

pip install -r requirements.txt

3️⃣ Run the DVC Pipeline bash

dvc repro

📊 Experiment Tracking

All experiments, metrics, and model artifacts are logged using MLflow. You can compare runs and analyze performance directly from the DagsHub UI. ☁️ Data & Model Versioning

Datasets and models are tracked using DVC
Supports remote storage (AWS S3, GCS, Azure, or DagsHub Storage)
Ensures full reproducibility across environments

🎯 Use Cases

Learning MLOps fundamentals
Building reproducible ML workflows
Experiment tracking and model comparison
Production-ready ML pipeline templates

🤝 Contributing

Contributions are welcome! Feel free to open issues or submit pull requests to improve the pipeline. 📜 License

This project is intended for educational and learning purposes. ⭐ If you find this project useful, consider starring it!

If you want, I can also:

Tailor this README for GitHub vs DagsHub
Add badges (Python, DVC, MLflow)
Create a Beginner-friendly version
Generate a project architecture diagram

Project -5 🐳 Basic Hello Docker Project

docker --version → Verify Docker installation
docker build -t welcome-app . → Build Docker image
docker run -p 5000:5000 welcome-app → Run the container
docker ps → Check running containers

Project -6.1 🚀 Apache Airflow Learning Projects (Math + MLOps)

This repository contains beginner‑to‑intermediate Apache Airflow examples designed to help you understand real‑world workflows and MLOps pipelines using modern Airflow 2.x (TaskFlow API).

✅ Can be run locally using Astronomer (Astro CLI) ✅ Easy to understand and production‑oriented ✅ Fully compatible with Airflow UI 📂 Project Structure text

. ├── dags/ │ ├── math_sequence_dag.py │ └── mlops_basic_pipeline.py ├── README.md └── requirements.txt

📌 Project 1: Math Sequence DAG 🔢 Description

This DAG demonstrates task dependencies and XCom value passing using a simple math workflow. ✅ Workflow Steps

Start with number 10
Add 5
Multiply by 2
Subtract 3
Square the final result

✅ Concepts Learned

DAG creation
TaskFlow API (@task)
Task dependencies
Automatic XComs
Logs in Airflow UI

✅ Final Result basic

10 → 15 → 30 → 27 → 729

📄 DAG File

dags/math_sequence_dag.py

Project -6.2: Real‑World MLOps Pipeline

🤖 Description

This project simulates a production‑style MLOps workflow commonly used in real companies. ✅ Workflow Steps

Extract data (mock)
Validate data
Train ML model
Evaluate model performance
Decide whether to deploy

✅ Concepts Learned

End‑to‑end ML pipelines
Data validation
Model training & evaluation
Deployment decision logic
Failure handling

📄 DAG File

dags/mlops_basic_pipeline.py

🔄 MLOps Workflow Diagram (Logical)

Extract Data ↓ Validate Data ↓ Train Model ↓ Evaluate Model ↓ Deploy Model (Yes / No)

🛠️ Prerequisites

Docker
Git
Astro CLI

✅ Install Astro CLI bash

curl -sSL https://install.astronomer.io | sudo bash

Verify: bash

astro version

▶️ How to Run the Project Locally 1️⃣ Clone the Repository bash

git clone https://github.com/your-username/airflow-learning-projects.git cd airflow-learning-projects

2️⃣ Start Airflow with Astro bash

astro dev start

3️⃣ Open Airflow UI

http://localhost:8080

Login:

Username: admin Password: admin

4️⃣ Trigger the DAGs

math_sequence_dag
mlops_basic_pipeline

🎯 What You Will Learn ✅ Apache Airflow

DAG lifecycle
TaskFlow API
Scheduling
Logging & retries
XComs
UI debugging

✅ MLOps Foundations

Pipeline design
Model validation
Deployment decision logic
Production‑style workflows

🧠 How This Maps to Real Production Systems Learning Example Real Production extract_data S3 / BigQuery / APIs validate_data Great Expectations train_model sklearn / PyTorch evaluate_model MLflow deploy_model Kubernetes / SageMaker 🚀 Future Enhancements

Planned improvements:

MLflow integration
Branching DAGs
Sensors for new data
CI/CD for DAGs
Feature engineering pipelines
Retraining schedules

🤝 Contributions

Contributions are welcome! Feel free to:

Open issues
Submit pull requests
Suggest improvements

📜 License

This project is licensed under the MIT License. ⭐ Support

If this repository helped you learn Airflow or MLOps:

⭐ Star the repo
🔁 Share it with others

Happy Learning & Orchestrating 🚀

📊 Project Summary Table Project Name Task Objective Prominent Techniques / Tools Project 1: Linear Regression with Iris Dataset (MLflow) Regression modeling & experiment tracking Learn MLflow fundamentals: tracking, inference, model registry, versioning Python, Scikit‑learn, MLflow, Pandas, NumPy, MLflow PyFunc Project 2: House Price Prediction (MLflow) End‑to‑end regression with tuning Train, tune, compare, and register best model using MLflow Random Forest, GridSearchCV, MLflow Tracking & Registry, sklearn Project 3: ANN with MLflow (End‑to‑End MLOps) Neural network regression Build production‑ready ANN with full ML lifecycle Keras/TensorFlow, MLflow, Hyperopt (TPE), PyFunc, Model Registry Project 4: ML Pipeline with DVC & MLflow Reproducible ML pipeline Version data, models, and experiments together DVC, MLflow, Random Forest, Git, DagsHub Project 5: Hello Docker Project Containerization basics Learn Docker image build & container execution Docker CLI, Dockerfile, Container lifecycle Project 6.1: Airflow Math Sequence DAG Workflow orchestration Learn DAGs, dependencies, and XComs Apache Airflow 2.x, TaskFlow API, Astro CLI Project 6.2: Airflow MLOps Pipeline MLOps workflow orchestration Simulate real‑world ML pipeline with deploy decisions Airflow, Python, MLOps concepts, Astro CLI 🎯 Skill Coverage Map

Machine Learning

Regression, Classification, ANN
Model evaluation (MSE, RMSE)
Feature handling & preprocessing

MLOps

MLflow tracking & registry
Model inference & validation
Hyperparameter optimization
Reproducibility & governance

Data & Pipeline Engineering

DVC pipelines
Versioned datasets & models
Parameterized workflows

DevOps / Platform

Docker fundamentals
Airflow orchestration
Astronomer (Astro CLI)

📌 Project Summary 🚀 Project 🧩 Task 🎯 Objective 🛠️ Prominent Techniques / Tools Linear Regression with Iris (MLflow) Regression & experiment tracking Learn MLflow tracking, inference, model registry, versioning Python, Scikit‑learn, MLflow, Pandas, NumPy, MLflow PyFunc House Price Prediction (MLflow) End‑to‑end regression Train, tune, compare, and register best model Random Forest, GridSearchCV, MLflow Tracking & Registry ANN with MLflow (End‑to‑End MLOps) Neural network regression Build production‑ready ANN with full ML lifecycle Keras, TensorFlow, MLflow, Hyperopt (TPE), PyFunc ML Pipeline with DVC & MLflow Reproducible ML pipeline Version data, models, and experiments together DVC, MLflow, Random Forest, Git, DagsHub Hello Docker Project Containerization basics Learn Docker image build & container execution Docker, Dockerfile, Container Lifecycle Airflow Math Sequence DAG Workflow orchestration Learn DAGs, dependencies, and XComs Apache Airflow 2.x, TaskFlow API, Astro CLI Airflow MLOps Pipeline End‑to‑end MLOps workflow Simulate real‑world ML pipeline with deploy decisions Airflow, Python, MLOps Concepts, Astro CLI

Project Summary

:---:

Project Name	Task	Objective	Tools
Project A	Regression	Model training	Python, MLflow
Project B	Classification	Accuracy improvement	Sklearn
Project C	Pipeline	Automation	Airflow

📌 Project Summary

🚀 Project	🧩 Task	🎯 Objective	🛠️ Prominent Techniques / Tools
Linear Regression with Iris (MLflow)	Regression & experiment tracking	Learn MLflow tracking, inference, model registry, versioning	Python, Scikit‑learn, MLflow, Pandas, NumPy, MLflow PyFunc
House Price Prediction (MLflow)	End‑to‑end regression	Train, tune, compare, and register best model	Random Forest, GridSearchCV, MLflow Tracking & Registry
ANN with MLflow (End‑to‑End MLOps)	Neural network regression	Build production‑ready ANN with full ML lifecycle	Keras, TensorFlow, MLflow, Hyperopt (TPE), PyFunc
ML Pipeline with DVC & MLflow	Reproducible ML pipeline	Version data, models, and experiments together	DVC, MLflow, Random Forest, Git, DagsHub
Hello Docker Project	Containerization basics	Learn Docker image build & container execution	Docker, Dockerfile, Container Lifecycle
Airflow Math Sequence DAG	Workflow orchestration	Learn DAGs, dependencies, and XComs	Apache Airflow 2.x, TaskFlow API, Astro CLI
Airflow MLOps Pipeline	End‑to‑end MLOps workflow	Simulate real‑world ML pipeline with deploy decisions	Airflow, Python, MLOps Concepts, Astro CLI

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github/workflows		.github/workflows
MLOPs_Fraud_Detection		MLOPs_Fraud_Detection
mlops-01-mlflow-tracking_model_registry		mlops-01-mlflow-tracking_model_registry
mlops-02-DVC-model-tracking		mlops-02-DVC-model-tracking
mlops-03-flask-app-docker		mlops-03-flask-app-docker
mlops-06-AWS-Sagemaker		mlops-06-AWS-Sagemaker
.gitmodules		.gitmodules
ANN_mlflow_v1.ipynb		ANN_mlflow_v1.ipynb
ANN_mlflow_v2.ipynb		ANN_mlflow_v2.ipynb
DVC_mlpipeline_4.png		DVC_mlpipeline_4.png
MLFLow_ANN_3.png		MLFLow_ANN_3.png
MLOPS_2026.md		MLOPS_2026.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

📌 MLOPS Projects Summary

Project -2 : House Price Prediction with MLflow Tracking

📌 Project Overview

📊 Dataset

🧠 Model

✅ Project Workflow

1️⃣ Data Loading

2️⃣ Data Preparation

3️⃣ Hyperparameter Tuning

4️⃣ Model Training & Evaluation

5️⃣ MLflow Experiment Tracking

6️⃣ Model Registration

Best Model: Random Forest Regressor

Project -3 ANN with MLflow – End‑to‑End MLOps Project

🔍 Project Summary

💼 Why This Project Matters

🧠 Technical Highlights

Model

Optimization

🧪 Experiment Tracking & Model Management

🚀 Inference & Serving Readiness

☁️ Deployment Readiness

🛠 Tech Stack

📊 Key Outcomes

🎯 Skills Demonstrated

Project -4: Machine Learning Pipeline with DVC & MLflow

Project -5 🐳 Basic Hello Docker Project

Project -6.1 🚀 Apache Airflow Learning Projects (Math + MLOps)

Project -6.2: Real‑World MLOps Pipeline

Project Summary

📌 Project Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages