Skip to content

Commit f0b6741

Browse files
committed
doc: update README
1 parent 5461e4b commit f0b6741

File tree

2 files changed

+91
-157
lines changed

2 files changed

+91
-157
lines changed

README.md

Lines changed: 91 additions & 157 deletions
Original file line numberDiff line numberDiff line change
@@ -1,106 +1,77 @@
1-
# MLOps Infrastructure Demo
1+
# Production-Ready MLOps Workflow 🚀
22

3-
A production-ready MLOps workflow demonstrating best practices in machine learning operations, from model training to deployment. This project showcases how to build a maintainable, scalable, and reproducible ML system.
3+
[![CI/CD Pipeline](https://github.com/anibalrojosan/mlops-infrastructure-demo/actions/workflows/main.yml/badge.svg)](https://github.com/anibalrojosan/mlops-infrastructure-demo/actions/workflows/main.yml)
4+
[![Python 3.12](https://img.shields.io/badge/python-3.12-blue.svg)](https://www.python.org/downloads/release/python-3120/)
5+
[![Coverage](https://img.shields.io/badge/coverage-80%2B%25-green.svg)](#running-tests)
46

5-
## 🎯 Project Purpose
7+
**An industrialized ML pipeline** that transforms a ML model into a scalable, tested, and containerized microservice.
68

7-
This project demonstrates **production-ready MLOps practices** rather than focusing solely on achieving state-of-the-art model performance. The goal is to showcase best practices in:
9+
---
810

9-
- **Reproducible ML Pipelines**: Using scikit-learn pipelines for consistent preprocessing and inference
10-
- **API Design**: Building robust REST APIs with proper validation and error handling
11-
- **Containerization**: Multi-service architecture with Docker Compose
12-
- **CI/CD**: Automated quality gates, testing, and deployment pipelines
13-
- **Code Quality**: Type checking, linting, and comprehensive testing
11+
## 📑 Table of Contents
12+
- [⚡ Quick Start](#⚡-quick-start-30-seconds)
13+
- [🎯 Project Purpose](#🎯-project-purpose)
14+
- [📁 Project Structure](#📁-project-structure)
15+
- [🛠️ Technical Documentation](#🛠️-technical-documentation)
16+
- [Setup](#setup)
17+
- [Running Tests](#running-tests)
18+
- [Training & Execution](#training-the-model)
19+
- [🔄 CI/CD & Quality Control](#🔄-cicd--quality-control)
20+
- [🚀 Future & Tech Stack](#🚀-future--tech-stack)
1421

15-
## 🔍 Why This Dataset?
22+
---
23+
## ⚡ Quick Start (30 seconds)
1624

17-
The Wisconsin Breast Cancer dataset is used as a **proof-of-concept** to validate the MLOps infrastructure. This choice allows the project to focus on engineering practices rather than model complexity.
25+
If you have Docker installed, you can spin up the entire ecosystem with a single command:
1826

19-
**Why not a more complex dataset?**
20-
1. **Infrastructure First**: The same MLOps practices work for simple or complex models. The value is in the engineering, not the accuracy metric.
21-
2. **Reproducibility**: A well-understood dataset ensures the infrastructure can be validated correctly before applying to more complex problems.
22-
3. **Learning Focus**: This project started as a learning exercise to understand MLOps principles—using a simple dataset allows focus on infrastructure, not feature engineering.
23-
4. **Transferability**: The practices demonstrated here are immediately applicable to any ML project, regardless of dataset complexity.
27+
```bash
28+
docker compose -f config/docker-compose.yml up --build
29+
```
2430

25-
**The real question this project answers**: *"How do I ensure my ML model works the same way in production as it does in development?"*
31+
* **API:** `http://localhost:5000`
32+
* **UI:** `http://localhost:8501`
2633

27-
## 🔧 What This Project Demonstrates
34+
## 🎯 Project Purpose
2835

29-
### 1. Modular & Testable ML Code
30-
- **Separation of Concerns**: Clear boundaries between data ingestion, preprocessing, training, and inference
31-
- **Unit Tests**: Comprehensive test coverage for each component (80%+ coverage requirement)
32-
- **Integration Tests**: End-to-end testing of the full pipeline and API
36+
This project demonstrates **production-ready MLOps practices** rather than focusing solely on achieving state-of-the-art model performance. The Wisconsin Breast Cancer dataset is used as a **proof-of-concept** to validate the MLOps infrastructure.
3337

34-
### 2. Production-Ready API
35-
- **Input Validation**: Pydantic schemas for strict type checking and validation
36-
- **Error Handling**: Proper HTTP status codes and meaningful error messages
37-
- **Logging**: Structured logging for debugging and monitoring
38-
- **Health Checks**: Monitoring endpoints for service status
38+
The goal is to showcase best practices in:
3939

40-
### 3. DevOps Best Practices
41-
- **Docker Containerization**: Multi-stage builds for optimized image sizes
42-
- **Docker Compose**: Orchestration of multi-service architecture
43-
- **CI/CD Pipelines**: Automated testing, building, and deployment
44-
- **Quality Gates**: Linting, type checking, and test coverage requirements before deployment
40+
- **Reproducible ML Pipelines**: Using scikit-learn pipelines for consistent preprocessing and inference
41+
- **API Design**: Building robust REST APIs with proper validation and error handling
42+
- **Containerization**: Multi-service architecture with Docker Compose
43+
- **CI/CD**: Automated quality gates, testing, and deployment pipelines
44+
- **Code Quality**: Type checking, linting, and comprehensive testing
4545

46-
### 4. Reproducibility & Versioning
47-
- **Locked Dependencies**: `uv.lock` ensures consistent environments
48-
- **Versioned Pipelines**: Model artifacts tracked and reproducible
49-
- **Documentation**: Clear setup and deployment instructions
46+
**The real question this project answers**:
5047

51-
## 🚀 Key Features
48+
> *"How do I ensure my ML model works the same way in production as it does in development?"*
5249
53-
-**Type-Safe API**: Pydantic validation prevents production errors
54-
-**Comprehensive Testing**: Unit and integration tests with 80%+ coverage
55-
-**Automated CI/CD**: Quality gates ensure code quality before deployment
56-
-**Multi-Container Setup**: Docker Compose for easy local development and deployment
57-
-**Modern Python**: Type hints, dependency management with `uv`
58-
-**Production-Ready**: Proper error handling, logging, and monitoring
50+
This project solves this through container immutability and environment parity.
5951

6052
## 📁 Project Structure
6153

54+
```text
55+
production-ready-mlops-workflow/
56+
├── .github/workflows/ # 🔄 CI/CD: Quality gates & automated deployment
57+
├── config/ # 🐳 Docker: Multi-container orchestration
58+
├── data/ # 📊 Dataset storage (raw & processed)
59+
├── models/ # 🧠 Trained model artifacts (joblib)
60+
├── notebooks/ # 📓 EDA and experimentation
61+
├── reports/ # 📈 Generated metrics and figures
62+
├── src/ # 🛠️ Source code
63+
│ ├── app.py # 🌐 Inference API (Flask)
64+
│ ├── schemas.py # ✅ Data validation (Pydantic)
65+
│ └── model/ # 🚂 Training and inference logic
66+
├── tests/ # 🧪 Unit & Integration test suite
67+
└── pyproject.toml # 📦 Dependency management (uv)
6268
```
63-
mlops-infrastructure-demo/
64-
├── config/ # Configuration files
65-
│ ├── docker-compose.yml # Multi-container Docker application
66-
│ ├── Dockerfile.api # Flask API container
67-
│ └── Dockerfile.streamlit # Streamlit UI container
68-
├── data/ # Dataset storage
69-
│ └── data.csv
70-
├── models/ # Trained model artifacts
71-
│ └── model.joblib
72-
├── notebooks/ # Exploratory analysis and experimentation
73-
│ ├── 1.0-eda.ipynb
74-
│ ├── 2.0-data_preprocessing.ipynb
75-
│ ├── 3.0-feature_engineering.ipynb
76-
│ └── 4.0-model_experimentation.ipynb
77-
├── reports/ # Generated reports and visualizations
78-
│ ├── figures/
79-
│ └── metrics/
80-
├── src/ # Source code
81-
│ ├── app.py # Flask API for model inference
82-
│ ├── schemas.py # Pydantic schemas for API validation
83-
│ ├── model/ # ML pipeline components
84-
│ │ ├── data_ingestion.py
85-
│ │ ├── data_preprocessing.py
86-
│ │ ├── model_inference.py
87-
│ │ ├── model_training.py
88-
│ │ └── pipeline_utils.py
89-
│ └── streamlit_app.py # Streamlit UI for predictions
90-
├── tests/ # Test suite
91-
│ ├── unit/ # Unit tests for each component
92-
│ ├── integration/ # Integration tests
93-
│ └── fixtures/ # Test data
94-
├── .github/workflows/ # CI/CD workflows
95-
│ └── main.yml
96-
├── pyproject.toml # Project dependencies (managed by uv)
97-
├── requirements.txt # Generated requirements for pip
98-
└── uv.lock # Locked dependency versions
99-
```
10069

101-
## 🛠️ Setup
70+
## 🛠️ Technical Documentation
71+
72+
### Setup
10273

103-
### Prerequisites
74+
#### Prerequisites
10475
- Python 3.12+
10576
- Docker and Docker Compose (for containerized deployment)
10677
- `uv` (recommended) or `pip` for dependency management
@@ -111,8 +82,8 @@ mlops-infrastructure-demo/
11182

11283
1. **Clone the repository:**
11384
```bash
114-
git clone https://github.com/anibalrojosan/mlops-infrastructure-demo
115-
cd mlops-infrastructure-demo
85+
git clone https://github.com/anibalrojosan/production-ready-mlops-workflow
86+
cd production-ready-mlops-workflow
11687
```
11788

11889
2. **Install `uv` globally (if needed):**
@@ -135,8 +106,8 @@ mlops-infrastructure-demo/
135106

136107
Note: you can run commands using `uv run` if you don't want to activate the virtual env.
137108

138-
### Using `pip` (Alternative)
139-
109+
<details>
110+
<summary><b>Using pip (Alternative)</b></summary>
140111

141112
1. **Create and activate virtual environment:**
142113
```bash
@@ -156,17 +127,18 @@ mlops-infrastructure-demo/
156127

157128
**Option A**: using the **requirements.txt** (recommended for production).
158129

159-
```
130+
```bash
160131
pip install -r requirements.txt
161132
```
162133

163134
**Option B**: using the **pyproject.toml** (recommended for development).
164135

165-
```
136+
```bash
166137
pip install .
167138
```
139+
</details>
168140

169-
## 🧪 Running Tests
141+
### Running Tests
170142

171143
The project includes comprehensive tests with a coverage requirement of 80%+.
172144

@@ -185,7 +157,7 @@ uv run pytest -v
185157
uv run pytest --cov=src --cov-report=term-missing
186158
```
187159

188-
## 🚂 Training the Model
160+
### Training the Model
189161

190162
Train the ML pipeline and save the model artifact:
191163

@@ -201,7 +173,7 @@ This will:
201173

202174
**Note**: The model must be trained before running the API.
203175

204-
## 🌐 Running the API
176+
### Running the API
205177

206178
Start the Flask API locally:
207179

@@ -213,45 +185,15 @@ The API will be accessible at `http://127.0.0.1:5000/`
213185

214186
### API Endpoints
215187

216-
#### Health Check
217-
```bash
218-
GET http://127.0.0.1:5000/
219-
```
220-
221-
**Response:**
222-
```json
223-
{
224-
"status": "healthy",
225-
"model_loaded": true
226-
}
227-
```
188+
The API exposes a `POST /predict` endpoint that accepts features as JSON and returns the prediction with probabilities. It also includes a `GET /` health check endpoint to verify service and model status.
228189

229-
#### Prediction
230-
```bash
231-
POST http://127.0.0.1:5000/predict
232-
Content-Type: application/json
233-
234-
{
235-
"radius_mean": 17.99,
236-
"texture_mean": 10.38,
237-
...
238-
}
239-
```
240-
241-
**Response:**
242-
```json
243-
{
244-
"prediction": 1,
245-
"probability_benign": 0.1,
246-
"probability_malignant": 0.9
247-
}
248-
```
190+
For full validation details and data structures, refer to the Pydantic schemas in [`src/schemas.py`](src/schemas.py).
249191

250192
**Example using test scripts:**
251193
- **Linux/macOS:** `./tests/integration/bash_test.sh`
252194
- **Windows PowerShell:** `.\tests\integration\powershell_test.ps1`
253195

254-
## 🎨 Streamlit UI
196+
### Streamlit UI
255197

256198
The Streamlit application provides an interactive web interface for making predictions:
257199

@@ -261,7 +203,7 @@ streamlit run src/streamlit_app.py
261203

262204
Ensure the Flask API is running first. The UI will open at `http://localhost:8501`.
263205

264-
## 🐳 Docker Deployment
206+
### Docker Deployment
265207

266208
The project uses Docker Compose to orchestrate both the Flask API and Streamlit UI services.
267209

@@ -283,9 +225,18 @@ This will:
283225
docker compose -f config/docker-compose.yml down
284226
```
285227

286-
## 🔄 CI/CD Pipeline
228+
## 🔄 CI/CD & Quality Control
287229

288-
The project includes a comprehensive CI/CD pipeline using GitHub Actions (`.github/workflows/main.yml`):
230+
The project implements a continuous integration pipeline that acts as a quality filter (**Quality Gates**):
231+
232+
1. **Static Analysis**: `ruff` for linting and `mypy` for strict typing.
233+
2. **Automated Testing**: `pytest` with a minimum coverage requirement of 80%.
234+
3. **Container Security**: Multi-stage Docker builds for lightweight and secure images.
235+
4. **Integration Tests**: Endpoint validation in isolated containers before deployment.
236+
237+
### Pipeline Details (GitHub Actions)
238+
239+
The workflow defined in `.github/workflows/main.yml` includes:
289240

290241
### Quality Gates Job
291242
1. **Linting**: `ruff` for code style and quality
@@ -301,41 +252,23 @@ The project includes a comprehensive CI/CD pipeline using GitHub Actions (`.gith
301252
5. **Health Checks**: Verify API and UI endpoints
302253
6. **Cleanup**: Remove test containers
303254

304-
This ensures that only tested, validated code reaches production.
305-
306-
## 💡 Key Learnings & Transferability
307-
308-
This project demonstrates practices that are directly transferable to any ML project:
255+
This ensures that only tested and validated code reaches production.
309256

310-
### What You Can Apply to Other Projects
257+
## 🚀 Future & Tech Stack
311258

312-
1. **Pipeline Architecture**: The modular pipeline structure works for any ML problem
313-
2. **API Design Patterns**: Pydantic validation and error handling are universal
314-
3. **Testing Strategies**: Unit and integration test patterns
315-
4. **CI/CD Practices**: Quality gates and automated deployment pipelines
316-
5. **Containerization**: Docker setup that scales from development to production
317-
318-
### Real-World Applications
319-
320-
While this uses a simple dataset, these practices are used by:
321-
- Teams deploying models to millions of users
322-
- Companies managing hundreds of models in production
323-
- Organizations requiring regulatory compliance (healthcare, finance)
324-
- Startups building ML-first products
325-
326-
**The infrastructure scales regardless of model complexity.**
327-
328-
## 🔮 Future Improvements
259+
<details>
260+
<summary><b>🔮 Future Improvements</b></summary>
329261

330262
Potential enhancements to further strengthen the MLOps workflow:
263+
- **Model Versioning**: Implement MLFlow for experiment tracking and model registry.
264+
- **Monitoring**: Add model performance monitoring and drift detection.
265+
- **A/B Testing**: Framework for comparing model versions in production.
266+
- **Feature Store**: Centralized feature management for multiple models.
267+
- **Automated Retraining**: Scheduled retraining based on data drift or performance degradation.
268+
</details>
331269

332-
- **Model Versioning**: Implement MLFlow for experiment tracking and model registry
333-
- **Monitoring**: Add model performance monitoring and drift detection
334-
- **A/B Testing**: Framework for comparing model versions in production
335-
- **Feature Store**: Centralized feature management for multiple models
336-
- **Automated Retraining**: Scheduled retraining based on data drift or performance degradation
337-
338-
## 📚 Technologies Used
270+
<details>
271+
<summary><b>📚 Technologies Used</b></summary>
339272

340273
- **ML Framework**: scikit-learn
341274
- **API Framework**: Flask
@@ -346,6 +279,7 @@ Potential enhancements to further strengthen the MLOps workflow:
346279
- **CI/CD**: GitHub Actions
347280
- **Code Quality**: ruff, mypy
348281
- **Dependency Management**: uv
282+
</details>
349283

350284
---
351285

doc/mlops_workflow.png

95.4 KB
Loading

0 commit comments

Comments
 (0)