You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A production-ready MLOps workflow demonstrating best practices in machine learning operations, from model training to deployment. This project showcases how to build a maintainable, scalable, and reproducible ML system.
**An industrialized ML pipeline** that transforms a ML model into a scalable, tested, and containerized microservice.
6
8
7
-
This project demonstrates **production-ready MLOps practices** rather than focusing solely on achieving state-of-the-art model performance. The goal is to showcase best practices in:
9
+
---
8
10
9
-
-**Reproducible ML Pipelines**: Using scikit-learn pipelines for consistent preprocessing and inference
10
-
-**API Design**: Building robust REST APIs with proper validation and error handling
11
-
-**Containerization**: Multi-service architecture with Docker Compose
12
-
-**CI/CD**: Automated quality gates, testing, and deployment pipelines
13
-
-**Code Quality**: Type checking, linting, and comprehensive testing
The Wisconsin Breast Cancer dataset is used as a **proof-of-concept** to validate the MLOps infrastructure. This choice allows the project to focus on engineering practices rather than model complexity.
25
+
If you have Docker installed, you can spin up the entire ecosystem with a single command:
18
26
19
-
**Why not a more complex dataset?**
20
-
1.**Infrastructure First**: The same MLOps practices work for simple or complex models. The value is in the engineering, not the accuracy metric.
21
-
2.**Reproducibility**: A well-understood dataset ensures the infrastructure can be validated correctly before applying to more complex problems.
22
-
3.**Learning Focus**: This project started as a learning exercise to understand MLOps principles—using a simple dataset allows focus on infrastructure, not feature engineering.
23
-
4.**Transferability**: The practices demonstrated here are immediately applicable to any ML project, regardless of dataset complexity.
27
+
```bash
28
+
docker compose -f config/docker-compose.yml up --build
29
+
```
24
30
25
-
**The real question this project answers**: *"How do I ensure my ML model works the same way in production as it does in development?"*
31
+
***API:**`http://localhost:5000`
32
+
***UI:**`http://localhost:8501`
26
33
27
-
## 🔧 What This Project Demonstrates
34
+
## 🎯 Project Purpose
28
35
29
-
### 1. Modular & Testable ML Code
30
-
-**Separation of Concerns**: Clear boundaries between data ingestion, preprocessing, training, and inference
31
-
-**Unit Tests**: Comprehensive test coverage for each component (80%+ coverage requirement)
32
-
-**Integration Tests**: End-to-end testing of the full pipeline and API
36
+
This project demonstrates **production-ready MLOps practices** rather than focusing solely on achieving state-of-the-art model performance. The Wisconsin Breast Cancer dataset is used as a **proof-of-concept** to validate the MLOps infrastructure.
33
37
34
-
### 2. Production-Ready API
35
-
-**Input Validation**: Pydantic schemas for strict type checking and validation
36
-
-**Error Handling**: Proper HTTP status codes and meaningful error messages
37
-
-**Logging**: Structured logging for debugging and monitoring
38
-
-**Health Checks**: Monitoring endpoints for service status
38
+
The goal is to showcase best practices in:
39
39
40
-
### 3. DevOps Best Practices
41
-
-**Docker Containerization**: Multi-stage builds for optimized image sizes
42
-
-**Docker Compose**: Orchestration of multi-service architecture
43
-
-**CI/CD Pipelines**: Automated testing, building, and deployment
44
-
-**Quality Gates**: Linting, type checking, and test coverage requirements before deployment
40
+
-**Reproducible ML Pipelines**: Using scikit-learn pipelines for consistent preprocessing and inference
41
+
-**API Design**: Building robust REST APIs with proper validation and error handling
42
+
-**Containerization**: Multi-service architecture with Docker Compose
43
+
-**CI/CD**: Automated quality gates, testing, and deployment pipelines
44
+
-**Code Quality**: Type checking, linting, and comprehensive testing
Note: you can run commands using `uv run` if you don't want to activate the virtual env.
137
108
138
-
### Using `pip` (Alternative)
139
-
109
+
<details>
110
+
<summary><b>Using pip (Alternative)</b></summary>
140
111
141
112
1.**Create and activate virtual environment:**
142
113
```bash
@@ -156,17 +127,18 @@ mlops-infrastructure-demo/
156
127
157
128
**Option A**: using the **requirements.txt** (recommended for production).
158
129
159
-
```
130
+
```bash
160
131
pip install -r requirements.txt
161
132
```
162
133
163
134
**Option B**: using the **pyproject.toml** (recommended for development).
164
135
165
-
```
136
+
```bash
166
137
pip install .
167
138
```
139
+
</details>
168
140
169
-
##🧪 Running Tests
141
+
###Running Tests
170
142
171
143
The project includes comprehensive tests with a coverage requirement of 80%+.
172
144
@@ -185,7 +157,7 @@ uv run pytest -v
185
157
uv run pytest --cov=src --cov-report=term-missing
186
158
```
187
159
188
-
##🚂 Training the Model
160
+
###Training the Model
189
161
190
162
Train the ML pipeline and save the model artifact:
191
163
@@ -201,7 +173,7 @@ This will:
201
173
202
174
**Note**: The model must be trained before running the API.
203
175
204
-
##🌐 Running the API
176
+
###Running the API
205
177
206
178
Start the Flask API locally:
207
179
@@ -213,45 +185,15 @@ The API will be accessible at `http://127.0.0.1:5000/`
213
185
214
186
### API Endpoints
215
187
216
-
#### Health Check
217
-
```bash
218
-
GET http://127.0.0.1:5000/
219
-
```
220
-
221
-
**Response:**
222
-
```json
223
-
{
224
-
"status": "healthy",
225
-
"model_loaded": true
226
-
}
227
-
```
188
+
The API exposes a `POST /predict` endpoint that accepts features as JSON and returns the prediction with probabilities. It also includes a `GET /` health check endpoint to verify service and model status.
228
189
229
-
#### Prediction
230
-
```bash
231
-
POST http://127.0.0.1:5000/predict
232
-
Content-Type: application/json
233
-
234
-
{
235
-
"radius_mean": 17.99,
236
-
"texture_mean": 10.38,
237
-
...
238
-
}
239
-
```
240
-
241
-
**Response:**
242
-
```json
243
-
{
244
-
"prediction": 1,
245
-
"probability_benign": 0.1,
246
-
"probability_malignant": 0.9
247
-
}
248
-
```
190
+
For full validation details and data structures, refer to the Pydantic schemas in [`src/schemas.py`](src/schemas.py).
0 commit comments