|
| 1 | +# 🚀 Sentiment Analysis MLOps Pipeline |
| 2 | + |
| 3 | +An **end-to-end MLOps project** demonstrating model deployment, observability, and performance optimization using **FastAPI**, **AWS S3**, and **GitHub Actions**. |
| 4 | + |
| 5 | +This repository shows how to take an NLP model from **fine-tuning to scalable inference** — with dynamic quantization and CI/CD automation for real-world readiness. |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## 🧠 Project Overview |
| 10 | + |
| 11 | +This project serves a fine-tuned **DistilBERT sentiment classifier** through a production-grade API. |
| 12 | +It includes model loading from **AWS S3**, optional **quantized inference**, structured **logging**, **load testing**, and **CI/CD automation**. |
| 13 | + |
| 14 | +Quantization reduced average latency by **≈60%**, proving the practical value of lightweight model optimization. |
| 15 | + |
| 16 | +| Mode | Avg Latency (ms) | P95 Latency (ms) | Improvement | |
| 17 | +| -------------------- | ---------------- | ---------------- | ------------- | |
| 18 | +| Without Quantization | 3274.15 | 5912.65 | — | |
| 19 | +| With Quantization | **1302.57** | **2581.50** | ✅ ~60% faster | |
| 20 | + |
| 21 | +--- |
| 22 | + |
| 23 | +## 🧩 Tech Stack |
| 24 | + |
| 25 | +| Category | Tools Used | |
| 26 | +| ---------------- | ------------------------------------------ | |
| 27 | +| **Modeling** | Hugging Face Transformers (DistilBERT) | |
| 28 | +| **Serving** | FastAPI, Uvicorn | |
| 29 | +| **Deployment** | AWS EC2, S3 | |
| 30 | +| **Automation** | GitHub Actions (CI/CD) | |
| 31 | +| **Monitoring** | Custom structured logging (`logs/app.log`) | |
| 32 | +| **Testing** | Pytest | |
| 33 | +| **Load Testing** | Async load simulator with aiohttp | |
| 34 | +| **Optimization** | PyTorch Dynamic Quantization | |
| 35 | + |
| 36 | +--- |
| 37 | + |
| 38 | +## ⚙️ Key Features |
| 39 | + |
| 40 | +* 🔁 **Automated model fetch from S3** during startup |
| 41 | +* ⚙️ **Dynamic quantization toggle** for faster CPU inference |
| 42 | +* 📈 **Structured request logging** (latency, client IP, text length, sentiment) |
| 43 | +* 🧪 **Pytest-based CI pipeline** for stability |
| 44 | +* 🌐 **FastAPI endpoint** for real-time predictions |
| 45 | +* 📊 **Load simulator** to measure performance under concurrent requests |
| 46 | + |
| 47 | +--- |
| 48 | + |
| 49 | +## 🧰 API Usage |
| 50 | + |
| 51 | +### **Health Check** |
| 52 | + |
| 53 | +```bash |
| 54 | +GET /ping |
| 55 | +``` |
| 56 | + |
| 57 | +✅ Response: |
| 58 | + |
| 59 | +```json |
| 60 | +{"status": 200, "quantized": false} |
| 61 | +``` |
| 62 | + |
| 63 | +### **Predict Endpoint** |
| 64 | + |
| 65 | +```bash |
| 66 | +POST /predict |
| 67 | +``` |
| 68 | + |
| 69 | +**Body:** |
| 70 | + |
| 71 | +```json |
| 72 | +{ |
| 73 | + "text": "The movie was absolutely fantastic!", |
| 74 | + "quantize": true |
| 75 | +} |
| 76 | +``` |
| 77 | + |
| 78 | +✅ Response: |
| 79 | + |
| 80 | +```json |
| 81 | +{ |
| 82 | + "sentiment": "positive", |
| 83 | + "latency_ms": "1320.45", |
| 84 | + "quantized": true |
| 85 | +} |
| 86 | +``` |
| 87 | + |
| 88 | +--- |
| 89 | + |
| 90 | +## 📦 Project Structure |
| 91 | + |
| 92 | +``` |
| 93 | +sentiment-mlops/ |
| 94 | +├── .github/ |
| 95 | +│ └── workflows/ |
| 96 | +│ ├── ci.yml # Continuous Integration: pytest, lint checks |
| 97 | +│ └── cd.yml # Continuous Deployment: deploy to EC2 |
| 98 | +│ |
| 99 | +├── app/ |
| 100 | +│ ├── serve.py # FastAPI app for serving predictions |
| 101 | +│ └── utils.py # Model loading, quantization, and inference logic |
| 102 | +│ |
| 103 | +├── logs/ |
| 104 | +│ ├── app.log # Application logs |
| 105 | +│ ├── latency-stats.txt # Performance summary |
| 106 | +│ └── simulator.log # Load test results |
| 107 | +│ |
| 108 | +├── scripts/ |
| 109 | +│ ├── load_simulator.py # Simulates concurrent requests for stress testing |
| 110 | +│ └── analyze_simulator_logs.py # Parses and visualizes latency results |
| 111 | +│ |
| 112 | +├── deploy.sh # Shell script for EC2 deployment |
| 113 | +├── Makefile # Unified dev commands (test, run, deploy) |
| 114 | +├── requirements.txt # Project dependencies |
| 115 | +├── test_setup.py # Basic API and health-check tests |
| 116 | +└── README.md # Documentation (you’re reading it!) |
| 117 | +
|
| 118 | +``` |
| 119 | + |
| 120 | +--- |
| 121 | + |
| 122 | +## 🔄 CI/CD Pipeline |
| 123 | + |
| 124 | +GitHub Actions automates: |
| 125 | + |
| 126 | +* ✅ Environment setup (Python + dependencies) |
| 127 | +* ✅ Linting & testing via `pytest` |
| 128 | +* ✅ Failure alerts on PRs |
| 129 | +* ✅ (Optional) Deployment to EC2 after passing tests |
| 130 | + |
| 131 | +This ensures your API is **always tested before merging**, just like real MLOps production pipelines. |
| 132 | + |
| 133 | +--- |
| 134 | + |
| 135 | +## 📊 Load Testing |
| 136 | + |
| 137 | +Run async load simulation to test stability under concurrent requests: |
| 138 | + |
| 139 | +```bash |
| 140 | +python scripts/load_simulator.py |
| 141 | +``` |
| 142 | + |
| 143 | +Results are logged to `logs/simulator.log` and visualized with a latency-over-time plot. |
| 144 | + |
| 145 | +--- |
| 146 | + |
| 147 | +## ⚡ Quantization Impact |
| 148 | + |
| 149 | +| Metric | Without Quantization | With Quantization | |
| 150 | +| -------------------- | -------------------- | ----------------- | |
| 151 | +| **Min Latency (ms)** | 331.52 | 291.97 | |
| 152 | +| **Avg Latency (ms)** | 3274.15 | **1302.57** | |
| 153 | +| **P95 Latency (ms)** | 5912.65 | **2581.50** | |
| 154 | +| **Max Latency (ms)** | 9143.29 | **4830.67** | |
| 155 | + |
| 156 | +👉 Demonstrates how **PyTorch dynamic quantization** reduces model size and speeds up inference — essential for CPU-based deployments. |
| 157 | + |
| 158 | +--- |
| 159 | + |
| 160 | +## 🧪 Run Tests |
| 161 | + |
| 162 | +```bash |
| 163 | +make test |
| 164 | +``` |
| 165 | + |
| 166 | +or |
| 167 | + |
| 168 | +```bash |
| 169 | +PYTHONPATH=. pytest -v |
| 170 | +``` |
| 171 | + |
| 172 | +--- |
| 173 | + |
| 174 | +## ☁️ Deployment |
| 175 | + |
| 176 | +The app is designed for **AWS EC2 deployment**. |
| 177 | +Model files are fetched from **S3** on first run and cached locally. |
| 178 | + |
| 179 | +```bash |
| 180 | +uvicorn app.serve:app --host 0.0.0.0 --port 8000 |
| 181 | +``` |
| 182 | + |
| 183 | +--- |
| 184 | + |
| 185 | +## 🧠 MLOps Pitch |
| 186 | + |
| 187 | +This project demonstrates a **production-ready NLP deployment pipeline** with performance optimization and automation at its core. |
| 188 | +By integrating **AWS**, **FastAPI**, and **GitHub Actions**, it showcases how MLOps turns research models into **reliable, scalable services** — cutting latency by **60%** through smart model quantization. |
| 189 | + |
| 190 | +--- |
| 191 | + |
| 192 | +## 👨💻 Author |
| 193 | + |
| 194 | +**M. Farrukh Mehmood** |
| 195 | + |
| 196 | + |
| 197 | +🔗 [LinkedIn](https://www.linkedin.com/in/sfarrukhm) | [GitHub](https://github.com/sfarrukhm) |
0 commit comments