Skip to content

A complete end-to-end machine learning system that trains a YOLOv8 hardhat detection model, exports it to ONNX, serves predictions through a high-performance Rust web service, and deploys to Kubernetes on AWS EKS.

Notifications You must be signed in to change notification settings

luiz826/kube-rust-vision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🦀 Kube Rust Vision ML

Production-ready ML object detection service built with Rust, YOLOv8, and Kubernetes

A complete end-to-end machine learning system that trains a YOLOv8 hardhat detection model, exports it to ONNX, serves predictions through a high-performance Rust web service, and deploys to Kubernetes on AWS EKS.

Rust Python Kubernetes AWS


📋 Table of Contents


🎯 Overview

This project demonstrates a production-grade ML pipeline for real-time object detection:

  1. Train YOLOv8 models with MLflow tracking (Python)
  2. Export models to ONNX format for cross-platform inference
  3. Serve predictions via a blazing-fast Rust API service
  4. Deploy to Kubernetes with infrastructure as code (Terraform)
  5. Scale horizontally with multiple replicas and health checks

Use Case: Detect whether construction workers are wearing hardhats in real-time images.


🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Training Pipeline (Python)                │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐              │
│  │ Dataset  │───▶│ YOLOv8   │───▶│  ONNX    │              │
│  │ (Images) │    │ Training │    │  Export  │              │
│  └──────────┘    └──────────┘    └──────────┘              │
│                        │                                     │
│                        ▼                                     │
│                   ┌──────────┐                               │
│                   │  MLflow  │ (Experiment Tracking)         │
│                   └──────────┘                               │
└─────────────────────────────────────────────────────────────┘
                            │
                            │ yolov8n_hardhat.onnx
                            ▼
┌─────────────────────────────────────────────────────────────┐
│              Inference Service (Rust + Axum)                 │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐              │
│  │  Image   │───▶│  ONNX    │───▶│  POST    │              │
│  │  Input   │    │ Runtime  │    │ Response │              │
│  └──────────┘    └──────────┘    └──────────┘              │
│                                                              │
│  Endpoints: /health, /predict                                │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│        Kubernetes Deployment (AWS EKS + Terraform)           │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐              │
│  │   VPC    │───▶│   EKS    │───▶│ Service  │              │
│  │ Network  │    │ Cluster  │    │  (LB)    │              │
│  └──────────┘    └──────────┘    └──────────┘              │
│                                                              │
│  Infrastructure as Code with Terraform                       │
└─────────────────────────────────────────────────────────────┘

✨ Features

Machine Learning

  • 🎯 YOLOv8n object detection for hardhat safety detection
  • 📊 MLflow integration for experiment tracking and model registry
  • 🔄 Automated ONNX export for production deployment
  • 📈 Comprehensive training configuration with data augmentation

Inference Service

  • High-performance Rust web service with Axum framework
  • 🔥 ONNX Runtime for optimized cross-platform inference
  • 🎨 Image preprocessing with automatic resizing and normalization
  • 🎯 Non-Maximum Suppression (NMS) for accurate bounding boxes
  • ❤️ Health check endpoint for Kubernetes probes

DevOps & Infrastructure

  • 🐳 Multi-stage Docker builds for minimal image size
  • ☸️ Kubernetes manifests with deployments, services, and namespaces
  • 🏗️ Terraform IaC for AWS EKS cluster provisioning
  • 📦 Horizontal scaling with replica management
  • 🔍 Liveness probes for automatic recovery

🔧 Prerequisites

For Training (Python)

  • Python 3.9+
  • pip or conda

For Inference Service (Rust)

  • Rust 1.70+
  • Cargo

For Deployment

  • Docker
  • kubectl
  • Terraform 1.0+
  • AWS CLI (configured with credentials)
  • AWS Account with EKS permissions

📁 Project Structure

kube-rust-vision-ml/
├── python_training/          # ML training pipeline
│   ├── train.py              # Main training script with MLflow
│   ├── test_model.py         # Model testing utilities
│   ├── model-config.yaml     # Training hyperparameters
│   ├── requirements.txt      # Python dependencies
│   ├── hardhat/              # Dataset (images + labels)
│   │   ├── data.yaml
│   │   ├── train/
│   │   ├── valid/
│   │   └── test/
│   ├── mlruns/               # MLflow experiment tracking (gitignored)
│   └── runs/                 # Training outputs (gitignored)
│
├── rust_service/             # Rust inference API
│   ├── src/
│   │   └── main.rs           # Axum server + ONNX inference
│   ├── Cargo.toml            # Rust dependencies
│   ├── Dockerfile            # Multi-stage container build
│   └── yolov8n_hardhat.onnx  # Exported ONNX model (gitignored)
│
├── k8s/                      # Kubernetes manifests
│   ├── 1-namespace.yml       # Namespace definition
│   ├── 2-deployment.yml      # Deployment with replicas
│   └── 3-service.yml         # LoadBalancer service
│
├── terraform/                # Infrastructure as Code
│   ├── main.tf               # Provider configuration
│   ├── vpc.tf                # VPC networking
│   ├── eks.tf                # EKS cluster setup
│   ├── ect.tf                # ECR container registry
│   └── variables.tf          # Configurable variables
│
├── .gitignore                # Git ignore patterns
└── README.md                 # This file

🚀 Quick Start

1. Train the Model

# Navigate to training directory
cd python_training

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On macOS/Linux
# .venv\Scripts\activate   # On Windows

# Install dependencies
pip install -r requirements.txt

# Start MLflow UI (in separate terminal)
mlflow server --host 127.0.0.1 --port 8080 --backend-store-uri file:./mlruns

# Train the model
python train.py

The training script will:

  • Load the hardhat dataset
  • Train YOLOv8n with configured parameters
  • Log metrics to MLflow
  • Export the model to ONNX format
  • Register the model in MLflow registry

View training progress: Open http://127.0.0.1:8080 in your browser

2. Run the Inference Service Locally

# Copy trained ONNX model to rust_service directory
cp python_training/runs/detect/yolov8n_train_and_export*/weights/yolov8n_hardhat.onnx rust_service/

# Navigate to Rust service
cd rust_service

# Build and run
cargo run --release

The service will start on http://localhost:8080

3. Test the API

# Health check
curl http://localhost:8080/health

# Make a prediction (replace with your image path)
curl -X POST http://localhost:8080/predict \
  --data-binary @test_image.jpg \
  -H "Content-Type: image/jpeg"

Expected Response:

{
  "status": "success",
  "detections": [
    {
      "x_min": 123.45,
      "y_min": 67.89,
      "x_max": 234.56,
      "y_max": 345.67,
      "confidence": 0.87,
      "class_id": 0,
      "class_name": "hardhat"
    }
  ]
}

🐳 Docker Deployment

Build Docker Image

cd rust_service

# Build the image
docker build -t rust-vision-service:latest .

# Run the container
docker run -p 8080:8080 rust-vision-service:latest

☸️ Kubernetes Deployment

Step 1: Provision Infrastructure with Terraform

cd terraform

# Initialize Terraform
terraform init

# Review the plan
terraform plan

# Apply infrastructure
terraform apply

# Configure kubectl
aws eks update-kubeconfig --region us-east-1 --name your-cluster-name

Step 2: Deploy to Kubernetes

# Apply Kubernetes manifests
kubectl apply -f k8s/1-namespace.yml
kubectl apply -f k8s/2-deployment.yml
kubectl apply -f k8s/3-service.yml

# Check deployment status
kubectl get pods -n rust-vision-app
kubectl get services -n rust-vision-app

# Get the LoadBalancer URL
kubectl get service rust-vision-service -n rust-vision-app

Step 3: Test in Production

# Get the external IP
EXTERNAL_IP=$(kubectl get service rust-vision-service -n rust-vision-app -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')

# Health check
curl http://$EXTERNAL_IP:8080/health

# Make prediction
curl -X POST http://$EXTERNAL_IP:8080/predict \
  --data-binary @test_image.jpg \
  -H "Content-Type: image/jpeg"

🛠️ Development

Training Configuration

Edit python_training/model-config.yaml to customize:

  • Model architecture (YOLOv8n, YOLOv8s, YOLOv8m, etc.)
  • Training epochs and batch size
  • Learning rate and optimizer settings
  • Data augmentation parameters
  • Export settings

Rust Service Configuration

Key constants in rust_service/src/main.rs:

const CLASSES: [&str; 2] = ["hardhat", "no-hardhat"];
const CONFIDENCE_THRESHOLD: f32 = 0.5;
const NMS_IOU_THRESHOLD: f32 = 0.45;

Infrastructure Configuration

Edit terraform/variables.tf to customize:

  • AWS region
  • EKS cluster name
  • Node instance types
  • VPC CIDR blocks

📚 API Documentation

Endpoints

GET /health

Health check endpoint for Kubernetes liveness probes.

Response:

200 OK

POST /predict

Perform object detection on an uploaded image.

Request:

  • Content-Type: image/jpeg, image/png
  • Body: Raw image binary data

Response:

{
  "status": "success",
  "detections": [
    {
      "x_min": float,
      "y_min": float,
      "x_max": float,
      "y_max": float,
      "confidence": float,
      "class_id": int,
      "class_name": string
    }
  ]
}

Error Responses:

  • 400 Bad Request: Invalid image format
  • 500 Internal Server Error: Inference failure

⚙️ Configuration

Environment Variables

Python Training:

  • MLFLOW_TRACKING_URI: MLflow server URL (default: http://127.0.0.1:8080)

Rust Service:

  • Configured via source code constants (see Development section)

Model Configuration

The model-config.yaml file controls all aspects of training:

  • Model: Pre-trained weights and architecture
  • Data: Dataset paths and classes
  • Training: Epochs, batch size, workers, device
  • Optimizer: Learning rate, momentum, weight decay
  • Augmentation: Flip, rotate, scale, HSV adjustments
  • Export: ONNX opset version and naming

🧪 Testing

Test Model Locally

cd python_training
python test_model.py

Run Rust Tests

cd rust_service
cargo test

📈 Performance

  • Inference Speed: ~50-100ms per image (CPU)
  • Model Size: ~6MB (YOLOv8n ONNX)
  • Docker Image: ~150MB (multi-stage build)
  • Memory Usage: ~100MB per replica

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.


🙏 Acknowledgments

  • Ultralytics YOLOv8: For the excellent object detection framework
  • ONNX Runtime: For cross-platform ML inference
  • Axum: For the ergonomic Rust web framework
  • MLflow: For experiment tracking and model registry
  • Roboflow: For the hardhat dataset

📞 Contact

For questions or support, please open an issue on GitHub.


Built with ❤️ using Rust, Python, and Kubernetes

About

A complete end-to-end machine learning system that trains a YOLOv8 hardhat detection model, exports it to ONNX, serves predictions through a high-performance Rust web service, and deploys to Kubernetes on AWS EKS.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published