🦀 Kube Rust Vision ML

Production-ready ML object detection service built with Rust, YOLOv8, and Kubernetes

A complete end-to-end machine learning system that trains a YOLOv8 hardhat detection model, exports it to ONNX, serves predictions through a high-performance Rust web service, and deploys to Kubernetes on AWS EKS.

🎯 Overview

This project demonstrates a production-grade ML pipeline for real-time object detection:

Train YOLOv8 models with MLflow tracking (Python)
Export models to ONNX format for cross-platform inference
Serve predictions via a blazing-fast Rust API service
Deploy to Kubernetes with infrastructure as code (Terraform)
Scale horizontally with multiple replicas and health checks

Use Case: Detect whether construction workers are wearing hardhats in real-time images.

🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Training Pipeline (Python)                │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐              │
│  │ Dataset  │───▶│ YOLOv8   │───▶│  ONNX    │              │
│  │ (Images) │    │ Training │    │  Export  │              │
│  └──────────┘    └──────────┘    └──────────┘              │
│                        │                                     │
│                        ▼                                     │
│                   ┌──────────┐                               │
│                   │  MLflow  │ (Experiment Tracking)         │
│                   └──────────┘                               │
└─────────────────────────────────────────────────────────────┘
                            │
                            │ yolov8n_hardhat.onnx
                            ▼
┌─────────────────────────────────────────────────────────────┐
│              Inference Service (Rust + Axum)                 │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐              │
│  │  Image   │───▶│  ONNX    │───▶│  POST    │              │
│  │  Input   │    │ Runtime  │    │ Response │              │
│  └──────────┘    └──────────┘    └──────────┘              │
│                                                              │
│  Endpoints: /health, /predict                                │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│        Kubernetes Deployment (AWS EKS + Terraform)           │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐              │
│  │   VPC    │───▶│   EKS    │───▶│ Service  │              │
│  │ Network  │    │ Cluster  │    │  (LB)    │              │
│  └──────────┘    └──────────┘    └──────────┘              │
│                                                              │
│  Infrastructure as Code with Terraform                       │
└─────────────────────────────────────────────────────────────┘

✨ Features

Machine Learning

🎯 YOLOv8n object detection for hardhat safety detection
📊 MLflow integration for experiment tracking and model registry
🔄 Automated ONNX export for production deployment
📈 Comprehensive training configuration with data augmentation

Inference Service

⚡ High-performance Rust web service with Axum framework
🔥 ONNX Runtime for optimized cross-platform inference
🎨 Image preprocessing with automatic resizing and normalization
🎯 Non-Maximum Suppression (NMS) for accurate bounding boxes
❤️ Health check endpoint for Kubernetes probes

DevOps & Infrastructure

🐳 Multi-stage Docker builds for minimal image size
☸️ Kubernetes manifests with deployments, services, and namespaces
🏗️ Terraform IaC for AWS EKS cluster provisioning
📦 Horizontal scaling with replica management
🔍 Liveness probes for automatic recovery

🔧 Prerequisites

For Training (Python)

Python 3.9+
pip or conda

For Inference Service (Rust)

Rust 1.70+
Cargo

For Deployment

Docker
kubectl
Terraform 1.0+
AWS CLI (configured with credentials)
AWS Account with EKS permissions

📁 Project Structure

kube-rust-vision-ml/
├── python_training/          # ML training pipeline
│   ├── train.py              # Main training script with MLflow
│   ├── test_model.py         # Model testing utilities
│   ├── model-config.yaml     # Training hyperparameters
│   ├── requirements.txt      # Python dependencies
│   ├── hardhat/              # Dataset (images + labels)
│   │   ├── data.yaml
│   │   ├── train/
│   │   ├── valid/
│   │   └── test/
│   ├── mlruns/               # MLflow experiment tracking (gitignored)
│   └── runs/                 # Training outputs (gitignored)
│
├── rust_service/             # Rust inference API
│   ├── src/
│   │   └── main.rs           # Axum server + ONNX inference
│   ├── Cargo.toml            # Rust dependencies
│   ├── Dockerfile            # Multi-stage container build
│   └── yolov8n_hardhat.onnx  # Exported ONNX model (gitignored)
│
├── k8s/                      # Kubernetes manifests
│   ├── 1-namespace.yml       # Namespace definition
│   ├── 2-deployment.yml      # Deployment with replicas
│   └── 3-service.yml         # LoadBalancer service
│
├── terraform/                # Infrastructure as Code
│   ├── main.tf               # Provider configuration
│   ├── vpc.tf                # VPC networking
│   ├── eks.tf                # EKS cluster setup
│   ├── ect.tf                # ECR container registry
│   └── variables.tf          # Configurable variables
│
├── .gitignore                # Git ignore patterns
└── README.md                 # This file

🚀 Quick Start

1. Train the Model

# Navigate to training directory
cd python_training

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On macOS/Linux
# .venv\Scripts\activate   # On Windows

# Install dependencies
pip install -r requirements.txt

# Start MLflow UI (in separate terminal)
mlflow server --host 127.0.0.1 --port 8080 --backend-store-uri file:./mlruns

# Train the model
python train.py

The training script will:

Load the hardhat dataset
Train YOLOv8n with configured parameters
Log metrics to MLflow
Export the model to ONNX format
Register the model in MLflow registry

View training progress: Open http://127.0.0.1:8080 in your browser

2. Run the Inference Service Locally

# Copy trained ONNX model to rust_service directory
cp python_training/runs/detect/yolov8n_train_and_export*/weights/yolov8n_hardhat.onnx rust_service/

# Navigate to Rust service
cd rust_service

# Build and run
cargo run --release

The service will start on http://localhost:8080

3. Test the API

# Health check
curl http://localhost:8080/health

# Make a prediction (replace with your image path)
curl -X POST http://localhost:8080/predict \
  --data-binary @test_image.jpg \
  -H "Content-Type: image/jpeg"

Expected Response:

{
  "status": "success",
  "detections": [
    {
      "x_min": 123.45,
      "y_min": 67.89,
      "x_max": 234.56,
      "y_max": 345.67,
      "confidence": 0.87,
      "class_id": 0,
      "class_name": "hardhat"
    }
  ]
}

🐳 Docker Deployment

Build Docker Image

cd rust_service

# Build the image
docker build -t rust-vision-service:latest .

# Run the container
docker run -p 8080:8080 rust-vision-service:latest

☸️ Kubernetes Deployment

Step 1: Provision Infrastructure with Terraform

cd terraform

# Initialize Terraform
terraform init

# Review the plan
terraform plan

# Apply infrastructure
terraform apply

# Configure kubectl
aws eks update-kubeconfig --region us-east-1 --name your-cluster-name

Step 2: Deploy to Kubernetes

# Apply Kubernetes manifests
kubectl apply -f k8s/1-namespace.yml
kubectl apply -f k8s/2-deployment.yml
kubectl apply -f k8s/3-service.yml

# Check deployment status
kubectl get pods -n rust-vision-app
kubectl get services -n rust-vision-app

# Get the LoadBalancer URL
kubectl get service rust-vision-service -n rust-vision-app

Step 3: Test in Production

# Get the external IP
EXTERNAL_IP=$(kubectl get service rust-vision-service -n rust-vision-app -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')

# Health check
curl http://$EXTERNAL_IP:8080/health

# Make prediction
curl -X POST http://$EXTERNAL_IP:8080/predict \
  --data-binary @test_image.jpg \
  -H "Content-Type: image/jpeg"

🛠️ Development

Training Configuration

Edit python_training/model-config.yaml to customize:

Model architecture (YOLOv8n, YOLOv8s, YOLOv8m, etc.)
Training epochs and batch size
Learning rate and optimizer settings
Data augmentation parameters
Export settings

Rust Service Configuration

Key constants in rust_service/src/main.rs:

const CLASSES: [&str; 2] = ["hardhat", "no-hardhat"];
const CONFIDENCE_THRESHOLD: f32 = 0.5;
const NMS_IOU_THRESHOLD: f32 = 0.45;

Infrastructure Configuration

Edit terraform/variables.tf to customize:

AWS region
EKS cluster name
Node instance types
VPC CIDR blocks

📚 API Documentation

Endpoints

`GET /health`

Health check endpoint for Kubernetes liveness probes.

Response:

200 OK

`POST /predict`

Perform object detection on an uploaded image.

Request:

Content-Type: image/jpeg, image/png
Body: Raw image binary data

Response:

{
  "status": "success",
  "detections": [
    {
      "x_min": float,
      "y_min": float,
      "x_max": float,
      "y_max": float,
      "confidence": float,
      "class_id": int,
      "class_name": string
    }
  ]
}

Error Responses:

400 Bad Request: Invalid image format
500 Internal Server Error: Inference failure

⚙️ Configuration

Environment Variables

Python Training:

MLFLOW_TRACKING_URI: MLflow server URL (default: http://127.0.0.1:8080)

Rust Service:

Configured via source code constants (see Development section)

Model Configuration

The model-config.yaml file controls all aspects of training:

Model: Pre-trained weights and architecture
Data: Dataset paths and classes
Training: Epochs, batch size, workers, device
Optimizer: Learning rate, momentum, weight decay
Augmentation: Flip, rotate, scale, HSV adjustments
Export: ONNX opset version and naming

🧪 Testing

Test Model Locally

cd python_training
python test_model.py

Run Rust Tests

cd rust_service
cargo test

📈 Performance

Inference Speed: ~50-100ms per image (CPU)
Model Size: ~6MB (YOLOv8n ONNX)
Docker Image: ~150MB (multi-stage build)
Memory Usage: ~100MB per replica

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Ultralytics YOLOv8: For the excellent object detection framework
ONNX Runtime: For cross-platform ML inference
Axum: For the ergonomic Rust web framework
MLflow: For experiment tracking and model registry
Roboflow: For the hardhat dataset

📞 Contact

For questions or support, please open an issue on GitHub.

Built with ❤️ using Rust, Python, and Kubernetes

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
k8s		k8s
python_training		python_training
rust_service		rust_service
terraform		terraform
.gitignore		.gitignore
README.md		README.md

luiz826/kube-rust-vision

Folders and files

Latest commit

History

Repository files navigation