Training a YOLO model for sock detection. Training requires a GPU — on CPU it would take too long.
| Device | Flag | Time (100 epochs) |
|---|---|---|
| NVIDIA GPU (RTX 4070 Super) | --device 0 |
~15 min |
| Apple Silicon (M3 Pro) | --device mps |
~20 min |
| CPU | --device cpu |
several hours |
This project uses a dedicated NVIDIA GPU server for training.
Before using ssh blackops in the examples below, make sure this host alias is configured in ~/.ssh/config (and optionally in /etc/hosts). See: infrastructure.md
ssh blackops
# Verify GPU is available
nvidia-smi
# Install dependencies
python3 -m venv venv
source venv/bin/activate
pip install ultralyticsuv venv && uv pip install ultralyticsMPS (Metal Performance Shaders) is supported by ultralytics out of the box.
# Full training (100 epochs, GPU)
./main.py train --model yolov8n.pt --data data.yaml --epochs 100 --batch 16 --device 0
# Quick test (10 epochs, Apple Silicon)
./main.py train --model yolov8n.pt --data data.yaml --epochs 10 --batch 8 --device mps
# On CPU (slow, for verification only)
./main.py train --model yolov8n.pt --data data.yaml --epochs 10 --batch 4 --device cpuyolo task=detect mode=train model=yolov8n.pt imgsz=640 data=data.yaml epochs=100 batch=16from ultralytics import YOLO
model = YOLO("yolov8n.pt")
results = model.train(data="data.yaml", epochs=100, imgsz=640, batch=16, device="0")| Parameter | Value | Description |
|---|---|---|
--model |
yolov8n.pt |
Base model (nano — lightest, suitable for RPi) |
--data |
data.yaml |
Dataset config |
--epochs |
100 |
Number of epochs |
--imgsz |
640 |
Image size (px) |
--batch |
16 |
Batch size (reduce if not enough VRAM) |
--device |
0 / mps / cpu |
Training device |
After training, ultralytics saves results to runs/detect/train/:
runs/detect/train/
├── weights/
│ ├── best.pt # Best model (by mAP50)
│ └── last.pt # Last epoch
├── results.csv # Metrics per epoch
├── confusion_matrix.png # Confusion matrix
├── F1_curve.png # F1 curve
├── P_curve.png # Precision curve
├── R_curve.png # Recall curve
└── results.png # Metric plots
- Precision — fraction of correct detections among all detections
- Recall — fraction of found objects among all real objects
- mAP50 — mean average precision at IoU=0.5 (primary metric)
- mAP50-95 — mean average precision at IoU 0.5 to 0.95
In practice: use mAP50 as the main quick quality signal, and mAP50-95 as the stricter metric that better reflects box quality across different IoU thresholds.
| Model | mAP50 | mAP50-95 | Size | File |
|---|---|---|---|---|
| YOLOv8n (100 epochs) | 0.995 | 0.885 | 6.0 MB | models/yolo8_best.pt |
| YOLOv11n (100 epochs) | 0.995 | 0.96 | 5.2 MB | models/yolo11_best.pt |
To evaluate inference speed across different formats:
./main.py bench --model best.ptAfter training, the .pt model should be exported to a deployment format for RPi.
NCNN (Tencent) is optimized for ARM processors. On RPi 5 it delivers 15.8 FPS (pure inference) / 14.9 FPS (with preprocessing) via pip ncnn + OMP workaround, vs 3.5 FPS for PyTorch.
| Format | RPi 5 FPS | Recommendation |
|---|---|---|
| pip ncnn native (4 OMP) | 14.9–15.8 | ✅ Production on RPi (NcnnNativeDetector) |
| NCNN (ultralytics) | 11.2 | Alternative (without OMP workaround) |
| ONNX | 3.0 | Slower, crashes on RPi 4 (legacy) |
| PyTorch (.pt) | 3.5 | Development and GPU only |
Quick rule of thumb:
- use NCNN for deployment on Raspberry Pi;
- keep
.ptfor training, debugging, and GPU hosts; - use ONNX only if you specifically need cross-framework compatibility.
# 1. Train the model (on the GPU server)
ssh blackops
cd ~/work/test20250807_yolov8
python -c "
from ultralytics import YOLO
model = YOLO('yolo11n.pt')
model.train(data='data.yaml', epochs=100, batch=16, device=0)
"
# Result: runs/detect/train/weights/best.pt
# 2. Export to NCNN (on dev machine or GPU server)
python -c "
from ultralytics import YOLO
model = YOLO('runs/detect/train/weights/best.pt')
model.export(format='ncnn')
"
# Result: runs/detect/train/weights/best_ncnn_model/
# ├── model.ncnn.param (model graph, ~22 KB)
# ├── model.ncnn.bin (weights, ~10 MB)
# └── metadata.yaml (ultralytics metadata)
# 3. Copy to RPi
scp -r runs/detect/train/weights/best_ncnn_model rpi5:~/sockstank/models/yolo11_best_ncnn_model
# 4. Run SocksTank on RPi
ssh rpi5
cd ~/sockstank
sudo -E python3 main.py serve --model models/yolo11_best_ncnn_model --conf 0.5# NCNN (recommended for RPi)
yolo export model=best.pt format=ncnn imgsz=640
# ONNX (generic)
yolo export model=best.pt format=onnx imgsz=640from ultralytics import YOLO
model = YOLO("best.pt")
# NCNN
model.export(format="ncnn") # -> best_ncnn_model/
# ONNX
model.export(format="onnx") # -> best.onnx| Argument | Type | Default | Description |
|---|---|---|---|
format |
str | torchscript |
Export format: ncnn, onnx, torchscript, engine (TensorRT) |
imgsz |
int/tuple | 640 |
Input image size |
half |
bool | False |
FP16 quantization |
int8 |
bool | False |
INT8 quantization (for edge devices) |
simplify |
bool | True |
Simplify model graph (ONNX) |
dynamic |
bool | False |
Dynamic input sizes (ONNX, TensorRT) |
device |
str | None |
Device: 0 (GPU), cpu, mps |
data |
str | coco8.yaml |
Dataset path for INT8 calibration |
| Model | Format | File | Size | Purpose |
|---|---|---|---|---|
| YOLOv11n | PyTorch | models/yolo11_best.pt |
5.2 MB | GPU, development |
| YOLOv11n | NCNN | models/yolo11_best_ncnn_model/ |
10.4 MB | RPi production |
| YOLOv11n | ONNX | models/yolo11_best.onnx |
10.1 MB | Generic |
| YOLOv8n | PyTorch | models/yolo8_best.pt |
6.0 MB | Older model |
| ← Previous | README | Next → |
|---|---|---|
| Dataset Preparation | Back to README | Running the Project |