Skip to content

Latest commit

 

History

History
205 lines (166 loc) · 8.29 KB

File metadata and controls

205 lines (166 loc) · 8.29 KB

OptiBot

Real-time visual perception system for robots combining object detection (YOLO) and monocular depth estimation (MiDaS) over low-latency WebRTC streams.

Quick Start (Windows)

For Windows users, we provide an automated setup script:

PowerShell -ExecutionPolicy Bypass -File .\start-optibot.ps1

This script will:

  • Install all required dependencies (Python 3.11, Node.js, Git, Make, uv)
  • Build the project
  • Start all services automatically
  • Open your browser to http://localhost:3000

Requirements: Administrator privileges

For manual setup or other platforms, see the sections below.

What’s included

  • FastAPI + aiortc backend streaming webcam frames
  • YOLOv8n inference (Ultralytics) on the server
  • Rough monocular distance estimate per detection
  • WebRTC DataChannel sending metadata to the client
  • React + Vite frontend showing the remote stream and detection stats

Run with Docker Compose (Linux only)

Note: Camera access in Docker only works on Linux.

make docker-compose-up

Run locally (All platforms)

Prereqs: Python 3.11

  1. Install dependencies
make dev
  1. Start the webcam service
make run-streamer-webcam

Or alternatively, if you provide a .mp4 file in the /backend folder per default, you can start the file service instead

make run-streamer-file
  1. Start the analyzer service (separate terminal)
make run-analyzer-local
  1. Start the frontend service (separate terminal)
make run-frontend-local

Open the shown URL in your console.

Model Management

# Download default models (YOLO pt, MiDaS cache)
make download-models

# Download and export to ONNX
make download-models-onnx

# Download individual models
make download-yolo
make download-midas
make download-depth-anything

# Export models to ONNX
make export-yolo-onnx
make export-midas-onnx

FP16 Quantization (Optional)

Export models with FP16 precision for ~50% size reduction:

ONNX_HALF_PRECISION=true make export-onnx

To start the analyzer service with ONNX backend:

DETECTOR_BACKEND=onnx DEPTH_BACKEND=onnx make run-analyzer-local

To start the analyzer service with Depth Anything V2 backend:

DEPTH_BACKEND=depth_anything_v2 make run-analyzer-local

Example production usage with custom model type:

# Set model type via environment variable
MIDAS_MODEL_TYPE=DPT_Hybrid \
cd src/backend && uv run python -m analyzer.cli \
  --yolo-model-path ./models/yolo11n.pt \
  --midas-model-path ./models/midas_cache

Available CLI flags:

  • --yolo-model-path: Path to YOLO model file (e.g., yolo11n.pt; yolov8n.pt still works)
  • --midas-model-path: Path to MiDaS model cache directory
  • --host: Host to bind to (default: 0.0.0.0)
  • --port: Port to bind to (default: 8001)
  • --reload: Enable auto-reload for development

Environment Variables

Optional environment variables:

  • CAMERA_INDEX (default 0) - select webcam device
  • REGION_SIZE (default 5) - size of the central bounding box region where we take the mean of the depth map from (should be odd for symmetry)
  • SCALE_FACTOR (default 432.0) - scaling of the relative depth map generated by MiDaS (must be determined empirically)
  • UPDATE_FREQ (default 2) - number of frames between depth updates
  • TARGET_SCALE_INIT (default 0.8) - initial downscale factor for images
  • SMOOTH_FACTOR (default 0.15) - smoothing factor for scale updates
  • MIN_SCALE (default 0.2) - minimum allowed scale
  • MAX_SCALE (default 1.0) - maximum allowed scale
  • FPS_THRESHOLD (default 15.0) - threshold FPS for skipping more frames
  • DEPTH_ANYTHING_SCALE_FACTOR (default 0.5) - tunable Depth Anything scale factor
  • CAMERA_FX/FY/CX/CY - intrinsic matrix entries in pixels (set these when you have calibrated your camera; overrides FOV-derived values)
  • CAMERA_FOV_X_DEG/CAMERA_FOV_Y_DEG - fallback field of view (used only when FX/FY are not provided)
  • DEPTH_BACKEND - torch (default), onnx, or depth_anything_v2
  • MIDAS_MODEL_TYPE - MiDaS variant to load (MiDaS_small, DPT_Hybrid, DPT_Large)
  • MIDAS_MODEL_REPO - torch.hub repo for MiDaS (default intel-isl/MiDaS)
  • MIDAS_CACHE_DIR - MiDaS cache directory (default models/midas_cache)
  • DEPTH_ANYTHING_MODEL - Hugging Face model ID for Depth Anything V2 (default depth-anything/Depth-Anything-V2-Small-hf)
  • DEPTH_ANYTHING_CACHE_DIR - Depth Anything cache directory (default models/depth_anything_cache)
  • MIDAS_ONNX_MODEL_PATH - defaults to models/midas_small.onnx
  • MIDAS_ONNX_INPUT_SIZE – input size for MiDaS ONNX preprocessing (default: 384)
  • MIDAS_ONNX_PROVIDERS - comma separated ONNX Runtime providers for depth (falls back to ONNX_PROVIDERS)
  • ONNX_SHARED_PREPROCESSING – reuse one resize step for ONNX detector + depth when sizes align (default: true)
  • DETECTOR_BACKEND - torch (default) or onnx
  • TORCH_DEVICE - force PyTorch to use cuda:0, cpu, etc. (defaults to best available)
  • TORCH_HALF_PRECISION - auto (default), true, or false
  • MODEL_PATH (default models/yolo11n.pt) - default YOLO model path (used when no CLI flag is provided)
  • ONNX_MODEL_PATH - defaults to models/yolo11n.onnx
  • ONNX_OPSET - opset used during ONNX export (default: 18 via make export-onnx)
  • ONNX_SIMPLIFY - simplify the exported ONNX graph (true/false, default: true)
  • ONNX_PROVIDERS - comma separated list such as CUDAExecutionProvider,CPUExecutionProvider
  • DETECTOR_IMAGE_SIZE, DETECTOR_CONF_THRESHOLD, DETECTOR_IOU_THRESHOLD, DETECTOR_MAX_DETECTIONS, DETECTOR_NUM_CLASSES
  • TRACKING_IOU_THRESHOLD (default 0.1) - minimum IoU to match detection to track
  • TRACKING_MAX_FRAMES_WITHOUT_DETECTION (default 10) - frames before removing stale tracks
  • TRACKING_EARLY_TERMINATION_IOU (default 0.9) - early termination threshold for matching
  • TRACKING_CONFIDENCE_DECAY (default 0.1) - confidence decay per interpolation factor
  • TRACKING_MAX_HISTORY_SIZE (default 5) - size for history of each tracked object
  • DETECTION_THRESHOLD (default 2) - minimum detections before a track becomes active/sent
  • VIDEO_FILE_PATH (default video.mp4 relative to the /backend folder) - default video file path for the file WebRTC service
  • VIDEO_SOURCE_TYPE (default webcam) - video source for the streamer (webcam or file)
  • STREAMER_OFFER_URL (default http://localhost:8000/offer) - upstream offer URL for the analyzer
  • STUN_SERVER (default stun:stun.l.google.com:19302) - STUN server for WebRTC
  • ICE_GATHERING_TIMEOUT (default 5.0) - timeout for ICE gathering
  • CORS_ORIGINS (default *) - comma separated CORS origins
  • LOG_INTRINSICS (default false) - log resolved intrinsics at runtime
  • ANALYZER_SETTINGS_FILE - path to JSON settings file (default config/analyzer.json)

Check src/backend/common/config.py.

Analyzer settings file (JSON)

The analyzer can load a JSON settings file on startup. If the file does not exist, it falls back to the default config values.

Default path:

  • config/analyzer.json

Override the path:

  • ANALYZER_SETTINGS_FILE=/path/to/analyzer.json

Format:

  • JSON object where keys match the config names in src/backend/common/config.py.
  • Values in the JSON override the defaults and environment variables for the analyzer.

Example config/analyzer.json:

{
  "MODEL_PATH": "models/yolo11n.pt",
  "DETECTOR_BACKEND": "onnx",
  "DEPTH_BACKEND": "depth_anything_v2",
  "DETECTOR_CONF_THRESHOLD": 0.35,
  "TRACKING_IOU_THRESHOLD": 0.2
}

Calibrate depth and XYZ

  • Set camera intrinsics: if you have calibrated values, export them to env vars (pixels): CAMERA_FX, CAMERA_FY, CAMERA_CX, CAMERA_CY. If not, set approximate FOVs: CAMERA_FOV_X_DEG=78 CAMERA_FOV_Y_DEG=65 (defaults). Intrinsics are derived from the first frame size plus these values.
  • Calibrate scale for MiDaS: place a target at a known distance D_true straight ahead, read the reported distance D_est. Update SCALE_FACTOR using SCALE_FACTOR_new = SCALE_FACTOR_old * (D_true / D_est), then restart the analyzer. Repeat once or twice until Z is correct; X/Y will align automatically.
  • Optional: to log the intrinsics resolved at runtime, set LOG_INTRINSICS=true on the analyzer process.

IMPORTANT: Please read the CONTRIBUTING.md.