OptiBot

Real-time visual perception system for robots combining object detection (YOLO) and monocular depth estimation (MiDaS) over low-latency WebRTC streams.

Quick Start (Windows)

For Windows users, we provide an automated setup script:

PowerShell -ExecutionPolicy Bypass -File .\start-optibot.ps1

This script will:

Install all required dependencies (Python 3.11, Node.js, Git, Make, uv)
Build the project
Start all services automatically
Open your browser to http://localhost:3000

Requirements: Administrator privileges

For manual setup or other platforms, see the sections below.

What’s included

FastAPI + aiortc backend streaming webcam frames
YOLOv8n inference (Ultralytics) on the server
Rough monocular distance estimate per detection
WebRTC DataChannel sending metadata to the client
React + Vite frontend showing the remote stream and detection stats

Run with Docker Compose (Linux only)

Note: Camera access in Docker only works on Linux.

make docker-compose-up

Run locally (All platforms)

Prereqs: Python 3.11

Install dependencies

make dev

Start the webcam service

make run-streamer-webcam

Or alternatively, if you provide a .mp4 file in the /backend folder per default, you can start the file service instead

make run-streamer-file

Start the analyzer service (separate terminal)

make run-analyzer-local

Start the frontend service (separate terminal)

make run-frontend-local

Open the shown URL in your console.

Model Management

# Download default models (YOLO pt, MiDaS cache)
make download-models

# Download and export to ONNX
make download-models-onnx

# Download individual models
make download-yolo
make download-midas
make download-depth-anything

# Export models to ONNX
make export-yolo-onnx
make export-midas-onnx

FP16 Quantization (Optional)

Export models with FP16 precision for ~50% size reduction:

ONNX_HALF_PRECISION=true make export-onnx

To start the analyzer service with ONNX backend:

DETECTOR_BACKEND=onnx DEPTH_BACKEND=onnx make run-analyzer-local

To start the analyzer service with Depth Anything V2 backend:

DEPTH_BACKEND=depth_anything_v2 make run-analyzer-local

Example production usage with custom model type:

# Set model type via environment variable
MIDAS_MODEL_TYPE=DPT_Hybrid \
cd src/backend && uv run python -m analyzer.cli \
  --yolo-model-path ./models/yolo11n.pt \
  --midas-model-path ./models/midas_cache

Available CLI flags:

--yolo-model-path: Path to YOLO model file (e.g., yolo11n.pt; yolov8n.pt still works)
--midas-model-path: Path to MiDaS model cache directory
--host: Host to bind to (default: 0.0.0.0)
--port: Port to bind to (default: 8001)
--reload: Enable auto-reload for development

Environment Variables

Optional environment variables:

CAMERA_INDEX (default 0) - select webcam device
REGION_SIZE (default 5) - size of the central bounding box region where we take the mean of the depth map from (should be odd for symmetry)
SCALE_FACTOR (default 432.0) - scaling of the relative depth map generated by MiDaS (must be determined empirically)
UPDATE_FREQ (default 2) - number of frames between depth updates
TARGET_SCALE_INIT (default 0.8) - initial downscale factor for images
SMOOTH_FACTOR (default 0.15) - smoothing factor for scale updates
MIN_SCALE (default 0.2) - minimum allowed scale
MAX_SCALE (default 1.0) - maximum allowed scale
FPS_THRESHOLD (default 15.0) - threshold FPS for skipping more frames
DEPTH_ANYTHING_SCALE_FACTOR (default 0.5) - tunable Depth Anything scale factor
CAMERA_FX/FY/CX/CY - intrinsic matrix entries in pixels (set these when you have calibrated your camera; overrides FOV-derived values)
CAMERA_FOV_X_DEG/CAMERA_FOV_Y_DEG - fallback field of view (used only when FX/FY are not provided)
DEPTH_BACKEND - torch (default), onnx, or depth_anything_v2
MIDAS_MODEL_TYPE - MiDaS variant to load (MiDaS_small, DPT_Hybrid, DPT_Large)
MIDAS_MODEL_REPO - torch.hub repo for MiDaS (default intel-isl/MiDaS)
MIDAS_CACHE_DIR - MiDaS cache directory (default models/midas_cache)
DEPTH_ANYTHING_MODEL - Hugging Face model ID for Depth Anything V2 (default depth-anything/Depth-Anything-V2-Small-hf)
DEPTH_ANYTHING_CACHE_DIR - Depth Anything cache directory (default models/depth_anything_cache)
MIDAS_ONNX_MODEL_PATH - defaults to models/midas_small.onnx
MIDAS_ONNX_INPUT_SIZE – input size for MiDaS ONNX preprocessing (default: 384)
MIDAS_ONNX_PROVIDERS - comma separated ONNX Runtime providers for depth (falls back to ONNX_PROVIDERS)
ONNX_SHARED_PREPROCESSING – reuse one resize step for ONNX detector + depth when sizes align (default: true)
DETECTOR_BACKEND - torch (default) or onnx
TORCH_DEVICE - force PyTorch to use cuda:0, cpu, etc. (defaults to best available)
TORCH_HALF_PRECISION - auto (default), true, or false
MODEL_PATH (default models/yolo11n.pt) - default YOLO model path (used when no CLI flag is provided)
ONNX_MODEL_PATH - defaults to models/yolo11n.onnx
ONNX_OPSET - opset used during ONNX export (default: 18 via make export-onnx)
ONNX_SIMPLIFY - simplify the exported ONNX graph (true/false, default: true)
ONNX_PROVIDERS - comma separated list such as CUDAExecutionProvider,CPUExecutionProvider
DETECTOR_IMAGE_SIZE, DETECTOR_CONF_THRESHOLD, DETECTOR_IOU_THRESHOLD, DETECTOR_MAX_DETECTIONS, DETECTOR_NUM_CLASSES
TRACKING_IOU_THRESHOLD (default 0.1) - minimum IoU to match detection to track
TRACKING_MAX_FRAMES_WITHOUT_DETECTION (default 10) - frames before removing stale tracks
TRACKING_EARLY_TERMINATION_IOU (default 0.9) - early termination threshold for matching
TRACKING_CONFIDENCE_DECAY (default 0.1) - confidence decay per interpolation factor
TRACKING_MAX_HISTORY_SIZE (default 5) - size for history of each tracked object
DETECTION_THRESHOLD (default 2) - minimum detections before a track becomes active/sent
VIDEO_FILE_PATH (default video.mp4 relative to the /backend folder) - default video file path for the file WebRTC service
VIDEO_SOURCE_TYPE (default webcam) - video source for the streamer (webcam or file)
STREAMER_OFFER_URL (default http://localhost:8000/offer) - upstream offer URL for the analyzer
STUN_SERVER (default stun:stun.l.google.com:19302) - STUN server for WebRTC
ICE_GATHERING_TIMEOUT (default 5.0) - timeout for ICE gathering
CORS_ORIGINS (default *) - comma separated CORS origins
LOG_INTRINSICS (default false) - log resolved intrinsics at runtime
ANALYZER_SETTINGS_FILE - path to JSON settings file (default config/analyzer.json)

Check src/backend/common/config.py.

Analyzer settings file (JSON)

The analyzer can load a JSON settings file on startup. If the file does not exist, it falls back to the default config values.

Default path:

config/analyzer.json

Override the path:

ANALYZER_SETTINGS_FILE=/path/to/analyzer.json

Format:

JSON object where keys match the config names in src/backend/common/config.py.
Values in the JSON override the defaults and environment variables for the analyzer.

Example config/analyzer.json:

{
  "MODEL_PATH": "models/yolo11n.pt",
  "DETECTOR_BACKEND": "onnx",
  "DEPTH_BACKEND": "depth_anything_v2",
  "DETECTOR_CONF_THRESHOLD": 0.35,
  "TRACKING_IOU_THRESHOLD": 0.2
}

Calibrate depth and XYZ

Set camera intrinsics: if you have calibrated values, export them to env vars (pixels): CAMERA_FX, CAMERA_FY, CAMERA_CX, CAMERA_CY. If not, set approximate FOVs: CAMERA_FOV_X_DEG=78 CAMERA_FOV_Y_DEG=65 (defaults). Intrinsics are derived from the first frame size plus these values.
Calibrate scale for MiDaS: place a target at a known distance D_true straight ahead, read the reported distance D_est. Update SCALE_FACTOR using SCALE_FACTOR_new = SCALE_FACTOR_old * (D_true / D_est), then restart the analyzer. Repeat once or twice until Z is correct; X/Y will align automatically.
Optional: to log the intrinsics resolved at runtime, set LOG_INTRINSICS=true on the analyzer process.

IMPORTANT: Please read the CONTRIBUTING.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OptiBot

Quick Start (Windows)

What’s included

Run with Docker Compose (Linux only)

Run locally (All platforms)

Model Management

FP16 Quantization (Optional)

Environment Variables

Analyzer settings file (JSON)

Calibrate depth and XYZ

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

OptiBot

Quick Start (Windows)

What’s included

Run with Docker Compose (Linux only)

Run locally (All platforms)

Model Management

FP16 Quantization (Optional)

Environment Variables

Analyzer settings file (JSON)

Calibrate depth and XYZ