Real-time visual perception system for robots combining object detection (YOLO) and monocular depth estimation (MiDaS) over low-latency WebRTC streams.
For Windows users, we provide an automated setup script:
PowerShell -ExecutionPolicy Bypass -File .\start-optibot.ps1This script will:
- Install all required dependencies (Python 3.11, Node.js, Git, Make, uv)
- Build the project
- Start all services automatically
- Open your browser to http://localhost:3000
Requirements: Administrator privileges
For manual setup or other platforms, see the sections below.
- FastAPI + aiortc backend streaming webcam frames
- YOLOv8n inference (Ultralytics) on the server
- Rough monocular distance estimate per detection
- WebRTC DataChannel sending metadata to the client
- React + Vite frontend showing the remote stream and detection stats
Note: Camera access in Docker only works on Linux.
make docker-compose-upPrereqs: Python 3.11
- Install dependencies
make dev
- Start the webcam service
make run-streamer-webcam
Or alternatively, if you provide a .mp4 file in the /backend folder per default, you can start the file service instead
make run-streamer-file
- Start the analyzer service (separate terminal)
make run-analyzer-local
- Start the frontend service (separate terminal)
make run-frontend-local
Open the shown URL in your console.
# Download default models (YOLO pt, MiDaS cache)
make download-models
# Download and export to ONNX
make download-models-onnx
# Download individual models
make download-yolo
make download-midas
make download-depth-anything
# Export models to ONNX
make export-yolo-onnx
make export-midas-onnxExport models with FP16 precision for ~50% size reduction:
ONNX_HALF_PRECISION=true make export-onnxTo start the analyzer service with ONNX backend:
DETECTOR_BACKEND=onnx DEPTH_BACKEND=onnx make run-analyzer-localTo start the analyzer service with Depth Anything V2 backend:
DEPTH_BACKEND=depth_anything_v2 make run-analyzer-localExample production usage with custom model type:
# Set model type via environment variable
MIDAS_MODEL_TYPE=DPT_Hybrid \
cd src/backend && uv run python -m analyzer.cli \
--yolo-model-path ./models/yolo11n.pt \
--midas-model-path ./models/midas_cacheAvailable CLI flags:
--yolo-model-path: Path to YOLO model file (e.g.,yolo11n.pt;yolov8n.ptstill works)--midas-model-path: Path to MiDaS model cache directory--host: Host to bind to (default:0.0.0.0)--port: Port to bind to (default:8001)--reload: Enable auto-reload for development
Optional environment variables:
CAMERA_INDEX(default 0) - select webcam deviceREGION_SIZE(default 5) - size of the central bounding box region where we take the mean of the depth map from (should be odd for symmetry)SCALE_FACTOR(default 432.0) - scaling of the relative depth map generated by MiDaS (must be determined empirically)UPDATE_FREQ(default 2) - number of frames between depth updatesTARGET_SCALE_INIT(default 0.8) - initial downscale factor for imagesSMOOTH_FACTOR(default 0.15) - smoothing factor for scale updatesMIN_SCALE(default 0.2) - minimum allowed scaleMAX_SCALE(default 1.0) - maximum allowed scaleFPS_THRESHOLD(default 15.0) - threshold FPS for skipping more framesDEPTH_ANYTHING_SCALE_FACTOR(default 0.5) - tunable Depth Anything scale factorCAMERA_FX/FY/CX/CY- intrinsic matrix entries in pixels (set these when you have calibrated your camera; overrides FOV-derived values)CAMERA_FOV_X_DEG/CAMERA_FOV_Y_DEG- fallback field of view (used only when FX/FY are not provided)DEPTH_BACKEND-torch(default),onnx, ordepth_anything_v2MIDAS_MODEL_TYPE- MiDaS variant to load (MiDaS_small,DPT_Hybrid,DPT_Large)MIDAS_MODEL_REPO- torch.hub repo for MiDaS (defaultintel-isl/MiDaS)MIDAS_CACHE_DIR- MiDaS cache directory (defaultmodels/midas_cache)DEPTH_ANYTHING_MODEL- Hugging Face model ID for Depth Anything V2 (defaultdepth-anything/Depth-Anything-V2-Small-hf)DEPTH_ANYTHING_CACHE_DIR- Depth Anything cache directory (defaultmodels/depth_anything_cache)MIDAS_ONNX_MODEL_PATH- defaults tomodels/midas_small.onnxMIDAS_ONNX_INPUT_SIZE– input size for MiDaS ONNX preprocessing (default:384)MIDAS_ONNX_PROVIDERS- comma separated ONNX Runtime providers for depth (falls back toONNX_PROVIDERS)ONNX_SHARED_PREPROCESSING– reuse one resize step for ONNX detector + depth when sizes align (default:true)DETECTOR_BACKEND-torch(default) oronnxTORCH_DEVICE- force PyTorch to usecuda:0,cpu, etc. (defaults to best available)TORCH_HALF_PRECISION-auto(default),true, orfalseMODEL_PATH(defaultmodels/yolo11n.pt) - default YOLO model path (used when no CLI flag is provided)ONNX_MODEL_PATH- defaults tomodels/yolo11n.onnxONNX_OPSET- opset used during ONNX export (default: 18 viamake export-onnx)ONNX_SIMPLIFY- simplify the exported ONNX graph (true/false, default: true)ONNX_PROVIDERS- comma separated list such asCUDAExecutionProvider,CPUExecutionProviderDETECTOR_IMAGE_SIZE,DETECTOR_CONF_THRESHOLD,DETECTOR_IOU_THRESHOLD,DETECTOR_MAX_DETECTIONS,DETECTOR_NUM_CLASSESTRACKING_IOU_THRESHOLD(default 0.1) - minimum IoU to match detection to trackTRACKING_MAX_FRAMES_WITHOUT_DETECTION(default 10) - frames before removing stale tracksTRACKING_EARLY_TERMINATION_IOU(default 0.9) - early termination threshold for matchingTRACKING_CONFIDENCE_DECAY(default 0.1) - confidence decay per interpolation factorTRACKING_MAX_HISTORY_SIZE(default 5) - size for history of each tracked objectDETECTION_THRESHOLD(default 2) - minimum detections before a track becomes active/sentVIDEO_FILE_PATH(defaultvideo.mp4relative to the/backendfolder) - default video file path for the file WebRTC serviceVIDEO_SOURCE_TYPE(defaultwebcam) - video source for the streamer (webcamorfile)STREAMER_OFFER_URL(defaulthttp://localhost:8000/offer) - upstream offer URL for the analyzerSTUN_SERVER(defaultstun:stun.l.google.com:19302) - STUN server for WebRTCICE_GATHERING_TIMEOUT(default 5.0) - timeout for ICE gatheringCORS_ORIGINS(default*) - comma separated CORS originsLOG_INTRINSICS(default false) - log resolved intrinsics at runtimeANALYZER_SETTINGS_FILE- path to JSON settings file (defaultconfig/analyzer.json)
Check
src/backend/common/config.py.
The analyzer can load a JSON settings file on startup. If the file does not exist, it falls back to the default config values.
Default path:
config/analyzer.json
Override the path:
ANALYZER_SETTINGS_FILE=/path/to/analyzer.json
Format:
- JSON object where keys match the config names in
src/backend/common/config.py. - Values in the JSON override the defaults and environment variables for the analyzer.
Example config/analyzer.json:
{
"MODEL_PATH": "models/yolo11n.pt",
"DETECTOR_BACKEND": "onnx",
"DEPTH_BACKEND": "depth_anything_v2",
"DETECTOR_CONF_THRESHOLD": 0.35,
"TRACKING_IOU_THRESHOLD": 0.2
}- Set camera intrinsics: if you have calibrated values, export them to env vars (pixels):
CAMERA_FX,CAMERA_FY,CAMERA_CX,CAMERA_CY. If not, set approximate FOVs:CAMERA_FOV_X_DEG=78 CAMERA_FOV_Y_DEG=65(defaults). Intrinsics are derived from the first frame size plus these values. - Calibrate scale for MiDaS: place a target at a known distance
D_truestraight ahead, read the reported distanceD_est. UpdateSCALE_FACTORusingSCALE_FACTOR_new = SCALE_FACTOR_old * (D_true / D_est), then restart the analyzer. Repeat once or twice until Z is correct; X/Y will align automatically. - Optional: to log the intrinsics resolved at runtime, set
LOG_INTRINSICS=trueon the analyzer process.
IMPORTANT: Please read the
CONTRIBUTING.md.