Skip to content

Commit b483999

Browse files
committed
Merge branch 'main' into feat/201-model-align
Signed-off-by: Felix Hilgers <felix.hilgers@fau.de>
2 parents cae6709 + e2d3c87 commit b483999

17 files changed

Lines changed: 914 additions & 98 deletions

File tree

README.md

Lines changed: 76 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,14 @@ make export-yolo-onnx
8484
make export-midas-onnx
8585
```
8686

87+
### FP16 Quantization (Optional)
88+
89+
Export models with FP16 precision for ~50% size reduction:
90+
91+
```bash
92+
ONNX_HALF_PRECISION=true make export-onnx
93+
```
94+
8795
To start the analyzer service with ONNX backend:
8896
```bash
8997
DETECTOR_BACKEND=onnx DEPTH_BACKEND=onnx make run-analyzer-local
@@ -113,32 +121,80 @@ Available CLI flags:
113121
### Environment Variables
114122

115123
Optional environment variables:
116-
- `CAMERA_INDEX` (default 0) – select webcam device
117-
- `REGION_SIZE` (default 5) – size of the central bounding box region where we take the mean of the depth map from (should be odd for symmetry)
118-
- `SCALE_FACTOR` (default 432.0) – scaling of the relative depth map generated by MiDaS (must be determined empirically)
119-
- `CAMERA_FX/FY/CX/CY` – intrinsic matrix entries in pixels (set these when you have calibrated your camera; overrides FOV-derived values)
120-
- `CAMERA_FOV_X_DEG/CAMERA_FOV_Y_DEG` – fallback field of view (used only when FX/FY are not provided)
121-
- `DEPTH_BACKEND``torch` (default), `onnx`, or `depth_anything_v2`
122-
- `MIDAS_MODEL_TYPE` – MiDaS variant to load (`MiDaS_small`, `DPT_Hybrid`, `DPT_Large`)
123-
- `MIDAS_MODEL_REPO` – torch.hub repo for MiDaS (default `intel-isl/MiDaS`)
124-
- `DEPTH_ANYTHING_MODEL` – Hugging Face model ID for Depth Anything V2 (default `depth-anything/Depth-Anything-V2-Small-hf`)
125-
- `MIDAS_ONNX_MODEL_PATH` – defaults to `models/midas_small.onnx`
124+
- `CAMERA_INDEX` (default 0) - select webcam device
125+
- `REGION_SIZE` (default 5) - size of the central bounding box region where we take the mean of the depth map from (should be odd for symmetry)
126+
- `SCALE_FACTOR` (default 432.0) - scaling of the relative depth map generated by MiDaS (must be determined empirically)
127+
- `UPDATE_FREQ` (default 2) - number of frames between depth updates
128+
- `TARGET_SCALE_INIT` (default 0.8) - initial downscale factor for images
129+
- `SMOOTH_FACTOR` (default 0.15) - smoothing factor for scale updates
130+
- `MIN_SCALE` (default 0.2) - minimum allowed scale
131+
- `MAX_SCALE` (default 1.0) - maximum allowed scale
132+
- `FPS_THRESHOLD` (default 15.0) - threshold FPS for skipping more frames
133+
- `DEPTH_ANYTHING_SCALE_FACTOR` (default 0.5) - tunable Depth Anything scale factor
134+
- `CAMERA_FX/FY/CX/CY` - intrinsic matrix entries in pixels (set these when you have calibrated your camera; overrides FOV-derived values)
135+
- `CAMERA_FOV_X_DEG/CAMERA_FOV_Y_DEG` - fallback field of view (used only when FX/FY are not provided)
136+
- `DEPTH_BACKEND` - `torch` (default), `onnx`, or `depth_anything_v2`
137+
- `MIDAS_MODEL_TYPE` - MiDaS variant to load (`MiDaS_small`, `DPT_Hybrid`, `DPT_Large`)
138+
- `MIDAS_MODEL_REPO` - torch.hub repo for MiDaS (default `intel-isl/MiDaS`)
139+
- `MIDAS_CACHE_DIR` - MiDaS cache directory (default `models/midas_cache`)
140+
- `DEPTH_ANYTHING_MODEL` - Hugging Face model ID for Depth Anything V2 (default `depth-anything/Depth-Anything-V2-Small-hf`)
141+
- `DEPTH_ANYTHING_CACHE_DIR` - Depth Anything cache directory (default `models/depth_anything_cache`)
142+
- `MIDAS_ONNX_MODEL_PATH` - defaults to `models/midas_small.onnx`
126143
- `MIDAS_ONNX_INPUT_SIZE` – input size for MiDaS ONNX preprocessing (default: `384`)
127-
- `MIDAS_ONNX_PROVIDERS` comma separated ONNX Runtime providers for depth (falls back to `ONNX_PROVIDERS`)
144+
- `MIDAS_ONNX_PROVIDERS` - comma separated ONNX Runtime providers for depth (falls back to `ONNX_PROVIDERS`)
128145
- `ONNX_SHARED_PREPROCESSING` – reuse one resize step for ONNX detector + depth when sizes align (default: `true`)
129-
- `DETECTOR_BACKEND``torch` (default) or `onnx`
130-
- `TORCH_DEVICE` – force PyTorch to use `cuda:0`, `cpu`, etc. (defaults to best available)
131-
- `TORCH_HALF_PRECISION``auto` (default), `true`, or `false`
132-
- `ONNX_MODEL_PATH` – defaults to `models/yolo11n.onnx`
133-
- `ONNX_OPSET` – opset used during ONNX export (default: 18 via `make export-onnx`)
134-
- `ONNX_SIMPLIFY` – simplify the exported ONNX graph (`true`/`false`, default: true)
135-
- `ONNX_PROVIDERS` – comma separated list such as `CUDAExecutionProvider,CPUExecutionProvider`
146+
- `DETECTOR_BACKEND` - `torch` (default) or `onnx`
147+
- `TORCH_DEVICE` - force PyTorch to use `cuda:0`, `cpu`, etc. (defaults to best available)
148+
- `TORCH_HALF_PRECISION` - `auto` (default), `true`, or `false`
149+
- `MODEL_PATH` (default `models/yolo11n.pt`) - default YOLO model path (used when no CLI flag is provided)
150+
- `ONNX_MODEL_PATH` - defaults to `models/yolo11n.onnx`
151+
- `ONNX_OPSET` - opset used during ONNX export (default: 18 via `make export-onnx`)
152+
- `ONNX_SIMPLIFY` - simplify the exported ONNX graph (`true`/`false`, default: true)
153+
- `ONNX_PROVIDERS` - comma separated list such as `CUDAExecutionProvider,CPUExecutionProvider`
136154
- `DETECTOR_IMAGE_SIZE`, `DETECTOR_CONF_THRESHOLD`, `DETECTOR_IOU_THRESHOLD`, `DETECTOR_MAX_DETECTIONS`, `DETECTOR_NUM_CLASSES`
137-
- `MODEL_PATH` (default `models/yolo11n.pt`) – default YOLO model path (used when no CLI flag is provided)
138-
- `VIDEO_FILE_PATH` (default `video.mp4` relative to the `/backend` folder) – default video file path for the file WebRTC service
155+
- `TRACKING_IOU_THRESHOLD` (default 0.1) - minimum IoU to match detection to track
156+
- `TRACKING_MAX_FRAMES_WITHOUT_DETECTION` (default 10) - frames before removing stale tracks
157+
- `TRACKING_EARLY_TERMINATION_IOU` (default 0.9) - early termination threshold for matching
158+
- `TRACKING_CONFIDENCE_DECAY` (default 0.1) - confidence decay per interpolation factor
159+
- `TRACKING_MAX_HISTORY_SIZE` (default 5) - size for history of each tracked object
160+
- `DETECTION_THRESHOLD` (default 2) - minimum detections before a track becomes active/sent
161+
- `VIDEO_FILE_PATH` (default `video.mp4` relative to the `/backend` folder) - default video file path for the file WebRTC service
162+
- `VIDEO_SOURCE_TYPE` (default `webcam`) - video source for the streamer (`webcam` or `file`)
163+
- `STREAMER_OFFER_URL` (default `http://localhost:8000/offer`) - upstream offer URL for the analyzer
164+
- `STUN_SERVER` (default `stun:stun.l.google.com:19302`) - STUN server for WebRTC
165+
- `ICE_GATHERING_TIMEOUT` (default 5.0) - timeout for ICE gathering
166+
- `CORS_ORIGINS` (default `*`) - comma separated CORS origins
167+
- `LOG_INTRINSICS` (default false) - log resolved intrinsics at runtime
168+
- `ANALYZER_SETTINGS_FILE` - path to JSON settings file (default `config/analyzer.json`)
139169

140170
> Check `src/backend/common/config.py`.
141171
172+
### Analyzer settings file (JSON)
173+
174+
The analyzer can load a JSON settings file on startup. If the file does not
175+
exist, it falls back to the default config values.
176+
177+
Default path:
178+
- `config/analyzer.json`
179+
180+
Override the path:
181+
- `ANALYZER_SETTINGS_FILE=/path/to/analyzer.json`
182+
183+
Format:
184+
- JSON object where keys match the config names in `src/backend/common/config.py`.
185+
- Values in the JSON override the defaults and environment variables for the analyzer.
186+
187+
Example `config/analyzer.json`:
188+
```json
189+
{
190+
"MODEL_PATH": "models/yolo11n.pt",
191+
"DETECTOR_BACKEND": "onnx",
192+
"DEPTH_BACKEND": "depth_anything_v2",
193+
"DETECTOR_CONF_THRESHOLD": 0.35,
194+
"TRACKING_IOU_THRESHOLD": 0.2
195+
}
196+
```
197+
142198

143199
### Calibrate depth and XYZ
144200
- Set camera intrinsics: if you have calibrated values, export them to env vars (pixels): `CAMERA_FX`, `CAMERA_FY`, `CAMERA_CX`, `CAMERA_CY`. If not, set approximate FOVs: `CAMERA_FOV_X_DEG=78 CAMERA_FOV_Y_DEG=65` (defaults). Intrinsics are derived from the first frame size plus these values.

config/analyzer.json

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"MODEL_PATH": "models/yolo11n.pt",
3+
"DETECTOR_BACKEND": "torch",
4+
"DEPTH_BACKEND": "torch",
5+
"DETECTOR_CONF_THRESHOLD": 0.25,
6+
"TRACKING_IOU_THRESHOLD": 0.1
7+
}

config/analyzer.json.license

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
SPDX-FileCopyrightText: 2025 robot-visual-perception
2+
3+
SPDX-License-Identifier: MIT

scripts/download_models.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,8 @@ def main() -> None:
164164
output_path=yolo_onnx_target,
165165
opset=args.onnx_opset,
166166
imgsz=config.DETECTOR_IMAGE_SIZE,
167-
simplify=args.onnx_simplify
167+
simplify=args.onnx_simplify,
168+
half=config.ONNX_HALF_PRECISION,
168169
)
169170

170171
# --- MiDaS Processing ---
@@ -199,6 +200,7 @@ def main() -> None:
199200
model_repo=args.midas_repo,
200201
opset=args.onnx_opset,
201202
input_size=config.MIDAS_ONNX_INPUT_SIZE,
203+
half=config.ONNX_HALF_PRECISION,
202204
)
203205

204206
# --- Depth Anything Processing ---

src/backend/analyzer/main.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@
2525
from fastapi.middleware.cors import CORSMiddleware
2626

2727
from common.config import config
28+
29+
config.apply_settings_file(config.ANALYZER_SETTINGS_FILE)
2830
from common.core.detector import get_detector
2931
from common.core.depth import get_depth_estimator
3032
from analyzer.routes import router, on_shutdown

src/backend/analyzer/manager.py

Lines changed: 36 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -274,15 +274,38 @@ async def _process_frames(self, source_track: MediaStreamTrack) -> None:
274274
target_scale=self.target_scale_init, source_track=source_track
275275
)
276276

277+
# Shared frame buffer coordinated via lock + event
278+
latest_frame: tuple[int, np.ndarray] | None = None
279+
frame_lock = asyncio.Lock()
280+
frame_ready = asyncio.Event()
281+
282+
async def frame_receiver() -> None:
283+
"""Continuously receive frames and store the latest one with its id."""
284+
nonlocal latest_frame
285+
while self.active_connections:
286+
frame_array = await self._receive_and_convert_frame(state)
287+
if frame_array is None:
288+
continue
289+
290+
async with frame_lock:
291+
latest_frame = (state.frame_id, frame_array)
292+
frame_ready.set()
293+
294+
receiver_task = asyncio.create_task(frame_receiver())
295+
277296
try:
278297
while self.active_connections:
279298
try:
280-
frame_array = await self._receive_and_convert_frame(state)
281-
if frame_array is None:
282-
continue
299+
# Wait until a new frame is available
300+
await frame_ready.wait()
301+
frame_ready.clear()
283302

284-
state.frame_id += 1
285-
state.fps_counter += 1
303+
async with frame_lock:
304+
if latest_frame is None:
305+
continue
306+
current_frame_id, frame_array = latest_frame
307+
state.frame_id = current_frame_id
308+
state.fps_counter += 1
286309

287310
state, current_time = self._update_fps_and_scaling(state)
288311
frame_small = resize_frame(frame_array, state.target_scale)
@@ -310,6 +333,12 @@ async def _process_frames(self, source_track: MediaStreamTrack) -> None:
310333
logger.warning("Frame processing cancelled")
311334
except Exception as e:
312335
logger.warning("Processing task error", extra={"error": str(e)})
336+
finally:
337+
receiver_task.cancel()
338+
try:
339+
await receiver_task
340+
except asyncio.CancelledError:
341+
pass
313342

314343
async def _receive_and_convert_frame(
315344
self, state: ProcessingState
@@ -328,6 +357,7 @@ async def _receive_and_convert_frame(
328357

329358
try:
330359
frame = await asyncio.wait_for(track.recv(), timeout=5.0)
360+
state.frame_id += 1
331361
state.consecutive_errors = 0
332362
except asyncio.TimeoutError:
333363
logger.warning("Frame receive timeout, skipping")
@@ -365,7 +395,7 @@ async def _receive_and_convert_frame(
365395
return None
366396

367397
try:
368-
frame_array = frame.to_ndarray(format="bgr24") # type: ignore[union-attr]
398+
frame_array = frame.to_ndarray(format="bgr24").copy() # type: ignore[union-attr]
369399
return frame_array
370400
except AttributeError:
371401
logger.warning(

src/backend/common/config.py

Lines changed: 43 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
# SPDX-FileCopyrightText: 2025 robot-visual-perception
22
#
33
# SPDX-License-Identifier: MIT
4+
import json
45
import os
5-
from typing import Optional
6+
from typing import Optional, Any
67
from pathlib import Path
78

89

@@ -81,6 +82,9 @@ class Config:
8182
# WebRTC settings
8283
STUN_SERVER: str = os.getenv("STUN_SERVER", "stun:stun.l.google.com:19302")
8384
ICE_GATHERING_TIMEOUT: float = float(os.getenv("ICE_GATHERING_TIMEOUT", "5.0"))
85+
ANALYZER_SETTINGS_FILE: Path = Path(
86+
os.getenv("ANALYZER_SETTINGS_FILE", "config/analyzer.json")
87+
)
8488

8589
# Analyzer mode (for analyzer.py)
8690
STREAMER_OFFER_URL: str = os.getenv(
@@ -107,6 +111,11 @@ class Config:
107111
DETECTOR_NUM_CLASSES: int = int(os.getenv("DETECTOR_NUM_CLASSES", "80"))
108112
TORCH_DEVICE: Optional[str] = os.getenv("TORCH_DEVICE")
109113
TORCH_HALF_PRECISION: str = os.getenv("TORCH_HALF_PRECISION", "auto")
114+
ONNX_HALF_PRECISION: bool = os.getenv("ONNX_HALF_PRECISION", "false").lower() in (
115+
"1",
116+
"true",
117+
"yes",
118+
)
110119
ONNX_PROVIDERS: list[str] = [
111120
provider.strip()
112121
for provider in os.getenv("ONNX_PROVIDERS", "").split(",")
@@ -115,6 +124,11 @@ class Config:
115124
ONNX_SHARED_PREPROCESSING: bool = os.getenv(
116125
"ONNX_SHARED_PREPROCESSING", "true"
117126
).lower() in ("1", "true", "yes")
127+
ONNX_IO_BINDING: bool = os.getenv("ONNX_IO_BINDING", "false").lower() in (
128+
"1",
129+
"true",
130+
"yes",
131+
)
118132

119133
# Tracking/interpolation settings
120134
# Minimum IoU to match detection to track
@@ -136,5 +150,33 @@ class Config:
136150
# Minimum detections before a track becomes active/sent
137151
DETECTION_THRESHOLD: int = int(os.getenv("DETECTION_THRESHOLD", "2"))
138152

153+
def apply_settings_file(self, path: Path | str | None) -> bool:
154+
"""Apply analyzer settings from a JSON file if present."""
155+
if not path:
156+
return False
157+
settings_path = Path(path)
158+
if not settings_path.is_file():
159+
return False
160+
with settings_path.open("r", encoding="utf-8") as handle:
161+
data = json.load(handle)
162+
if not isinstance(data, dict):
163+
raise ValueError("Analyzer settings file must contain a JSON object")
164+
for key, value in data.items():
165+
if not hasattr(self, key):
166+
continue
167+
current = getattr(self, key)
168+
setattr(self, key, _coerce_value(value, current))
169+
return True
170+
171+
172+
def _coerce_value(value: Any, current: Any) -> Any:
173+
if isinstance(current, Path):
174+
if value is None:
175+
return value
176+
return Path(value).expanduser().resolve()
177+
if isinstance(current, list) and isinstance(value, str):
178+
return [item.strip() for item in value.split(",") if item.strip()]
179+
return value
180+
139181

140182
config = Config()

src/backend/common/core/detector.py

Lines changed: 57 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
# SPDX-License-Identifier: MIT
44
import asyncio
55
from pathlib import Path
6-
from typing import Optional, Callable
6+
from typing import Optional, Callable, Any
77
import logging
88

99
import numpy as np
@@ -270,12 +270,14 @@ def __init__(self, model_path: Optional[Path] = None) -> None:
270270
self._iou = config.DETECTOR_IOU_THRESHOLD
271271
self._max_det = config.DETECTOR_MAX_DETECTIONS
272272
self._num_classes = config.DETECTOR_NUM_CLASSES
273+
self._use_io_binding = config.ONNX_IO_BINDING
274+
self._io_binding: Optional[Any] = None
275+
self._io_device_type, self._io_device_id = self._resolve_io_binding_device()
273276

274277
def predict(self, frame_rgb: np.ndarray) -> list[Detection]:
275278
"""Run ONNX Runtime inference and return scaled, filtered detections."""
276279
input_tensor, ratio, dwdh = self._prepare_input(frame_rgb)
277-
ort_inputs = {self._input_name: input_tensor}
278-
outputs = self._session.run(self._output_names, ort_inputs)[0]
280+
outputs = self._run_onnx(input_tensor)[0]
279281
h, w = frame_rgb.shape[:2]
280282
return self._postprocess(outputs, (h, w), ratio, dwdh)
281283

@@ -395,6 +397,58 @@ def _resolve_providers(self) -> list[str]:
395397
providers = [p for p in preferred if p in available]
396398
return providers or available
397399

400+
def _resolve_io_binding_device(self) -> tuple[str, int]:
401+
"""Pick the device type/id used for IO binding based on providers."""
402+
try:
403+
providers = self._session.get_providers()
404+
except Exception:
405+
return ("cpu", 0)
406+
provider_map = {
407+
"CUDAExecutionProvider": "cuda",
408+
"ROCMExecutionProvider": "rocm",
409+
"DmlExecutionProvider": "dml",
410+
}
411+
for provider in providers:
412+
if provider in provider_map:
413+
return (provider_map[provider], 0)
414+
return ("cpu", 0)
415+
416+
def _run_onnx(self, input_tensor: np.ndarray) -> list[np.ndarray]:
417+
"""Run ONNX Runtime inference with optional IO binding."""
418+
if not self._use_io_binding or self._io_device_type == "cpu":
419+
ort_inputs = {self._input_name: input_tensor}
420+
return self._session.run(self._output_names, ort_inputs)
421+
422+
if self._io_binding is None:
423+
self._io_binding = self._session.io_binding()
424+
425+
io_binding = self._io_binding
426+
if io_binding is None:
427+
ort_inputs = {self._input_name: input_tensor}
428+
return self._session.run(self._output_names, ort_inputs)
429+
430+
try:
431+
io_binding.clear_binding_inputs()
432+
io_binding.clear_binding_outputs()
433+
434+
ort_value = ort.OrtValue.ortvalue_from_numpy(
435+
input_tensor, self._io_device_type, self._io_device_id
436+
)
437+
io_binding.bind_ortvalue_input(self._input_name, ort_value)
438+
for output_name in self._output_names:
439+
io_binding.bind_output(
440+
output_name, self._io_device_type, self._io_device_id
441+
)
442+
self._session.run_with_iobinding(io_binding)
443+
return io_binding.copy_outputs_to_cpu()
444+
except Exception as exc:
445+
logger.warning(
446+
"IO binding failed, falling back to session.run",
447+
extra={"error": str(exc)},
448+
)
449+
ort_inputs = {self._input_name: input_tensor}
450+
return self._session.run(self._output_names, ort_inputs)
451+
398452

399453
# Register built-in backends
400454
register_detector_backend("torch", _TorchDetector)

0 commit comments

Comments
 (0)