Skip to content

Commit f3a2c56

Browse files
committed
Merge branch 'main' into feat/178-quant
Signed-off-by: Felix Hilgers <felix.hilgers@fau.de>
2 parents 71958d3 + a14d93e commit f3a2c56

14 files changed

Lines changed: 784 additions & 90 deletions

File tree

README.md

Lines changed: 68 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -121,30 +121,78 @@ Available CLI flags:
121121
### Environment Variables
122122

123123
Optional environment variables:
124-
- `CAMERA_INDEX` (default 0) – select webcam device
125-
- `REGION_SIZE` (default 5) – size of the central bounding box region where we take the mean of the depth map from (should be odd for symmetry)
126-
- `SCALE_FACTOR` (default 432.0) – scaling of the relative depth map generated by MiDaS (must be determined empirically)
127-
- `CAMERA_FX/FY/CX/CY` – intrinsic matrix entries in pixels (set these when you have calibrated your camera; overrides FOV-derived values)
128-
- `CAMERA_FOV_X_DEG/CAMERA_FOV_Y_DEG` – fallback field of view (used only when FX/FY are not provided)
129-
- `DEPTH_BACKEND``torch` (default), `onnx`, or `depth_anything_v2`
130-
- `MIDAS_MODEL_TYPE` – MiDaS variant to load (`MiDaS_small`, `DPT_Hybrid`, `DPT_Large`)
131-
- `MIDAS_MODEL_REPO` – torch.hub repo for MiDaS (default `intel-isl/MiDaS`)
132-
- `DEPTH_ANYTHING_MODEL` – Hugging Face model ID for Depth Anything V2 (default `depth-anything/Depth-Anything-V2-Small-hf`)
133-
- `MIDAS_ONNX_MODEL_PATH` – defaults to `models/midas_small.onnx`
134-
- `MIDAS_ONNX_PROVIDERS` – comma separated ONNX Runtime providers for depth (falls back to `ONNX_PROVIDERS`)
135-
- `DETECTOR_BACKEND``torch` (default) or `onnx`
136-
- `TORCH_DEVICE` – force PyTorch to use `cuda:0`, `cpu`, etc. (defaults to best available)
137-
- `TORCH_HALF_PRECISION``auto` (default), `true`, or `false`
138-
- `ONNX_MODEL_PATH` – defaults to `models/yolo11n.onnx`
139-
- `ONNX_OPSET` – opset used during ONNX export (default: 18 via `make export-onnx`)
140-
- `ONNX_SIMPLIFY` – simplify the exported ONNX graph (`true`/`false`, default: true)
141-
- `ONNX_PROVIDERS` – comma separated list such as `CUDAExecutionProvider,CPUExecutionProvider`
124+
- `CAMERA_INDEX` (default 0) - select webcam device
125+
- `REGION_SIZE` (default 5) - size of the central bounding box region where we take the mean of the depth map from (should be odd for symmetry)
126+
- `SCALE_FACTOR` (default 432.0) - scaling of the relative depth map generated by MiDaS (must be determined empirically)
127+
- `UPDATE_FREQ` (default 2) - number of frames between depth updates
128+
- `TARGET_SCALE_INIT` (default 0.8) - initial downscale factor for images
129+
- `SMOOTH_FACTOR` (default 0.15) - smoothing factor for scale updates
130+
- `MIN_SCALE` (default 0.2) - minimum allowed scale
131+
- `MAX_SCALE` (default 1.0) - maximum allowed scale
132+
- `FPS_THRESHOLD` (default 15.0) - threshold FPS for skipping more frames
133+
- `DEPTH_ANYTHING_SCALE_FACTOR` (default 0.5) - tunable Depth Anything scale factor
134+
- `CAMERA_FX/FY/CX/CY` - intrinsic matrix entries in pixels (set these when you have calibrated your camera; overrides FOV-derived values)
135+
- `CAMERA_FOV_X_DEG/CAMERA_FOV_Y_DEG` - fallback field of view (used only when FX/FY are not provided)
136+
- `DEPTH_BACKEND` - `torch` (default), `onnx`, or `depth_anything_v2`
137+
- `MIDAS_MODEL_TYPE` - MiDaS variant to load (`MiDaS_small`, `DPT_Hybrid`, `DPT_Large`)
138+
- `MIDAS_MODEL_REPO` - torch.hub repo for MiDaS (default `intel-isl/MiDaS`)
139+
- `MIDAS_CACHE_DIR` - MiDaS cache directory (default `models/midas_cache`)
140+
- `DEPTH_ANYTHING_MODEL` - Hugging Face model ID for Depth Anything V2 (default `depth-anything/Depth-Anything-V2-Small-hf`)
141+
- `DEPTH_ANYTHING_CACHE_DIR` - Depth Anything cache directory (default `models/depth_anything_cache`)
142+
- `MIDAS_ONNX_MODEL_PATH` - defaults to `models/midas_small.onnx`
143+
- `MIDAS_ONNX_PROVIDERS` - comma separated ONNX Runtime providers for depth (falls back to `ONNX_PROVIDERS`)
144+
- `DETECTOR_BACKEND` - `torch` (default) or `onnx`
145+
- `TORCH_DEVICE` - force PyTorch to use `cuda:0`, `cpu`, etc. (defaults to best available)
146+
- `TORCH_HALF_PRECISION` - `auto` (default), `true`, or `false`
147+
- `MODEL_PATH` (default `models/yolo11n.pt`) - default YOLO model path (used when no CLI flag is provided)
148+
- `ONNX_MODEL_PATH` - defaults to `models/yolo11n.onnx`
149+
- `ONNX_OPSET` - opset used during ONNX export (default: 18 via `make export-onnx`)
150+
- `ONNX_SIMPLIFY` - simplify the exported ONNX graph (`true`/`false`, default: true)
151+
- `ONNX_PROVIDERS` - comma separated list such as `CUDAExecutionProvider,CPUExecutionProvider`
142152
- `DETECTOR_IMAGE_SIZE`, `DETECTOR_CONF_THRESHOLD`, `DETECTOR_IOU_THRESHOLD`, `DETECTOR_MAX_DETECTIONS`, `DETECTOR_NUM_CLASSES`
143-
- `MODEL_PATH` (default `models/yolo11n.pt`) – default YOLO model path (used when no CLI flag is provided)
144-
- `VIDEO_FILE_PATH` (default `video.mp4` relative to the `/backend` folder) – default video file path for the file WebRTC service
153+
- `TRACKING_IOU_THRESHOLD` (default 0.1) - minimum IoU to match detection to track
154+
- `TRACKING_MAX_FRAMES_WITHOUT_DETECTION` (default 10) - frames before removing stale tracks
155+
- `TRACKING_EARLY_TERMINATION_IOU` (default 0.9) - early termination threshold for matching
156+
- `TRACKING_CONFIDENCE_DECAY` (default 0.1) - confidence decay per interpolation factor
157+
- `TRACKING_MAX_HISTORY_SIZE` (default 5) - size for history of each tracked object
158+
- `DETECTION_THRESHOLD` (default 2) - minimum detections before a track becomes active/sent
159+
- `VIDEO_FILE_PATH` (default `video.mp4` relative to the `/backend` folder) - default video file path for the file WebRTC service
160+
- `VIDEO_SOURCE_TYPE` (default `webcam`) - video source for the streamer (`webcam` or `file`)
161+
- `STREAMER_OFFER_URL` (default `http://localhost:8000/offer`) - upstream offer URL for the analyzer
162+
- `STUN_SERVER` (default `stun:stun.l.google.com:19302`) - STUN server for WebRTC
163+
- `ICE_GATHERING_TIMEOUT` (default 5.0) - timeout for ICE gathering
164+
- `CORS_ORIGINS` (default `*`) - comma separated CORS origins
165+
- `LOG_INTRINSICS` (default false) - log resolved intrinsics at runtime
166+
- `ANALYZER_SETTINGS_FILE` - path to JSON settings file (default `config/analyzer.json`)
145167

146168
> Check `src/backend/common/config.py`.
147169
170+
### Analyzer settings file (JSON)
171+
172+
The analyzer can load a JSON settings file on startup. If the file does not
173+
exist, it falls back to the default config values.
174+
175+
Default path:
176+
- `config/analyzer.json`
177+
178+
Override the path:
179+
- `ANALYZER_SETTINGS_FILE=/path/to/analyzer.json`
180+
181+
Format:
182+
- JSON object where keys match the config names in `src/backend/common/config.py`.
183+
- Values in the JSON override the defaults and environment variables for the analyzer.
184+
185+
Example `config/analyzer.json`:
186+
```json
187+
{
188+
"MODEL_PATH": "models/yolo11n.pt",
189+
"DETECTOR_BACKEND": "onnx",
190+
"DEPTH_BACKEND": "depth_anything_v2",
191+
"DETECTOR_CONF_THRESHOLD": 0.35,
192+
"TRACKING_IOU_THRESHOLD": 0.2
193+
}
194+
```
195+
148196

149197
### Calibrate depth and XYZ
150198
- Set camera intrinsics: if you have calibrated values, export them to env vars (pixels): `CAMERA_FX`, `CAMERA_FY`, `CAMERA_CX`, `CAMERA_CY`. If not, set approximate FOVs: `CAMERA_FOV_X_DEG=78 CAMERA_FOV_Y_DEG=65` (defaults). Intrinsics are derived from the first frame size plus these values.

config/analyzer.json

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"MODEL_PATH": "models/yolo11n.pt",
3+
"DETECTOR_BACKEND": "torch",
4+
"DEPTH_BACKEND": "torch",
5+
"DETECTOR_CONF_THRESHOLD": 0.25,
6+
"TRACKING_IOU_THRESHOLD": 0.1
7+
}

config/analyzer.json.license

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
SPDX-FileCopyrightText: 2025 robot-visual-perception
2+
3+
SPDX-License-Identifier: MIT

src/backend/analyzer/main.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@
2525
from fastapi.middleware.cors import CORSMiddleware
2626

2727
from common.config import config
28+
29+
config.apply_settings_file(config.ANALYZER_SETTINGS_FILE)
2830
from common.core.detector import get_detector
2931
from common.core.depth import get_depth_estimator
3032
from analyzer.routes import router, on_shutdown

src/backend/analyzer/manager.py

Lines changed: 36 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -259,15 +259,38 @@ async def _process_frames(self, source_track: MediaStreamTrack) -> None:
259259
target_scale=self.target_scale_init, source_track=source_track
260260
)
261261

262+
# Shared frame buffer coordinated via lock + event
263+
latest_frame: tuple[int, np.ndarray] | None = None
264+
frame_lock = asyncio.Lock()
265+
frame_ready = asyncio.Event()
266+
267+
async def frame_receiver() -> None:
268+
"""Continuously receive frames and store the latest one with its id."""
269+
nonlocal latest_frame
270+
while self.active_connections:
271+
frame_array = await self._receive_and_convert_frame(state)
272+
if frame_array is None:
273+
continue
274+
275+
async with frame_lock:
276+
latest_frame = (state.frame_id, frame_array)
277+
frame_ready.set()
278+
279+
receiver_task = asyncio.create_task(frame_receiver())
280+
262281
try:
263282
while self.active_connections:
264283
try:
265-
frame_array = await self._receive_and_convert_frame(state)
266-
if frame_array is None:
267-
continue
284+
# Wait until a new frame is available
285+
await frame_ready.wait()
286+
frame_ready.clear()
268287

269-
state.frame_id += 1
270-
state.fps_counter += 1
288+
async with frame_lock:
289+
if latest_frame is None:
290+
continue
291+
current_frame_id, frame_array = latest_frame
292+
state.frame_id = current_frame_id
293+
state.fps_counter += 1
271294

272295
state, current_time = self._update_fps_and_scaling(state)
273296
frame_small = resize_frame(frame_array, state.target_scale)
@@ -290,6 +313,12 @@ async def _process_frames(self, source_track: MediaStreamTrack) -> None:
290313
logger.warning("Frame processing cancelled")
291314
except Exception as e:
292315
logger.warning("Processing task error", extra={"error": str(e)})
316+
finally:
317+
receiver_task.cancel()
318+
try:
319+
await receiver_task
320+
except asyncio.CancelledError:
321+
pass
293322

294323
async def _receive_and_convert_frame(
295324
self, state: ProcessingState
@@ -308,6 +337,7 @@ async def _receive_and_convert_frame(
308337

309338
try:
310339
frame = await asyncio.wait_for(track.recv(), timeout=5.0)
340+
state.frame_id += 1
311341
state.consecutive_errors = 0
312342
except asyncio.TimeoutError:
313343
logger.warning("Frame receive timeout, skipping")
@@ -345,7 +375,7 @@ async def _receive_and_convert_frame(
345375
return None
346376

347377
try:
348-
frame_array = frame.to_ndarray(format="bgr24") # type: ignore[union-attr]
378+
frame_array = frame.to_ndarray(format="bgr24").copy() # type: ignore[union-attr]
349379
return frame_array
350380
except AttributeError:
351381
logger.warning(

src/backend/common/config.py

Lines changed: 38 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
# SPDX-FileCopyrightText: 2025 robot-visual-perception
22
#
33
# SPDX-License-Identifier: MIT
4+
import json
45
import os
5-
from typing import Optional
6+
from typing import Optional, Any
67
from pathlib import Path
78

89

@@ -80,6 +81,9 @@ class Config:
8081
# WebRTC settings
8182
STUN_SERVER: str = os.getenv("STUN_SERVER", "stun:stun.l.google.com:19302")
8283
ICE_GATHERING_TIMEOUT: float = float(os.getenv("ICE_GATHERING_TIMEOUT", "5.0"))
84+
ANALYZER_SETTINGS_FILE: Path = Path(
85+
os.getenv("ANALYZER_SETTINGS_FILE", "config/analyzer.json")
86+
)
8387

8488
# Analyzer mode (for analyzer.py)
8589
STREAMER_OFFER_URL: str = os.getenv(
@@ -116,6 +120,11 @@ class Config:
116120
for provider in os.getenv("ONNX_PROVIDERS", "").split(",")
117121
if provider.strip()
118122
]
123+
ONNX_IO_BINDING: bool = os.getenv("ONNX_IO_BINDING", "false").lower() in (
124+
"1",
125+
"true",
126+
"yes",
127+
)
119128

120129
# Tracking/interpolation settings
121130
# Minimum IoU to match detection to track
@@ -137,5 +146,33 @@ class Config:
137146
# Minimum detections before a track becomes active/sent
138147
DETECTION_THRESHOLD: int = int(os.getenv("DETECTION_THRESHOLD", "2"))
139148

149+
def apply_settings_file(self, path: Path | str | None) -> bool:
150+
"""Apply analyzer settings from a JSON file if present."""
151+
if not path:
152+
return False
153+
settings_path = Path(path)
154+
if not settings_path.is_file():
155+
return False
156+
with settings_path.open("r", encoding="utf-8") as handle:
157+
data = json.load(handle)
158+
if not isinstance(data, dict):
159+
raise ValueError("Analyzer settings file must contain a JSON object")
160+
for key, value in data.items():
161+
if not hasattr(self, key):
162+
continue
163+
current = getattr(self, key)
164+
setattr(self, key, _coerce_value(value, current))
165+
return True
166+
167+
168+
def _coerce_value(value: Any, current: Any) -> Any:
169+
if isinstance(current, Path):
170+
if value is None:
171+
return value
172+
return Path(value).expanduser().resolve()
173+
if isinstance(current, list) and isinstance(value, str):
174+
return [item.strip() for item in value.split(",") if item.strip()]
175+
return value
176+
140177

141178
config = Config()

src/backend/common/core/detector.py

Lines changed: 57 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
# SPDX-License-Identifier: MIT
44
import asyncio
55
from pathlib import Path
6-
from typing import Optional, Callable
6+
from typing import Optional, Callable, Any
77
import logging
88

99
import numpy as np
@@ -228,12 +228,14 @@ def __init__(self, model_path: Optional[Path] = None) -> None:
228228
self._iou = config.DETECTOR_IOU_THRESHOLD
229229
self._max_det = config.DETECTOR_MAX_DETECTIONS
230230
self._num_classes = config.DETECTOR_NUM_CLASSES
231+
self._use_io_binding = config.ONNX_IO_BINDING
232+
self._io_binding: Optional[Any] = None
233+
self._io_device_type, self._io_device_id = self._resolve_io_binding_device()
231234

232235
def predict(self, frame_rgb: np.ndarray) -> list[Detection]:
233236
"""Run ONNX Runtime inference and return scaled, filtered detections."""
234237
input_tensor, ratio, dwdh = self._prepare_input(frame_rgb)
235-
ort_inputs = {self._input_name: input_tensor}
236-
outputs = self._session.run(self._output_names, ort_inputs)[0]
238+
outputs = self._run_onnx(input_tensor)[0]
237239
h, w = frame_rgb.shape[:2]
238240
return self._postprocess(outputs, (h, w), ratio, dwdh)
239241

@@ -336,6 +338,58 @@ def _resolve_providers(self) -> list[str]:
336338
providers = [p for p in preferred if p in available]
337339
return providers or available
338340

341+
def _resolve_io_binding_device(self) -> tuple[str, int]:
342+
"""Pick the device type/id used for IO binding based on providers."""
343+
try:
344+
providers = self._session.get_providers()
345+
except Exception:
346+
return ("cpu", 0)
347+
provider_map = {
348+
"CUDAExecutionProvider": "cuda",
349+
"ROCMExecutionProvider": "rocm",
350+
"DmlExecutionProvider": "dml",
351+
}
352+
for provider in providers:
353+
if provider in provider_map:
354+
return (provider_map[provider], 0)
355+
return ("cpu", 0)
356+
357+
def _run_onnx(self, input_tensor: np.ndarray) -> list[np.ndarray]:
358+
"""Run ONNX Runtime inference with optional IO binding."""
359+
if not self._use_io_binding or self._io_device_type == "cpu":
360+
ort_inputs = {self._input_name: input_tensor}
361+
return self._session.run(self._output_names, ort_inputs)
362+
363+
if self._io_binding is None:
364+
self._io_binding = self._session.io_binding()
365+
366+
io_binding = self._io_binding
367+
if io_binding is None:
368+
ort_inputs = {self._input_name: input_tensor}
369+
return self._session.run(self._output_names, ort_inputs)
370+
371+
try:
372+
io_binding.clear_binding_inputs()
373+
io_binding.clear_binding_outputs()
374+
375+
ort_value = ort.OrtValue.ortvalue_from_numpy(
376+
input_tensor, self._io_device_type, self._io_device_id
377+
)
378+
io_binding.bind_ortvalue_input(self._input_name, ort_value)
379+
for output_name in self._output_names:
380+
io_binding.bind_output(
381+
output_name, self._io_device_type, self._io_device_id
382+
)
383+
self._session.run_with_iobinding(io_binding)
384+
return io_binding.copy_outputs_to_cpu()
385+
except Exception as exc:
386+
logger.warning(
387+
"IO binding failed, falling back to session.run",
388+
extra={"error": str(exc)},
389+
)
390+
ort_inputs = {self._input_name: input_tensor}
391+
return self._session.run(self._output_names, ort_inputs)
392+
339393

340394
# Register built-in backends
341395
register_detector_backend("torch", _TorchDetector)

0 commit comments

Comments
 (0)