This example is the best RVC4 reference for preserving detail when the target object is small in the frame. It compares three strategies side-by-side: naive low-resolution face detection, two-stage person-then-face detection on high-resolution crops, and brute-force tiling over the high-resolution image.
- You need a two-stage detector pipeline that crops from a higher-resolution source.
- You want a reference for comparing naive inference against crop-based or tiling-based approaches.
- You need custom host nodes for selecting detections, switching outputs, and merging tiled results.
- You want a standalone-only RVC4 pipeline that stays close to the default Visualizer runtime.
- You need host/peripheral support on RVC2.
- You need a finished custom frontend as the primary runtime path.
- You need generic object detection rather than a detail-preserving face pipeline comparison.
- You need a minimal example without multiple branches and custom host nodes.
Category:apps/focused-visionShape:frontendPrimary task:detail-preserving face detection via high-res cropping and tilingEntrypoint:backend/src/main.pyStandalone path:oakapp.tomlFrontend:static frontend packaged from frontend/src/App.tsxRuns on:RVC4 standalone onlyRequires:RVC4 device with RGB camera; high-resolution RGB throughput; bundled models in backend/src/depthai_models/Input:one RGB camera stream split into high-resolution and low-resolution branchesOutput:640x640 RGB,NN detections,Non-Focus Head Crops,Focused Vision Head Crops, andFocused with TilingModels:scrfd-person-detection.yaml and yunet.yamlVisualizer / UI:packaged custom frontend backed by default DepthAI Visualizer topics
- backend/src/main.py: all three comparison branches and published topics
- backend/src/pipeline_builders.py: RGB split, cropper, and encoder helper builders
- backend/src/host_nodes/pick_largest_bbox.py: largest-detection selection
- backend/src/host_nodes/face_detection_from_gathered_data.py: 2-stage detection remapping logic
- backend/src/host_nodes/merge_img_detections.py: tiled detection merging
- backend/src/arguments.py: CLI arguments
- frontend/src/App.tsx: intended comparison UI layout
- frontend/src/constants.ts: topic-group assumptions in the frontend source
- oakapp.toml: standalone runtime path and packaged static frontend build
- The backend builds one RGB camera source and splits it into high-resolution and low-resolution outputs.
- The naive branch runs face detection directly on low-resolution RGB and crops the largest detected face.
- The 2-stage branch runs person detection, crops a high-resolution person region, remaps coordinates, then runs face detection on that crop.
- The tiling branch slices the high-resolution image into overlapping tiles, runs face detection on each tile, merges detections, and crops the largest face result.
- Custom host nodes provide black fallback frames when no detections exist so the comparison layout remains stable.
RGB low-res -> face detector -> largest bbox -> cropper -> Non-Focus Head CropsRGB low-res -> person detector -> crop high-res person -> face detector -> remap -> cropper -> Focused Vision Head CropsRGB high-res -> Tiling -> per-tile face detector -> merge + filter -> cropper -> Focused with TilingRGB low-res -> H264 encoder -> 640x640 RGB
Safe to change:topic names, crop padding, confidence thresholds, tile overlap, comparison labelsRequires care:high-res versus low-res output sizes, coordinate remapping, tile merge behavior, host-node fallback behaviorLikely to break if changed blindly:the three-branch comparison symmetry, output topic names expected by the frontend source, and tile count versus throughput assumptions
To change the person or face detector:swap the model YAMLs under backend/src/depthai_models/ and keep the same branch structureTo reuse only the 2-stage path:keep the person-detection, crop, remap, and face-detection pieces from backend/src/main.pyTo experiment with higher or lower throughput:adjustHIGH_RES_*,LOW_RES_*, and--fps_limitTo build a real custom UI:start from frontend/src/App.tsx and keep its topic contracts aligned with the backend outputs
- The backend exits on RVC2; this is intentionally RVC4-only.
- The default FPS limit is forced to
13if not provided. - backend/src/arguments.py defines
--media_pathand--api_key, but the current backend does not use either argument. - frontend/src/MessageInput.tsx still references a
Custom Servicethat the backend does not register.
- Although
INDEX.mdclassifies this as a frontend app, most of the runtime state still flows through standard Visualizer topics rather than custom backend services. - The branch comparison uses custom host nodes heavily, so this is not a good "minimal on-device only" reference.
- High-resolution cropping is the main idea to preserve object detail; the face detector input size stays small even when the source image is large.
- apps/qr-tiling: use this when you want a live tiling UI and runtime tile control
- apps/dino-tracking: use this when you want another RVC4 standalone frontend-heavy app with backend state synchronization
- tutorials/full-fov-nn: use this when the main topic is field-of-view and resolution handling rather than multi-branch comparison
- neural-networks/face-detection/head-posture-detection: use this when the main goal is face-related inference rather than detail-preserving capture
Run:oakctl app run .Success looks like:the Visualizer shows the RGB preview plus three comparison outputs for naive, 2-stage, and tiling approachesCommon failure meaning:the app is running on a non-RVC4 platform, the expected bundled models are missing, or the frontend topic expectations no longer match the backend outputs