This is the best standalone reference for open-vocabulary detection plus configurable automatic snap collection. Use it when you need a frontend/backend app where the UI changes classes, thresholds, prompt sources, and snapping conditions while the backend remains authoritative over capture logic.
- You need open-vocabulary detection with text prompts, image prompts, and bbox-based prompting.
- You want to auto-capture frames and metadata under configurable snap conditions.
- You need a custom frontend that restores backend state on connect.
- You want a stronger standalone app reference than the generic single-model scaffold.
- You only need a plain single-model inference example without UI.
- You need host/peripheral support on RVC2 or RVC4 as the main path.
- You need 3D measurement or pointcloud workflows.
- You need similarity tracking rather than open-vocabulary prompting.
Category:apps/data-collectionShape:frontendPrimary task:open-vocabulary snap collection with configurable conditionsEntrypoint:backend/src/main.pyStandalone path:oakapp.tomlFrontend:frontend/src/App.tsxRuns on:RVC4 standalone onlyRequires:RVC4 device; static frontend build; bundled YOLOE model; backend YAML configs in backend/src/config/yaml_configs/Input:live RGB camera by default, or media input via--media_path; text prompt, image prompt, or bbox prompt from the frontendOutput:encodedVideo,Annotations, and saved snaps with metadata controlled by backend snapping logicModels:yoloe_v8_l_fp16.RVC4.yamlVisualizer / UI:custom static frontend served through the oakapp container stack
- backend/src/main.py: end-to-end pipeline and frontend service registration
- backend/src/config/system_configuration.py: how CLI args and YAMLs become runtime config
- backend/src/config/yaml_configs/config.yaml: baseline video, NN, and tracker defaults
- backend/src/config/yaml_configs/conditions.yaml: snap-condition defaults
- backend/src/config/yaml_configs/prompts_config.yaml: prompt-related defaults
- backend/src/nn/nn_detection_node.py: backend detection node
- backend/src/snapping/snapping_node.py: snap-condition handling
- backend/src/prompting/fe_services.py: frontend prompt service handlers
- frontend/src/App.tsx: stream, bbox prompt drawing, and config restore flow
- frontend/src/utils/classes/ClassSelector.tsx: text-class control
- frontend/src/utils/classes/ImageUploader.tsx: visual prompt upload
- frontend/src/utils/conditions/SnapConditionsPanel.tsx: snap-condition UI
- oakapp.toml: static frontend build and standalone packaging
- The backend builds config from CLI args plus YAML files.
CameraSourceNodeprovides frames.NNDetectionNoderuns the open-vocabulary detector.- A tracker is built from backend/src/tracking/tracker_builder.py.
SnappingNodedecides when snaps should be emitted based on detections and tracklets.FrameCacheNodestores the latest frame for image-prompt workflows.- The backend registers services for class updates, threshold updates, image upload, bbox prompting, snap-condition updates, and config export.
camera or media -> CameraSourceNode -> NNDetectionNode -> AnnotationsNN detections + tracker output -> SnappingNode -> snap eventslatest frame cache + frontend prompt services -> detector prompt updatesbackend state -> Get App Config Service -> frontend state restore
Safe to change:default classes, thresholds, snap-condition defaults, frontend layout, config YAML valuesRequires care:service payload contracts, state export format, tracker-to-snapping coupling, prompt encoder wiringLikely to break if changed blindly:frontend restore behavior, bbox prompt coordinate handling, or condition naming shared between backend and frontend
To change default runtime behavior:start in backend/src/config/yaml_configs/To add a new prompt source:extend backend/src/prompting/ and wire a matching control in frontend/src/App.tsxTo add a new snap condition:extend backend/src/snapping/conditions.py and frontend/src/utils/conditions/To reuse the open-vocabulary detector without snaps:keep the NN and prompting pieces and remove backend/src/snapping/snapping_node.py
- This example is intentionally RVC4 standalone only.
- The backend uses
serveFrontend=False, so the app depends on the static frontend build declared in oakapp.toml. - Runtime behavior is split across CLI args and YAML config files, so changing only one side may not do what you expect.
- The frontend expects the backend to be the source of truth and rehydrates local UI state from
Get App Config Service.
- The config service wraps the exported state under a
datakey, and the frontend parses that shape explicitly. - Prompt updates are service-driven, not topic-driven.
- oakapp.toml bundles both the backend Python environment and the built frontend assets, so this is a stronger standalone reference than the default Visualizer apps.
- apps/dino-tracking: use this when you need interactive tracking rather than open-vocabulary snapping
- apps/object-volume-measurement-3d: use this when you need object clicks and a richer 3D measurement backend
- custom-frontend/open-vocabulary-object-detection: use this when you want another open-vocabulary frontend/backend pattern
- neural-networks/generic-example: use this when you want the lighter single-model baseline
Run:oakctl app run .Success looks like:the frontend shows live video, class and threshold updates work, bbox/image prompts reach the backend, and snap-condition state restores correctly after reconnectCommon failure meaning:the static frontend was not built, the RVC4-only model/runtime assumptions were violated, or frontend/backend service contracts drifted