Skip to content

Latest commit

 

History

History
106 lines (80 loc) · 7.24 KB

File metadata and controls

106 lines (80 loc) · 7.24 KB

AGENTS.md

Summary

This is the best standalone reference for open-vocabulary detection plus configurable automatic snap collection. Use it when you need a frontend/backend app where the UI changes classes, thresholds, prompt sources, and snapping conditions while the backend remains authoritative over capture logic.

Use This Example When

  • You need open-vocabulary detection with text prompts, image prompts, and bbox-based prompting.
  • You want to auto-capture frames and metadata under configurable snap conditions.
  • You need a custom frontend that restores backend state on connect.
  • You want a stronger standalone app reference than the generic single-model scaffold.

Do Not Use This Example When

  • You only need a plain single-model inference example without UI.
  • You need host/peripheral support on RVC2 or RVC4 as the main path.
  • You need 3D measurement or pointcloud workflows.
  • You need similarity tracking rather than open-vocabulary prompting.

Quick Facts

  • Category: apps/data-collection
  • Shape: frontend
  • Primary task: open-vocabulary snap collection with configurable conditions
  • Entrypoint: backend/src/main.py
  • Standalone path: oakapp.toml
  • Frontend: frontend/src/App.tsx
  • Runs on: RVC4 standalone only
  • Requires: RVC4 device; static frontend build; bundled YOLOE model; backend YAML configs in backend/src/config/yaml_configs/
  • Input: live RGB camera by default, or media input via --media_path; text prompt, image prompt, or bbox prompt from the frontend
  • Output: encoded Video, Annotations, and saved snaps with metadata controlled by backend snapping logic
  • Models: yoloe_v8_l_fp16.RVC4.yaml
  • Visualizer / UI: custom static frontend served through the oakapp container stack

Read First

Architecture

  • The backend builds config from CLI args plus YAML files.
  • CameraSourceNode provides frames.
  • NNDetectionNode runs the open-vocabulary detector.
  • A tracker is built from backend/src/tracking/tracker_builder.py.
  • SnappingNode decides when snaps should be emitted based on detections and tracklets.
  • FrameCacheNode stores the latest frame for image-prompt workflows.
  • The backend registers services for class updates, threshold updates, image upload, bbox prompting, snap-condition updates, and config export.

Data Flow

  • camera or media -> CameraSourceNode -> NNDetectionNode -> Annotations
  • NN detections + tracker output -> SnappingNode -> snap events
  • latest frame cache + frontend prompt services -> detector prompt updates
  • backend state -> Get App Config Service -> frontend state restore

Modification Guide

  • Safe to change: default classes, thresholds, snap-condition defaults, frontend layout, config YAML values
  • Requires care: service payload contracts, state export format, tracker-to-snapping coupling, prompt encoder wiring
  • Likely to break if changed blindly: frontend restore behavior, bbox prompt coordinate handling, or condition naming shared between backend and frontend

Common Adaptations

Constraints

  • This example is intentionally RVC4 standalone only.
  • The backend uses serveFrontend=False, so the app depends on the static frontend build declared in oakapp.toml.
  • Runtime behavior is split across CLI args and YAML config files, so changing only one side may not do what you expect.
  • The frontend expects the backend to be the source of truth and rehydrates local UI state from Get App Config Service.

Non-Obvious Repo Conventions

  • The config service wraps the exported state under a data key, and the frontend parses that shape explicitly.
  • Prompt updates are service-driven, not topic-driven.
  • oakapp.toml bundles both the backend Python environment and the built frontend assets, so this is a stronger standalone reference than the default Visualizer apps.

Related Examples

Validation

  • Run: oakctl app run .
  • Success looks like: the frontend shows live video, class and threshold updates work, bbox/image prompts reach the backend, and snap-condition state restores correctly after reconnect
  • Common failure meaning: the static frontend was not built, the RVC4-only model/runtime assumptions were violated, or frontend/backend service contracts drifted