This is the repository reference for two-stage 3D bounding-box estimation with Objectron. Use it when you need detect-then-crop-then-estimate 3D keypoints rather than a plain 2D detector or stereo depth baseline.
- You need a two-stage pipeline that turns 2D detections into 3D pose/keypoint estimates.
- You want a host-visualized reference for Objectron-style geometry output.
- You need camera or replay input support with a packaged standalone path.
- You need generic 2D detection only.
- You need stereo depth or spatial coordinates from a stereo pair.
- You need the broader task family described in the README rather than the current chair-only implementation.
Category:neural-networks/3D-detection/objectronShape:script+standalonePrimary task:2-stage chair detection plus 3D Objectron pose/keypoint estimationEntrypoint:main.pyStandalone path:oakapp.tomlFrontend:noneRuns on:RVC2 peripheral, RVC4 peripheral, and RVC4 standalone packagingRequires:camera input or replay media, packaged detection and Objectron modelsInput:camera frames by default orReplayVideothrough--media_pathOutput:VideoandPositionModels:depthai_models/objectron_chair.RVC2.yaml, depthai_models/objectron_chair.RVC4.yaml, plus YOLOv6 detector YAMLsVisualizer / UI:DepthAI Visualizer viadai.RemoteConnection
- A first-stage YOLOv6 detector runs on camera or replay input.
ImgDetectionsFilterkeeps onlyVALID_LABELS = [56], which is the COCOchairlabel in the current code.FrameCropperextracts padded chair crops for the Objectron model.- A second
ParsingNeuralNetworkruns theobjectron_chairmodel on those crops. GatherDatare-associates second-stage outputs with first-stage detections, and utils/annotation_node.py draws the 3D skeleton/pose overlay.
- Despite the README’s broader Objectron framing, the current repo state is chair-only.
PADDING = 0.2is baked into both crop generation and annotation reprojection.- Default FPS is intentionally low on RVC2 and lower than many detection examples because this is a two-stage geometry workflow.
- neural-networks/object-detection/spatial-detections: use this when you need stereo spatial detections instead of model-based 3D boxes
- neural-networks/generic-example: use this when you only need a single-model baseline
- neural-networks/pose-estimation/animal-pose: use this when you need detect-then-crop-then-keypoints without 3D box fitting
Run:python3 main.pySuccess looks like:the Visualizer showsVideoandPosition, and detected chairs receive 3D keypoint/box overlaysCommon failure meaning:the detector is not finding chairs, the wrong platform model is being used, or replay media does not match the expected frame type