This is the repository reference for person detection plus human pose estimation with Lite-HRNet. Use it when you need body keypoints on detected people rather than a generic detector.
- You need human pose estimation.
- You want a person-detection first stage with a swappable pose model.
- You need camera or replay input with packaged standalone support.
- You need hand or animal pose instead of human pose.
- You need a single-stage detector.
- You need multi-person tracking rather than per-frame pose overlays.
Category:neural-networks/pose-estimation/human-poseShape:script+standalonePrimary task:person detection plus human pose estimationEntrypoint:main.pyStandalone path:oakapp.tomlFrontend:noneRuns on:RVC2 peripheral, RVC4 peripheral, and RVC4 standalone packagingRequires:person detector and Lite-HRNet-style pose modelInput:camera frames by default orReplayVideovia--media_pathOutput:Video,Detections, andPoseModels:YOLOv6 and Lite-HRNet YAMLs in depthai_models/Visualizer / UI:DepthAI Visualizer viadai.RemoteConnection
- A person detector runs first.
ImgDetectionsFilteridentifies thepersonclass for the pose workflow.FrameCropperextracts padded person crops for Lite-HRNet.GatherDataand utils/annotation_node.py merge the keypoints and skeleton back to the original detections.
- The current code path is person-specific, even though the detector could emit other classes.
- The pose parser threshold is intentionally set to
0.0so the overlay node can do the filtering instead. - Replay sizing and crop padding affect downstream pose quality.
- neural-networks/pose-estimation/hand-pose: use this when you need hand landmarks and gesture logic
- neural-networks/pose-estimation/animal-pose: use this when you need animal pose
- neural-networks/reidentification/human-reidentification: use this when you need to identify tracked people or faces rather than estimate pose
Run:python3 main.pySuccess looks like:the Visualizer showsVideo,Detections, andPose, and visible people receive skeleton overlaysCommon failure meaning:the detector is not stably finding people, crop generation drifted, or the selected pose model does not match the parser assumptions