- Project:
project_kelpie - Codename:
KELPIE - Wave:
10 (WARDOG) - Date:
2026-04-10 - Owner:
KELPIE agent - Paper:
Prototype-Based Low Altitude UAV Semantic Segmentation (PBSeg)(arXiv:2604.01550) - Claimed code: github.com/zhangda1018/PBSeg
KELPIE delivers a segmentation module for low-altitude UAV perception, adapted from PBSeg and integrated into ANIMA constraints:
- YOLO stack baseline must use YOLO26.
- All runnable paths must support
--backend mlx|cuda|cpu|auto. - Training/eval must target UAV defense scenarios relevant to Shenzhen Robot Fair.
Because upstream code is not currently available in runnable form (empty repository), this PRD defines a reproducible in-house implementation path from paper method descriptions and a strict verification gate.
Low-altitude UAV scenes have:
- extreme scale variance (tiny drones/vehicles + large background structures),
- cluttered boundaries and occlusions,
- constrained edge compute.
We need segmentation quality good enough for downstream anti-UAV detection/tracking while preserving deployability.
- Implement PBSeg-style architecture primitives:
- Prototype-Based Cross-Attention (PBCA)
- Context-Aware Modulation (CAM)
- Efficient multi-scale decoder with deformable conv fallback
- Provide train/eval scaffolding on UAV segmentation data.
- Integrate YOLO26 hooks for joint segmentation+detection workflows.
- Keep dual-compute interface stable (MLX and CUDA).
- Reproducing exact paper SOTA numbers in Phase 1.
- Building a complete production ROS2 deployment in this initial pass.
- Modifying shared datasets outside read-only usage.
Verification details are in prds/01_verification_report.md.
Status summary:
- Paper exists on arXiv and is readable end-to-end.
- Reference GitHub repo exists but is effectively empty (
size=0, single 2-line README commit). - Required datasets are partially discoverable on shared volume (
visdronefound; others not yet found). - No independent reproductions found yet; citation count currently 0 on OpenAlex (expected for new preprint).
Decision:
- Proceed with guarded implementation (not KILLED),
- Flag for CTO review due missing runnable upstream code.
- ANIMA perception engineers integrating anti-UAV pipelines.
- Research engineers reproducing/benchmarking low-altitude segmentation.
- Train segmentation from internal + public UAV datasets.
- Evaluate mIoU and class-wise IoU.
- Produce masks to support YOLO26-guided candidate filtering and downstream tracking modules.
- Load TOML config from
configs/default.toml. - Expose backend selection by env var and CLI.
- Implement torch PBSeg baseline with:
- ResNet backbone,
- CAM pyramid decoder,
- PBCA decoder blocks,
- mask/class heads.
- Device adapter reports selected backend and device.
- Scripts accept
--backendand run on CPU/CUDA; MLX smoke path present. - Provide backend parity checker utility.
- Support segmentation datasets in directory structure with
images/+masks/. - Synthetic dataset fallback for CI and smoke tests.
- Provide CLI train script with checkpoints + JSON metrics output.
- Provide eval script for mIoU/mF1/OA.
- Provide adapter to instantiate YOLO26 checkpoints where available.
- Keep YOLO usage optional and isolated from segmentation training loop.
- Reproducibility: deterministic seeds and logged config snapshot.
- Extensibility: modular blocks for PBCA/CAM and decoder variants.
- Safety: no write operations to shared dataset volume.
- Performance target (initial): functional forward pass < 300ms on 1024x1024 (single image, CUDA).
High-level flow:
- Input image -> backbone multi-scale features.
- CAM + pyramid decoder -> refined features.
- Query embeddings + PBCA layers -> mask/class predictions.
- Optional YOLO26 detections fused downstream.
See prds/03_data_and_benchmarks.md.
Required datasets:
- Internal 1.8M UAV (primary training scale-up)
- UAVid, UDD6 (paper baseline), plus VisDrone/UAVDT/DroneVehicle/SeaDronesSee for defense adaptation.
- Verification report + CTO flag.
- Full PRD/task docs.
- Initial code scaffold and tests.
- Reproduction attempt on UAVid/UDD6 once data + upstream details are complete.
- Domain adaptation to internal 1.8M UAV + YOLO26-guided pipeline.
- ROS2/Gazebo/Docker integration and benchmark hardening.
- Missing upstream implementation details may reduce fidelity.
- Dataset labeling conventions may differ across sources.
- MLX parity for full model training may lag torch CUDA path.
Mitigations:
- Keep modular implementation and explicit assumptions.
- Add parity tests and configuration hashing.
- Track unresolved ambiguities in
NEXT_STEPS.mdand CTO review file.
- Verification checklist completed and documented.
- PRD + architecture + data + roadmap docs created.
pytestpasses for config/device/model smoke tests.python -m anima_kelpie --helpand smoke forward run succeeds.- Backend parity utility exists and runs when both backends are present.
- Python 3.10+
- torch / torchvision
- ultralytics (YOLO26-compatible checkpoints expected in environment)
- optional MLX packages on Apple Silicon
- Exact UDD6 data source mirror for reproducibility package.
- Whether to retain ResNet50 baseline or migrate to lighter backbones for edge in Phase 2.
- Fusion strategy between segmentation masks and YOLO26 detections.