Skip to content

Latest commit

 

History

History
158 lines (126 loc) · 6.11 KB

File metadata and controls

158 lines (126 loc) · 6.11 KB

KELPIE: PBSeg for UAV Defense — Product Requirements Document

Document Control

  • Project: project_kelpie
  • Codename: KELPIE
  • Wave: 10 (WARDOG)
  • Date: 2026-04-10
  • Owner: KELPIE agent
  • Paper: Prototype-Based Low Altitude UAV Semantic Segmentation (PBSeg) (arXiv:2604.01550)
  • Claimed code: github.com/zhangda1018/PBSeg

1. Executive Summary

KELPIE delivers a segmentation module for low-altitude UAV perception, adapted from PBSeg and integrated into ANIMA constraints:

  • YOLO stack baseline must use YOLO26.
  • All runnable paths must support --backend mlx|cuda|cpu|auto.
  • Training/eval must target UAV defense scenarios relevant to Shenzhen Robot Fair.

Because upstream code is not currently available in runnable form (empty repository), this PRD defines a reproducible in-house implementation path from paper method descriptions and a strict verification gate.

2. Problem Statement

Low-altitude UAV scenes have:

  • extreme scale variance (tiny drones/vehicles + large background structures),
  • cluttered boundaries and occlusions,
  • constrained edge compute.

We need segmentation quality good enough for downstream anti-UAV detection/tracking while preserving deployability.

3. Goals and Non-Goals

Goals

  • Implement PBSeg-style architecture primitives:
    • Prototype-Based Cross-Attention (PBCA)
    • Context-Aware Modulation (CAM)
    • Efficient multi-scale decoder with deformable conv fallback
  • Provide train/eval scaffolding on UAV segmentation data.
  • Integrate YOLO26 hooks for joint segmentation+detection workflows.
  • Keep dual-compute interface stable (MLX and CUDA).

Non-Goals

  • Reproducing exact paper SOTA numbers in Phase 1.
  • Building a complete production ROS2 deployment in this initial pass.
  • Modifying shared datasets outside read-only usage.

4. Paper Verification Outcome (Gate)

Verification details are in prds/01_verification_report.md.

Status summary:

  • Paper exists on arXiv and is readable end-to-end.
  • Reference GitHub repo exists but is effectively empty (size=0, single 2-line README commit).
  • Required datasets are partially discoverable on shared volume (visdrone found; others not yet found).
  • No independent reproductions found yet; citation count currently 0 on OpenAlex (expected for new preprint).

Decision:

  • Proceed with guarded implementation (not KILLED),
  • Flag for CTO review due missing runnable upstream code.

5. Users and Use Cases

Primary Users

  • ANIMA perception engineers integrating anti-UAV pipelines.
  • Research engineers reproducing/benchmarking low-altitude segmentation.

Key Use Cases

  • Train segmentation from internal + public UAV datasets.
  • Evaluate mIoU and class-wise IoU.
  • Produce masks to support YOLO26-guided candidate filtering and downstream tracking modules.

6. Functional Requirements

FR-1 Configuration

  • Load TOML config from configs/default.toml.
  • Expose backend selection by env var and CLI.

FR-2 Model

  • Implement torch PBSeg baseline with:
    • ResNet backbone,
    • CAM pyramid decoder,
    • PBCA decoder blocks,
    • mask/class heads.

FR-3 Dual Compute

  • Device adapter reports selected backend and device.
  • Scripts accept --backend and run on CPU/CUDA; MLX smoke path present.
  • Provide backend parity checker utility.

FR-4 Data

  • Support segmentation datasets in directory structure with images/ + masks/.
  • Synthetic dataset fallback for CI and smoke tests.

FR-5 Training/Eval

  • Provide CLI train script with checkpoints + JSON metrics output.
  • Provide eval script for mIoU/mF1/OA.

FR-6 YOLO26 Integration

  • Provide adapter to instantiate YOLO26 checkpoints where available.
  • Keep YOLO usage optional and isolated from segmentation training loop.

7. Non-Functional Requirements

  • Reproducibility: deterministic seeds and logged config snapshot.
  • Extensibility: modular blocks for PBCA/CAM and decoder variants.
  • Safety: no write operations to shared dataset volume.
  • Performance target (initial): functional forward pass < 300ms on 1024x1024 (single image, CUDA).

8. Architecture

See prds/02_architecture.md.

High-level flow:

  1. Input image -> backbone multi-scale features.
  2. CAM + pyramid decoder -> refined features.
  3. Query embeddings + PBCA layers -> mask/class predictions.
  4. Optional YOLO26 detections fused downstream.

9. Data Plan

See prds/03_data_and_benchmarks.md.

Required datasets:

  • Internal 1.8M UAV (primary training scale-up)
  • UAVid, UDD6 (paper baseline), plus VisDrone/UAVDT/DroneVehicle/SeaDronesSee for defense adaptation.

10. Delivery Phases

Phase 1 (this pass)

  • Verification report + CTO flag.
  • Full PRD/task docs.
  • Initial code scaffold and tests.

Phase 2

  • Reproduction attempt on UAVid/UDD6 once data + upstream details are complete.

Phase 3

  • Domain adaptation to internal 1.8M UAV + YOLO26-guided pipeline.

Phase 4

  • ROS2/Gazebo/Docker integration and benchmark hardening.

11. Risks

  • Missing upstream implementation details may reduce fidelity.
  • Dataset labeling conventions may differ across sources.
  • MLX parity for full model training may lag torch CUDA path.

Mitigations:

  • Keep modular implementation and explicit assumptions.
  • Add parity tests and configuration hashing.
  • Track unresolved ambiguities in NEXT_STEPS.md and CTO review file.

12. Acceptance Criteria (Phase 1)

  • Verification checklist completed and documented.
  • PRD + architecture + data + roadmap docs created.
  • pytest passes for config/device/model smoke tests.
  • python -m anima_kelpie --help and smoke forward run succeeds.
  • Backend parity utility exists and runs when both backends are present.

13. Dependencies

  • Python 3.10+
  • torch / torchvision
  • ultralytics (YOLO26-compatible checkpoints expected in environment)
  • optional MLX packages on Apple Silicon

14. Open Decisions

  • Exact UDD6 data source mirror for reproducibility package.
  • Whether to retain ResNet50 baseline or migrate to lighter backbones for edge in Phase 2.
  • Fusion strategy between segmentation masks and YOLO26 detections.