KELPIE: PBSeg for UAV Defense — Product Requirements Document

Document Control

Project: project_kelpie
Codename: KELPIE
Wave: 10 (WARDOG)
Date: 2026-04-10
Owner: KELPIE agent
Paper: Prototype-Based Low Altitude UAV Semantic Segmentation (PBSeg) (arXiv:2604.01550)
Claimed code: github.com/zhangda1018/PBSeg

1. Executive Summary

KELPIE delivers a segmentation module for low-altitude UAV perception, adapted from PBSeg and integrated into ANIMA constraints:

YOLO stack baseline must use YOLO26.
All runnable paths must support --backend mlx|cuda|cpu|auto.
Training/eval must target UAV defense scenarios relevant to Shenzhen Robot Fair.

Because upstream code is not currently available in runnable form (empty repository), this PRD defines a reproducible in-house implementation path from paper method descriptions and a strict verification gate.

2. Problem Statement

Low-altitude UAV scenes have:

extreme scale variance (tiny drones/vehicles + large background structures),
cluttered boundaries and occlusions,
constrained edge compute.

We need segmentation quality good enough for downstream anti-UAV detection/tracking while preserving deployability.

3. Goals and Non-Goals

Goals

Implement PBSeg-style architecture primitives:
- Prototype-Based Cross-Attention (PBCA)
- Context-Aware Modulation (CAM)
- Efficient multi-scale decoder with deformable conv fallback
Provide train/eval scaffolding on UAV segmentation data.
Integrate YOLO26 hooks for joint segmentation+detection workflows.
Keep dual-compute interface stable (MLX and CUDA).

Non-Goals

Reproducing exact paper SOTA numbers in Phase 1.
Building a complete production ROS2 deployment in this initial pass.
Modifying shared datasets outside read-only usage.

4. Paper Verification Outcome (Gate)

Verification details are in prds/01_verification_report.md.

Status summary:

Paper exists on arXiv and is readable end-to-end.
Reference GitHub repo exists but is effectively empty (size=0, single 2-line README commit).
Required datasets are partially discoverable on shared volume (visdrone found; others not yet found).
No independent reproductions found yet; citation count currently 0 on OpenAlex (expected for new preprint).

Decision:

Proceed with guarded implementation (not KILLED),
Flag for CTO review due missing runnable upstream code.

5. Users and Use Cases

Primary Users

ANIMA perception engineers integrating anti-UAV pipelines.
Research engineers reproducing/benchmarking low-altitude segmentation.

Key Use Cases

Train segmentation from internal + public UAV datasets.
Evaluate mIoU and class-wise IoU.
Produce masks to support YOLO26-guided candidate filtering and downstream tracking modules.

6. Functional Requirements

FR-1 Configuration

Load TOML config from configs/default.toml.
Expose backend selection by env var and CLI.

FR-2 Model

Implement torch PBSeg baseline with:
- ResNet backbone,
- CAM pyramid decoder,
- PBCA decoder blocks,
- mask/class heads.

FR-3 Dual Compute

Device adapter reports selected backend and device.
Scripts accept --backend and run on CPU/CUDA; MLX smoke path present.
Provide backend parity checker utility.

FR-4 Data

Support segmentation datasets in directory structure with images/ + masks/.
Synthetic dataset fallback for CI and smoke tests.

FR-5 Training/Eval

Provide CLI train script with checkpoints + JSON metrics output.
Provide eval script for mIoU/mF1/OA.

FR-6 YOLO26 Integration

Provide adapter to instantiate YOLO26 checkpoints where available.
Keep YOLO usage optional and isolated from segmentation training loop.

7. Non-Functional Requirements

Reproducibility: deterministic seeds and logged config snapshot.
Extensibility: modular blocks for PBCA/CAM and decoder variants.
Safety: no write operations to shared dataset volume.
Performance target (initial): functional forward pass < 300ms on 1024x1024 (single image, CUDA).

8. Architecture

See prds/02_architecture.md.

High-level flow:

Input image -> backbone multi-scale features.
CAM + pyramid decoder -> refined features.
Query embeddings + PBCA layers -> mask/class predictions.
Optional YOLO26 detections fused downstream.

9. Data Plan

See prds/03_data_and_benchmarks.md.

Required datasets:

Internal 1.8M UAV (primary training scale-up)
UAVid, UDD6 (paper baseline), plus VisDrone/UAVDT/DroneVehicle/SeaDronesSee for defense adaptation.

10. Delivery Phases

Phase 1 (this pass)

Verification report + CTO flag.
Full PRD/task docs.
Initial code scaffold and tests.

Phase 2

Reproduction attempt on UAVid/UDD6 once data + upstream details are complete.

Phase 3

Domain adaptation to internal 1.8M UAV + YOLO26-guided pipeline.

Phase 4

ROS2/Gazebo/Docker integration and benchmark hardening.

11. Risks

Missing upstream implementation details may reduce fidelity.
Dataset labeling conventions may differ across sources.
MLX parity for full model training may lag torch CUDA path.

Mitigations:

Keep modular implementation and explicit assumptions.
Add parity tests and configuration hashing.
Track unresolved ambiguities in NEXT_STEPS.md and CTO review file.

12. Acceptance Criteria (Phase 1)

Verification checklist completed and documented.
PRD + architecture + data + roadmap docs created.
pytest passes for config/device/model smoke tests.
python -m anima_kelpie --help and smoke forward run succeeds.
Backend parity utility exists and runs when both backends are present.

13. Dependencies

Python 3.10+
torch / torchvision
ultralytics (YOLO26-compatible checkpoints expected in environment)
optional MLX packages on Apple Silicon

14. Open Decisions

Exact UDD6 data source mirror for reproducibility package.
Whether to retain ResNet50 baseline or migrate to lighter backbones for edge in Phase 2.
Fusion strategy between segmentation masks and YOLO26 detections.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KELPIE: PBSeg for UAV Defense — Product Requirements Document

Document Control

1. Executive Summary

2. Problem Statement

3. Goals and Non-Goals

Goals

Non-Goals

4. Paper Verification Outcome (Gate)

5. Users and Use Cases

Primary Users

Key Use Cases

6. Functional Requirements

FR-1 Configuration

FR-2 Model

FR-3 Dual Compute

FR-4 Data

FR-5 Training/Eval

FR-6 YOLO26 Integration

7. Non-Functional Requirements

8. Architecture

9. Data Plan

10. Delivery Phases

Phase 1 (this pass)

Phase 2

Phase 3

Phase 4

11. Risks

12. Acceptance Criteria (Phase 1)

13. Dependencies

14. Open Decisions

FilesExpand file tree

PRD.md

Latest commit

History

PRD.md

File metadata and controls

KELPIE: PBSeg for UAV Defense — Product Requirements Document

Document Control

1. Executive Summary

2. Problem Statement

3. Goals and Non-Goals

Goals

Non-Goals

4. Paper Verification Outcome (Gate)

5. Users and Use Cases

Primary Users

Key Use Cases

6. Functional Requirements

FR-1 Configuration

FR-2 Model

FR-3 Dual Compute

FR-4 Data

FR-5 Training/Eval

FR-6 YOLO26 Integration

7. Non-Functional Requirements

8. Architecture

9. Data Plan

10. Delivery Phases

Phase 1 (this pass)

Phase 2

Phase 3

Phase 4

11. Risks

12. Acceptance Criteria (Phase 1)

13. Dependencies

14. Open Decisions