- Title: Learning Motion Blur Robust Vision Transformers for Real-Time UAV Tracking
- ArXiv:
2407.05383 - URL: https://arxiv.org/abs/2407.05383
- Authors: You Wu, Xucheng Wang, Dan Zeng, Hengzhou Ye, Xiaolan Xie, Qijun Zhao, Shuiwang Li
ALMOST
Why not READY:
- The public repo is present but not reproducible.
- Shared dataset volume audit did not confirm the paper datasets locally.
- Exact training recipe inherits from Aba-ViTrack and must still be recovered.
| Model | Size | Source | Path on Server | Status |
|---|---|---|---|---|
deit_tiny_patch16_224 |
~5.8M params | timm / DeiT upstream | /mnt/forge-data/models/deit_tiny_patch16_224.safetensors |
MISSING |
vit_tiny_patch16_224 |
~5.7M params | timm / ViT upstream | /mnt/forge-data/models/vit_tiny_patch16_224.safetensors |
MISSING |
t2t_vit_t_14 |
lightweight ViT | T2T-ViT upstream | /mnt/forge-data/models/t2t_vit_t_14.safetensors |
MISSING |
yolo26m.pt |
detector backbone | internal YOLO26 baseline | /mnt/forge-data/models/yolo26/yolo26m.pt |
MISSING |
yolo26m-uav.pt |
detector adaptation | internal YOLO26 UAV fine-tune | /mnt/forge-data/models/yolo26/yolo26m-uav.pt |
MISSING |
| Dataset | Size | Split | Source | Path | Status |
|---|---|---|---|---|---|
MegaUAV-1.8M |
~1.8M images | train/val/test | internal | /Volumes/AIFlowDev/RobotFlowLabs/datasets/MegaUAV-1.8M |
MISSING |
UAV123 |
benchmark | test | public benchmark | /Volumes/AIFlowDev/RobotFlowLabs/datasets/UAV123 |
MISSING |
UAV123@10fps |
derived benchmark | test | downsampled from UAV123 |
/Volumes/AIFlowDev/RobotFlowLabs/datasets/UAV123_10fps |
MISSING |
VisDrone2018 |
benchmark | test | public benchmark | /Volumes/AIFlowDev/RobotFlowLabs/datasets/VisDrone2018 |
MISSING |
UAVDT |
benchmark | test | public benchmark | /Volumes/AIFlowDev/RobotFlowLabs/datasets/UAVDT |
MISSING |
DroneVehicle |
adaptation | train/val | public benchmark | /Volumes/AIFlowDev/RobotFlowLabs/datasets/DroneVehicle |
MISSING |
SeaDronesSee |
adaptation | train/val | public benchmark | /Volumes/AIFlowDev/RobotFlowLabs/datasets/SeaDronesSee |
MISSING |
UAVTrack112_L |
real-world eval | eval | paper real-world set | /Volumes/AIFlowDev/RobotFlowLabs/datasets/UAVTrack112_L |
MISSING |
| Param | Value | Paper Section |
|---|---|---|
template_size |
128 x 128 |
§4.1 |
search_size |
256 x 256 |
§4.1 |
loss_cls |
weighted focal loss | §3.4 |
loss_box |
L1 + GIoU |
§3.4 |
eta_iou |
2.0 |
Eq. 7 |
eta_l1 |
5.0 |
Eq. 7 |
rho |
1e-4 |
Eq. 7, Table 5 |
gamma |
1e3 |
Eq. 7 |
epsilon |
0.05 |
§3.3 |
exit_threshold |
0.95 |
§3.3 |
| Param | Status | Note |
|---|---|---|
optimizer |
MISSING | paper says training pipeline follows Aba-ViTrack |
batch_size |
MISSING | not restated in BDTrack text |
schedule |
MISSING | not restated in BDTrack text |
nenf |
MISSING | concept described, exact deployed value not exposed in paper text |
lambda_weight |
MISSING | equation includes lambda but text does not provide the final scalar |
tau |
MISSING | equation includes tau but final chosen constant is not stated in exposed paper text |
| Benchmark | Metric | Paper Value | Our Target |
|---|---|---|---|
| four-benchmark average | precision | 84.4 |
>= 83.0 bootstrap reproduction |
| four-benchmark average | success | 64.5 |
>= 63.0 bootstrap reproduction |
UAVDT |
precision | 84.1 |
>= 82.0 |
UAVDT |
success | 61.0 |
>= 59.0 |
VisDrone2018 |
precision | 85.2 |
>= 83.0 |
VisDrone2018 |
success | 64.3 |
>= 62.0 |
UAV123 |
precision | 84.8 |
>= 83.0 |
UAV123 |
success | 66.7 |
>= 65.0 |
UAV123@10fps |
precision | 83.5 |
>= 82.0 |
UAV123@10fps |
success | 65.9 |
>= 64.0 |
| efficiency | GPU FPS | 283.4 |
>= 240 on modern CUDA path |
- Public GitHub repo currently lacks the implementation needed for a direct reproduction.
- Paper implementation details partially inherit from Aba-ViTrack, so FOXHOUND must recover or re-derive missing training settings before Phase 2.
- YOLO26 is an adaptation layer for ANIMA deployment, not part of the original BDTrack paper.