A Curated Collection of Vision-Language Navigation for Unmanned Aerial Vehicles
Organizing and showcasing state-of-the-art technologies in aerial robotics โ Vision-Language Navigation (VLN), Aerial Instruction Following, UAV-Environment Interaction, and LLM-driven Drone Navigation.
We believe autonomous aerial navigation will experience its own "ChatGPT moment."
| Section | Description | |
|---|---|---|
| ๐ | Survey | Comprehensive surveys and reviews on UAV-VLN |
| ๐ | UAV-VLN Models | Vision-Language Navigation for drones and aerial robots |
| ๐ง | End-to-End UAV-VLN Models | Trainable models that directly map vision+language to actions |
| ๐บ๏ธ | Semantic Cognitive Mapping & Zero Shot | LLM/VLM-based semantic map construction for zero-shot UAV navigation |
| ๐ | Benchmarks & Datasets | Evaluation benchmarks & metrics for UAV-VLN |
| ๐ฎ | Simulators | Simulation platforms for aerial robots |
| ๐ | Sim-to-Real | Bridging simulation and real-world deployment |
| ๐ | Related Works | Other awesome lists & related resources |
Note
Paper Ordering โ Within each year, papers are generally listed in reverse chronological order (newer papers appear lower). Particularly influential or representative works may be highlighted at the top regardless of date.
Tip
Contributing โ This repository is continuously updated! If you have papers, projects, or resources not yet included, please submit a Pull Request or open an Issue. Let's build a comprehensive resource for the aerial robotics and AI community!
Entry Template:
- **[ShortName] Full Paper Title** - Author et al. *Conference Year*
- ๐ [Paper](link) | ๐ป [Code](link) | ๐ฅ [Project](link)Comprehensive surveys and review papers covering the landscape of uav-vln
- [2026]Vision-Language Navigation for Aerial Robots:Towards the Era of Large Language Models
- ๐paper
- [2025]AeroVerse-Review: Comprehensive survey on aerial embodied vision-and-language navigation[paper]
A complete collection of UAV-VLN methods
-
[NavDreamer]NavDreamer: Video Models as Zero-Shot 3D Navigators - Huang et al.
-
[OnFly]OnFly: Onboard Zero-Shot Aerial Vision-Language Navigation toward Safety and Efficiency - Zheng et al.
-
[AerialVLA] AerialVLA: A Vision-Language-Action Model for UAV Navigation via Minimalist End-to-End Control - Xu et al.
-
[AutoFly]AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild - Sun et al. ICLR 2026
-
[Fly0]Fly0: Decoupling Semantic Grounding from Geometric Planning for Zero-Shot Aerial Navigation - Xu et al.
-
[AirHunt]AirHunt: Bridging VLM Semantics and Continuous Planning for Efficient Aerial Object Navigation - Chen et al.
- ๐ Paper
-
[APEX]APEX: A Decoupled Memory-based Explorer for Asynchronous Aerial Object Goal Navigation - Zhang et al. CVPR 2026
-
[IndoorUAV]IndoorUAV: Benchmarking Vision-Language UAV Navigation in Continuous Indoor Environments - Liu et al. AAAI 2026
-
[OpenFly]Openfly: A comprehensive platform for aerial vision-language navigation - Gao et al. ICLR 2026
-
[HETT] History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation - Ding et al. AAAI 2026
-
[LongFly]LongFly: Long-Horizon UAV Vision-and-Language Navigation with Spatiotemporal Context Integration - Jiang et al. ICLR 2026
- ๐ Paper
-
[See,Point,Fly] See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation - Hu et al. CoRL 2025
-
[CityNavAgent] CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory - Zhang et al. ACL 2025
-
[CityNav] Vision-and-Language Navigation for UAVs - Lee et al. ICCV 2025
-
[OpenUAV]Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology - Wang et al. ICLR 2025
- ๐ Paper | ๐ป Code
- ๐ UAV VLN platform featuring 6-DoF continuous trajectories, assistant-guided benchmark, and MLLM-based hierarchical navigation across 22 diverse scenes (urban, rural, forest, desert).
- ๐ง Method: End-to-end | Action: Discrete (DOF๏ผ4) | Backbone: Llama
- ๐ Env: AirSim + UE4 | Dataset:OpenUAV
-
[AeroDuo]AeroDuo: Aerial Duo for UAV-based Vision and Language Navigation - Wu et al. ACMM 2025
-
[NavAgent]NavAgent: Multi-scale Urban Street View Fusion For UAV Embodied Vision-and-Language Navigation - Liu et al.
- ๐ Paper
-
[PIVOT]PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs - Soroush et al.
-
[EmbodiedCity]EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment - Gao et al.
-
[TypeFly]TypeFly: Flying Drones with Large Language Model - Chen et al.
-
[AerialVLN] Vision-and-Language Navigation for UAVs - Zhang et al. ICCV 2023
-
[AVDN]Aerial Vision-and-Dialog Navigation - Fan et al. ACL 2023
Trainable models that directly map vision+language to actions
-
[AerialVLA] AerialVLA: A Vision-Language-Action Model for UAV Navigation via Minimalist End-to-End Control - Xu et al.
-
[AutoFly]AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild - Sun et al. ICLR 2026
-
[APEX]APEX: A Decoupled Memory-based Explorer for Asynchronous Aerial Object Goal Navigation - Zhang et al. CVPR 2026
-
[IndoorUAV]IndoorUAV: Benchmarking Vision-Language UAV Navigation in Continuous Indoor Environments - Liu et al. AAAI 2026
-
[OpenFly]Openfly: A comprehensive platform for aerial vision-language navigation - Gao et al. ICLR 2026
-
[HETT] History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation - Ding et al. AAAI 2026
-
[LongFly]LongFly: Long-Horizon UAV Vision-and-Language Navigation with Spatiotemporal Context Integration - Jiang et al. ICLR 2026
- ๐ Paper
-
[CityNav] Vision-and-Language Navigation for UAVs - Lee et al. ICCV 2025
-
[OpenUAV]Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology - Wang et al. ICLR 2025
-
[AeroDuo]AeroDuo: Aerial Duo for UAV-based Vision and Language Navigation - Wu et al. ACMM 2025
-
[NavAgent]NavAgent: Multi-scale Urban Street View Fusion For UAV Embodied Vision-and-Language Navigation - Liu et al.
- ๐ Paper
-
[EmbodiedCity]EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment - Gao et al.
-
[AerialVLN] Vision-and-Language Navigation for UAVs - Zhang et al. ICCV 2023
-
[AVDN]Aerial Vision-and-Dialog Navigation - Fan et al. ACL 2023
Methods that explicitly build semantic-aware spatial representations or propose a pipeline for zero-shot or training-free navigation.
-
[NavDreamer]NavDreamer: Video Models as Zero-Shot 3D Navigators - Huang et al.
- ๐ Paper
-
[OnFly]OnFly: Onboard Zero-Shot Aerial Vision-Language Navigation toward Safety and Efficiency - Zheng et al.
-
[Fly0]Fly0: Decoupling Semantic Grounding from Geometric Planning for Zero-Shot Aerial Navigation - Xu et al.
-
[AirHunt]AirHunt: Bridging VLM Semantics and Continuous Planning for Efficient Aerial Object Navigation - Chen et al.
- ๐ Paper
-
[See,Point,Fly] See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation - Hu et al. CoRL 2025
-
[CityNavAgent] CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory - Zhang et al. ACL 2025
-
[PIVOT]PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs - Soroush et al.
-
[TypeFly]TypeFly: Flying Drones with Large Language Model - Chen et al.
Evaluation benchmarks and datasets specific to UAV-VLN.
-
[OpenFly]Openfly: A comprehensive platform for aerial vision-language navigation - Gao et al. ICLR 2026
-
[CityNav] Vision-and-Language Navigation for UAVs - Lee et al. ICCV 2025
-
[OpenUAV]Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology - Wang et al. ICLR 2025
-
[AeroDuo]AeroDuo: Aerial Duo for UAV-based Vision and Language Navigation - Wu et al. ACMM 2025
-
[EmbodiedCity]EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment - Gao et al.
-
[AerialVLN] Vision-and-Language Navigation for UAVs - Zhang et al. ICCV 2023
-
[AVDN]Aerial Vision-and-Dialog Navigation - Fan et al. ACL 2023
| Dataset | Environment | Scale | Instruction Granularity | Sensor | Type | Notes |
|---|---|---|---|---|---|---|
| AerialVLN/S | 25 urban scenes | 8,446/3,916 | Step-by-step | RGB-D | Virtual | Early 3D, pilot traj |
| CityNav | Cambridge+Birm | 32,637 | Step-by-step | RGB-D | Real | human demo traj |
| OpenFly | 18 scenes | 100k | High-level | RGB-D,Lidar,PC | RealโVirtual | more engine, cross-scene |
| AVDN | Global satellite | 3,064 | Dialog-based / Mixed | RGB | Real | xView satellite |
| HaL-13k | OpenUAV | 13k | High-level | RGB, LiDAR,PC | Virtual | Multi-UAV |
| OpenUAV | 22 diverse scenes | 12k | Hybrid | RGB-D(5), LiDAR, IMU, GPS | Virtual | urban,rural,forest,desert |
Note
- Scale: Number of trajectories.
- Instruction Type: Level of instruction granularity
- Real/Virtual: Virtual (simulation only) / Real (real-world captured) / Hybrid (both).
- Notes: Licensing, access requirements, special features.
Simulation platforms for aerial robot navigation and interaction.
| Simulator | Virtual/Real | Engine | Environment | Key Features | Sensors |
|---|---|---|---|---|---|
| AerialVLN | Virtual | UE4 + AirSim | 25 urban scenes | First UAV-VLN sim, 870+ obj, 8 actions (4-DoF) | RGB-D |
| OpenUAV | Virtual | UE4 + AirSim | 22 suburban/natural | urban,rural,forest,desert, 6-DoF | RGB-D, IMU, GPS |
| OpenFly | RealโVirtual | UE4/5, GTA-V, GE, 3DGS | 18 scenes, 150+ kmยฒ | Unified API, cross-scene generalization | RGB, D |
| CityNav | Real (recon) | WebGL (Potree) | Cambridge+Birm (8.7kmยฒ) | 3D recon, GSM | RGB, D (PC) |
| Embodied City | Real (recon) | UE5 + AirSim | Beijing CBD | Dynamic (pedestrians, vehicles), high detail | RGB,Depth,Segmentation,IMU,GPS |
Note
- Virtual: Fully simulated environment (software only).
- Real: Real-world platform or dataset (hardware involved).
- RealโVirtual: Real-world data reconstructed into simulation.
Techniques and case studies for transferring policies from simulation to real-world drones.
๐ฌ Help Wanted: Contribute Sim-to-Real Resources
This area is currently under active development. If you know of relevant papers, projects, or case studies on Sim-to-Real for UAV Vision-Language Navigation, please consider contributing!
- ๐ Papers: Any work addressing the gap between simulation and real-world drone deployment in language-guided navigation.
- ๐ง Techniques: Domain randomization, sim-to-real adaptation, or system identification methods specifically for UAVs.
- ๐ฅ Real-world Demos: Videos or open-source code repositories demonstrating successful real-world transfer.
๐ Please submit via Issue or Pull Request. Thank you! ๐
Built with โค๏ธ for the UAV-VLN community.
Inspired by Awesome Embodied VLA / VA / VLN.