Skip to content

earth-insights/awesome-uav-vln

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 

Repository files navigation

๐Ÿš Awesome UAV-VLN

A Curated Collection of Vision-Language Navigation for Unmanned Aerial Vehicles

Awesome Papers Last Commit Stars Forks


Organizing and showcasing state-of-the-art technologies in aerial robotics โ€” Vision-Language Navigation (VLN), Aerial Instruction Following, UAV-Environment Interaction, and LLM-driven Drone Navigation.

We believe autonomous aerial navigation will experience its own "ChatGPT moment."



๐Ÿ“‹ Table of Contents

Section Description
๐Ÿ“š Survey Comprehensive surveys and reviews on UAV-VLN
๐Ÿš UAV-VLN Models Vision-Language Navigation for drones and aerial robots
๐Ÿง  End-to-End UAV-VLN Models Trainable models that directly map vision+language to actions
๐Ÿ—บ๏ธ Semantic Cognitive Mapping & Zero Shot LLM/VLM-based semantic map construction for zero-shot UAV navigation
๐Ÿ“Š Benchmarks & Datasets Evaluation benchmarks & metrics for UAV-VLN
๐ŸŽฎ Simulators Simulation platforms for aerial robots
๐Ÿ”„ Sim-to-Real Bridging simulation and real-world deployment
๐Ÿ”— Related Works Other awesome lists & related resources

Note

Paper Ordering โ€” Within each year, papers are generally listed in reverse chronological order (newer papers appear lower). Particularly influential or representative works may be highlighted at the top regardless of date.

Tip

Contributing โ€” This repository is continuously updated! If you have papers, projects, or resources not yet included, please submit a Pull Request or open an Issue. Let's build a comprehensive resource for the aerial robotics and AI community!

Entry Template:

- **[ShortName] Full Paper Title** - Author et al. *Conference Year*
  - ๐Ÿ“„ [Paper](link) | ๐Ÿ’ป [Code](link) | ๐ŸŽฅ [Project](link)

๐Ÿ“š Survey

Comprehensive surveys and review papers covering the landscape of uav-vln

  • [2026]Vision-Language Navigation for Aerial Robots:Towards the Era of Large Language Models
  • [2025]AeroVerse-Review: Comprehensive survey on aerial embodied vision-and-language navigation[paper]

๐Ÿš UAV-VLN Models

A complete collection of UAV-VLN methods

2026

  • [NavDreamer]NavDreamer: Video Models as Zero-Shot 3D Navigators - Huang et al.

    • ๐Ÿ“„ Paper | ๐ŸŒ Web
    • ๐Ÿ“Leverages video generation (Wan2.6) for future-view planning and (\pi^3) for trajectory waypoint extraction.
    • ๐Ÿ”ง Method: Zero Shot | Backbone: Qwen-VL3
  • [OnFly]OnFly: Onboard Zero-Shot Aerial Vision-Language Navigation toward Safety and Efficiency - Zheng et al.

    • ๐Ÿ“„ Paper | ๐Ÿ’ป Code
    • ๐Ÿ“ Leverages dual-agent decoupling, hybrid memory for progress monitoring, and semantic-geometric verification for safe goal refinement.
    • ๐Ÿ”ง Method: Zero Shot | Backbone:Qwen3-VL-4B-AWQ |๐ŸŒ Env๏ผšUE4+Airsim
  • [AerialVLA] AerialVLA: A Vision-Language-Action Model for UAV Navigation via Minimalist End-to-End Control - Xu et al.

    • ๐Ÿ“„ Paper | ๐Ÿ’ป Code
    • ๐Ÿ”ง Method: End-to-end | Backbone:OpenVLA |๐ŸŒ Env๏ผšOpenUAV
  • [AutoFly]AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild - Sun et al. ICLR 2026

    • ๐Ÿ“„ Paper | ๐Ÿ’ป Code
    • ๐Ÿ“ Integrates a pseudo-depth encoder to enhance spatial reasoning from RGB-only inputs for end-to-end VLA navigation.
    • ๐Ÿ”ง Method: End-to-end | Backbone:LLaMA2-7B
  • [Fly0]Fly0: Decoupling Semantic Grounding from Geometric Planning for Zero-Shot Aerial Navigation - Xu et al.

    • ๐Ÿ“„ Paper |๐Ÿ’ป Code
    • ๐Ÿ“ Decouples MLLM-based semantic grounding from geometric planning, using 2Dโ†’3D projection and Ego-Planner for zero-shot aerial navigation.
    • ๐Ÿ”ง Method: GSM
  • [AirHunt]AirHunt: Bridging VLM Semantics and Continuous Planning for Efficient Aerial Object Navigation - Chen et al.

  • [APEX]APEX: A Decoupled Memory-based Explorer for Asynchronous Aerial Object Goal Navigation - Zhang et al. CVPR 2026

    • ๐Ÿ“„ Paper | ๐Ÿ’ป Code
    • Employs a 3D dynamic spatio-semantic map with attraction, exploration, and obstacle channels, decoupled RL-based action decision, and parallel asynchronous execution for efficient aerial object goal navigation.
    • ๐Ÿ”ง Method: GSM+RL
  • [IndoorUAV]IndoorUAV: Benchmarking Vision-Language UAV Navigation in Continuous Indoor Environments - Liu et al. AAAI 2026

  • [OpenFly]Openfly: A comprehensive platform for aerial vision-language navigation - Gao et al. ICLR 2026

    • ๐Ÿ“„ Paper | ๐Ÿ’ป Code
    • ๐Ÿ“ First aerial VLN platform with automated data generation across multiple rendering engines and keyframe-aware agent.
    • ๐ŸŒ Env:OpenFly | Dataset:OpenFly
    • ๐Ÿ”ง Method: End-to-end | Action: Discrete: (DOF๏ผš4) | Backbone: Llama
  • [HETT] History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation - Ding et al. AAAI 2026

    • ๐Ÿ“„ Paper | ๐Ÿ’ป Code
    • ๐Ÿ”ง Method: End-to-end

2025

  • [LongFly]LongFly: Long-Horizon UAV Vision-and-Language Navigation with Spatiotemporal Context Integration - Jiang et al. ICLR 2026

  • [See,Point,Fly] See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation - Hu et al. CoRL 2025

  • [CityNavAgent] CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory - Zhang et al. ACL 2025

  • [CityNav] Vision-and-Language Navigation for UAVs - Lee et al. ICCV 2025

    • ๐Ÿ“„ Paper | ๐Ÿ’ป Code
    • ๐Ÿ“ First large-scale real-world dataset for aerial VLN.
    • ๐ŸŒ Env:CityFlight | Dataset: CityNav
    • ๐Ÿ’ก Novel:Geographic Semantic Map
  • [OpenUAV]Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology - Wang et al. ICLR 2025

    • ๐Ÿ“„ Paper | ๐Ÿ’ป Code
    • ๐Ÿ“ UAV VLN platform featuring 6-DoF continuous trajectories, assistant-guided benchmark, and MLLM-based hierarchical navigation across 22 diverse scenes (urban, rural, forest, desert).
    • ๐Ÿ”ง Method: End-to-end | Action: Discrete (DOF๏ผš4) | Backbone: Llama
    • ๐ŸŒ Env: AirSim + UE4 | Dataset:OpenUAV
  • [AeroDuo]AeroDuo: Aerial Duo for UAV-based Vision and Language Navigation - Wu et al. ACMM 2025

2024

  • [NavAgent]NavAgent: Multi-scale Urban Street View Fusion For UAV Embodied Vision-and-Language Navigation - Liu et al.

  • [PIVOT]PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs - Soroush et al.

  • [EmbodiedCity]EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment - Gao et al.

  • [TypeFly]TypeFly: Flying Drones with Large Language Model - Chen et al.

2023

  • [AerialVLN] Vision-and-Language Navigation for UAVs - Zhang et al. ICCV 2023

    • ๐Ÿ“„ Paper | ๐Ÿ’ป Code
    • ๐Ÿ“ Proposes the first large-scale UAV-VLN benchmark, including simulator, dataset, and baselines.
    • ๐Ÿ”ง Method: End-to-end | Action: Discrete (DOF๏ผš4) | Backbone: CMA
    • ๐ŸŒ Env: AirSim + UE4 | Dataset: AerialVLN/AerialVLN-S
  • [AVDN]Aerial Vision-and-Dialog Navigation - Fan et al. ACL 2023

๐Ÿง  End-to-End UAV-VLN Models

Trainable models that directly map vision+language to actions

  • [AerialVLA] AerialVLA: A Vision-Language-Action Model for UAV Navigation via Minimalist End-to-End Control - Xu et al.

  • [AutoFly]AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild - Sun et al. ICLR 2026

  • [APEX]APEX: A Decoupled Memory-based Explorer for Asynchronous Aerial Object Goal Navigation - Zhang et al. CVPR 2026

  • [IndoorUAV]IndoorUAV: Benchmarking Vision-Language UAV Navigation in Continuous Indoor Environments - Liu et al. AAAI 2026

  • [OpenFly]Openfly: A comprehensive platform for aerial vision-language navigation - Gao et al. ICLR 2026

  • [HETT] History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation - Ding et al. AAAI 2026

  • [LongFly]LongFly: Long-Horizon UAV Vision-and-Language Navigation with Spatiotemporal Context Integration - Jiang et al. ICLR 2026

  • [CityNav] Vision-and-Language Navigation for UAVs - Lee et al. ICCV 2025

  • [OpenUAV]Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology - Wang et al. ICLR 2025

  • [AeroDuo]AeroDuo: Aerial Duo for UAV-based Vision and Language Navigation - Wu et al. ACMM 2025

  • [NavAgent]NavAgent: Multi-scale Urban Street View Fusion For UAV Embodied Vision-and-Language Navigation - Liu et al.

  • [EmbodiedCity]EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment - Gao et al.

  • [AerialVLN] Vision-and-Language Navigation for UAVs - Zhang et al. ICCV 2023

  • [AVDN]Aerial Vision-and-Dialog Navigation - Fan et al. ACL 2023

๐Ÿ—บ๏ธ Semantic Cognitive Mapping & Zero Shot

Methods that explicitly build semantic-aware spatial representations or propose a pipeline for zero-shot or training-free navigation.

  • [NavDreamer]NavDreamer: Video Models as Zero-Shot 3D Navigators - Huang et al.

  • [OnFly]OnFly: Onboard Zero-Shot Aerial Vision-Language Navigation toward Safety and Efficiency - Zheng et al.

  • [Fly0]Fly0: Decoupling Semantic Grounding from Geometric Planning for Zero-Shot Aerial Navigation - Xu et al.

  • [AirHunt]AirHunt: Bridging VLM Semantics and Continuous Planning for Efficient Aerial Object Navigation - Chen et al.

  • [See,Point,Fly] See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation - Hu et al. CoRL 2025

  • [CityNavAgent] CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory - Zhang et al. ACL 2025

  • [PIVOT]PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs - Soroush et al.

  • [TypeFly]TypeFly: Flying Drones with Large Language Model - Chen et al.

๐Ÿ“Š Benchmarks & Datasets

Evaluation benchmarks and datasets specific to UAV-VLN.

  • [OpenFly]Openfly: A comprehensive platform for aerial vision-language navigation - Gao et al. ICLR 2026

  • [CityNav] Vision-and-Language Navigation for UAVs - Lee et al. ICCV 2025

  • [OpenUAV]Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology - Wang et al. ICLR 2025

  • [AeroDuo]AeroDuo: Aerial Duo for UAV-based Vision and Language Navigation - Wu et al. ACMM 2025

  • [EmbodiedCity]EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment - Gao et al.

  • [AerialVLN] Vision-and-Language Navigation for UAVs - Zhang et al. ICCV 2023

  • [AVDN]Aerial Vision-and-Dialog Navigation - Fan et al. ACL 2023

๐Ÿ“ฆ Datasets

Dataset Environment Scale Instruction Granularity Sensor Type Notes
AerialVLN/S 25 urban scenes 8,446/3,916 Step-by-step RGB-D Virtual Early 3D, pilot traj
CityNav Cambridge+Birm 32,637 Step-by-step RGB-D Real human demo traj
OpenFly 18 scenes 100k High-level RGB-D,Lidar,PC Realโ†’Virtual more engine, cross-scene
AVDN Global satellite 3,064 Dialog-based / Mixed RGB Real xView satellite
HaL-13k OpenUAV 13k High-level RGB, LiDAR,PC Virtual Multi-UAV
OpenUAV 22 diverse scenes 12k Hybrid RGB-D(5), LiDAR, IMU, GPS Virtual urban,rural,forest,desert

Note

  • Scale: Number of trajectories.
  • Instruction Type: Level of instruction granularity
  • Real/Virtual: Virtual (simulation only) / Real (real-world captured) / Hybrid (both).
  • Notes: Licensing, access requirements, special features.

๐ŸŽฎ Simulators

Simulation platforms for aerial robot navigation and interaction.

Simulator Virtual/Real Engine Environment Key Features Sensors
AerialVLN Virtual UE4 + AirSim 25 urban scenes First UAV-VLN sim, 870+ obj, 8 actions (4-DoF) RGB-D
OpenUAV Virtual UE4 + AirSim 22 suburban/natural urban,rural,forest,desert, 6-DoF RGB-D, IMU, GPS
OpenFly Realโ†’Virtual UE4/5, GTA-V, GE, 3DGS 18 scenes, 150+ kmยฒ Unified API, cross-scene generalization RGB, D
CityNav Real (recon) WebGL (Potree) Cambridge+Birm (8.7kmยฒ) 3D recon, GSM RGB, D (PC)
Embodied City Real (recon) UE5 + AirSim Beijing CBD Dynamic (pedestrians, vehicles), high detail RGB,Depth,Segmentation,IMU,GPS

Note

  • Virtual: Fully simulated environment (software only).
  • Real: Real-world platform or dataset (hardware involved).
  • Realโ†’Virtual: Real-world data reconstructed into simulation.

๐Ÿ”„ Sim-to-Real

Techniques and case studies for transferring policies from simulation to real-world drones.


๐Ÿ“ฌ Help Wanted: Contribute Sim-to-Real Resources

This area is currently under active development. If you know of relevant papers, projects, or case studies on Sim-to-Real for UAV Vision-Language Navigation, please consider contributing!

  • ๐Ÿ“„ Papers: Any work addressing the gap between simulation and real-world drone deployment in language-guided navigation.
  • ๐Ÿ”ง Techniques: Domain randomization, sim-to-real adaptation, or system identification methods specifically for UAVs.
  • ๐ŸŽฅ Real-world Demos: Videos or open-source code repositories demonstrating successful real-world transfer.

๐Ÿ‘‰ Please submit via Issue or Pull Request. Thank you! ๐Ÿ™


๐Ÿ”— Related Works


Built with โค๏ธ for the UAV-VLN community.
Inspired by Awesome Embodied VLA / VA / VLN.

About

A curated list of awesome resources for Vision-Language Navigation (VLN) in Unmanned Aerial Vehicles (UAVs) / Aerial Robotics.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors