UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission Generation

UPD from the authors: We are a bit surprised about the popularity of this paper, so the code and data are about to be refactored for more convenient format.

The new code based on a new advanced framework will be released with a new paper.

Oleg

This repository is for the research paper accepted in Proc. ACM/IEEE Int. Conf. on Human Robot Interaction (HRI 2025)

Abstract

The UAV-VLA (Visual-Language-Action) system is a tool designed to facilitate communication with aerial robots. By integrating satellite imagery processing with the Visual Language Model (VLM) and the powerful capabilities of GPT, UAV-VLA enables users to generate general flight paths-and-action plans through simple text requests. This system leverages the rich contextual information provided by satellite images, allowing for enhanced decision-making and mission planning. The combination of visual analysis by VLM and natural language processing by GPT can provide the user with the path-and-action set, making aerial operations more efficient and accessible. The newly developed method showed the difference in the length of the created trajectory in 22% and the mean error in finding the objects of interest on a map in 34.22 m by Euclidean distance in the K-Nearest Neighbors (KNN) approach.

https://arxiv.org/abs/2501.05014

This repository includes:

The implementation of the UAV-VLA framework.
Dataset and benchmark details.
Code for simulation-based experiments in Mission Planner.

UAV-VLA Framework

Benchmark

The images of the benchmark are stored in the folder benchmark-UAV-VLPA-nano-30/images. The metadata files are benchmark-UAV-VLPA-nano-30/img_lat_long_data.txt and benchmark-UAV-VLPA-nano-30/parsed_coordinates.csv.

Installation

To install requirements, run

pip install -r requirements.txt

!12GB VRAM minimum

Export your ChatGpt api key

export api_key="your chatgpt ap_key"

Mission generation

To generate commands for UAV add your API key for ChatGPT in the generate_plans.py, then run

python3 generate_plans.py

It will produce the commands and store the text files in the folder /created_missions and visualizations of the identified points on the benchmark images in the folder /identified_new_data.

As a result of this script, you will also find the total computational time time of the UAV-VLA system which is approximately 5 minutes and 24 seconds.

Path-Plans Creation

To see the results of VLM on the benchmark, run

python3 run_vlm.py

Some examples of the path generated can be seen below:

Experimental Results

To view the experimental results, you need to run the main.py script. This script automates the entire process of generating coordinates, calculating trajectory lengths, and producing visualizations.

Navigate into the folder experiments/, run:

python3 main.py

What Happens When You run main.py:

Generate Home Positions
Generate VLM Coordinates
Generate MP Coordinates
Calculate Trajectory Lengths
Calculate RMSE (Root Mean Square Error)
Plot Results
Generate Identified Images: The script generates images by overlaying the VLM and Mission Planner (human-generated) coordinates on the original images from the dataset. These identified images are saved in identified_images_VLM/ (for VLM outputs) and identified_images_mp/ (for Mission Planner outputs).

After running the script, you will be able to examine:

Text Files: Containing the generated coordinates, home positions, and RMSE data.
Images: Showing the identified coordinates overlaid on the images.
Plots: Comparing trajectory lengths and RMSE values.

Trajectory Bar Chart:

Error Box Plot:

Error Comparison Table:

The errors were calculated using different approaches including K-Nearest Neighbor (KNN), Dynamic Time Warping (DTW), and Linear Interpolation.

	Metric	KNN Error (m)	DTW RMSE (m)	Interpolation RMSE (m)
1	Mean	34.2218	307.265	409.538
2	Median	26.0456	318.462	395.593
3	Max	112.493	644.574	727.936

Simulation Video

The generated mission from the UAV-VLA framework was tested in the ArduPilot Mission Planner. The simulation can be seen below.

simulation_video.mp4

Citation

@inproceedings{10.5555/3721488.3721725,
author = {Sautenkov, Oleg and Yaqoot, Yasheerah and Lykov, Artem and Mustafa, Muhammad Ahsan and Tadevosyan, Grik and Akhmetkazy, Aibek and Altamirano Cabrera, Miguel and Martynov, Mikhail and Karaf, Sausar and Tsetserukou, Dzmitry},
title = {UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission Generation},
year = {2025},
publisher = {IEEE Press},
abstract = {The UAV-VLA (Visual-Language-Action) system is a tool designed to facilitate communication with aerial robots. By integrating satellite imagery processing with the Visual Language Model (VLM) and the powerful capabilities of GPT, UAV-VLA enables users to generate general flight paths-and-action plans through simple text requests. This system leverages the rich contextual information provided by satellite images, allowing for enhanced decision-making and mission planning. The combination of visual analysis by VLM and natural language processing by GPT can provide the user with the path-and-action set, making aerial operations more efficient and accessible. The newly developed method showed the difference in the length of the created trajectory in 22\% and the mean error in finding the objects of interest on a map in 34.22 m by Euclidean distance in the K-Nearest Neighbors (KNN) approach. Additionally, the UAV-VLA system generates all flight plans in just 5 minutes and 24 seconds, making it 6.5 times faster than an experienced human operator.},
booktitle = {Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction},
pages = {1588–1592},
numpages = {5},
keywords = {drone, llm-agents, navigation, path planning, uav, vla, vlm, vlm-agents},
location = {Melbourne, Australia},
series = {HRI '25}
}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
benchmark-UAV-VLPA-nano-30		benchmark-UAV-VLPA-nano-30
created_missions		created_missions
experiments		experiments
identified_new_data		identified_new_data
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
UAV_VLA_Title_image.png		UAV_VLA_Title_image.png
config.py		config.py
csv_coordinates_parser.py		csv_coordinates_parser.py
draw_circles.py		draw_circles.py
generate_plans.py		generate_plans.py
identified_points.txt		identified_points.txt
parser_for_coordinates.py		parser_for_coordinates.py
recalculate_to_latlon.py		recalculate_to_latlon.py
requirements.txt		requirements.txt
run_vlm.py		run_vlm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission Generation

UPD from the authors: We are a bit surprised about the popularity of this paper, so the code and data are about to be refactored for more convenient format.

The new code based on a new advanced framework will be released with a new paper.

Oleg

Table of Contents

Abstract

UAV-VLA Framework

Benchmark

Installation

Export your ChatGpt api key

Mission generation

Path-Plans Creation

Experimental Results

What Happens When You run main.py:

Trajectory Bar Chart:

Error Box Plot:

Error Comparison Table:

Simulation Video

Citation

About

Releases

Packages

Contributors 3

Languages

License

Sautenich/UAV-VLA

Folders and files

Latest commit

History

Repository files navigation

UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission Generation

UPD from the authors: We are a bit surprised about the popularity of this paper, so the code and data are about to be refactored for more convenient format.

The new code based on a new advanced framework will be released with a new paper.

Oleg

Table of Contents

Abstract

UAV-VLA Framework

Benchmark

Installation

Export your ChatGpt api key

Mission generation

Path-Plans Creation

Experimental Results

What Happens When You run main.py:

Trajectory Bar Chart:

Error Box Plot:

Error Comparison Table:

Simulation Video

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages