TagaVLM: Topology-Aware Global Action Reasoning for Vision-Language Navigation

Official implementation of the ICRA 2026 paper "TagaVLM: Topology-Aware Global Action Reasoning for Vision-Language Navigation".

⚠️ IMPORTANT: Code cleaning and preprocessed data release are in progress. Full release coming soon!

For details, please visit our project page.

Results on R2R (Val Unseen)

Method	Backbone	NE ↓	OSR ↑	SR ↑	SPL ↑
NavCoT	LLaMA2-7B	6.26	48.11	40.23	36.64
MapGPT	GPT-4V	5.62	57.9	47.7	38.1
TagaVLM-0.5B (Ours)	Qwen2-0.5B	5.57	55.09	45.72	41.91
TagaVLM-7B (Ours)	Qwen2-7B	4.97	60.2	51.09	47.18

Installation

git clone https://github.com/APEX-BJUT/Taga-VLM.git
cd Taga-VLM

conda create -n tagavlm python=3.9 -y
conda activate tagavlm
pip install --upgrade pip
pip install -e ".[train]"

Install the patched transformers (required for STAR-Att):

cd transformers-4.40.0 && pip install -e . && cd ..

Additional pinned dependencies: accelerate==0.28.0, numpy<=2.0.

Flash-Attention 2: Download the prebuilt .whl for your CUDA/Python version from Flash-Attention Releases (select the abiFALSE variant), then:

pip install flash_attn-*.whl

Matterport3D Simulator: Follow Matterport3DSimulator.

Data Preparation

Download model weights and data from HuggingFace and place them as:

Taga-VLM
├── data
│   ├── mp3d_data
│   ├── view_images_bgr_from_mattersim.h5
│   ├── view_images_hm3d
│   ├── view_images_hm3d_pano
│   └── anno
├── model_zoo
│   ├── TagaVLM-qwen2-7b
│   └── TagaVLM-qwen2-0.5b

Training & Evaluation

# Training
bash scripts/train/finetune_TagaVLM.sh

# Evaluation on R2R
cd map_nav_src && bash run_r2r.sh

Note: Make sure the dtype is torch.float16 in line 325,327 of Taga-VLM/llava/model/llava_arch.py before evaluation, and for 0.5b model , add "vocab_size": 151936 and "tie_word_embeddings": true in config.json after training

Citation

@inproceedings{liu2026tagavlm,
  title     = {TagaVLM: Topology-Aware Global Action Reasoning for Vision-Language Navigation},
  author    = {Liu, Jiaxing and Zhang, Zexi and Li, Xiaoyan and Wang, Boyue and Hu, Yongli and Yin, Baocai},
  booktitle = {Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
  year      = {2026}
}

Acknowledgement

This project builds upon LLaVA-NeXT and VLN-DUET. We thank the authors for open-sourcing their code.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
llava		llava
map_nav_src		map_nav_src
scripts		scripts
transformers-4.40.0		transformers-4.40.0
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TagaVLM: Topology-Aware Global Action Reasoning for Vision-Language Navigation

Results on R2R (Val Unseen)

Installation

Data Preparation

Training & Evaluation

Citation

Acknowledgement

About

Uh oh!

Contributors 2

Languages

Folders and files

Latest commit

History

Repository files navigation

TagaVLM: Topology-Aware Global Action Reasoning for Vision-Language Navigation

Results on R2R (Val Unseen)

Installation

Data Preparation

Training & Evaluation

Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Languages