Awesome VLA Benchmarks

A curated list of benchmarks, evaluation frameworks, and datasets for Vision-Language-Action (VLA) models in robotics.

VLA models take visual observations and language instructions as input, and output robot actions. This list catalogs the benchmarks used to evaluate them.

Contributions welcome! Please read the contributing guidelines before submitting a pull request.

Awesome VLA Models

A chronological list of published Vision-Language-Action models.

VLM Backbone: the pretrained vision-language model the VLA was built on (or "from-scratch" if none).
Action Head: how continuous robot actions are produced — discrete tokens, diffusion, flow matching, etc.
Open: ✓ if weights/code are publicly released; ◐ if partial (code only / weights restricted); ✗ if closed.

Model	Date	Org	VLM Backbone	Action Head	Params	Open	Links
RT-1	2022-12	Google	EfficientNet + Universal Sentence Encoder	Discrete action tokens (Transformer)	35M	✓	Paper / Code
PaLM-E	2023-03	Google	PaLM + ViT	LLM-driven planning (text actions)	up to 562B	✗	Paper / Site
RT-2	2023-07	Google DeepMind	PaLI-X / PaLM-E	Discrete action tokens (co-fine-tuned with web data)	5B / 55B	✗	Paper / Site
RT-2-X / RT-X	2023-10	Open X-Embodiment collab	PaLI-X	Discrete action tokens, cross-embodiment	55B	✗	Paper / Site
RoboFlamingo	2023-11	ByteDance / Berkeley	OpenFlamingo	LSTM action head	~3B	✓	Paper / Code
3D-VLA	2024-03	UMass / MIT	3D-LLM	Generative 3D goal + action	—	✓	Paper / Code
Octo	2024-05	UC Berkeley / Stanford	Transformer (from-scratch)	Diffusion head	27M / 93M	✓	Paper / Code
OpenVLA	2024-06	Stanford / UC Berkeley	Llama-2-7B + DINOv2 + SigLIP	Discrete action tokens (autoregressive)	7B	✓	Paper / Code
TinyVLA	2024-09	Midea / ECNU	Small VLM (Pythia-based)	Diffusion head	<1B	✓	Paper / Site
RDT-1B	2024-10	Tsinghua AIR	SigLIP + T5-XXL	Diffusion Transformer	1B	✓	Paper / Code
π0 (Pi-Zero)	2024-10	Physical Intelligence	PaliGemma	Flow-matching action expert	3B	✓	Paper / Code
CogACT	2024-11	Microsoft Research Asia	OpenVLA-style (DINOv2+SigLIP+Llama2)	DiT action expert (decoupled cognition/action)	7B+	✓	Paper / Code
π0-FAST	2025-01	Physical Intelligence	PaliGemma	FAST (DCT) action tokens	3B	✓	Paper / Site
SpatialVLA	2025-01	Shanghai AI Lab et al.	PaliGemma2	Ego3D position-aware action tokens	4B	✓	Paper / Code
DexVLA	2025-02	Midea	Qwen2-VL	Diffusion action expert (dexterous)	1B+	✓	Paper / Site
Magma	2025-02	Microsoft	LLaVA-style	Set-of-marks + action traces	8B	✓	Paper / Code
Helix	2025-02	Figure AI	S2 (VLM, ~7B) + S1 (80M visuomotor)	Dual-system, S1 runs at 200Hz	~7B (S2)	✗	Site
Hi Robot	2025-02	Physical Intelligence	π0 backbone + high-level VLM	Hierarchical (instruction → action)	3B	✗	Paper / Site
OpenVLA-OFT	2025-02	Stanford	OpenVLA	Parallel decoding + continuous actions + L1 regression	7B	✓	Paper / Code
GR00T N1	2025-03	NVIDIA	Eagle-2 VLM	DiT action head (System 1+2 design)	2B	✓	Paper / Code
Gemini Robotics	2025-03	Google DeepMind	Gemini 2.0	Action decoder (closed)	—	✗	Paper / Site
GO-1	2025-03	AgiBot	InternVL backbone	Latent planner + action expert (ViLLA)	—	◐	Site / Code
π0.5	2025-04	Physical Intelligence	π0 + open-world co-training	Flow matching, generalizes to unseen homes	3B	✗	Paper / Site
NORA	2025-04	SUTD	Qwen2.5-VL	FAST tokens	3B	✓	Paper / Code
SmolVLA	2025-06	Hugging Face	SmolVLM-2	Flow matching action expert	450M	✓	Paper / Code
GR00T N1.5	2025-06	NVIDIA	Eagle-2 VLM	DiT action head (improved post-training)	2B	✓	Code
WorldVLA	2025-06	Alibaba DAMO	Chameleon	Unified world-model + action autoregression	7B	✓	Paper / Code
Gemini Robotics On-Device	2025-06	Google DeepMind	Gemini Nano family	On-device action decoder	—	✗	Site
MolmoAct	2025-08	Allen AI (AI2)	Molmo VLM	Action reasoning + chunked action tokens	7B	✓	Paper / Code

Simulation Benchmarks - Manipulation

Benchmark	Year	Simulator	Tasks	Key Focus	Links
CALVIN	2022	PyBullet	34 tasks, 4 envs	Long-horizon language-conditioned manipulation	Paper / Code
LIBERO	2023	robosuite	130 tasks, 4 suites	Lifelong learning, knowledge transfer	Paper / Code
RLBench	2020	CoppeliaSim	100 tasks	Vision-guided manipulation (RL, IL, few-shot)	Paper / Code
PerAct2	2024	CoppeliaSim	18 bimanual tasks	Bimanual 6-DoF coordination	Paper / Code
Meta-World	2019	MuJoCo	50 tasks	Multi-task / meta RL	Paper / Code
ManiSkill3	2024	SAPIEN	12 domains	GPU-parallel, fastest sim (30K+ FPS)	Paper / Code
ManiSkill-HAB	2024	SAPIEN	Home rearrangement	Low-level home manipulation	Paper / Site
robosuite	2020	MuJoCo	9 tasks, 10 robots	Modular manipulation framework	Paper / Code
RoboMimic	2021	MuJoCo	5 sim + 3 real tasks	IL from human demonstrations	Paper / Code
VIMA	2023	PyBullet	17 task types, 600K+ trajs	Multimodal prompt-conditioned	Paper / Code
Ravens / CLIPort	2020/22	PyBullet	10 tasks	Transporter / language rearrangement	Paper / Code
ARNOLD	2023	Isaac Sim	8 tasks, 40 objects	Continuous states in realistic 3D scenes	Paper / Code
COLOSSEUM	2024	CoppeliaSim	20 tasks x 14 perturbations	Systematic generalization testing	Paper / Code
VLABench	2024	-	100 categories, 2000+ objects	Long-horizon reasoning	Paper / Code
GemBench	2024	CoppeliaSim	7 primitives x 4 levels	Generalization levels	Paper / Code
ClevrSkills	2024	ManiSkill2	33 tasks, 330K trajs	Compositional reasoning	Paper
LoHoRavens	2023	PyBullet	10 tasks	Long-horizon without step-by-step instructions	Paper / Code
BEHAVIOR-1K	2022	OmniGibson	1000 activities	Full household activities	Paper / Code
RoboCasa	2024	robosuite	100-365 tasks	Kitchen tasks, generalist robots	Paper / Code
GenManip	2025	-	200 scenarios	LLM-driven instruction generalization	Paper / Code
Franka Kitchen	2019	MuJoCo	4 subtasks	Multi-task offline RL	Paper
FurnitureBench	2023	Isaac Gym	8 IKEA-style tasks	Long-horizon furniture assembly	Paper / Code
BiGym	2024	MuJoCo	40 tasks	Bimanual mobile manipulation	Paper / Code
RoboTwin	2024	-	50 tasks, 5 embodiments	Dual-arm with generative digital twins	Paper / Code
DexArt	2023	SAPIEN	Multiple	Dexterous articulated object manipulation	Paper / Code
Bi-DexHands	2022	Isaac Gym	Thousands	Bimanual dexterous manipulation	Paper / Code
DOMINO	2026	-	35 tasks, 110K+ trajs	Dynamic manipulation generalization	Paper / Code
LiLo-VLA (LIBERO-Long++ / Ultra-Long)	2026	robosuite	21 tasks	Compositional long-horizon manipulation with object-centric linking	Paper
InstructVLA	2026	-	Instruction-tuning suite	Instruction tuning from understanding to manipulation (ICLR 2026)	Code

Simulation Benchmarks - Embodied AI / Navigation

Benchmark	Year	Simulator	Tasks	Key Focus	Links
AI2-THOR / ManipulaTHOR	2017	Unity	120+ rooms	Navigation + manipulation	Paper / Code
Habitat 2.0	2021	Habitat Sim	Thousands of envs	Navigation + rearrangement	Paper / Site
EmbodiedBench	2025	Multi-env	1,128 instances	MLLM-based embodied agents	Paper / Code

Humanoid Benchmarks

Benchmark	Year	Simulator	Tasks	Key Focus	Links
HumanoidBench	2024	MuJoCo	27 (15 manip + 12 loco)	Whole-body locomotion & manipulation	Paper / Code
LeVERB	2025	Isaac Lab	150+ tasks, 10 categories	Vision-language humanoid whole-body control	Paper
Ego Humanoid Manipulation	2025	Isaac Lab	12 tasks	Egocentric vision humanoid manipulation	Code
HumanoidGen (HGen-Bench)	2025	SAPIEN	20 tasks	LLM-driven bimanual dexterous task generation	Paper / Code
Humanoid Everyday	2025	Real-world	260 tasks, 10.3K trajs	Large-scale real humanoid manipulation	Paper / Data
OmniH2O	2024	Isaac Gym	6 tasks	Human-to-humanoid teleoperation & autonomy	Paper / Code
SIMPLE (Psi-0)	2026	MuJoCo + Isaac Sim	6+ loco-manip tasks	Open humanoid VLA benchmarking simulator	Paper / Code
Mimicking-Bench	2024	-	6 tasks, 23K sequences	Human-to-humanoid scene interaction	Paper / Site

Real-World Datasets & Benchmarks

Benchmark	Year	Embodiment	Scale	Key Focus	Links
Open X-Embodiment	2023	22 robots	1M+ trajs, 527 skills	Cross-embodiment transfer	Paper / Code
BridgeData V2	2023	WidowX 250	60K trajs, 24 envs	Multi-task, cross-environment	Paper / Site
DROID	2024	18 Frankas	76K trajs, 564 scenes	In-the-wild manipulation	Paper / Code
RoboMIND	2025	4 embodiments	107K trajs, 479 tasks	Multi-embodiment with failure data	Paper / Site
AgiBot World	2025	Dual-arm	1M+ trajs, 217 tasks	Bimanual at scale (4000 m² facility)	Paper / Code
RoboSet	2023	Franka	7.5K trajs, 38 tasks	Kitchen multi-task	Paper / Site
Language-Table	2023	Custom	600K trajs	Open-vocabulary pushing/rearrangement	Paper / Code
FMB	2024	Franka	22.5K demos	Functional manipulation (grasp, assemble)	Paper / Site
LHManip	2023	Real robot	200 episodes, 20 tasks	Long-horizon in cluttered scenes	Paper / Code
ALOHA / Mobile ALOHA	2023	Custom bimanual	7-50 tasks	Bimanual (mobile) manipulation	Paper / Site
RoboVQA	2024	3 embodiments	829K pairs	VQA for robot reasoning	Paper / Site
MUTEX	2023	Franka	100 sim + 50 real	6-modality task specification	Paper

Sim-to-Real Evaluation

Benchmark	Year	Approach	Key Focus	Links
SimplerEnv (SIMPLER)	2024	Sim-as-real proxy	Evaluate real-world policies in sim	Paper / Code
REALM	2025	Real-validated sim	15 perturbation factors, p<0.001 correlation	Paper / Code
RobotArena Infinity	2025	Real-to-sim translation	VLM scoring + human preferences	Paper / Site
RoboArena	2025	Distributed real eval	Crowd-sourced ELO-style rankings	Paper
RoboChallenge	2025	Remote real robots	30 tasks, fleet of 10 machines	Paper / Site

VLA-Specific Evaluation Frameworks

Framework	Year	Type	Key Focus	Links
vla-eval	2026	Unified harness	17 benchmarks, 500+ models, Docker-based	Code
VLA-Arena	2025	Systematic eval	170 tasks, 4 dimensions x 3 difficulty levels	Paper / Code
LADEV	2024	Language-driven eval	Auto-generated scenes from NL descriptions	Paper
ManipBench	2025	MCQ-based	VLM reasoning for low-level manipulation	Paper / Site
RoboBench	2025	MCQ/VQA-based	MLLM as embodied brain, 5 cognitive dims	Paper / Site
Eval-Actions + AutoEval	2026	Automated eval	Trustworthy evaluation protocol for robotic manipulation	Paper

Robustness & Safety Benchmarks

Benchmark	Year	Extends	Key Focus	Links
LIBERO-PRO	2025	LIBERO	Robustness under 4-dim perturbations	Paper / Code
LIBERO-Plus	2025	LIBERO	7-dim x 5-level robustness analysis	Paper / Code
LIBERO-X	2026	LIBERO	Hierarchical robustness litmus test	-
LIBERO-Para	2026	LIBERO	Paraphrase robustness (22-52% degradation)	-
SimX-OR	2025	Plug-in	Observational robustness (blur, noise, etc.)	Paper / Code
Eva-VLA	2025	LIBERO	Adversarial physical variations	Paper
VLA-Risk	2025	Multiple	Safety/risk across 296 scenarios, 3 dims (object/action/space) x 2 modalities	OpenReview
RoboMME	2026	Custom	Memory-augmented VLA evaluation	Code
Safety-CHORES / SafeVLA	2025	AI2-THOR / CHORES	5 cost categories (corner, blind_spot, fragile, critical, danger) on long-horizon nav+manip; safe RL via CMDP (NeurIPS 2025 Spotlight)	Paper / Code / Site
RoboCasa-Safety (via OmniGuide)	2026	RoboCasa	Safety-rate protocol (no collision with static furniture) + 3D SDF guidance	Paper / Site
Linguistic Red-Team	2026	Multiple	Diversity-aware adversarial instructions (SR 93% → 5.85%)	Paper
VLSA / AEGIS	2026	Plug-in	Plug-and-play CBF safety-constraint layer with theoretical guarantees	Paper

Unified Platforms

Platform	Year	Key Focus	Links
RoboVerse	2025	Cross-simulator unified platform (MetaSim)	Paper / Code
STAR-Gen	2025	Generalization taxonomy (visual, semantic, behavioral)	Paper / Site

Survey Papers

"Vision-Language-Action Models for Robotics: A Review" - Site
"Pure Vision Language Action (VLA) Models: A Comprehensive Survey" - Paper
"A Survey on Vision-Language-Action Models for Embodied AI" - Paper
"A Survey on Efficient Vision-Language-Action Models" - Paper / Site
"A Survey on Vision-Language-Action Models: An Action Tokenization Perspective" - Paper
"Benchmarking the Generality of Vision-Language-Action Models" - Paper

Related Awesome Lists

Awesome-VLA - VLA models, benchmarks, datasets
awesome-world-models-for-vla-agents - World models for VLA agents
awesome-embodied-vla-va-vln - VLA, VA, VLN state-of-the-art
Awesome-VLA-Papers - Action tokenization perspective
awesome-physical-ai - Physical AI, VLA, world models
Awesome-VLA-Robotics - Comprehensive VLA papers
Awesome-RL-VLA - RL + VLA
Awesome-VLA (yueen-ma) - 400+ papers visualized
Awesome-Learning-for-Manipulation - VLA, visuomotor, world models

Contributing

Please see CONTRIBUTING.md for guidelines.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome VLA Benchmarks

Table of Contents

Awesome VLA Models

Simulation Benchmarks - Manipulation

Simulation Benchmarks - Embodied AI / Navigation

Humanoid Benchmarks

Real-World Datasets & Benchmarks

Sim-to-Real Evaluation

VLA-Specific Evaluation Frameworks

Robustness & Safety Benchmarks

Unified Platforms

Survey Papers

Related Awesome Lists

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Awesome VLA Benchmarks

Table of Contents

Awesome VLA Models

Simulation Benchmarks - Manipulation

Simulation Benchmarks - Embodied AI / Navigation

Humanoid Benchmarks

Real-World Datasets & Benchmarks

Sim-to-Real Evaluation

VLA-Specific Evaluation Frameworks

Robustness & Safety Benchmarks

Unified Platforms

Survey Papers

Related Awesome Lists

Contributing

License

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages