VLA-Arena is an open-source benchmark for systematic evaluation of Vision-Language-Action (VLA) models. VLA-Arena provides a full toolchain covering scenes modeling, demonstrations collection, models training and evaluation. It features 150+ tasks across 13 specialized suites, hierarchical difficulty levels (L0-L2), and comprehensive metrics for safety, generalization, and efficiency assessment.
VLA-Arena focuses on four key domains:
- Safety: Operate reliably and safely in the physical world.
- Distractors: Maintain stable performance when facing environmental unpredictability.
- Extrapolation: Generalize learned knowledge to novel situations.
- Long Horizon: Combine long sequences of actions to achieve a complex goal.
2025.09.29: VLA-Arena is officially released!
- ๐ End-to-End & Out-of-the-Box: We provide a complete and unified toolchain covering everything from scene modeling and behavior collection to model training and evaluation. Paired with comprehensive docs and tutorials, you can get started in minutes.
- ๐ Plug-and-Play Evaluation: Seamlessly integrate and benchmark your own VLA models. Our framework is designed with a unified API, making the evaluation of new architectures straightforward with minimal code changes.
- ๐ ๏ธ Effortless Task Customization: Leverage the Constrained Behavior Domain Definition Language (CBDDL) to rapidly define entirely new tasks and safety constraints. Its declarative nature allows you to achieve comprehensive scenario coverage with minimal effort.
- ๐ Systematic Difficulty Scaling: Systematically assess model capabilities across three distinct difficulty levels (L0โL1โL2). Isolate specific skills and pinpoint failure points, from basic object manipulation to complex, long-horizon tasks.
If you find VLA-Arena useful, please cite it in your publications.
@misc{vla-arena2025,
title={VLA-Arena},
author={Jiahao Li, Borong Zhang, Jiachen Shen, Jiaming Ji, and Yaodong Yang},
journal={GitHub repository},
year={2025}
}# 1. Install VLA-Arena
pip install vla-arena
# 2. Download task suites (required)
vla-arena.download-tasks install-all --repo vla-arena/tasks๐ฆ Important: To reduce PyPI package size, task suites and asset files must be downloaded separately after installation (~850 MB).
# Clone repository (includes all tasks and assets)
git clone https://github.com/PKU-Alignment/VLA-Arena.git
cd VLA-Arena
# Create environment
conda create -n vla-arena python=3.11
conda activate vla-arena
# Install VLA-Arena
pip install -e .- The
mujoco.dllfile may be missing in therobosuite/utilsdirectory, which can be obtained frommujoco/mujoco.dll; - When using on Windows platform, you need to modify the
mujocorendering method inrobosuite\utils\binding_utils.py:if _SYSTEM == "Darwin": os.environ["MUJOCO_GL"] = "cgl" else: os.environ["MUJOCO_GL"] = "wgl" # Change "egl" to "wgl"
# Collect demonstration data
python scripts/collect_demonstration.py --bddl-file tasks/your_task.bddlThis will open an interactive simulation environment where you can control the robotic arm using keyboard controls to complete the task specified in the BDDL file.
# Create a dedicated environment for the model
conda create -n [model_name]_vla_arena python=3.11 -y
conda activate [model_name]_vla_arena
# Install VLA-Arena and model-specific dependencies
pip install -e .
pip install vla-arena[model_name]
# Fine-tune a model (e.g., OpenVLA)
vla-arena train --model openvla --config vla_arena/configs/train/openvla.yaml
# Evaluate a model
vla-arena eval --model openvla --config vla_arena/configs/evaluation/openvla.yamlNote: OpenPi requires a different setup process using uv for environment management. Please refer to the Model Fine-tuning and Evaluation Guide for detailed OpenPi installation and training instructions.
VLA-Arena provides 11 specialized task suites with 150+ tasks total, organized into four domains:
| Suite | Description | L0 | L1 | L2 | Total |
|---|---|---|---|---|---|
static_obstacles |
Static collision avoidance | 5 | 5 | 5 | 15 |
cautious_grasp |
Safe grasping strategies | 5 | 5 | 5 | 15 |
hazard_avoidance |
Hazard area avoidance | 5 | 5 | 5 | 15 |
state_preservation |
Object state preservation | 5 | 5 | 5 | 15 |
dynamic_obstacles |
Dynamic collision avoidance | 5 | 5 | 5 | 15 |
| Suite | Description | L0 | L1 | L2 | Total |
|---|---|---|---|---|---|
static_distractors |
Cluttered scene manipulation | 5 | 5 | 5 | 15 |
dynamic_distractors |
Dynamic scene manipulation | 5 | 5 | 5 | 15 |
| Suite | Description | L0 | L1 | L2 | Total |
|---|---|---|---|---|---|
preposition_combinations |
Spatial relationship understanding | 5 | 5 | 5 | 15 |
task_workflows |
Multi-step task planning | 5 | 5 | 5 | 15 |
unseen_objects |
Unseen object recognition | 5 | 5 | 5 | 15 |
| Suite | Description | L0 | L1 | L2 | Total |
|---|---|---|---|---|---|
long_horizon |
Long-horizon task planning | 10 | 5 | 5 | 20 |
Difficulty Levels:
- L0: Basic tasks with clear objectives
- L1: Intermediate tasks with increased complexity
- L2: Advanced tasks with challenging scenarios
| Suite Name | L0 | L1 | L2 |
|---|---|---|---|
| Static Obstacles | ![]() |
![]() |
![]() |
| Cautious Grasp | ![]() |
![]() |
![]() |
| Hazard Avoidance | ![]() |
![]() |
![]() |
| State Preservation | ![]() |
![]() |
![]() |
| Dynamic Obstacles | ![]() |
![]() |
![]() |
| Suite Name | L0 | L1 | L2 |
|---|---|---|---|
| Static Distractors | ![]() |
![]() |
![]() |
| Dynamic Distractors | ![]() |
![]() |
![]() |
| Suite Name | L0 | L1 | L2 |
|---|---|---|---|
| Preposition Combinations | ![]() |
![]() |
![]() |
| Task Workflows | ![]() |
![]() |
![]() |
| Unseen Objects | ![]() |
![]() |
![]() |
| Suite Name | L0 | L1 | L2 |
|---|---|---|---|
| Long Horizon | ![]() |
![]() |
![]() |
- OS: Ubuntu 20.04+ or macOS 12+
- Python: 3.11 or higher
- CUDA: 11.8+ (for GPU acceleration)
# Clone repository
git clone https://github.com/PKU-Alignment/VLA-Arena.git
cd VLA-Arena
# Create environment
conda create -n vla-arena python=3.11
conda activate vla-arena
# Install dependencies
pip install --upgrade pip
pip install -e .VLA-Arena provides comprehensive documentation for all aspects of the framework. Choose the guide that best fits your needs:
๐๏ธ Scene Construction Guide | ไธญๆ็
Build custom task scenarios using CBDDL (Constrained Behavior Domain Definition Language).
- CBDDL file structure and syntax
- Region, fixture, and object definitions
- Moving objects with various motion types (linear, circular, waypoint, parabolic)
- Initial and goal state specifications
- Cost constraints and safety predicates
- Image effect settings
- Asset management and registration
- Scene visualization tools
๐ Data Collection Guide | ไธญๆ็
Collect demonstrations in custom scenes and convert data formats.
- Interactive simulation environment with keyboard controls
- Demonstration data collection workflow
- Data format conversion (HDF5 to training dataset)
- Dataset regeneration (filtering noops and optimizing trajectories)
- Convert dataset to RLDS format (for X-embodiment frameworks)
- Convert RLDS dataset to LeRobot format (for Hugging Face LeRobot)
Fine-tune and evaluate VLA models using VLA-Arena generated datasets.
- General models (OpenVLA, OpenVLA-OFT, UniVLA, SmolVLA): Simple installation and training workflow
- OpenPi: Special setup using
uvfor environment management - Model-specific installation instructions (
pip install vla-arena[model_name]) - Training configuration and hyperparameter settings
- Evaluation scripts and metrics
- Policy server setup for inference (OpenPi)
- Standard:
finetune_openvla.sh- Basic OpenVLA fine-tuning - Advanced:
finetune_openvla_oft.sh- OpenVLA OFT with enhanced features
- English:
README_EN.md- Complete English documentation index - ไธญๆ:
README_ZH.md- ๅฎๆดไธญๆๆๆกฃ็ดขๅผ
After installation, you can use the following commands to view and download task suites:
# View installed tasks
vla-arena.download-tasks installed
# List available task suites
vla-arena.download-tasks list --repo vla-arena/tasks
# Install a single task suite
vla-arena.download-tasks install robustness_dynamic_distractors --repo vla-arena/tasks
# Install all task suites (recommended)
vla-arena.download-tasks install-all --repo vla-arena/tasks# View installed tasks
python -m scripts.download_tasks installed
# Install all tasks
python -m scripts.download_tasks install-all --repo vla-arena/tasksIf you want to use your own task repository:
# Use custom HuggingFace repository
vla-arena.download-tasks install-all --repo your-username/your-task-repoYou can create and share your own task suites:
# Package a single task
vla-arena.manage-tasks pack path/to/task.bddl --output ./packages
# Package all tasks
python scripts/package_all_suites.py --output ./packages
# Upload to HuggingFace Hub
vla-arena.manage-tasks upload ./packages/my_task.vlap --repo your-username/your-repoWe compare six models across four dimensions: Safety, Distractor, Extrapolation, and Long Horizon. Performance trends over three difficulty levels (L0โL2) are shown with a unified scale (0.0โ1.0) for cross-model comparison. Safety tasks report both cumulative cost (CC, shown in parentheses) and success rate (SR), while other tasks report only SR. Bold numbers mark the highest performance per difficulty level.
| Task | OpenVLA | OpenVLA-OFT | ฯโ | ฯโ-FAST | UniVLA | SmolVLA |
|---|---|---|---|---|---|---|
| StaticObstacles | ||||||
| L0 | 1.00 (CC: 0.0) | 1.00 (CC: 0.0) | 0.98 (CC: 0.0) | 1.00 (CC: 0.0) | 0.84 (CC: 0.0) | 0.14 (CC: 0.0) |
| L1 | 0.60 (CC: 8.2) | 0.20 (CC: 45.4) | 0.74 (CC: 8.0) | 0.40 (CC: 56.0) | 0.42 (CC: 9.7) | 0.00 (CC: 8.8) |
| L2 | 0.00 (CC: 38.2) | 0.20 (CC: 49.0) | 0.32 (CC: 28.1) | 0.20 (CC: 6.8) | 0.18 (CC: 60.6) | 0.00 (CC: 2.6) |
| CautiousGrasp | ||||||
| L0 | 0.80 (CC: 6.6) | 0.60 (CC: 3.3) | 0.84 (CC: 3.5) | 0.64 (CC: 3.3) | 0.80 (CC: 3.3) | 0.52 (CC: 2.8) |
| L1 | 0.40 (CC: 120.2) | 0.50 (CC: 6.3) | 0.08 (CC: 16.4) | 0.06 (CC: 15.6) | 0.60 (CC: 52.1) | 0.28 (CC: 30.7) |
| L2 | 0.00 (CC: 50.1) | 0.00 (CC: 2.1) | 0.00 (CC: 0.5) | 0.00 (CC: 1.0) | 0.00 (CC: 8.5) | 0.04 (CC: 0.3) |
| HazardAvoidance | ||||||
| L0 | 0.20 (CC: 17.2) | 0.36 (CC: 9.4) | 0.74 (CC: 6.4) | 0.16 (CC: 10.4) | 0.70 (CC: 5.3) | 0.16 (CC: 10.4) |
| L1 | 0.02 (CC: 22.8) | 0.00 (CC: 22.9) | 0.00 (CC: 16.8) | 0.00 (CC: 15.4) | 0.12 (CC: 18.3) | 0.00 (CC: 19.5) |
| L2 | 0.20 (CC: 15.7) | 0.20 (CC: 14.7) | 0.00 (CC: 15.6) | 0.20 (CC: 13.9) | 0.04 (CC: 16.7) | 0.00 (CC: 18.0) |
| StatePreservation | ||||||
| L0 | 1.00 (CC: 0.0) | 1.00 (CC: 0.0) | 0.98 (CC: 0.0) | 0.60 (CC: 0.0) | 0.90 (CC: 0.0) | 0.50 (CC: 0.0) |
| L1 | 0.66 (CC: 6.6) | 0.76 (CC: 7.6) | 0.64 (CC: 6.4) | 0.56 (CC: 5.6) | 0.76 (CC: 7.6) | 0.18 (CC: 1.8) |
| L2 | 0.34 (CC: 21.0) | 0.20 (CC: 4.6) | 0.48 (CC: 15.8) | 0.20 (CC: 4.2) | 0.54 (CC: 16.4) | 0.08 (CC: 9.6) |
| DynamicObstacles | ||||||
| L0 | 0.60 (CC: 3.6) | 0.80 (CC: 8.8) | 0.92 (CC: 6.0) | 0.80 (CC: 3.6) | 0.26 (CC: 7.1) | 0.32 (CC: 2.1) |
| L1 | 0.60 (CC: 5.1) | 0.56 (CC: 3.7) | 0.64 (CC: 3.3) | 0.30 (CC: 8.8) | 0.58 (CC: 16.3) | 0.24 (CC: 16.6) |
| L2 | 0.26 (CC: 5.6) | 0.10 (CC: 1.8) | 0.10 (CC: 40.2) | 0.00 (CC: 21.2) | 0.08 (CC: 6.0) | 0.02 (CC: 0.9) |
| Task | OpenVLA | OpenVLA-OFT | ฯโ | ฯโ-FAST | UniVLA | SmolVLA |
|---|---|---|---|---|---|---|
| StaticDistractors | ||||||
| L0 | 0.80 | 1.00 | 0.92 | 1.00 | 1.00 | 0.54 |
| L1 | 0.20 | 0.00 | 0.02 | 0.22 | 0.12 | 0.00 |
| L2 | 0.00 | 0.20 | 0.02 | 0.00 | 0.00 | 0.00 |
| DynamicDistractors | ||||||
| L0 | 0.60 | 1.00 | 0.78 | 0.80 | 0.78 | 0.42 |
| L1 | 0.58 | 0.54 | 0.70 | 0.28 | 0.54 | 0.30 |
| L2 | 0.40 | 0.40 | 0.18 | 0.04 | 0.04 | 0.00 |
| Task | OpenVLA | OpenVLA-OFT | ฯโ | ฯโ-FAST | UniVLA | SmolVLA |
|---|---|---|---|---|---|---|
| PrepositionCombinations | ||||||
| L0 | 0.68 | 0.62 | 0.76 | 0.14 | 0.50 | 0.20 |
| L1 | 0.04 | 0.18 | 0.10 | 0.00 | 0.02 | 0.00 |
| L2 | 0.00 | 0.00 | 0.00 | 0.00 | 0.02 | 0.00 |
| TaskWorkflows | ||||||
| L0 | 0.82 | 0.74 | 0.72 | 0.24 | 0.76 | 0.32 |
| L1 | 0.20 | 0.00 | 0.00 | 0.00 | 0.04 | 0.04 |
| L2 | 0.16 | 0.00 | 0.00 | 0.00 | 0.20 | 0.00 |
| UnseenObjects | ||||||
| L0 | 0.80 | 0.60 | 0.80 | 0.00 | 0.34 | 0.16 |
| L1 | 0.60 | 0.40 | 0.52 | 0.00 | 0.76 | 0.18 |
| L2 | 0.00 | 0.20 | 0.04 | 0.00 | 0.16 | 0.00 |
| Task | OpenVLA | OpenVLA-OFT | ฯโ | ฯโ-FAST | UniVLA | SmolVLA |
|---|---|---|---|---|---|---|
| LongHorizon | ||||||
| L0 | 0.80 | 0.80 | 0.92 | 0.62 | 0.66 | 0.74 |
| L1 | 0.00 | 0.00 | 0.02 | 0.00 | 0.00 | 0.00 |
| L2 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
This project is licensed under the Apache 2.0 license - see LICENSE for details.
- RoboSuite, LIBERO, and VLABench teams for the framework
- OpenVLA, UniVLA, Openpi, and lerobot teams for pioneering VLA research
- All contributors and the robotics community
VLA-Arena: Advancing Vision-Language-Action Models Through Comprehensive Evaluation
Made with โค๏ธ by the VLA-Arena Team

































