EVOLVE-VLA enables Vision-Language-Action (VLA) models to continuously adapt through autonomous environment interaction—learning by doing, not just watching.
Current VLA models rely on supervised fine-tuning (SFT) with extensive expert demonstrations, leading to:
- 💰 High labor costs: Hundreds of demonstrations per task
- 🔒 Rigid memorization: Policies that merely replay training trajectories
- ❌ Poor adaptation: Inability to recover from execution deviations
Our test-time training framework addresses these limitations by enabling VLAs to self-improve during deployment through online reinforcement learning, requiring minimal or even zero task-specific demonstrations.
- Continue learning during deployment through environment interaction
- Minimal supervision required (1-shot or zero-shot)
- Autonomous exploration and self-improvement via online RL
- Learned progress estimator replaces impractical oracle rewards
- Dense, continuous feedback for sample-efficient learning
- No access to ground-truth success signals needed at test time
-
Accumulative Progress Estimation: Smooths noisy point-wise estimates into stable signals through interval-based sampling and incremental aggregation
-
Progressive Horizon Extension: Gradual curriculum learning that extends exploration horizons, enabling the policy to master simpler sub-tasks before tackling complete long-horizon tasks
| Setting | Improvement | Details |
|---|---|---|
| Long-Horizon Tasks | +8.6% | LIBERO-Long benchmark |
| 1-Shot Learning | +22.0% | Minimal demonstration data |
| Cross-Task Transfer | 0% → 20.8% | Zero-shot generalization without task-specific SFT |
Through autonomous test-time training, EVOLVE-VLA develops skills entirely absent from demonstrations:
- ✅ Error Recovery: Re-attempting failed grasps and adjusting strategies
- ✅ Adaptation: Handling unexpected object state changes
- ✅ Novel Strategies: Discovering alternative manipulation approaches (e.g., grasping cup body instead of handle)
Check out our project page for video demonstrations showing:
- Error recovery through repeated grasp attempts
- Adaptation to unexpected state changes
- Novel manipulation strategies learned autonomously
We are committed to releasing the following upon publication:
- 🔓 Full training code for EVOLVE-VLA
- 🔓 Inference codebase for deploying trained models
- 🔓 Pre-trained models on HuggingFace
- 🔓 Evaluation scripts for LIBERO benchmark
- 📖 Detailed documentation and tutorials
Stay tuned! Star this repo to get notified when the code is released.
If you find our work useful, please cite:
@article{bai2025evolve,
title={EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models},
author={Bai, Zechen and Gao, Chen and Shou, Mike Zheng},
journal={arXiv preprint arXiv:2512.14666},
year={2025}
}We thank the following projects and teams for their valuable contributions to the community:
- OpenVLA for the open-source VLA model and codebase
- SimpleVLA-RL for pioneering work on RL fine-tuning for VLAs
- verl for the efficient RL training framework
- VLAC for the vision-language action critic model
- LIBERO team for providing the benchmark
⭐ Star this repo to stay updated on our code release!