You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<!-- TODO: Replace with your figure description -->
283
288
<h2class="subtitle has-text-centered">
284
-
Aliquam vitae elit ullamcorper tellus egestas pellentesque. Ut lacus tellus, maximus vel lectus at, placerat pretium mi. Maecenas dignissim tincidunt vestibulum. Sed consequat hendrerit nisl ut maximus.
289
+
(a) Paradigms for world-model-based VLA reinforcement learning. Existing methodologies typically rely on reconstructing the environment within 3D world or training video world models that simulate the environment. To address the imprecise action-following inherent in existing video-based simulators, we propose World-VLA-Loop, a closed-loop paradigm that jointly optimizes the world model and the VLA policy to iteratively enhance the performance and grounding of both. (b) We show that the real-world policy success rate is improved by 36.7% after two iterations of joint optimization with VLA model and world model.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin ullamcorper tellus sed ante aliquam tempus. Etiam porttitor urna feugiat nibh elementum, et tempor dolor mattis. Donec accumsan enim augue, a vulputate nisi sodales sit amet. Proin bibendum ex eget mauris cursus euismod nec et nibh. Maecenas ac gravida ante, nec cursus dui. Vivamus purus nibh, placerat ac purus eget, sagittis vestibulum metus. Sed vestibulum bibendum lectus gravida commodo. Pellentesque auctor leo vitae sagittis suscipit.
305
+
Recent progress in robotic world models has leveraged video diffusion transformers to predict future observations conditioned on historical states and actions. While these models can simulate realistic visual outcomes, they often exhibit poor action-following precision, hindering their utility for downstream robotic learning. In this work, we introduce World-VLA-Loop, a closed-loop framework for the joint refinement of world models and Vision-Language-Action (VLA) policies. We propose a state-aware video world model that functions as a high-fidelity interactive simulator by jointly predicting future observations and reward signals. To enhance reliability, we introduce the Sans dataset, which incorporates near-success trajectories to improve action-outcome alignment within the world model. This framework enables a closed-loop for reinforcement learning (RL) post-training of VLA policies entirely within a virtual environment. Crucially, our approach facilitates a co-evolving cycle: failure rollouts generated by the VLA policy are iteratively fed back to refine the world model’s precision, which in turn enhances subsequent RL optimization. Evaluations across simulation and real-world tasks demonstrate that our framework significantly boosts VLA performance with minimal physical interaction, establishing a mutually beneficial relationship between world modeling and policy learning for general-purpose robotics.
<li>Curate success and near-success dataset (SANS) mainly via manual teleoperation. Few demonstrations are needed.
326
+
</li>
327
+
<li>Fine-tune the action-conditioned world model on SANS dataset with joint reward and video supervision.</li>
328
+
<li>Execute VLA policy rollouts within the world model and perform RL (GRPO) optimization.</li>
329
+
<li>Deploy the refined policy in real-world. And in real-world deployment, new rollouts could be used to collect new failure and success data for further SANS dataset augmentation, which can be used to iteratively improve the world model and policy.</li>
330
+
</ol>
331
+
This cycle enables the joint optimization of the world model and the VLA policy, iteratively enhancing both performance.
0 commit comments