Skip to content

Commit bdedcf7

Browse files
baihe-liuyichongtjouilletwiwaszko-intelsgolebiewski-intel
authored
Add Pi0.5 with RTC pipeline in Robotic AI Suites (open-edge-platform#2201)
Co-authored-by: Tang, Yichong <yichong.tang@intel.com> Co-authored-by: Jeremy Ouillette <jeremy.ouillette@intel.com> Co-authored-by: Wiktor Iwaszko <wiktorx.iwaszko@intel.com> Co-authored-by: Sebastian Golebiewski <sebastianx.golebiewski@intel.com>
1 parent fec5f47 commit bdedcf7

16 files changed

+5573
-0
lines changed

.gitmodules

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,3 +49,7 @@
4949
[submodule "health-and-life-sciences-ai-suite/multi_modal_patient_monitoring/services/mdpnp/mdpnp"]
5050
path = health-and-life-sciences-ai-suite/multi_modal_patient_monitoring/services/mdpnp/mdpnp
5151
url = https://github.com/mdpnp/mdpnp.git
52+
[submodule "robotics-ai-suite/pipelines/pi05-rtc-ov/lerobot"]
53+
path = robotics-ai-suite/pipelines/pi05-rtc-ov/lerobot
54+
url = https://github.com/huggingface/lerobot.git
55+
shallow = true

robotics-ai-suite/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ The types of collection are as follows:
2626
| [Imitation Learning - ACT](pipelines/act-sample) | [Imitation Learning - ACT](https://docs.openedgeplatform.intel.com/dev/edge-ai-suites/robotics-ai-suite/embodied/sample_pipelines/imitation_learning_act.html) | Imitation learning pipeline using Action Chunking with Transformers(ACT) algorithm to train and evaluate in simulated or real robot environments with Intel® optimization |
2727
| [Improved 3D Diffusion Policy (OpenVINO Toolkit)](pipelines/idp3-ov) | [Improved 3D Diffusion Policy (OpenVINO Toolkit)](https://docs.openedgeplatform.intel.com/dev/edge-ai-suites/robotics-ai-suite/embodied/developer_tools_tutorials/model_tutorials/model_idp3.html) | Improved 3D Diffusion Policy implementation optimized with OpenVINO toolkit |
2828
| [LLM Robotics Demo](pipelines/llm-robotics-demo) | [LLM Robotics Demo](https://docs.openedgeplatform.intel.com/dev/edge-ai-suites/robotics-ai-suite/embodied/sample_pipelines/llm_robotics.html) | Step-by-step guide for setting up a real-time system to control a JAKA robot arm with movement commands generated using an LLM |
29+
| [Pi0.5 with Real-Time Chunking (OpenVINO Toolkit)](pipelines/pi05-rtc-ov) | [Pi0.5 with Real-Time Chunking (OpenVINO Toolkit)](https://docs.openedgeplatform.intel.com/dev/edge-ai-suites/robotics-ai-suite/embodied/sample_pipelines/pi05_with_rtc.html) | Implementation of Pi0.5 VLA model with Real-Time Chunking (RTC) optimized with the OpenVINO toolkit |
2930
| [Robotics Diffusion Transformer (OpenVINO Toolkit)](pipelines/rdt-ov) | [Robotics Diffusion Transformer (OpenVINO Toolkit)](https://docs.openedgeplatform.intel.com/dev/edge-ai-suites/robotics-ai-suite/embodied/sample_pipelines/robotics_diffusion_transformer.html) | Robotics Diffusion Transformer implementation optimized with OpenVINO toolkit |
3031
| [VSLAM: ORB-SLAM3](pipelines/orb-slam3-sample) | [VSLAM: ORB-SLAM3](https://docs.openedgeplatform.intel.com/dev/edge-ai-suites/robotics-ai-suite/embodied/sample_pipelines/ORB_VSLAM.html) | One of the popular real-time feature-based SLAM libraries that can perform Visual, Visual-Inertial and Multi-Map SLAM with monocular, stereo and RGB-D cameras, using pin-hole and fish-eye lens models |
3132

63.7 KB
Loading
268 KB
Loading

robotics-ai-suite/docs/embodied/sample_pipelines.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,5 @@ These pipelines are designed to showcase some core features of the SDK, includin
1414
sample_pipelines/ORB_VSLAM
1515
sample_pipelines/llm_robotics
1616
sample_pipelines/robotics_diffusion_transformer
17+
sample_pipelines/pi05_with_rtc
1718

Lines changed: 336 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,336 @@
1+
.. _pi05_rtc:
2+
3+
Pi0.5 with Real-Time Chunking
4+
#############################
5+
6+
π₀.₅ (Pi0.5) is a Vision-Language-Action (VLA) model architecture designed by `Physical Intelligence <https://www.pi.website/>`__. It is built upon the PaliGemma VLM backbone, integrating a SigLIP vision encoder (So400m) with a Gemma language model base (e.g., 2.6B parameters) to process multimodal inputs.
7+
8+
Architecturally, π₀.₅ distinguishes itself through a specialized "Action Expert" head — a smaller parameter model (e.g., Gemma 300M) — that generates continuous actions using Flow Matching. Unlike traditional policy heads, this design solves an Ordinary Differential Equation (ODE) from noise to actions, enabling high-precision control.
9+
10+
Key structural features of π₀.₅ include:
11+
12+
- **AdaRMSNorm Conditioning**: The flow timestep :math:`t` is injected directly into the normalization layers of the Action Expert via Adaptive RMS Normalization, providing more effective conditioning than standard concatenation.
13+
- **Discretized State Tokenization**: Robot proprioceptive state is discretized and treated as text tokens within the input prefix, allowing the model to "read" its physical state using the same attention mechanisms as natural language.
14+
- **Unified Prefix Processing**: Visual patch tokens from SigLIP and text tokens are concatenated into a single sequence, which the transformer processes holistically before passing context to the Action Expert.
15+
16+
.. image:: assets/images/pi05-overview.png
17+
:width: 85%
18+
:align: center
19+
20+
*(Figure source:* `Pi0.5 Paper <https://arxiv.org/abs/2504.16054>`__ *π₀.₅: a Vision-Language-Action Model with Open-World Generalization)*
21+
22+
Real-Time Chunking (RTC) is an inference strategy designed to enable high-frequency robotic control with high-latency flow-matching policies (e.g., Pi0, Pi0.5). Based on the application of asynchronous inference execution, RTC employs a unique **Prefix Guidance** mechanism during inference. Instead of blending overlapping chunks after generation (temporal ensembling), RTC uses the unexecuted portion of the previous chunk as a constraint during the flow-matching process. By treating the transition as an inpainting problem, the model is guided to generate new trajectories that seamlessly extend the current motion, ensuring continuous control.
23+
24+
The synergy between Pi0.5 and RTC enables sophisticated generalist control on standard hardware by addressing two critical problems of standard VLA models: **Action Waiting** and **Action Jumping**.
25+
26+
1. **Eliminating Action Waiting**: RTC runs inference asynchronously in the background while the robot executes buffered actions. This ensures the robot never pauses to "think," maintaining high-frequency control (e.g., 50Hz) despite the model's lower inference speed.
27+
2. **Preventing Action Jumping**: Through **Prefix Guidance**, RTC treats trajectory generation as an inpainting task. It constrains the start of the new plan to align perfectly with the unexecuted tail of the previous plan, enforcing continuity at the generation level rather than relying on post-hoc smoothing.
28+
29+
.. image:: assets/images/RTC-overview.png
30+
:width: 85%
31+
:align: center
32+
33+
*(Figure source:* `RTC Paper <https://arxiv.org/abs/2506.07339>`__ *Real-Time Execution of Action Chunking Flow Policies)*
34+
35+
This project demonstrates an implementation of Pi0.5 + RTC using the OpenVINO toolkit, specifically accelerating inference on Intel platforms. It provides a comprehensive end-to-end pipeline, covering both MuJoCo simulation for policy validation and a modular workflow for deployment on real ALOHA robots.
36+
37+
Installation
38+
============
39+
40+
This project extends the open-source project `LeRobot <https://github.com/huggingface/lerobot>`__ to provide OpenVINO acceleration and Real-Time Chunking (RTC) features on Intel compute platforms. To set up the environment, you need to initialize and patch the submodule:
41+
42+
.. code-block:: bash
43+
44+
git submodule update --init lerobot
45+
cd lerobot
46+
git am ../patches/*.patch
47+
48+
49+
Setup Python Environment
50+
::::::::::::::::::::::::
51+
52+
Install the packages as prerequisites:
53+
54+
.. code-block:: bash
55+
56+
sudo apt install -y ffmpeg libavcodec-dev libavformat-dev libavutil-dev libavdevice-dev
57+
58+
If you would like to use ``uv``, you can set up the environment and install dependencies by running:
59+
60+
.. code-block:: bash
61+
62+
uv sync --extra pi-ov
63+
64+
.. note::
65+
66+
**Usage:** You can run a Python file by using: ``uv run --extra pi-ov <your_python_file>``.
67+
68+
Alternatively, you can create a Python environment:
69+
70+
.. code-block:: bash
71+
72+
python3 -m venv pi_env
73+
source pi_env/bin/activate
74+
pip install -e .[pi-ov] --extra-index https://download.pytorch.org/whl/cpu
75+
76+
Model Preparation
77+
=================
78+
79+
Running model inference with the OpenVINO toolkit requires converting the model to the OpenVINO IR format.
80+
You can use the `checkpoint <https://eci.intel.com/embodied-sdk-docs/_downloads/checkpoint.tar.gz>`__ finetuned on a simulation task for convenience.
81+
Alternatively, you can convert your own checkpoints trained using the LeRobot framework.
82+
83+
.. code-block:: bash
84+
85+
cd examples/pi05_with_openvino
86+
87+
Convert Pi0.5 model without RTC
88+
:::::::::::::::::::::::::::::::
89+
90+
To convert the standard Pi05 model to OpenVINO IR (without RTC support), use the ``convert_ov.py`` script.
91+
92+
**Arguments:**
93+
94+
* ``--torch_dir``: Path to the pretrained PyTorch model checkpoint or the Hugging Face repo. Default: "lerobot/pi05_base"
95+
* ``--ov_output_dir``: Directory where an OpenVINO IR model will be saved.
96+
* ``--dataset_path``: (Optional) Path to a local LeRobotDataset directory. If provided, the converter uses dataset stats and the first sample to build real preprocessed inputs (instead of random dummy inputs).
97+
* ``--compress_int8``: (Optional) Compress weights to INT8. ``nncf`` is required.
98+
* ``--save_fp32``: (Optional) Save an OpenVINO model in FP32 format (FP16 by default).
99+
* ``--override``: (Optional) Overwrite existing files.
100+
* ``--camera_num``, ``-c``: (Optional) Number of cameras (batch size for image input). Default: 4.
101+
102+
.. attention::
103+
104+
Using the Pi0.5 model in LeRobot will automatically download the `google/paligemma-3b-pt-224 <https://huggingface.co/google/paligemma-3b-pt-224>`__ from Hugging Face. Due to author restrictions, downloading the model requires logging into your Hugging Face account.
105+
If you encounter download errors, follow the `instructions <https://huggingface.co/docs/huggingface_hub/quick-start#authentication>`__ on how to log in and authorize your account.
106+
107+
Examples (``uv``):
108+
109+
.. code-block:: bash
110+
111+
uv run --extra pi-ov scripts/convert_ov.py \
112+
--torch_dir <path_to_pytorch_checkpoint> \
113+
--ov_output_dir pi05_lerobot_ov_ir \
114+
--override
115+
116+
For using ``--compress_int8``, ``nncf`` is required.
117+
118+
.. code-block:: bash
119+
120+
uv run --extra pi-ov --with nncf scripts/convert_ov.py \
121+
--torch_dir <path_to_pytorch_checkpoint> \
122+
--ov_output_dir pi05_lerobot_ov_ir \
123+
--compress_int8 \
124+
--override
125+
126+
Use a sample from a local LeRobotDataset to generate representative inputs during the export/conversion process:
127+
128+
.. code-block:: bash
129+
130+
uv run --extra pi-ov --with nncf scripts/convert_ov.py \
131+
--torch_dir <path_to_pytorch_checkpoint> \
132+
--dataset_path <path_to_local_dataset> \
133+
--ov_output_dir pi05_lerobot_ov_ir \
134+
--compress_int8 \
135+
--override
136+
137+
Convert Pi0.5 model with RTC
138+
::::::::::::::::::::::::::::
139+
140+
To convert the Pi05 model to OpenVINO IR with RTC support, use the ``convert_ov_rtc.py`` script. The arguments are the same as above.
141+
142+
Examples (``uv``):
143+
144+
.. code-block:: bash
145+
146+
uv run --extra pi-ov --with nncf scripts/convert_ov_rtc.py \
147+
--torch_dir <path_to_pytorch_checkpoint> \
148+
--dataset_path <path_to_local_dataset> \
149+
--ov_output_dir pi05_rtc_lerobot_ov_ir \
150+
--compress_int8 \
151+
--override
152+
153+
Exported OpenVINO models with RTC require two extra inputs: ``prev_chunk_left_over`` and ``prefix_weights`` during inference.
154+
155+
.. note::
156+
157+
When it is unnecessary to enable the RTC function (e.g., the first inference step that doesn't have a previous chunk to follow), you can disable RTC by passing zero-tensors to these extra inputs.
158+
159+
Run Pipeline
160+
============
161+
162+
Environment Configuration
163+
:::::::::::::::::::::::::
164+
165+
Bind the ``xe`` driver to the iGPU, as it provides better performance than ``i915`` in this scenario.
166+
167+
- Check the kernel driver in use for the iGPU:
168+
169+
.. code-block:: bash
170+
171+
lspci -s 00:02.0 -vvv
172+
173+
- If it does not show "Kernel driver in use: xe", run the following script to bind the ``xe`` driver:
174+
175+
.. code-block:: bash
176+
177+
#!/bin/bash
178+
set -e
179+
180+
sudo systemctl stop gdm3
181+
# unbind i915 if "Kernel driver in use: i915" from igpu
182+
echo 0000:00:02.0 > /sys/bus/pci/drivers/i915/unbind
183+
# unbind i915 from other devices if have
184+
# unbind xe from all other devices if have
185+
# e.g. echo 0000:03:00.0 > /sys/bus/pci/drivers/xe/unbind for dgpu
186+
187+
# remove i915 module and probe xe module
188+
rmmod i915
189+
modprobe -r xe
190+
echo 0 > /sys/bus/pci/drivers_autoprobe
191+
192+
# xe driver uses the same firmware with i915
193+
modprobe xe force_probe=7d51 enable_rc6=0 guc_firmware_path=i915/experimental/mtl_guc_70.bin dmc_firmware_path=i915/experimental/mtl_dmc.bin gsc_firmware_path=i915/experimental/mtl_gsc_1.bin
194+
# bind igpu to xe
195+
echo 0000:00:02.0 > /sys/bus/pci/drivers/xe/bind
196+
# bind other devices to xe if need
197+
198+
Inference Benchmarking
199+
::::::::::::::::::::::
200+
201+
Run the ``benchmark_pi05_ov_rtc.py`` script to benchmark the policy inference pipeline, which includes preprocessing, model inference, and postprocessing. You can find usage examples of the ``PI05Policy`` with OpenVINO support in the script.
202+
203+
.. code-block:: bash
204+
205+
uv run --extra pi-ov scripts/benchmark_pi05_ov_rtc.py \
206+
--model_dir <ov_model_dir> \
207+
--device GPU \
208+
--chunk_size 75 \
209+
-n 10
210+
211+
**Arguments:**
212+
213+
* ``--model_dir``: Directory containing an OpenVINO model (.xml and .bin files).
214+
* ``--device``: Target device for inference (e.g., "CPU", "GPU"). Default: "CPU".
215+
* ``-n``, ``--num_runs``: Number of inference runs to average for benchmarking. Default: 10.
216+
* ``--camera_num``, ``-c``: Number of cameras used in the model. **This should be the same as the setting used during model conversion**. Default: 4.
217+
* ``--chunk_size``: Override the model's chunk_size (and n_action_steps). **This should be the same as the setting of the pretrained checkpoint**. Default: 50.
218+
* ``--run_torch``: (Optional) Run the original PyTorch model for output comparison.
219+
* ``--torch_dir``: (Optional) Path to the PyTorch model directory for comparison if ``--run_torch`` is set. Default: "lerobot/pi05_base".
220+
* ``--disable_rtc``: (Optional) Disable the RTC functionality when loading a model with RTC. It is invalid when loading a model without the RTC support.
221+
222+
Evaluation Script Overview
223+
::::::::::::::::::::::::::
224+
225+
``eval_aloha.py`` provides an evaluation script for the ALOHA pipeline that can run:
226+
227+
* **MuJoCo simulation** (``--robot_type mujoco_aloha``) for tasks like ``sim_transfer_cube``.
228+
* **Real ALOHA robot** (``--robot_type real_aloha``) for tasks like ``transfer_cube``.
229+
230+
Arguments:
231+
""""""""""
232+
233+
* ``--pretrained_model_path``: Path to the pretrained model checkpoint or the Hugging Face repo ID. Default: ``lerobot/pi05_base``.
234+
* ``--dataset_path``: (Optional) Local dataset directory used to load metadata (e.g. task language) and, if needed, dataset statistics.
235+
* ``--stats_path``: (Optional) Path to ``stats.json`` used for normalization. If omitted, the script attempts to load ``<pretrained_model_path>/stats.json``, falling back to ``--dataset_path`` if available.
236+
* ``--robot_type``: ``mujoco_aloha`` or ``real_aloha``.
237+
* ``--task``: Task name, e.g. ``sim_transfer_cube``, ``sim_insertion``, ``transfer_cube``.
238+
* ``--num_episodes``: Number of trajectories/episodes to run. Default: ``1``.
239+
* ``--max_steps``: Max steps per episode. Default: ``400``.
240+
* ``--fps``: Control Frequency (Hz). Default: ``50.0``.
241+
242+
**OpenVINO:**
243+
244+
* ``--use_ov``: Use an OpenVINO model for inference.
245+
* ``--ov_model_path``: Path to the OpenVINO IR model directory (containing ``model.xml`` and ``model.bin``). Default: ``pi05_lerobot_ov_ir_INT8``.
246+
* ``--ov_device``: String with an OpenVINO device name (e.g. ``CPU``, ``GPU``, ``GPU.0``). Default: ``GPU.0``.
247+
248+
.. note::
249+
250+
OpenVINO inference still requires ``--pretrained_model_path``. It is used to construct the model inputs (preprocessing/tokenization), and determine model/config dimensions (e.g. action space) alongside the OpenVINO model.
251+
252+
Since dataset statistics are required for normalization, you need to provide them via ``--stats_path`` (recommended) or ``--dataset_path``. If neither is provided, the script will try to load ``stats.json`` from ``--pretrained_model_path``.
253+
254+
**RTC (Real-Time Chunking):**
255+
256+
* ``--rtc_enabled``: Enable the RTC algorithm.
257+
* ``--rtc_horizon``: Execution horizon for RTC. Default: ``45``.
258+
259+
**Visualization/Logging:**
260+
261+
* ``--plot``: Enable visualization (typically used with MuJoCo).
262+
263+
.. note::
264+
If visualization windows do not appear when using ``--plot``, install python3-tk to enable the Matplotlib interactive backend: ``sudo apt install python3-tk``.
265+
266+
* ``--save_traj``: Save trajectory data and plots.
267+
* ``--save_traj_path``: Output directory for saved trajectories/plots. Default: ``trajectory_plots``.
268+
269+
Simulation Pipeline
270+
:::::::::::::::::::
271+
272+
.. note::
273+
274+
If you encounter MESA warnings, try ``sudo apt install mesa-utils libgl1-mesa-dri libglx-mesa0``.
275+
276+
Run ``sim_transfer_cube`` in MuJoCo, using an OpenVINO model:
277+
278+
.. code-block:: bash
279+
280+
MUJOCO_GL=egl uv run --extra pi-ov examples/aloha/eval_aloha.py \
281+
--robot_type mujoco_aloha \
282+
--task sim_transfer_cube \
283+
--pretrained_model_path <path_to_pretrained_model> \
284+
--use_ov \
285+
--ov_model_path <path_to_ov_model>
286+
287+
Run ``sim_transfer_cube`` in MuJoCo, using an OpenVINO model with RTC:
288+
289+
.. code-block:: bash
290+
291+
MUJOCO_GL=egl uv run --extra pi-ov examples/aloha/eval_aloha.py \
292+
--robot_type mujoco_aloha \
293+
--task sim_transfer_cube \
294+
--pretrained_model_path <path_to_pretrained_model> \
295+
--use_ov \
296+
--ov_model_path <path_to_ov_model> \
297+
--rtc_enabled \
298+
--rtc_horizon 45
299+
300+
Real-robot Pipeline
301+
:::::::::::::::::::
302+
303+
The real-robot pipeline focuses on running inference on the physical ALOHA hardware.
304+
305+
Run ``transfer_cube`` on a real ALOHA robot, using an OpenVINO model:
306+
307+
.. code-block:: bash
308+
309+
uv run --extra pi-ov examples/aloha/eval_aloha.py \
310+
--robot_type real_aloha \
311+
--task transfer_cube \
312+
--max_steps 600 \
313+
--pretrained_model_path <path_to_pretrained_model> \
314+
--use_ov \
315+
--ov_model_path <path_to_ov_model>
316+
317+
Run ``transfer_cube`` on a real ALOHA robot, using an OpenVINO model with RTC:
318+
319+
.. code-block:: bash
320+
321+
uv run --extra pi-ov examples/aloha/eval_aloha.py \
322+
--robot_type real_aloha \
323+
--task transfer_cube \
324+
--max_steps 600 \
325+
--pretrained_model_path <path_to_pretrained_model> \
326+
--use_ov \
327+
--ov_model_path <path_to_ov_model> \
328+
--rtc_enabled \
329+
--rtc_horizon 45
330+
331+
**Tip**: When running OpenVINO inference on Intel platforms, pinning the process to P-cores can help achieve more stable inference performance. For example, prefix your command with ``taskset``:
332+
333+
.. code-block:: bash
334+
335+
taskset -c 0-5 uv run --extra pi-ov examples/aloha/eval_aloha.py ...
336+
63.7 KB
Loading
268 KB
Loading

0 commit comments

Comments
 (0)