Skip to content

Commit 2a05d7e

Browse files
authored
Add Robotics AI Suite. (open-edge-platform#167)
1 parent db84f24 commit 2a05d7e

32 files changed

+3274
-2
lines changed

.github/CODEOWNERS

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,5 @@ SECURITY.md @xwu2intel
99
/metro-ai-suite/loitering-detection/ @vagheshp @tjanczak @xwu2intel
1010
/manufacturing-ai-suite/pallet-defect-detection/ @ajagadi1 @sugnanprabhu @rrajore
1111
/manufacturing-ai-suite/weld-porosity/ @ajagadi1 @sugnanprabhu @rrajore
12-
/manufacturing-ai-suite/wind-turbine-anomaly-detection/ @vkb1 @sathyendranv @pooja-intel
12+
/manufacturing-ai-suite/wind-turbine-anomaly-detection/ @vkb1 @sathyendranv @pooja-intel
13+
/robotics-ai-suite/ @jouillet @jb-balaji @pirouf @mohitmeh12

README.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,9 @@ These suites simplify the creation of custom AI solutions for specific industrie
1919

2020
[The Media & Entertainment AI Suite](media-and-entertainment-ai-suite) provides libraries and sample applications to accelerate solution development for high-performance, high-quality, live video production helping improve viewer experience.
2121

22+
[The Robotics AI Suite](robotics-ai-suite) provides ready to use samples leveraging AI to help solve common robotics problems such as perception, navigation, simulation, and planning.
23+
24+
2225
The Edge AI Suites project hosts a collection of sample applications organized as follows:
2326

2427
| Suite | Sample Application | Get Started | Developers Docs |
@@ -33,6 +36,8 @@ The Edge AI Suites project hosts a collection of sample applications organized a
3336
| Manufacturing AI Suite | [Wind Turbine Anomaly Detection](manufacturing-ai-suite/wind-turbine-anomaly-detection/) | [Link](manufacturing-ai-suite/wind-turbine-anomaly-detection/docs/user-guide/get-started.md) | [Docker deployment](manufacturing-ai-suite/wind-turbine-anomaly-detection/docs/user-guide/get-started.md#deploy-with-docker-compose-single-node) and [Helm deployment](manufacturing-ai-suite/wind-turbine-anomaly-detection/docs/user-guide/how-to-deploy-with-helm.md) |
3437
| Retail AI Suite | [Automated Self Checkout](https://github.com/intel-retail/automated-self-checkout) | [Link](https://github.com/intel-retail/automated-self-checkout?tab=readme-ov-file#-quickstart) | [Advanced Guide](https://intel-retail.github.io/documentation/use-cases/automated-self-checkout/automated-self-checkout.html) |
3538
| Retail AI Suite | [Loss Prevention](https://github.com/intel-retail/loss-prevention) | [Link](https://github.com/intel-retail/loss-prevention?tab=readme-ov-file#quickstart) | [Advanced Guide](https://intel-retail.github.io/documentation/use-cases/loss-prevention/loss-prevention.html) |
39+
| Robotics AI Suite | [ACT Sample](robotics-ai-suite/act-sample) | [Link](robotics-ai-suite/act-sample) | [Tutorial](robotics-ai-suite/act-sample/README.md) |
40+
| Robotics AI Suite | [ORB-SLAM3 Sample](robotics-ai-suite/orb-slam3-sample) | [Link](robotics-ai-suite/orb-slam3-sample) | [Tutorial](robotics-ai-suite/orb-slam3-sample/README.md) |
3641

3742
Please visit each sample application sub-directory for respective **Getting Started**, **Customization** instructions.
3843

@@ -54,4 +59,4 @@ The **Edge AI Suites** project is licensed under the [APACHE 2.0](LICENSE), exce
5459
|[Holographic Sensor Fusion](metro-ai-suite/holographic-sensor-fusion) | [LIMITED EDGE SOFTWARE DISTRIBUTION LICENSE AGREEMENT](metro-ai-suite/holographic-sensor-fusion/LICENSE.txt) |
5560

5661
Last Updated Date: May 30, 2025.
57-
62+
847 KB
Loading

robotics-ai-suite/README.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Robotics AI Suite
2+
3+
The [Robotics AI Suite](https://eci.intel.com/embodied-sdk-docs/content/Intel_embodied_Intelligence_SDK.html) is a collection of intuitive, easy-to-use software stack designed to streamline the development process of Embodied Intelligence product and applications on Intel platform. The suite provides developers with a comprehensive environment for developing, testing, and optimizing Embodied Intelligence software and algorithms efficiently. It also provides necessary software framework, libraries, tools, Best known configuration(BKC), tutorials and example codes to facilitate AI solution development.
4+
5+
The Robotics AI Suite includes the below features:
6+
7+
- Comprehensive software platform from BSP, acceleration libraries, SDK to reference demos, with documentation and developer tutorials
8+
- Real-time BKC, Linux RT kernel and optimized EtherCAT
9+
- Traditional vision and motion planning acceleration on CPU, Reinforcement/Imitation Learning-based manipulation, AI-based vision & LLM/VLM acceleration on iGPU & NPU
10+
- Typical workflows and examples including ACT/DP-based manipulation, LLM task planning, Pick & Place, ORB-SLAM3, etc
11+
12+
![architecture](README.assets/sdk_architecture.png)
13+
14+
## Description
15+
16+
This software architecture is designed to power Embodied Intelligence systems by integrating computer vision, AI-driven manipulation, locomotion, SLAM, and large models into a unified framework. Built on ROS2 middleware, it takes advantage of Intel’s CPU, iGPU, dGPU, and NPU to optimize performance for robotics and AI applications. The stack includes high-performance AI frameworks, real-time libraries, and system-level optimizations, making it a comprehensive solution for Embodied Intelligence products.
17+
18+
At the highest level, the architecture is structured around key reference pipelines and demos that demonstrate its core capabilities. These include Vision Servo, which enhances robotic perception using AI-powered vision modules, and ACT-based Manipulation, which applies reinforcement learning and imitation learning to improve robotic grasping and movement. Optimized Locomotion leverages traditional control algorithms like MPC (Model Predictive Control) and LQR (Linear Quadratic Regulator), alongside reinforcement learning models for adaptive motion. Additionally, the ORB-SLAM3 pipeline focuses on real-time simultaneous localization and mapping, while LLM Task Planning integrates large language models for intelligent task execution.
19+
20+
Beneath these pipelines, the software stack includes specialized AI and robotics modules. The vision module supports CNN-based models, OpenCV, and PCL operators for optimized perception, enabling robots to interpret their surroundings efficiently. The manipulation module combines traditional motion planning with AI-driven control, allowing robots to execute complex movements. For locomotion, the system blends classic control techniques with reinforcement learning models, ensuring smooth and adaptive movement. Meanwhile, SLAM components such as GPU ORB extraction and ADBSCAN optimization enhance mapping accuracy, and BEV (Bird’s Eye View) models contribute to improved spatial awareness. The large model module supports LLMs, Vision-Language Models (VLM), and Vision-Language-Action Models (VLA), enabling advanced reasoning and decision-making capabilities.
21+
22+
At the core of the system is ROS2 middleware and acceleration frameworks, which provide a standardized framework for robotics development. The architecture is further enhanced by Intel’s AI acceleration libraries, including Intel® OpenVINO™ for deep learning inference, Intel® LLM Library for PyTorch* (IPEX-LLM) for optimized large model execution, and compatibility with TensorFlow*, PyTorch*, and ONNX*. The Intel® oneAPI compiler and libraries offer high-performance computing capabilities, leveraging oneMKL for mathematical operations, oneDNN for deep learning, and oneTBB for parallel processing. Additionally, Intel’s real-time libraries ensure low-latency execution, with tools for performance tuning and EtherCAT-based industrial communication.
23+
24+
To ensure seamless integration with robotic hardware, the suite runs on a real-time optimized Linux BSP. It includes support for optimized EtherCAT and camera drivers, along with Intel-specific features such as Speed Shift Technology and Cache Allocation to enhance power efficiency and performance. These system-level enhancements allow the software stack to deliver high responsiveness, making it suitable for real-time robotics applications.
25+
26+
Overall, the Robotics AI Suite provides a highly optimized, AI-driven framework for robotics and Embodied Intelligence, combining computer vision, motion planning, real-time processing, and large-scale AI models into a cohesive system. By leveraging Intel’s hardware acceleration and software ecosystem, it enables next-generation robotic applications with enhanced intelligence, efficiency, and adaptability.
27+
28+
## Collection
29+
30+
**AI Suite Pipelines:**
31+
32+
| Pipeline Name | Description |
33+
| ------------- | ----------- |
34+
| [Imitation Learning - ACT](orb-slam3-sample) | Imitation learning pipeline using Action Chunking with Transformers(ACT) algorithm to train and evaluate in simulator or real robot environment with Intel optimization |
35+
| [VSLAM: ORB-SLAM3](act-sample) | One of popular real-time feature-based SLAM libraries able to perform Visual, Visual-Inertial and Multi-Map SLAM with monocular, stereo and RGB-D cameras, using pin-hole and fisheye lens models |
36+
37+
**Intel® OpenVINO™ optimized model algorithms:**
38+
39+
| Algorithm | Description |
40+
| --------- | ----------- |
41+
| [YOLOv8](https://eci.intel.com/embodied-sdk-docs/content/developer_tools_tutorials/model_tutorials.html#model-tutorials) | CNN based object detection |
42+
| [YOLOv12](https://eci.intel.com/embodied-sdk-docs/content/developer_tools_tutorials/model_tutorials.html#model-tutorials) | CNN based object detection |
43+
| [MobileNetV2](https://eci.intel.com/embodied-sdk-docs/content/developer_tools_tutorials/model_tutorials.html#model-tutorials) | CNN based object detection |
44+
| [SAM](https://eci.intel.com/embodied-sdk-docs/content/developer_tools_tutorials/model_tutorials.html#model-tutorials) | Transformer based segmentation |
45+
| [SAM2](https://eci.intel.com/embodied-sdk-docs/content/developer_tools_tutorials/model_tutorials.html#model-tutorials) | Extend SAM to video segmentation and object tracking with cross attention to memory |
46+
| [FastSAM](https://eci.intel.com/embodied-sdk-docs/content/developer_tools_tutorials/model_tutorials.html#model-tutorials) | Lightweight substitute to SAM |
47+
| [MobileSAM](https://eci.intel.com/embodied-sdk-docs/content/developer_tools_tutorials/model_tutorials.html#model-tutorials) | Lightweight substitute to SAM (Same model architecture with SAM. Can refer to OpenVINO SAM tutorials for model export and application) |
48+
| [U-NET](https://eci.intel.com/embodied-sdk-docs/content/developer_tools_tutorials/model_tutorials.html#model-tutorials) | CNN based segmentation and diffusion model |
49+
| [DETR](https://eci.intel.com/embodied-sdk-docs/content/developer_tools_tutorials/model_tutorials.html#model-tutorials) | Transformer based object detection |
50+
| [DETR GroundingDino](https://eci.intel.com/embodied-sdk-docs/content/developer_tools_tutorials/model_tutorials.html#model-tutorials) | Transformer based object detection |
51+
| [CLIP](https://eci.intel.com/embodied-sdk-docs/content/developer_tools_tutorials/model_tutorials.html#model-tutorials) | Transformer based image classification |
52+
| [Action Chunking with Transformers - ACT](https://eci.intel.com/embodied-sdk-docs/content/developer_tools_tutorials/model_tutorials/model_act.html#model-act) | An end-to-end imitation learning model designed for fine manipulation tasks in robotics |
53+
| [Feature Extraction Model: SuperPoint](https://eci.intel.com/embodied-sdk-docs/content/developer_tools_tutorials/model_tutorials/model_superpoint.html#model-superpoint) | A self-supervised framework for interest point detection and description in images, suitable for a large number of multiple-view geometry problems in computer vision |
54+
| [Feature Tracking Model: LightGlue](https://eci.intel.com/embodied-sdk-docs/content/developer_tools_tutorials/model_tutorials/model_lightglue.html#model-lightglue) | A model designed for efficient and accurate feature matching in computer vision tasks |
55+
| [Bird’s Eye View Perception: Fast-BEV](https://eci.intel.com/embodied-sdk-docs/content/developer_tools_tutorials/model_tutorials/model_fastbev.html#model-fastbev) | Obtaining a BEV perception is to gain a comprehensive understanding of the spatial layout and relationships between objects in a scene |
56+
| [Monocular Depth Estimation: Depth Anything V2](https://eci.intel.com/embodied-sdk-docs/content/developer_tools_tutorials/model_tutorials/model_depthanythingv2.html#model-depthanythingv2) | A powerful tool that leverages deep learning to infer 3D information from 2D images |
57+
138 KB
Loading
1.25 MB
Loading
Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
# Imitation Learning - ACT
2+
3+
Imitation learning is a machine learning approach where a model is trained to mimic expert behavior by observing and replicating demonstrations, enabling it to perform tasks similarly to the expert. ACT is an action chunking policy with Transformers, an architecture designed for sequence modeling, and train it as a conditional VAE (CVAE) to capture the variability in human data. It significantly outperforms previous imitation learning algorithms on a range of simulated and real-world fine manipulation tasks.
4+
5+
We have built an imitation learning pipeline for ACT, which can be used to train and evaluate the ACT model on different tasks both in simulation and real robot environment. In this sample pipeline, we provided source code optimized by Intel® Extension for PyTorch and Intel® OpenVINO™ to accelerate the process.
6+
7+
In this tutorial, we will introduce how to setup ACT pipeline.
8+
9+
## Prerequisites
10+
11+
Please make sure you have finished setup steps in [Installation & Setup](https://eci.intel.com/embodied-sdk-docs/content/installation_setup.html) and followed refer to [oneAPI doc](https://eci.intel.com/embodied-sdk-docs/content/developer_tools_tutorials/oneapi.html#oneapi-install-label) to setup Intel® oneAPI packages.
12+
13+
## Installation
14+
15+
### ALOHA real robot environment setup (Optional)
16+
17+
Follow the [stationary ALOHA guide](https://docs.trossenrobotics.com/aloha_docs/2.0/getting_started/stationary.html) to build real robot platform.
18+
19+
### Virtual environment setup
20+
21+
1. Create a Python 3.10 virtual environment with the following command:
22+
23+
```
24+
$ sudo apt install python3-venv
25+
$ python3 -m venv act
26+
```
27+
28+
2. Activate the virtual environment with the following command:
29+
30+
```
31+
$ source act/bin/activate
32+
```
33+
34+
### Install Intel® Extension for PyTorch
35+
36+
> [!IMPORTANT]
37+
> Intel® Extension for PyTorch workloads are incompatible with the NPU driver. For more details, please refer to the [Troubleshooting page](https://eci.intel.com/embodied-sdk-docs/content/troubleshooting.html).
38+
39+
Install the Intel® Extension for PyTorch with the following command:
40+
41+
```
42+
$ pip install torch==2.3.1+cxx11.abi torchvision==0.18.1+cxx11.abi torchaudio==2.3.1+cxx11.abi intel-extension-for-pytorch==2.3.110+xpu oneccl_bind_pt==2.3.100+xpu ipex-llm==2.2.0b20241224 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
43+
```
44+
45+
### Install Intel® OpenVINO™
46+
47+
Install the Intel® OpenVINO™ with the following command:
48+
49+
```
50+
$ pip install openvino==2024.6.0
51+
```
52+
53+
### Dependencies setup
54+
55+
Install the dependencies with the following command:
56+
57+
```
58+
$ pip install pyquaternion==0.9.9 pyyaml==6.0 rospkg==1.5.0 pexpect==4.8.0 mujoco==3.2.6 dm_control==1.0.26 matplotlib==3.10.0 einops==0.6.0 packaging==23.0 h5py==3.12.1 ipython==8.12.0 opencv-python==4.10.0.84 transformers==4.37.0 accelerate==0.23.0 bigdl-core-xe-21==2.6.0b2 bigdl-core-xe-addons-21==2.6.0b2 bigdl-core-xe-batch-21==2.6.0b2 huggingface-hub==0.24.7
59+
```
60+
61+
### Install ACT package
62+
63+
The Embodied Intelligence SDK provides optimized source code for Intel® Extension for PyTorch and Intel® OpenVINO™. To get the source code with the following command:
64+
65+
For Intel® Extension for PyTorch:
66+
67+
```
68+
$ sudo apt install act-ipex
69+
$ sudo chown -R $USER /opt/act-ipex/
70+
```
71+
72+
For Intel® OpenVINO™:
73+
74+
```
75+
$ sudo apt install act-ov
76+
$ sudo chown -R $USER /opt/act-ov/
77+
```
78+
79+
### Install DETR
80+
81+
Install the DETR with the following command:
82+
83+
```
84+
$ cd <path_to_act>/detr/
85+
$ pip install -e .
86+
```
87+
88+
## Run pipeline
89+
90+
### Inference
91+
92+
1. You can download our pre-trained weights from this link: [Download Link](https://eci.intel.com/embodied-sdk-docs/_downloads/sim_insertion_scripted.zip). The command of training is the same as above, but you need to set the argument `--ckpt_dir` to the path of the pre-trained weights.
93+
94+
2. Convert the model checkpoint to OpenVINO IR **(Optional)**
95+
96+
`ov_convert.py` is a script provided to convert the PyTorch model to OpenVINO IR. You can find the script in the `act-ov` directory, and see the usage with the following command:
97+
98+
```
99+
$ cd /opt/act-ov/
100+
$ python3 ov_convert.py -h
101+
```
102+
103+
For example, you can convert the model with the following command:
104+
105+
```
106+
$ python3 ov_convert.py --ckpt_path <your_ckpt_path> --height 480 --weight 640 --camera_num 4 --chunk_size 100
107+
```
108+
109+
> [!IMPORTANT]
110+
> Please make sure the arguments `--chunk_size`, `--kl_weight`, `--hidden_dim`, `--dim_feedforward`, `--camera_num` are the same as the training arguments.
111+
112+
3. The pipeline supports configurations with up to four cameras. You can modify the `constants.py` file in the source directory to define the number of cameras. Below are examples of configurations for four cameras and one camera:
113+
114+
```
115+
# In /opt/act-ov/constants.py
116+
SIM_TASK_CONFIGS = {
117+
'sim_insertion_scripted': {
118+
'dataset_dir': DATA_DIR + '/sim_insertion_scripted',
119+
'num_episodes': 50,
120+
'episode_len': 400,
121+
'camera_names': ['top', 'angle', 'left_wrist', 'right_wrist']
122+
},
123+
}
124+
125+
# In /opt/act-ipex/constants.py
126+
SIM_TASK_CONFIGS = {
127+
'sim_insertion_scripted': {
128+
'dataset_dir': DATA_DIR + '/sim_insertion_scripted',
129+
'num_episodes': 50,
130+
'episode_len': 400,
131+
'camera_names': ['top']
132+
},
133+
}
134+
```
135+
136+
Below is a camera viewer showcasing four different camera perspectives, the left one is the `angle` camera, and the right one is the `top` camera. The middle two are the `left and right wrist` cameras, respectively.
137+
138+
![act-sim-cameras](README.assets/act-sim-cameras.png)
139+
140+
4. Evaluate the policy with the following command:
141+
142+
```
143+
$ python3 imitate_episodes.py --task_name sim_insertion_scripted --ckpt_dir <ckpt dir> --policy_class ACT --kl_weight 10 --chunk_size 100 --hidden_dim 512 --batch_size 8 --dim_feedforward 3200 --num_epochs 2000 --lr 1e-5 --seed 0 --device GPU --eval
144+
```
145+
146+
> [!NOTE]
147+
> `--eval` is used to evaluate the policy.
148+
> `--device` is used to set the device to CPU or GPU.
149+
> `--temporal_agg` can be used to enable the temporal aggregation algorithm.
150+
> `--onscreen_render` can be used to enable onscreen rendering.
151+
152+
When the `--onscreen_render` parameter is enabled, the successful inference result appears as follows:
153+
154+
![act-sim-insertion-demo](README.assets/act-sim-insertion-demo.gif)
155+
156+
### Training **(Optional)**
157+
158+
> [!IMPORTANT]
159+
> Please refer to the [ALOHA paper](https://arxiv.org/abs/2304.13705) for instructions on setting up a machine with the training environment.
160+
161+
1. Generate 50 episodes with the following command:
162+
163+
```
164+
# Bimanual Insertion task
165+
$ python3 record_sim_episodes.py --task_name sim_insertion_scripted --dataset_dir <data save dir> --num_episodes 50
166+
```
167+
168+
2. Visualize the episode with the following command:
169+
170+
```
171+
$ python3 visualize_episodes.py --dataset_dir <data save dir> --episode_idx 0
172+
```
173+
174+
3. Train ACT with the following command:
175+
176+
```
177+
# Bimanual Insertion task
178+
$ python3 imitate_episodes.py --task_name sim_insertion_scripted --ckpt_dir <ckpt dir> --policy_class ACT --kl_weight 10 --chunk_size 100 --hidden_dim 512 --batch_size 8 --dim_feedforward 3200 --num_epochs 2000 --lr 1e-5 --seed 0
179+
```
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Patches for Action Chunking with Transformers (ACT)
2+
3+
The following patches are provided to enhance the ACT source available at: https://github.com/tonyzhaozh/act
4+
5+
| Directory | Enhancement |
6+
| ------------ | ---------------------------- |
7+
| [ipex](ipex) | Intel® Extension for PyTorch |
8+
| [ov](ov) | Intel® OpenVINO™ |
9+

0 commit comments

Comments
 (0)