Skip to content

Commit dd1f823

Browse files
committed
update docs
1 parent a0a6e0e commit dd1f823

File tree

3 files changed

+20
-7
lines changed

3 files changed

+20
-7
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11

22

3-
# <a href="https://https://demo-generation.github.io/">𝑫𝒆𝒎𝒐𝑮𝒆𝒏: Synthetic Demonstration Generation for Data-Efficient Visuomotor Policy Learning</a>
3+
# <a href="https://demo-generation.github.io/">𝑫𝒆𝒎𝒐𝑮𝒆𝒏: Synthetic Demonstration Generation for Data-Efficient Visuomotor Policy Learning</a>
44

5-
<a href="https://https://demo-generation.github.io/"><strong>Project Page</strong></a> | <a href="https://arxiv.org/abs/2502.16932"><strong>arXiv</strong></a> | <a href="https://x.com/ZhengrongX/status/1899134914416800123"><strong>Twitter</strong></a>
5+
<a href="https://demo-generation.github.io/"><strong>Project Page</strong></a> | <a href="https://arxiv.org/abs/2502.16932"><strong>arXiv</strong></a> | <a href="https://x.com/ZhengrongX/status/1899134914416800123"><strong>Twitter</strong></a>
66

77

88

@@ -25,7 +25,7 @@ For action generation, 𝑫𝒆𝒎𝒐𝑮𝒆𝒏 adopts the idea of Task and
2525
* **2025/04/02**, Officially released 𝑫𝒆𝒎𝒐𝑮𝒆𝒏.
2626

2727

28-
# 🚀 Quick Try
28+
# 🚀 Quick Try in 5 Minutes
2929
## 1. Minimal Installation
3030
#### 1.0. Create conda Env
3131
```bash

docs/1_data_collection.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Data Collection (for Your Own Task)
22

3-
We provide some source demos under the `data/datasets/source` folder. If you only want to get a sense of how 𝑫𝒆𝒎𝒐𝑮𝒆𝒏 works, you can directly start from the provided demos, and jump to [data_generation](./2_data_generation.md). If you want to collect your own data, you can follow the steps below.
3+
We provide some source demos under the `data/datasets/source` folder. If you only want to get a sense of how 𝑫𝒆𝒎𝒐𝑮𝒆𝒏 works, you can directly start from the provided demos, and jump to Quick Try in [README](../README.md) or the instructions in [data_generation](./2_data_generation.md). If you want to collect your own data, you can follow the steps below.
44

55

66

@@ -18,8 +18,10 @@ These dimension informations should be specified in the `shape_meta:` configurat
1818
## Data Requirements
1919
𝑫𝒆𝒎𝒐𝑮𝒆𝒏 can be applied to various platforms, including bimanual manipulation and dexterous-hand end-effectors. We provide an interface for collecting demos with keyboard for your reference in `real_world/collect_demo.py`.
2020

21-
To facilitate synthetic generation of visual observations, 𝑫𝒆𝒎𝒐𝑮𝒆𝒏 require the access to **3D point clouds**. This asks for preliminary camera calibration of the depth camera. You can follow the procedures in a [note](https://gist.github.com/hshi74/edabc1e9bed6ea988a2abd1308e1cc96) by Haochen Shi. The camera-related parameters should be noted in the beginning of `real_world/utils/pcd_process.py`.
21+
To facilitate synthetic generation of visual observations, 𝑫𝒆𝒎𝒐𝑮𝒆𝒏 require the access to **3D point clouds**. This asks for preliminary camera calibration of the depth camera. You can follow the procedures in a [note](https://gist.github.com/hshi74/edabc1e9bed6ea988a2abd1308e1cc96) by Haochen Shi. The camera-related parameters should be noted in the beginning of `real_world/utils/pcd_process.py`.
22+
23+
Similar to many previous works using 3D point clouds as the visual observation, the unrelated points (i.e., those from the background and the table surface) should be **cropped out**. Once the camera calibration is ready, this can be easily done by specifying a workspace bounding box and discarding all the points outside the workspace.
2224

2325
The point cloud we use is projected from **single-view** depth image instead of multi-view, since (1) the calibration process is time-consuming, (2) single-view camera is more practical for mobile platforms, e.g., ego-centric vision on a humanoid.
2426

25-
Like DP3, we recommend the use of RealSense **L515** rather than the more commonly seen D435, because L515 captures higher-quality point clouds, e.g., fewer holes on the object surface, clearer boundaries between objects and background. We add a DBSCAN clustering step to discard the outlier points in the processing pipeline, which we found could effectively improve the quality of point clouds.
27+
Like DP3, we recommend the use of RealSense **L515** rather than the more commonly seen D435, because L515 captures higher-quality point clouds, e.g., fewer holes on the object surface, clearer boundaries between objects and background. We add a DBSCAN clustering step to discard the outlier points in the processing pipeline, which we found could effectively improve the quality of point clouds. Afterwards, the point cloud should undergo a farthest point sampling (FPS) process, downsampled to a fixed number of points, e.g., `1024`.

docs/2_data_generation.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,14 @@
11
# Data Generation with 𝑫𝒆𝒎𝒐𝑮𝒆𝒏
22

3-
TBD.
3+
𝑫𝒆𝒎𝒐𝑮𝒆𝒏 is designed for the automatic generation of synthetic demonstrations. The unavoidable human effort in the 𝑫𝒆𝒎𝒐𝑮𝒆𝒏 pipeline lies in the pre-processing process, i.e., (1) segmenting the point cloud observation *only for the first frame*, and (2) parsing the source trajectory into object-centric segments.
4+
5+
## Point Cloud Segmentation
6+
Once we exclude the unrelated points outside the workspace and process the point cloud with clustering and FPS, the points in the cloud of the first frame should belong to either the robot end-effector or the object(s). In many cases, they can be easily separated by manually specifying a bounding box for the object, and the rest of the cloud should belong to the robot end-effector.
7+
8+
We also provide a more elegant implementation to automate this process by leveraging open-vocabulary segmentation models (e.g., [Grounded-SAM](https://github.com/IDEA-Research/Grounded-Segment-Anything) or [LangSAM](https://github.com/luca-medeiros/lang-segment-anything)). More specifically, we only need to describe the manipulated objects with natural language. Taking the language prompt as input, these models can segment the corresponding objects on the RGB image. Next, since the RGB and depth image are pixel-aligned, we can project the segmented masks onto the depth image to obtain the point cloud segmentation. An implementation of this process is provided in `demo_generation/demo_generation/mask_util.py`.
9+
10+
11+
## Source Trajectory Parsing
12+
The source trajectory needs to be parsed into object-centric segments. For each object manipulated in the task, it is related to 2 sub-segments: (1) a *motion* segment that approaches the object, and (2) a *skill* segment that manipulates the object thourgh contact. Still, we have two options for trajectory parsing. The more straightforward one is in fact to manually specify the start frames for each sub-segment. You can run the demo generation code with `generation:range_name: src` and `generation:render_video: True`, and this will give you the rendered video of the source demonstration. You can easily tell the parsing frames by looking at the frame index marked on the top-left corner of the video.
13+
14+
Still, we provide a more elegant way by checking whether the distance between the robot end-effector and the object point cloud falls below a threshold. While this automates the parsing process, it may require some manual tuning of the threshold, and therefore is not as practical as manual specification in many cases. The implementations are provided in `parse_frames_two_stage` and `parse_frames_one_stage` functions in `demo_generation/demo_generation/demogen.py`.

0 commit comments

Comments
 (0)