You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+15-16Lines changed: 15 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -97,35 +97,34 @@ You can customize these parameters by editing `src/cfgs/launch.yaml` directly, o
97
97
-`.hydra/`: Configuration snapshots and Hydra management files
98
98
-`README.md`:Project documentation
99
99
100
-
101
-
## Note on Open-Source Implementation
102
-
103
-
The open-source implementation slightly differs from the original version reported in the paper. We reconstructed the entire training and evaluation pipeline for better clarity and reproducibility.
104
-
105
-
During this process, a few settings were adjusted:
106
-
107
-
- Both the **latent loss** and **action loss** were changed from **L2** to **L1**.
108
-
- The **multi-token prediction head** was reduced from **5 tokens** to **2 tokens**.
109
-
110
-
These updates generally lead to improved success rates across most tasks.
111
-
As a result, your observed performance (e.g., **100% on “push button”**) may exceed the numbers reported in the paper.
112
-
113
100
## Experiments Results
114
101
115
102
### Evluation over 60 RLBench tasks
116
103
Why we use 60 tasks for the main evaluation?
117
-
Although the 18 RLBench tasks have been widely adopted as a benchmark since their introduction in Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation, they are primarily used to evaluate 3D-based hierarchical policies that depend heavily on high-precision 3D inputs and motion planners. Many of these tasks are extremely challenging for RGB-only visuomotor policies, often leading to uniformly low success rates and therefore limited discriminative power.`
104
+
Although the 18 RLBench tasks have been widely adopted as a benchmark since their introduction in Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation, they are primarily used to evaluate 3D-based hierarchical policies that depend heavily on high-precision 3D inputs and motion planners. Many of these tasks are extremely challenging for RGB-only visuomotor policies, often leading to uniformly low success rates and therefore limited discriminative power.
To enable convenient comparison with 3D-based hierarchical methods—such as RVT-2, we also report results on the RLBench-18 benchmark. Plase check appendix for more details.
111
+
To enable convenient comparison with 3D-based hierarchical methods—such as RVT-2, we also report results on the RLBench-18 benchmark. Plase check appendix for more details. As you can see, there is still substantial room for RGB-only visuomotor policies to close the performance gap.
The open-source implementation slightly differs from the original version reported in the paper. We reconstructed the entire training and evaluation pipeline for better clarity and reproducibility.
118
+
119
+
During this process, a few settings were adjusted:
120
+
121
+
- Both the **latent loss** and **action loss** were changed from **L2** to **L1**.
122
+
- The **multi-token prediction head** was reduced from **5 tokens** to **2 tokens**.
123
+
124
+
These updates generally lead to improved success rates across most tasks.
125
+
As a result, your observed performance (e.g., **100% on “push button”**) may exceed the numbers reported in the paper.
126
+
127
+
### Updated Results (Open-Source Version)
129
128
130
129
For reference, below are the task-level success rates of the open-source implementation compared with those reported in the paper.
131
130
The open-source version generally achieves higher performance due to the modified training configuration.
0 commit comments