Skip to content

Commit ffa3ebb

Browse files
authored
Update README.md
1 parent 9e8de8a commit ffa3ebb

File tree

1 file changed

+15
-16
lines changed

1 file changed

+15
-16
lines changed

README.md

Lines changed: 15 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -97,35 +97,34 @@ You can customize these parameters by editing `src/cfgs/launch.yaml` directly, o
9797
- `.hydra/`: Configuration snapshots and Hydra management files
9898
- `README.md`:Project documentation
9999

100-
101-
## Note on Open-Source Implementation
102-
103-
The open-source implementation slightly differs from the original version reported in the paper. We reconstructed the entire training and evaluation pipeline for better clarity and reproducibility.
104-
105-
During this process, a few settings were adjusted:
106-
107-
- Both the **latent loss** and **action loss** were changed from **L2** to **L1**.
108-
- The **multi-token prediction head** was reduced from **5 tokens** to **2 tokens**.
109-
110-
These updates generally lead to improved success rates across most tasks.
111-
As a result, your observed performance (e.g., **100% on “push button”**) may exceed the numbers reported in the paper.
112-
113100
## Experiments Results
114101

115102
### Evluation over 60 RLBench tasks
116103
Why we use 60 tasks for the main evaluation?
117-
Although the 18 RLBench tasks have been widely adopted as a benchmark since their introduction in Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation, they are primarily used to evaluate 3D-based hierarchical policies that depend heavily on high-precision 3D inputs and motion planners. Many of these tasks are extremely challenging for RGB-only visuomotor policies, often leading to uniformly low success rates and therefore limited discriminative power.`
104+
Although the 18 RLBench tasks have been widely adopted as a benchmark since their introduction in Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation, they are primarily used to evaluate 3D-based hierarchical policies that depend heavily on high-precision 3D inputs and motion planners. Many of these tasks are extremely challenging for RGB-only visuomotor policies, often leading to uniformly low success rates and therefore limited discriminative power.
118105

119106
<img width="1105" height="473" alt="coa_performance" src="https://github.com/user-attachments/assets/b4408c9d-311b-4c42-9cdb-74decfdb91ef" />
120107

121108

122109
### Evluation over 18 RLBench tasks
123110

124-
To enable convenient comparison with 3D-based hierarchical methods—such as RVT-2, we also report results on the RLBench-18 benchmark. Plase check appendix for more details.
111+
To enable convenient comparison with 3D-based hierarchical methods—such as RVT-2, we also report results on the RLBench-18 benchmark. Plase check appendix for more details. As you can see, there is still substantial room for RGB-only visuomotor policies to close the performance gap.
125112

126113
<img width="706" height="431" alt="coa_rlbench18" src="https://github.com/user-attachments/assets/3b698819-fd0a-4e6e-979e-f64ec108df52" />
127114

128-
## Updated Results (Open-Source Version)
115+
## Note on Open-Source Implementation
116+
117+
The open-source implementation slightly differs from the original version reported in the paper. We reconstructed the entire training and evaluation pipeline for better clarity and reproducibility.
118+
119+
During this process, a few settings were adjusted:
120+
121+
- Both the **latent loss** and **action loss** were changed from **L2** to **L1**.
122+
- The **multi-token prediction head** was reduced from **5 tokens** to **2 tokens**.
123+
124+
These updates generally lead to improved success rates across most tasks.
125+
As a result, your observed performance (e.g., **100% on “push button”**) may exceed the numbers reported in the paper.
126+
127+
### Updated Results (Open-Source Version)
129128

130129
For reference, below are the task-level success rates of the open-source implementation compared with those reported in the paper.
131130
The open-source version generally achieves higher performance due to the modified training configuration.

0 commit comments

Comments
 (0)