Skip to content

Commit 2e5111a

Browse files
authored
Update README.md
1 parent ffa3ebb commit 2e5111a

File tree

1 file changed

+8
-2
lines changed

1 file changed

+8
-2
lines changed

README.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -101,14 +101,20 @@ You can customize these parameters by editing `src/cfgs/launch.yaml` directly, o
101101

102102
### Evluation over 60 RLBench tasks
103103
Why we use 60 tasks for the main evaluation?
104-
Although the 18 RLBench tasks have been widely adopted as a benchmark since their introduction in Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation, they are primarily used to evaluate 3D-based hierarchical policies that depend heavily on high-precision 3D inputs and motion planners. Many of these tasks are extremely challenging for RGB-only visuomotor policies, often leading to uniformly low success rates and therefore limited discriminative power.
104+
Although the 18 RLBench tasks have been widely adopted as a benchmark since their introduction in Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation, they are primarily used to evaluate 3D-based hierarchical policies that depend heavily on high-precision 3D inputs and motion planners. Many of these tasks are extremely challenging for RGB-only visuomotor policies, often leading to uniformly low success rates and therefore limited discriminative power.
105105

106106
<img width="1105" height="473" alt="coa_performance" src="https://github.com/user-attachments/assets/b4408c9d-311b-4c42-9cdb-74decfdb91ef" />
107107

108108

109109
### Evluation over 18 RLBench tasks
110110

111-
To enable convenient comparison with 3D-based hierarchical methods—such as RVT-2, we also report results on the RLBench-18 benchmark. Plase check appendix for more details. As you can see, there is still substantial room for RGB-only visuomotor policies to close the performance gap.
111+
To enable convenient comparison with 3D-based hierarchical methods—such as RVT-2, we also report results on the RLBench-18 benchmark. Plase check appendix for more details.
112+
113+
> Somehow, RLBench (the most popular 3D-policy benchmark) has gained significant traction in VLA benchmarking, yet VLAs remain far from matching 3D SOTA methods such as 3DDA. For various reasons, most VLA policies tend to avoid comparing against all relevant 3D baselines.
114+
>
115+
> — Source: [A Practitioner’s Guide to VLA Evaluation](https://mbreuss.github.io/blog_post_iclr_26_vla.html#practitioners-guide)
116+
117+
As you can see, there is still substantial room for RGB-only visuomotor policies to close the performance gap.
112118

113119
<img width="706" height="431" alt="coa_rlbench18" src="https://github.com/user-attachments/assets/3b698819-fd0a-4e6e-979e-f64ec108df52" />
114120

0 commit comments

Comments
 (0)