Skip to content

Commit 3552a10

Browse files
authored
Update update_paper_list.md
1 parent a25c242 commit 3552a10

1 file changed

Lines changed: 11 additions & 0 deletions

File tree

update_template_or_data/update_paper_list.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,14 @@
1+
- [CUARewardBench: A Benchmark for Evaluating Reward Models on Computer-using Agent](https://arxiv.org/abs/2510.18596)
2+
- Haojia Lin, Xiaoyu Tan, Yulei Qin, Zihan Xu, Yuchen Shi, Zongyi Li, Gang Li, Shaofei Cai, Siqi Cai, Chaoyou Fu, Ke Li, Xing Sun
3+
- 🏛️ Institutions: Tencent Youtu Lab, PKU, NJU
4+
- 📅 Date: October 21, 2025
5+
- 📑 Publisher: arXiv
6+
- 💻 Env: [Desktop]
7+
- 🔑 Key: [benchmark], [dataset], [reward model], [computer-using agent], [CUARewardBench], [Unanimous Prompt Ensemble (UPE)]
8+
- 📖 TLDR: This paper introduces CUARewardBench, the first benchmark for evaluating reward models tailored to computer-using agents. It includes step-level and trajectory-level annotations from tasks across 10 software types and 7 agent architectures. The study identifies the limitations of current reward models and proposes UPE, a prompting-based method that significantly improves evaluation accuracy.
9+
10+
11+
112
- [PolySkill: Learning Generalizable Skills Through Polymorphic Abstraction](https://arxiv.org/abs/2510.15863)
213
- Simon Yu, Gang Li, Weiyan Shi, Peng Qi
314
- 🏛️ Institutions: NEU, Uniphore

0 commit comments

Comments
 (0)