Skip to content

Commit d3dcc6c

Browse files
authored
Merge pull request #12 from wwh0411/main
Add the paper MobileA3gent
2 parents e5e8ec4 + f541c1f commit d3dcc6c

2 files changed

Lines changed: 18 additions & 0 deletions

File tree

β€ŽREADME.mdβ€Ž

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -297,6 +297,15 @@ This repo covers a variety of papers related to GUI Agents, such as:
297297
- πŸ”‘ Key: [framework], [benchmark], [security], [jailbreaking], [evaluation], [OpenHands]
298298
- πŸ“– TLDR: This paper investigates why Web AI agents are significantly more susceptible to executing harmful commands compared to standalone LLMs, despite sharing the same underlying models. Through a fine-grained evaluation, the authors identify three critical factors contributing to this vulnerability: embedding user goals into system prompts, multi-step action generation, and processing of event streams from web navigation. The study introduces a five-level harmfulness evaluation framework and utilizes the OpenHands platform to systematically assess these vulnerabilities, revealing a 46.6% success rate in malicious task execution by Web AI agents versus 0% for standalone LLMs.
299299

300+
- [MobileA3gent: Training Mobile GUI Agents Using Decentralized Self-Sourced Data from Diverse Users](https://arxiv.org/abs/2502.02982)
301+
- Wenhao Wang, Mengying Yuan, Zijie Yu, Guangyi Liu, Rui Ye, Tian Jin, Siheng Chen, Yanfeng Wang
302+
- πŸ›οΈ Institutions: ZJU, SJTU, Shanghai AI Lab, MAGIC
303+
- πŸ“… Date: February 5, 2025
304+
- πŸ“‘ Publisher: arXiv
305+
- πŸ’» Env: [Mobile]
306+
- πŸ”‘ Key: [framework], [Auto-Annotation], [federated learning], [privacy], [non-IID], [adapted aggregation]
307+
- πŸ“– TLDR: This paper introduces *MobileA3gent*, a framework for training mobile GUI agents using decentralized, self-sourced data from diverse users. It addresses challenges in extracting user instructions without human intervention and utilizing distributed data while preserving privacy. The framework comprises two key techniques: *Auto-Annotation*, which automatically collects high-quality datasets during users' routine phone usage, and *FedVLM-A*, which enhances federated training on non-IID user data by incorporating both episode- and step-level distributions. Experiments demonstrate that MobileA3gent achieves performance comparable to centralized human-annotated models at less than 1% of the cost, highlighting its potential for real-world applications.
308+
300309
- [Magma: A Foundation Model for Multimodal AI Agents](https://microsoft.github.io/Magma/)
301310
- Jianwei Yang, Reuben Tan, Qianhui Wu, Ruijie Zheng, Baolin Peng, Yongyuan Liang, Yu Gu, Mu Cai, Seonghyeon Ye, Joel Jang, Yuquan Deng, Lars Liden, Jianfeng Gao
302311
- πŸ›οΈ Institutions: Microsoft Research, Univ. of Maryland, Univ. of Wisconsin-Madison, KAIST, Univ. of Washington

β€Župdate_template_or_data/update_paper_list.mdβ€Ž

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -312,6 +312,15 @@
312312
- πŸ”‘ Key: [framework], [reinforcement learning], [distributed training], [A-RIDE], [on-device control]
313313
- πŸ“– TLDR: This paper introduces **DistRL**, a novel framework designed to enhance the efficiency of online reinforcement learning fine-tuning for mobile device control agents. DistRL employs centralized training and decentralized data acquisition to ensure efficient fine-tuning in dynamic online interactions. The framework is supported by a custom RL algorithm, A-RIDE, which balances exploration with prioritized data utilization to ensure stable and robust training. Experiments demonstrate that DistRL improves training efficiency by 3Γ— and accelerates data collection by 2.4Γ— compared to leading synchronous multi-machine methods. Notably, after training, DistRL achieves a 20% relative improvement in success rate on general Android tasks from an open benchmark, outperforming existing approaches while maintaining the same training time.
314314

315+
- [MobileA3gent: Training Mobile GUI Agents Using Decentralized Self-Sourced Data from Diverse Users](https://arxiv.org/abs/2502.02982)
316+
- Wenhao Wang, Mengying Yuan, Zijie Yu, Guangyi Liu, Rui Ye, Tian Jin, Siheng Chen, Yanfeng Wang
317+
- πŸ›οΈ Institutions: ZJU, SJTU, Shanghai AI Lab, MAGIC
318+
- πŸ“… Date: February 5, 2025
319+
- πŸ“‘ Publisher: arXiv
320+
- πŸ’» Env: [Mobile]
321+
- πŸ”‘ Key: [framework], [Auto-Annotation], [federated learning], [privacy], [non-IID], [adapted aggregation]
322+
- πŸ“– TLDR: This paper introduces *MobileA3gent*, a framework for training mobile GUI agents using decentralized, self-sourced data from diverse users. It addresses challenges in extracting user instructions without human intervention and utilizing distributed data while preserving privacy. The framework comprises two key techniques: *Auto-Annotation*, which automatically collects high-quality datasets during users' routine phone usage, and *FedVLM-A*, which enhances federated training on non-IID user data by incorporating both episode- and step-level distributions. Experiments demonstrate that MobileA3gent achieves performance comparable to centralized human-annotated models at less than 1% of the cost, highlighting its potential for real-world applications.
323+
315324
- [Magma: A Foundation Model for Multimodal AI Agents](https://microsoft.github.io/Magma/)
316325
- Jianwei Yang, Reuben Tan, Qianhui Wu, Ruijie Zheng, Baolin Peng, Yongyuan Liang, Yu Gu, Mu Cai, Seonghyeon Ye, Joel Jang, Yuquan Deng, Lars Liden, Jianfeng Gao
317326
- πŸ›οΈ Institutions: Microsoft Research, Univ. of Maryland, Univ. of Wisconsin-Madison, KAIST, Univ. of Washington

0 commit comments

Comments
Β (0)