You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -49,7 +49,7 @@ If you find our work useful, please consider citing:
49
49
50
50
The **[Searchable Paper Page](https://vyokky.github.io/LLM-Brained-GUI-Agents-Survey/)** is a web-based interface that allows you to search and filter through the papers in our survey. You can also view the papers by category, platform, and date.
Copy file name to clipboardExpand all lines: data/benchmark.json
+34-2Lines changed: 34 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -409,7 +409,7 @@
409
409
},
410
410
{
411
411
"Name": "Beyond Pass or Fail: A Multi-dimensional Benchmark for Mobile UI Navigation",
412
-
"Platform": "Android",
412
+
"Platform": "Mobile Android",
413
413
"Date": "January 2025",
414
414
"Paper_Url": "https://arxiv.org/abs/2501.02863",
415
415
"Highlight": "Provides a fully automated benchmarking suite and introduces a multi-dimensional evaluation framework.",
@@ -422,5 +422,37 @@
422
422
"Paper_Url": "https://arxiv.org/abs/2502.08047",
423
423
"Highlight": "First GUI benchmark designed to evaluate dynamic GUI interactions by incorporating various initial states.",
424
424
"Code_Url": ""
425
+
},
426
+
{
427
+
"Name": "AEIA-MN: Evaluating the Robustness of Multimodal LLM-Powered Mobile Agents Against Active Environmental Injection Attacks",
428
+
"Platform": "Mobile Android",
429
+
"Date": "February 2025",
430
+
"Paper_Url": "https://arxiv.org/abs/2502.13053",
431
+
"Highlight": "Introduces the Active Environment Injection Attack (AEIA) framework that actively manipulates environmental elements (e.g., notifications) in mobile operating systems to mislead multimodal LLM-powered agents.",
432
+
"Code_Url": ""
433
+
},
434
+
{
435
+
"Name": "WebGames: Challenging General-Purpose Web-Browsing AI Agents",
436
+
"Platform": "Web",
437
+
"Date": "February 2025",
438
+
"Paper_Url": "https://arxiv.org/abs/2502.18356",
439
+
"Highlight": "A comprehensive benchmark designed to evaluate the capabilities of general-purpose web-browsing AI agents through 50+ interactive challenges. It uniquely provides a hermetic testing environment with verifiable ground-truth solutions.",
"Name": "AutoEval: A Practical Framework for Autonomous Evaluation of Mobile Agents",
444
+
"Platform": "Mobile Android",
445
+
"Date": "March 2025",
446
+
"Paper_Url": "https://arxiv.org/abs/2503.02403",
447
+
"Highlight": "Introduces a fully autonomous evaluation framework for mobile agents, eliminating the need for manual task reward signal definition and extensive evaluation code development.",
448
+
"Code_Url": ""
449
+
},
450
+
{
451
+
"Name": "SafeArena: Evaluating the Safety of Autonomous Web Agents",
452
+
"Platform": "Web",
453
+
"Date": "March 2025",
454
+
"Paper_Url": "https://arxiv.org/abs/2503.04957",
455
+
"Highlight": "The first benchmark specifically designed to evaluate the deliberate misuse of web agents by testing their ability to complete both safe and harmful tasks.",
"Name": "Symbiotic Cooperation for Web Agents: Harnessing Complementary Strengths of Large and Small LLMs",
500
+
"Platform": "Web",
501
+
"Date": "February 2025",
502
+
"Paper_Url": "https://arxiv.org/abs/2502.07942",
503
+
"Highlight": "Multi-agent iterative architecture & Introduces an iterative, symbiotic learning process between large and small LLMs for web automation. Enhances both data synthesis and task performance through speculative data synthesis, multi-task learning, and privacy-preserving hybrid modes.",
504
+
"Code_Url": ""
505
+
},
506
+
{
507
+
"Name": "PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC",
508
+
"Platform": "Windows computers",
509
+
"Date": "February 2025",
510
+
"Paper_Url": "https://arxiv.org/abs/2502.14282",
511
+
"Highlight": "PC-Agent's hierarchical multi-agent design enables efficient decomposition of complex PC tasks. Its Active Perception Module enhances fine-grained GUI understanding by combining accessibility structures, OCR, and intention grounding.",
"Name": "MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions",
524
+
"Platform": "Android",
525
+
"Date": "February 2025",
526
+
"Paper_Url": "https://arxiv.org/abs/2502.16796",
527
+
"Highlight": "Introduces an app-oriented multi-agent framework with self-evolution, overcoming the complexity of cross-app interactions by dynamically recruiting specialized agents.",
"Name": "Programming with Pixels: Computer-Use Meets Software Engineering",
532
+
"Platform": "Computers",
533
+
"Date": "February 2025",
534
+
"Paper_Url": "https://arxiv.org/abs/2502.18525",
535
+
"Highlight": "Shifts software engineering agents from API-based tool interactions to direct GUI-based computer use, allowing agents to interact with an IDE as a human developer would.",
536
+
"Code_Url": "https://programmingwithpixels.com"
537
+
},
538
+
{
539
+
"Name": "AppAgentX: Evolving GUI Agents as Proficient Smartphone Users",
540
+
"Platform": "Mobile Android",
541
+
"Date": "March 2025",
542
+
"Paper_Url": "https://arxiv.org/abs/2503.02268",
543
+
"Highlight": "Introduces an evolutionary mechanism that enables dynamic learning from past interactions and replaces inefficient low-level operations with high-level actions.",
544
+
"Code_Url": "https://appagentx.github.io/"
545
+
},
546
+
{
547
+
"Name": "LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications",
548
+
"Platform": "Web",
549
+
"Date": "March 2025",
550
+
"Paper_Url": "https://arxiv.org/abs/2503.02950",
551
+
"Highlight": "First open-source, production-ready web agent integrating tree search for multi-step task execution.",
"Name": "CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning",
556
+
"Platform": "Mobile Android",
557
+
"Date": "March 2025",
558
+
"Paper_Url": "https://arxiv.org/abs/2503.03743",
559
+
"Highlight": "Introduces a basis subtask framework, where subtasks are predefined based on human task decomposition patterns, ensuring better executability and efficiency.",
560
+
"Code_Url": "https://github.com/Yuqi-Zhou/CHOP"
561
+
},
562
+
{
563
+
"Name": "Enhancing Language Multi-Agent Learning with Multi-Agent Credit Re-Assignment for Interactive Environment Generalization",
564
+
"Platform": "Mobile Android, Web",
565
+
"Date": "February 2025",
566
+
"Paper_Url": "https://arxiv.org/abs/2502.14496",
567
+
"Highlight": "A multi-agent reinforcement learning framework that introduces a Credit Re-Assignment (CR) strategy, using LLMs instead of environment-specific rewards to enhance performance and generalization.",
"Name": "Automating the enterprise with foundation models",
572
+
"Platform": "Web",
573
+
"Date": "May 2024",
574
+
"Paper_Url": "https://arxiv.org/abs/2405.03710",
575
+
"Highlight": "Eliminates the high setup costs, brittle execution, and burdensome maintenance associated with traditional RPA by learning from video and text documentation.",
Copy file name to clipboardExpand all lines: data/gui-testing.json
+26-1Lines changed: 26 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -102,5 +102,30 @@
102
102
"Paper_Url": "https://arxiv.org/abs/2411.17933",
103
103
"Highlight": "Leverages multimodal LLMs to perform UI test transfers without requiring source code access",
104
104
"Code_Url": ""
105
+
},
106
+
{
107
+
"Name": "UXAgent: An LLM Agent-Based Usability Testing Framework for Web Design",
108
+
"Platform": "Web platforms",
109
+
"Date": "February 2025",
110
+
"Paper_Url": "https://arxiv.org/abs/2502.12561",
111
+
"Highlight": "Enables LLM-powered automated usability testing by simulating thousands of user interactions, collecting both qualitative and quantitative data, and providing researchers with early feedback before real-user studies.",
112
+
"Code_Url": "https://uxagent.hailab.io"
113
+
},
114
+
{
115
+
"Name": "Guardian: A Runtime Framework for LLM-Based UI Exploration",
"Highlight": "Eliminates the need for pre-written test scripts by leveraging LLMs and multimodal perception to generate and execute test cases automatically.",
"Name": "VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model",
340
+
"Platform": "Mobile Android",
341
+
"Date": "February 2025",
342
+
"Paper_Url": "https://arxiv.org/abs/2502.18906",
343
+
"Highlight": "Unlike traditional RL methods that require environment interactions, VEM enables training purely on offline data with a Value Environment Model.",
0 commit comments