LLM-Agents-Papers

✍️ Description

Last Updated Time: 2025/7/12

A repo lists papers related to LLM based agent. Includes

Survey
Technique For Enhancement
Interaction
Application
- Math
- Chemistry
- Biology
- Physics
- Geography
- Art
- Medicine
- Finance
- Software Engineering
- Research
Automation
- Workflow
- Automatic Evaluation
Training
- Fine tuning
- RL
- DPO
Scaling
- Single-Agent Framework
- Multi-Agent System
Stability
Infrastructure
Others

💛 Recommendation

For more comprehensive reading, we also recommend other paper lists:

zjunlp/LLMAgentPapers: Must-read Papers on Large Language Model Agents.
teacherpeterpan/self-correction-llm-papers: This is a collection of research papers for Self-Correcting Large Language Models with Automated Feedback.
Paitesanshi/LLM-Agent-Survey: A Survey on LLM-based Autonomous Agents.
woooodyy/llm-agent-paper-list: Must-read papers for LLM-based agents.
git-disl/awesome-LLM-game-agent-papers: Must-read papers for LLM-based Game agents.

📰 Papers

Survey

[2025/06/10] Measuring Data Science Automation: A Survey of Evaluation Tools for AI Assistants and Agents | [paper] | [code]
[2025/06/06] Evolutionary Perspectives on the Evaluation of LLM-Based AI Agents: A Comprehensive Survey | [paper] | [code]
[2025/05/27] Creativity in LLM-based Multi-Agent Systems: A Survey | [paper] | [code]
[2025/05/24] Multi-Party Conversational Agents: A Survey | [paper] | [code]
[2025/05/16] A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron? | [paper] | [code]
[2025/05/02] AI agents may be worth the hype but not the resources (yet): An initial exploration of machine translation quality and costs in three language pairs in the legal and news domains | [paper] | [code]
[2025/05/01] A Survey on Large Language Model based Human-Agent Systems | [paper] | [code]
[2025/04/30] Humanizing LLMs: A Survey of Psychological Measurements with Tools, Datasets, and Human-Agent Applications | [paper] | [code]
[2025/04/22] A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment | [paper] | [code]
[2025/04/20] Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey | [paper] | [code]
[2025/04/14] A Survey of Large Language Model-Powered Spatial Intelligence Across Scales: Advances in Embodied Agents, Smart Cities, and Earth Science | [paper] | [code]
[2025/04/12] A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems | [paper] | [code]
[2025/03/28] Evaluating LLM-based Agents for Multi-Turn Conversations: A Survey | [paper] | [code]
[2025/03/27] Large Language Model Agent: A Survey on Methodology, Applications and Challenges | [paper] | [code]
[2025/03/27] A Survey on (M)LLM-Based GUI Agents | [paper] | [code]
[2025/03/24] A Survey of Large Language Model Agents for Question Answering | [paper] | [code]
[2025/03/20] Survey on Evaluation of LLM-based Agents | [paper] | [code]
[2025/03/13] LLMs Working in Harmony: A Survey on the Technological Aspects of Building Effective LLM-Based Multi Agent Systems | [paper] | [code]
[2025/03/12] Agentic AI for Scientific Discovery: A Survey of Progress, Challenges, and Future Directions | [paper] | [code]
[2025/02/20] Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems | [paper] | [code]
[2025/02/18] Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents | [paper] | [code]
[2025/02/16] A Survey of LLM-based Agents in Medicine: How far are we from Baymax? | [paper] | [code]
[2025/01/15] Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG | [paper] | [code]
[2024/12/23] A Survey on LLM-based Multi-Agent System: Recent Advances and New Frontiers in Application | [paper] | [code]
[2024/12/18] A Survey on Large Language Model-based Agents for Statistics and Data Science | [paper] | [code]
[2024/12/05] A Survey on Large Language Model-Based Social Agents in Game-Theoretic Scenarios | [paper] | [code]
[2024/12/04] From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents | [paper] | [code]
[2024/11/27] Large Language Model-Brained GUI Agents: A Survey | [paper] | [code]
[2024/09/27] A Survey on Complex Tasks for Goal-Directed Interactive Agents | [paper] | [code]
[2024/09/13] Agents in Software Engineering: Survey, Landscape, and Vision | [paper] | [code]
[2024/09/04] A Survey on Emergent Language | [paper] | [code]
[2024/08/05] From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future | [paper] | [code]
[2024/07/26] Large Language Model Agent in Financial Trading: A Survey | [paper] | [code]
[2024/06/03] Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization | [paper] | [code]
[2024/06/01] Towards Rationality in Language and Multimodal Agents: A Survey | [paper] | [code]
[2024/04/17] Advancing Social Intelligence in AI Agents: Technical Challenges and Open Questions | [paper] | [code]
[2024/04/02] A Survey on Large Language Model-Based Game Agents | [paper] | [code]
[2024/03/26] Leveraging Large Language Models in Human-Robot Interaction: A Critical Analysis of Potential and Pitfalls | [paper] | [code]
[2024/03/07] Promising and worth-to-try future directions for advancing state-of-the-art surrogates methods of agent-based models in social and health computational sciences | [paper] | [code]
[2024/02/28] Large Language Models and Games: A Survey and Roadmap | [paper] | [code]
[2024/02/28] A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems | [paper] | [code]
[2024/02/05] Understanding the planning of LLM agents: A survey | [paper] | [code]
[2024/01/01] If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents | [paper] | [code]
[2023/12/31] A Survey of Personality, Persona, and Profile in Conversational Agents and Chatbots | [paper] | [code]
[2023/12/19] Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives | [paper] | [code]
[2023/09/14] The Rise and Potential of Large Language Model Based Agents: A Survey | [paper] | [code]
[2023/08/22] A Survey on Large Language Model based Autonomous Agents | [paper] | [code]
[2023/06/27] Next Steps for Human-Centered Generative AI: A Technical Perspective | [paper] | [code]

Technique For Enhancement

Planning

[2025/06/30] Thought-Augmented Planning for LLM-Powered Interactive Recommender Agent | [paper] | [code]
[2025/06/24] NaviAgent: Bilevel Planning on Tool Dependency Graphs for Function Calling | [paper] | [code]
[2025/06/10] Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search | [paper] | [code]
[2025/06/06] MAPLE: Multi-Agent Adaptive Planning with Long-Term Memory for Table Reasoning | [paper] | [code]
[2025/05/22] T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning | [paper] | [code]
[2025/05/02] PIPA: A Unified Evaluation Protocol for Diagnosing Interactive Planning Agents | [paper] | [code]
[2025/04/15] GraphicBench: A Planning Benchmark for Graphic Design with Language Agents | [paper] | [code]
[2025/03/12] Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks | [paper] | [code]
[2025/03/04] MPO: Boosting LLM Agents with Meta Plan Optimization | [paper] | [code]
[2025/03/03] Improving Retrospective Language Agents via Joint Policy Gradient Optimization | [paper] | [code]
[2025/02/08] CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging | [paper] | [code]
[2025/02/06] Robotouille: An Asynchronous Planning Benchmark for LLM Agents | [paper] | [code]
[2025/01/27] MADP: Multi-Agent Deductive Planning for Enhanced Cognitive-Behavioral Mental Health Question Answer | [paper] | [code]
[2025/01/14] Talk to Right Specialists: Routing and Planning in Multi-agent System for Question Answering | [paper] | [code]
[2024/12/30] Plancraft: an evaluation dataset for planning with LLM agents | [paper] | [code]
[2024/12/28] Efficient Multi-Agent Collaboration with Tool Use for Online Planning in Complex Table Question Answering | [paper] | [code]
[2024/12/13] Script-Based Dialog Policy Planning for LLM-Powered Conversational Agents: A Basic Architecture for an "AI Therapist" | [paper] | [code]
[2024/11/13] One STEP at a time: Language Agents are Stepwise Planners | [paper] | [code]
[2024/11/05] Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent | [paper] | [code]
[2024/10/12] CAMPHOR: Collaborative Agents for Multi-input Planning and High-Order Reasoning On Device | [paper] | [code]
[2024/10/01] Self-controller: Controlling LLMs with Multi-round Step-by-step Self-awareness | [paper] | [code]
[2024/09/30] Interactive Speculative Planning: Enhance Agent Efficiency through Co-design of System and User Interface | [paper] | [code]
[2024/09/28] SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models | [paper] | [code]
[2024/09/25] MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making | [paper] | [code]
[2024/08/15] VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool | [paper] | [code]
[2024/08/12] Towards Autonomous Agents: Adaptive-planning, Reasoning, and Acting in Language Models | [paper] | [code]
[2024/08/01] AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation | [paper] | [code]
[2024/07/04] Controllable Conversations: Planning-Based Dialogue Agent with Large Language Models | [paper] | [code]
[2024/06/17] RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents | [paper] | [code]
[2024/06/09] A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback Learning | [paper] | [code]
[2024/06/06] Tool-Planner: Task Planning with Clusters across Multiple Tools | [paper] | [code]
[2024/05/28] A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models | [paper] | [code]
[2024/05/27] REVECA: Adaptive Planning and Trajectory-based Validation in Cooperative Language Agents using Information Relevance and Relative Proximity | [paper] | [code]
[2024/04/21] Socratic Planner: Inquiry-Based Zero-Shot Planning for Embodied Instruction Following | [paper] | [code]
[2024/04/17] The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey | [paper] | [code]
[2024/03/11] Strength Lies in Differences! Improving Strategy Planning for Non-collaborative Dialogues via Diversified User Simulation | [paper] | [code]
[2024/03/10] TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision | [paper] | [code]
[2024/03/05] KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents | [paper] | [code]
[2024/02/29] PlanGPT: Enhancing Urban Planning with Tailored Language Model and Efficient Retrieval | [paper] | [code]
[2024/02/18] What's the Plan? Evaluating and Developing Planning-Aware Techniques for Language Models | [paper] | [code]
[2024/02/18] PreAct: Prediction Enhances Agent's Planning Ability | [paper] | [code]
[2024/02/16] When is Tree Search Useful for LLM Planning? It Depends on the Discriminator | [paper] | [code]
[2024/02/15] TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation | [paper] | [code]
[2024/02/09] Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity | [paper] | [code]
[2024/02/06] RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents | [paper] | [code]
[2024/02/02] TravelPlanner: A Benchmark for Real-World Planning with Language Agents | [paper] | [code]
[2024/01/10] AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning | [paper] | [code]
[2023/11/19] TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems | [paper] | [code]
[2023/10/12] Tree-Planner: Efficient Close-loop Task Planning with Large Language Models | [paper] | [code]
[2023/10/09] Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena | [paper] | [code]
[2023/08/07] TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage | [paper] | [code]
[2023/08/01] SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning | [paper] | [code]
[2023/05/26] AdaPlanner: Adaptive Planning from Feedback with Language Models | [paper] | [code]
[2023/05/24] Reasoning with Language Model is Planning with World Model | [paper] | [code]
[2023/05/24] Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning | [paper] | [code]
[2023/03/29] Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks | [paper] | [code]
[2023/02/03] Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents | [paper] | [code]
[2022/12/08] LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models | [paper] | [code]

Memory Mechanism

[2025/07/10] MIRIX: Multi-Agent Memory System for LLM-Based Agents | [paper] | [code]
[2025/07/07] Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions | [paper] | [code]
[2025/07/03] MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent | [paper] | [code]
[2025/06/30] Ella: Embodied Social Agents with Lifelong Memory | [paper] | [code]
[2025/06/30] State and Memory is All You Need for Robust and Reliable AI Agents | [paper] | [code]
[2025/06/20] MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents | [paper] | [code]
[2025/06/18] MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents | [paper] | [code]
[2025/06/17] Cost-Efficient Serving of LLM Agents via Test-Time Plan Caching | [paper] | [code]
[2025/06/09] G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems | [paper] | [code]
[2025/06/07] Contextual Experience Replay for Self-Improvement of Language Agents | [paper] | [code]
[2025/06/06] MAPLE: Multi-Agent Adaptive Planning with Long-Term Memory for Table Reasoning | [paper] | [code]
[2025/05/26] Towards Multi-Granularity Memory Association and Selection for Long-Term Conversational Agents | [paper] | [code]
[2025/05/26] Task Memory Engine: Spatial Memory for Robust Multi-Step LLM Agents | [paper] | [code]
[2025/05/23] Collaborative Memory: Multi-User Memory Sharing in LLM Agents with Dynamic Access Control | [paper] | [code]
[2025/05/22] Embodied Agents Meet Personalization: Exploring Memory Utilization for Personalized Assistance | [paper] | [code]
[2025/04/30] LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics | [paper] | [code]
[2025/04/28] Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory | [paper] | [code]
[2025/04/11] Task Memory Engine (TME): A Structured Memory Framework with Graph-Aware Extensions for Multi-Step LLM Agent Tasks | [paper] | [code]
[2025/03/27] MemInsight: Autonomous Memory Augmentation for LLM Agents | [paper] | [code]
[2025/03/25] MARS: Memory-Enhanced Agents with Reflective Self-improvement | [paper] | [code]
[2025/03/11] In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents | [paper] | [code]
[2025/02/17] A-MEM: Agentic Memory for LLM Agents | [paper] | [code]
[2025/02/08] On Memory Construction and Retrieval for Personalized Conversational Agents | [paper] | [code]
[2025/01/20] Zep: A Temporal Knowledge Graph Architecture for Agent Memory | [paper] | [code]
[2025/01/15] Doc-Guided Sent2Sent++: A Sent2Sent++ Agent with Doc-Guided memory for Document-level Machine Translation | [paper] | [code]
[2024/12/17] On the Structural Memory of LLM Agents | [paper] | [code]
[2024/12/17] Memory-Augmented Agent Training for Business Document Understanding | [paper] | [code]
[2024/10/10] DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory | [paper] | [code]
[2024/09/28] Crafting Personalized Agents through Retrieval-Augmented Generation on Editable Memory Graphs | [paper] | [code]
[2024/09/11] Agent Workflow Memory | [paper] | [code]
[2024/09/01] Self-evolving Agents with reflective and memory-augmented abilities | [paper] | [code]
[2024/08/18] HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model | [paper] | [code]
[2024/08/07] Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks | [paper] | [code]
[2024/05/29] Toward Conversational Agents with Context and Time Sensitive Long-term Memory | [paper] | [code]
[2024/04/15] Memory Sharing for Large Language Model based Agents | [paper] | [code]
[2024/02/19] Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations | [paper] | [code]
[2024/02/07] InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory | [paper] | [code]
[2024/02/06] RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents | [paper] | [code]
[2024/01/05] From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models | [paper] | [code]
[2023/12/22] Empowering Working Memory for Large Language Model Agents | [paper] | [code]
[2023/12/22] Personalized Large Language Model Assistant with Evolving Conditional Memory | [paper] | [code]
[2023/11/10] JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models | [paper] | [code]
[2023/06/06] ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory | [paper] | [code]
[2023/05/23] RET-LLM: Towards a General Read-Write Memory for Large Language Models | [paper] | [code]
[2023/05/17] MemoryBank: Enhancing Large Language Models with Long-Term Memory | [paper] | [code]
[2023/05/02] The Role of Summarization in Generative Agents: A Preliminary Perspective | [paper] | [code]
[2023/05/01] Learning to Reason and Memorize with Self-Notes | [paper] | [code]
[2023/04/26] Enhancing Large Language Model with Self-Controlled Memory Framework | [paper] | [code]
[2023/04/21] Emergent and Predictable Memorization in Large Language Models | [paper] | [code]

Feedback&Reflection

[2025/07/08] Conditional Multi-Stage Failure Recovery for Embodied Agents | [paper] | [code]
[2025/06/10] Reinforce LLM Reasoning through Multi-Agent Reflection | [paper] | [code]
[2025/06/04] Debate, Reflect, and Distill: Multi-Agent Feedback with Tree-Structured Preference Optimization for Efficient Language Model Enhancement | [paper] | [code]
[2025/06/04] Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to Enhance LLM Reasoning | [paper] | [code]
[2025/06/03] Mitigating Manipulation and Enhancing Persuasion: A Reflective Multi-Agent Approach for Legal Argument Generation | [paper] | [code]
[2025/05/22] Optimizing LLM-Based Multi-Agent System with Textual Feedback: A Case Study on Software Development | [paper] | [code]
[2025/05/21] ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State Reflection | [paper] | [code]
[2025/05/21] Aligning Dialogue Agents with Global Feedback via Large Language Model Reward Decomposition | [paper] | [code]
[2025/05/06] FRAME: Feedback-Refined Agent Methodology for Enhancing Medical Research Insights | [paper] | [code]
[2025/04/26] Stealing Creator's Workflow: A Creator-Inspired Agentic Framework with Iterative Feedback Loop for Improved Scientific Short-form Generation | [paper] | [code]
[2025/03/20] The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement | [paper] | [code]
[2025/03/11] In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents | [paper] | [code]
[2025/03/04] Generator-Assistant Stepwise Rollback Framework for Large Language Model Agent | [paper] | [code]
[2025/03/03] Improving Retrospective Language Agents via Joint Policy Gradient Optimization | [paper] | [code]
[2025/02/20] STeCa: Step-level Trajectory Calibration for LLM Agent Learning | [paper] | [code]
[2025/02/17] Table-Critic: A Multi-Agent Framework for Collaborative Criticism and Refinement in Table Reasoning | [paper] | [code]
[2025/02/17] A Study on Leveraging Search and Self-Feedback for Agent Reasoning | [paper] | [code]
[2025/02/03] PlotGen: Multi-Agent LLM-based Scientific Data Visualization via Multimodal Feedback | [paper] | [code]
[2025/01/26] Large Language Models as Theory of Mind Aware Generative Agents with Counterfactual Reflection | [paper] | [code]
[2025/01/23] AgentRec: Agent Recommendation Using Sentence Embeddings Aligned to Human Feedback | [paper] | [code]
[2025/01/08] InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection | [paper] | [code]
[2024/12/31] Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents | [paper] | [code]
[2024/12/22] A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions via Iterative Refinement and LLM-Driven Feedback Loops | [paper] | [code]
[2024/11/29] Training Agents with Weakly Supervised Feedback from Large Language Models | [paper] | [code]
[2024/11/21] Enhancing LLMs for Power System Simulations: A Feedback-driven Multi-agent Framework | [paper] | [code]
[2024/11/11] Using Generative AI and Multi-Agents to Provide Automatic Feedback | [paper] | [code]
[2024/11/04] Positive Experience Reflection for Agents in Interactive Text Environments | [paper] | [code]
[2024/10/29] Enhancing Financial Question Answering with a Multi-Agent Reflection Framework | [paper] | [code]
[2024/10/28] CRAT: A Multi-Agent Framework for Causality-Enhanced Reflective and Retrieval-Augmented Translation with Large Language Models | [paper] | [code]
[2024/10/25] OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization | [paper] | [code]
[2024/10/23] ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents | [paper] | [code]
[2024/10/20] Training Language Models to Critique With Multi-agent Feedback | [paper] | [code]
[2024/10/16] PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking | [paper] | [code]
[2024/10/08] DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback | [paper] | [code]
[2024/10/02] ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning | [paper] | [code]
[2024/10/02] RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance | [paper] | [code]
[2024/09/18] MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning | [paper] | [code]
[2024/09/05] E2CL: Exploration-based Error Correction Learning for Embodied Agents | [paper] | [code]
[2024/09/01] Self-evolving Agents with reflective and memory-augmented abilities | [paper] | [code]
[2024/08/30] Tool-Assisted Agent on SQL Inspection and Refinement in Real-World Scenarios | [paper] | [code]
[2024/08/15] MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL | [paper] | [code]
[2024/07/25] Recursive Introspection: Teaching Language Model Agents How to Self-Improve | [paper] | [code]
[2024/06/09] A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback Learning | [paper] | [code]
[2024/06/05] LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback | [paper] | [code]
[2024/06/03] Re-ReST: Reflection-Reinforced Self-Training for Language Agents | [paper] | [code]
[2024/03/18] QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback-based Self-Correction | [paper] | [code]
[2024/03/17] Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback | [paper] | [code]
[2024/03/08] ChatASU: Evoking LLM's Reflexion to Truly Understand Aspect Sentiment in Dialogues | [paper] | [code]
[2024/03/04] Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents | [paper] | [code]
[2024/02/27] Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization | [paper] | [code]
[2024/02/26] SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection | [paper] | [code]
[2024/02/22] Mirror: A Multiple-perspective Self-Reflection Method for Knowledge-rich Reasoning | [paper] | [code]
[2024/02/19] A Critical Evaluation of AI Feedback for Aligning Large Language Models | [paper] | [code]
[2024/02/06] AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls | [paper] | [code]
[2024/02/02] StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback | [paper] | [code]
[2023/11/14] The ART of LLM Refinement: Ask, Refine, and Trust | [paper] | [code]
[2023/10/31] Learning From Mistakes Makes LLM Better Reasoner | [paper] | [code]
[2023/10/12] A Zero-Shot Language Agent for Computer Control with Structured Reflection | [paper] | [code]
[2023/07/27] PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback | [paper] | [code]
[2023/05/22] Making Language Models Better Tool Learners with Execution Feedback | [paper] | [code]
[2023/05/17] Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback | [paper] | [code]
[2023/04/21] Improving Grounded Language Understanding in a Collaborative Environment by Interacting with Agents Through Help Feedback | [paper] | [code]
[2023/04/11] Teaching Large Language Models to Self-Debug | [paper] | [code]
[2023/03/30] Self-Refine: Iterative Refinement with Self-Feedback | [paper] | [code]

RAG

[2025/07/09] Multi-Agent Retrieval-Augmented Framework for Evidence-Based Counterspeech Against Health Misinformation | [paper] | [code]
[2025/07/04] AI-VaxGuide: An Agentic RAG-Based LLM for Vaccination Decisions | [paper] | [code]
[2025/06/28] Knowledge Augmented Finetuning Matters in both RAG and Agent Based Dialog Systems | [paper] | [code]
[2025/06/27] ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation | [paper] | [code]
[2025/06/12] CIIR@LiveRAG 2025: Optimizing Multi-Agent Retrieval Augmented Generation through Self-Training | [paper] | [code]
[2025/06/04] Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to Enhance LLM Reasoning | [paper] | [code]
[2025/05/28] Agent-UniRAG: A Trainable Open-Source LLM Agent Framework for Unified Retrieval-Augmented Generation Systems | [paper] | [code]
[2025/05/26] MA-RAG: Multi-Agent Retrieval-Augmented Generation via Collaborative Chain-of-Thought Reasoning | [paper] | [code]
[2025/05/22] O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering | [paper] | [code]
[2025/05/22] Personalizing Student-Agent Interactions Using Log-Contextualized Retrieval Augmented Generation (RAG) | [paper] | [code]
[2025/05/22] Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty | [paper] | [code]
[2025/05/21] InfoDeepSeek: Benchmarking Agentic Information Seeking for Retrieval-Augmented Generation | [paper] | [code]
[2025/05/13] ALOHA: Empowering Multilingual Agent for University Orientation with Hierarchical Retrieval | [paper] | [code]
[2025/05/12] Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent | [paper] | [code]
[2025/04/30] Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA | [paper] | [code]
[2025/04/24] A RAG-Based Multi-Agent LLM System for Natural Hazard Resilience and Adaptation | [paper] | [code]
[2025/04/15] Towards Automated Safety Requirements Derivation Using Agent-based RAG | [paper] | [code]
[2025/04/13] HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation | [paper] | [code]
[2025/04/11] TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning | [paper] | [code]
[2025/04/10] CollEX -- A Multimodal Agentic RAG System Enabling Interactive Exploration of Scientific Collections | [paper] | [code]
[2025/03/18] Retrieval-Augmented Simulacra: Generative Agents for Up-to-date and Knowledge-Adaptive Simulations | [paper] | [code]
[2025/03/14] RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration | [paper] | [code]
[2025/03/01] EXCLAIM: An Explainable Cross-Modal Agentic System for Misinformation Detection with Hierarchical Retrieval | [paper] | [code]
[2025/02/25] ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents | [paper] | [code]
[2025/02/19] RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision | [paper] | [code]
[2025/02/08] On Memory Construction and Retrieval for Personalized Conversational Agents | [paper] | [code]
[2025/02/06] Enhancing Online Learning Efficiency Through Heterogeneous Resource Integration with a Multi-Agent RAG System | [paper] | [code]
[2025/01/25] Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning | [paper] | [code]
[2024/12/31] MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation | [paper] | [code]
[2024/12/24] GeAR: Graph-enhanced Agent for Retrieval-augmented Generation | [paper] | [code]
[2024/12/20] Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG | [paper] | [code]
[2024/12/16] BioRAGent: A Retrieval-Augmented Generation System for Showcasing Generative Query Expansion and Domain-Specific Search for Scientific Q&A | [paper] | [code]
[2024/12/07] SLA Management in Reconfigurable Multi-Agent RAG: A Systems Approach to Question Answering | [paper] | [code]
[2024/11/05] Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent | [paper] | [code]
[2024/10/28] CRAT: A Multi-Agent Framework for Causality-Enhanced Reflective and Retrieval-Augmented Translation with Large Language Models | [paper] | [code]
[2024/10/18] Toolshed: Scale Tool-Equipped Agents with Advanced RAG-Tool Fusion and Tool Knowledge Bases | [paper] | [code]
[2024/10/01] Conversational Exploratory Search of Scholarly Publications Using Knowledge Graphs | [paper] | [code]
[2024/09/28] Crafting Personalized Agents through Retrieval-Augmented Generation on Editable Memory Graphs | [paper] | [code]
[2024/08/18] Agentic Retrieval-Augmented Generation for Time Series Analysis | [paper] | [code]
[2024/08/05] LLM Agents Improve Semantic Code Search | [paper] | [code]
[2024/08/03] MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance | [paper] | [code]
[2024/07/20] Golden-Retriever: High-Fidelity Agentic Retrieval Augmented Generation for Industrial Knowledge Base | [paper] | [code]
[2024/06/26] Geode: A Zero-shot Geospatial Question-Answering Agent with Explicit Reasoning and Precise Spatio-Temporal Retrieval | [paper] | [code]
[2024/06/19] StackRAG Agent: Improving Developer Answers with Retrieval-Augmented Generation | [paper] | [code]
[2024/06/09] A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback Learning | [paper] | [code]
[2024/03/05] AgentsCourt: Building Judicial Decision-Making Agents with Court Debate Simulation and Legal Knowledge Augmentation | [paper] | [code]
[2024/02/06] RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents | [paper] | [code]
[2023/12/27] Automating Knowledge Acquisition for Content-Centric Cognitive Agents Using LLMs | [paper] | [code]

Search

[2025/06/09] CheMatAgent: Enhancing LLMs for Chemistry and Materials Science through Tree-Search Based Tool Learning | [paper] | [code]
[2025/06/06] AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search | [paper] | [code]
[2025/05/26] T^2Agent A Tool-augmented Multimodal Misinformation Detection Agent with Monte Carlo Tree Search | [paper] | [code]
[2025/05/12] Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs | [paper] | [code]
[2025/04/10] The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search | [paper] | [code]
[2025/04/04] SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement | [paper] | [code]
[2025/03/18] DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal | [paper] | [code]
[2025/02/20] I-MCTS: Enhancing Agentic AutoML via Introspective Monte Carlo Tree Search | [paper] | [code]
[2025/02/18] R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs | [paper] | [code]
[2025/02/18] Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks | [paper] | [code]
[2025/02/17] A Study on Leveraging Search and Self-Feedback for Agent Reasoning | [paper] | [code]
[2025/02/05] SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs | [paper] | [code]
[2025/02/02] Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search | [paper] | [code]
[2025/01/31] KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search | [paper] | [code]
[2025/01/09] Search-o1: Agentic Search-Enhanced Large Reasoning Models | [paper] | [code]
[2024/12/24] A Novel Task-Driven Method with Evolvable Interactive Agents Using Event Trees for Enhanced Emergency Decision Support | [paper] | [code]
[2024/12/22] Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration | [paper] | [code]
[2024/12/05] Agent AI with LangGraph: A Modular Framework for Enhancing Machine Translation Using Large Language Models | [paper] | [code]
[2024/11/07] CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models | [paper] | [code]
[2024/10/29] Synergizing LLM Agents and Knowledge Graph for Socioeconomic Prediction in LBSN | [paper] | [code]
[2024/10/25] AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs | [paper] | [code]
[2024/10/22] SELA: Tree-Search Enhanced LLM Agents for Automated Machine Learning | [paper] | [code]
[2024/10/13] Expanding Search Space with Diverse Prompting Agents: An Efficient Sampling Approach for LLM Mathematical Reasoning | [paper] | [code]
[2024/10/13] LLM-Based Multi-Agent Systems are Scalable Graph Generative Models | [paper] | [code]
[2024/10/02] ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning | [paper] | [code]
[2024/09/09] SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning | [paper] | [code]
[2024/07/01] Tree Search for Language Model Agents | [paper] | [code]
[2024/06/17] Input Conditioned Graph Generation for Language Agents | [paper] | [code]
[2024/02/17] KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph | [paper] | [code]
[2024/02/16] When is Tree Search Useful for LLM Planning? It Depends on the Discriminator | [paper] | [code]
[2024/02/09] CoSearchAgent: A Lightweight Collaborative Search Agent with Large Language Models | [paper] | [code]
[2023/05/17] Tree of Thoughts: Deliberate Problem Solving with Large Language Models | [paper] | [code]

Interaction

Role Playing

[2025/06/28] Agent-to-Agent Theory of Mind: Testing Interlocutor Awareness among Large Language Models | [paper] | [code]
[2025/06/24] MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration | [paper] | [code]
[2025/06/20] Language-Informed Synthesis of Rational Agent Models for Grounded Theory-of-Mind Reasoning On-The-Fly | [paper] | [code]
[2025/06/06] PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time | [paper] | [code]
[2025/06/02] Thinking in Character: Advancing Role-Playing Agents with Role-Aware Reasoning | [paper] | [code]
[2025/05/30] Context-Aware Sentiment Forecasting via LLM-based Multi-Perspective Role-Playing Agents | [paper] | [code]
[2025/05/29] ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents | [paper] | [code]
[2025/05/26] OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction | [paper] | [code]
[2025/05/20] Inter(sectional) Alia(s): Ambiguity in Voice Agent Identity via Intersectional Japanese Self-Referents | [paper] | [code]
[2025/04/29] BrAIcht, a theatrical agent that speaks like Bertolt Brecht's characters | [paper] | [code]
[2025/04/25] Exploring Personality-Aware Interactions in Salesperson Dialogue Agents | [paper] | [code]
[2025/04/13] UXAgent: A System for Simulating Usability Testing of Web Design with LLM Agents | [paper] | [code]
[2025/04/03] LLMs as Deceptive Agents: How Role-Based Prompting Induces Semantic Ambiguity in Puzzle Tasks | [paper] | [code]
[2025/03/14] AIstorian lets AI be a historian: A KG-powered multi-agent system for accurate biography generation | [paper] | [code]
[2025/02/20] InstructAgent: Building User Controllable Recommender via LLM Agent | [paper] | [code]
[2025/02/18] SEFL: Harnessing Large Language Model Agents to Improve Educational Feedback Systems | [paper] | [code]
[2025/02/17] Can LLM Agents Maintain a Persona in Discourse? | [paper] | [code]
[2025/02/17] LM Agents for Coordinating Multi-User Information Gathering | [paper] | [code]
[2025/02/16] SCALE: Towards Collaborative Content Analysis in Social Science with Large Language Model Agents and Human Intervention | [paper] | [code]
[2025/02/13] Language Agents as Digital Representatives in Collective Decision-Making | [paper] | [code]
[2025/02/06] PsyPlay: Personality-Infused Role-Playing Conversational Agents | [paper] | [code]
[2025/02/03] Plan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLM Agents As A Daily Assistant | [paper] | [code]
[2025/01/23] AgentRec: Agent Recommendation Using Sentence Embeddings Aligned to Human Feedback | [paper] | [code]
[2025/01/15] Personality Modeling for Persuasion of Misinformation using AI Agent | [paper] | [code]
[2024/12/28] BaiJia: A Large-Scale Role-Playing Agent Corpus of Chinese Historical Characters | [paper] | [code]
[2024/12/22] Modular Conversational Agents for Surveys and Interviews | [paper] | [code]
[2024/12/11] SweetieChat: A Strategy-Enhanced Role-playing Framework for Diverse Scenarios Handling Emotional Support Agent | [paper] | [code]
[2024/12/10] My Words Imply Your Opinion: Reader Agent-Based Propagation Enhancement for Personalized Implicit Emotion Analysis | [paper] | [code]
[2024/11/21] Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel Planning | [paper] | [code]
[2024/11/19] Probing the Capacity of Language Model Agents to Operationalize Disparate Experiential Context Despite Distraction | [paper] | [code]
[2024/11/12] SHARP: Unlocking Interactive Hallucination via Stance Transfer in Role-Playing Agents | [paper] | [code]
[2024/11/04] A Multi-Task Role-Playing Agent Capable of Imitating Character Linguistic Styles | [paper] | [code]
[2024/10/28] Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments | [paper] | [code]
[2024/10/24] Schema-Guided Culture-Aware Complex Event Simulation with Multi-Agent Role-Play | [paper] | [code]
[2024/09/23] ERABAL: Enhancing Role-Playing Agents through Boundary-Aware Learning | [paper] | [code]
[2024/09/19] FoodPuzzle: Developing Large Language Model Agents as Flavor Scientists | [paper] | [code]
[2024/09/12] TravelAgent: An AI Assistant for Personalized Travel Planning | [paper] | [code]
[2024/09/11] Using Generative Agents to Create Tip Sheets for Investigative Data Reporting | [paper] | [code]
[2024/08/28] Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions | [paper] | [code]
[2024/08/21] Drama Engine: A Framework for Narrative Agents | [paper] | [code]
[2024/06/24] The Effects of Embodiment and Personality Expression on Learning in LLM-based Educational Agents | [paper] | [code]
[2024/06/17] HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing | [paper] | [code]
[2024/06/11] Agent-SiMT: Agent-assisted Simultaneous Machine Translation with Large Language Models | [paper] | [code]
[2024/06/09] Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions | [paper] | [code]
[2024/05/28] TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models | [paper] | [code]
[2024/05/10] LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play | [paper] | [code]
[2024/05/08] LLMs with Personalities in Multi-issue Negotiation Games | [paper] | [code]
[2024/05/06] Large Language Models (LLMs) as Agents for Augmented Democracy | [paper] | [code]
[2024/05/02] GAIA: A General AI Assistant for Intelligent Accelerator Operations | [paper] | [code]
[2024/05/01] "Ask Me Anything": How Comcast Uses LLMs to Assist Agents in Real Time | [paper] | [code]
[2024/04/26] Large Language Model Agent as a Mechanical Designer | [paper] | [code]
[2024/04/19] Cooperative Sentiment Agents for Multimodal Sentiment Analysis | [paper] | [code]
[2024/03/31] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model | [paper] | [code]
[2024/03/23] EduAgent: Generative Student Agents in Learning | [paper] | [code]
[2024/03/19] Characteristic AI Agents via Large Language Models | [paper] | [code]
[2024/03/15] VideoAgent: Long-form Video Understanding with Large Language Model as Agent | [paper] | [code]
[2024/03/13] Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation | [paper] | [code]
[2024/02/29] On the Decision-Making Abilities in Role-Playing using Large Language Models | [paper] | [code]
[2024/02/28] Prospect Personalized Recommendation on Large Language Model-based Agent Platform | [paper] | [code]
[2024/02/26] Language Agents as Optimizable Graphs | [paper] | [code]
[2024/02/22] Triad: A Framework Leveraging a Multi-Role LLM-based Agent to Solve Knowledge Base Question Answering | [paper] | [code]
[2024/02/22] Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation | [paper] | [code]
[2024/02/21] Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent | [paper] | [code]
[2024/02/19] Stick to your Role! Stability of Personal Values Expressed in Large Language Models | [paper] | [code]
[2024/02/18] Modelling Political Coalition Negotiations Using LLM-based Agents | [paper] | [code]
[2024/02/06] Professional Agents -- Evolving Large Language Models into Autonomous Experts with Human-Level Competencies | [paper] | [code]
[2024/02/06] Can Generative Agents Predict Emotion? | [paper] | [code]
[2024/02/05] GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models | [paper] | [code]
[2024/01/31] LLMs Simulate Big Five Personality Traits: Further Evidence | [paper] | [code]
[2023/12/22] Personalized Large Language Model Assistant with Evolving Conditional Memory | [paper] | [code]
[2023/12/21] ChatGPT as a commenter to the news: can LLMs generate human-like opinions? | [paper] | [code]
[2023/12/20] Machine Mindset: An MBTI Exploration of Large Language Models | [paper] | [code]
[2023/12/19] Can ChatGPT be Your Personal Medical Assistant? | [paper] | [code]
[2023/10/13] AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems | [paper] | [code]
[2023/10/01] RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models | [paper] | [code]
[2023/09/02] ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models | [paper] | [code]
[2023/08/22] Towards an On-device Agent for Text Rewriting | [paper] | [code]
[2023/08/10] LLM As DBA | [paper] | [code]
[2023/08/03] InterAct: Exploring the Potentials of ChatGPT as a Cooperative Agent | [paper] | [code]
[2023/07/11] Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration | [paper] | [code]
[2023/07/05] Building Cooperative Embodied Agents Modularly with Large Language Models | [paper] | [code]
[2023/05/25] Role-Play with Large Language Models | [paper] | [code]
[2023/05/09] TidyBot: Personalized Robot Assistance with Large Language Models | [paper] | [code]

Conversation

[2025/06/28] Knowledge Augmented Finetuning Matters in both RAG and Agent Based Dialog Systems | [paper] | [code]
[2025/06/24] Augmenting Multi-Agent Communication with State Delta Trajectory | [paper] | [code]
[2025/06/17] From What to Respond to When to Respond: Timely Response Generation for Open-domain Dialogue Agents | [paper] | [code]
[2025/06/17] Expectation Confirmation Preference Optimization for Multi-Turn Conversational Recommendation Agent | [paper] | [code]
[2025/06/13] The Behavior Gap: Evaluating Zero-shot LLM Agents in Complex Task-Oriented Dialogs | [paper] | [code]
[2025/06/11] Chat-of-Thought: Collaborative Multi-Agent System for Generating Domain Specific Information | [paper] | [code]
[2025/06/09] $\tau^2$-Bench: Evaluating Conversational Agents in a Dual-Control Environment | [paper] | [code]
[2025/06/04] AI Agents for Conversational Patient Triage: Preliminary Simulation-Based Evaluation with Real-World EHR Data | [paper] | [code]
[2025/06/04] CLAIM: An Intent-Driven Multi-Agent Framework for Analyzing Manipulation in Courtroom Dialogues | [paper] | [code]
[2025/05/29] A Practical Approach for Building Production-Grade Conversational Agents with Workflow Graphs | [paper] | [code]
[2025/05/28] ChatCFD: an End-to-End CFD Agent with Domain-specific Structured Thinking | [paper] | [code]
[2025/05/26] Towards Multi-Granularity Memory Association and Selection for Long-Term Conversational Agents | [paper] | [code]
[2025/05/24] Multi-Party Conversational Agents: A Survey | [paper] | [code]
[2025/05/21] Aligning Dialogue Agents with Global Feedback via Large Language Model Reward Decomposition | [paper] | [code]
[2025/04/29] BrAIcht, a theatrical agent that speaks like Bertolt Brecht's characters | [paper] | [code]
[2025/04/26] MATCHA: Can Multi-Agent Collaboration Build a Trustworthy Conversational Recommender? | [paper] | [code]
[2025/04/21] EducationQ: Evaluating LLMs' Teaching Capabilities Through Multi-Agent Dialogue Framework | [paper] | [code]
[2025/04/20] DialogueAgents: A Hybrid Agent-Based Speech Synthesis Framework for Multi-Party Dialogue | [paper] | [code]
[2025/04/12] A Multi-view Discourse Framework for Integrating Semantic and Syntactic Features in Dialog Agents | [paper] | [code]
[2025/04/07] Bridging Industrial Expertise and XR with LLM-Powered Conversational Agents | [paper] | [code]
[2025/04/07] A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions | [paper] | [code]
[2025/03/28] Evaluating LLM-based Agents for Multi-Turn Conversations: A Survey | [paper] | [code]
[2025/03/27] EQ-Negotiator: An Emotion-Reasoning LLM Agent in Credit Dialogues | [paper] | [code]
[2025/03/26] 3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark | [paper] | [code]
[2025/03/25] CoMAC: Conversational Agent for Multi-Source Auxiliary Context with Sparse and Symmetric Latent Interactions | [paper] | [code]
[2025/03/25] Substance over Style: Evaluating Proactive Conversational Coaching Agents | [paper] | [code]
[2025/03/18] Personalized Attacks of Social Engineering in Multi-turn Conversations -- LLM Agents for Simulation and Detection | [paper] | [code]
[2025/03/11] In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents | [paper] | [code]
[2025/03/05] Cite Before You Speak: Enhancing Context-Response Grounding in E-commerce Conversational LLM-Agents | [paper] | [code]
[2025/02/24] Turning Conversations into Workflows: A Framework to Extract and Evaluate Dialog Workflows for Service AI Agents | [paper] | [code]
[2025/02/20] Enhancing Conversational Agents with Theory of Mind: Aligning Beliefs, Desires, and Intentions for Human-Like Interaction | [paper] | [code]
[2025/02/18] One Size doesn't Fit All: A Personalized Conversational Tutoring Agent for Mathematics Instruction | [paper] | [code]
[2025/02/18] Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors | [paper] | [code]
[2025/02/18] You need to MIMIC to get FAME: Solving Meeting Transcript Scarcity with a Multi-Agent Conversations | [paper] | [code]
[2025/02/17] InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context | [paper] | [code]
[2025/02/13] Reliable Conversational Agents under ASP Control that Understand Natural Language | [paper] | [code]
[2025/02/12] Can a Single Model Master Both Multi-turn Conversations and Tool Use? CoALM: A Unified Conversational Agentic Language Model | [paper] | [code]
[2025/02/09] MTPChat: A Multimodal Time-Aware Persona Dataset for Conversational Agents | [paper] | [code]
[2025/02/09] HamRaz: A Culture-Based Persian Conversation Dataset for Person-Centered Therapy Using LLM Agents | [paper] | [code]
[2025/02/08] On Memory Construction and Retrieval for Personalized Conversational Agents | [paper] | [code]
[2025/02/06] PsyPlay: Personality-Infused Role-Playing Conversational Agents | [paper] | [code]
[2025/01/24] Unmasking Conversational Bias in AI Multiagent Systems | [paper] | [code]
[2025/01/23] Communicating Activations Between Language Model Agents | [paper] | [code]
[2025/01/19] IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems | [paper] | [code]
[2025/01/14] Developing Enhanced Conversational Agents for Social Virtual Worlds | [paper] | [code]
[2025/01/03] PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents | [paper] | [code]
[2024/12/30] Exploring and Controlling Diversity in LLM-Agent Conversation | [paper] | [code]
[2024/12/24] Extracting triples from dialogues for conversational social agents | [paper] | [code]
[2024/12/22] Modular Conversational Agents for Surveys and Interviews | [paper] | [code]
[2024/12/21] InfoTech Assistant : A Multimodal Conversational Agent for InfoTechnology Web Portal Queries | [paper] | [code]
[2024/12/13] Script-Based Dialog Policy Planning for LLM-Powered Conversational Agents: A Basic Architecture for an "AI Therapist" | [paper] | [code]
[2024/12/06] CALICO: Conversational Agent Localization via Synthetic Data Generation | [paper] | [code]
[2024/12/05] Educational-Psychological Dialogue Robot Based on Multi-Agent Collaboration | [paper] | [code]
[2024/12/01] Examining Identity Drift in Conversations of LLM Agents | [paper] | [code]
[2024/11/07] Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model | [paper] | [code]
[2024/11/07] Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations | [paper] | [code]
[2024/11/06] MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue | [paper] | [code]
[2024/11/01] DARD: A Multi-Agent Approach for Task-Oriented Dialog Systems | [paper] | [code]
[2024/11/01] ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents | [paper] | [code]
[2024/10/29] MARCO: Multi-Agent Real-time Chat Orchestration | [paper] | [code]
[2024/10/25] AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs | [paper] | [code]
[2024/10/18] Coherence-Driven Multimodal Safety Dialogue with Active Learning for Embodied Agents | [paper] | [code]
[2024/10/15] HR-Agent: A Task-Oriented Dialogue (TOD) LLM Agent Tailored for HR Applications | [paper] | [code]
[2024/10/10] Rewriting Conversational Utterances with Instructed Large Language Models | [paper] | [code]
[2024/09/24] Automated test generation to evaluate tool-augmented LLMs as conversational AI agents | [paper] | [code]
[2024/09/23] Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents | [paper] | [code]
[2024/09/13] AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents | [paper] | [code]
[2024/09/06] Sparse Rewards Can Self-Train Dialogue Agents | [paper] | [code]
[2024/09/02] Co-Learning: Code Learning for Multi-Agent Reinforcement Collaborative Framework with Conversational Natural Language Interfaces | [paper] | [code]
[2024/08/27] Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations | [paper] | [code]
[2024/08/22] MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symbolic LLM Agents | [paper] | [code]
[2024/08/13] What should I wear to a party in a Greek taverna? Evaluation for Conversational Agents in the Fashion Domain | [paper] | [code]
[2024/08/06] OpenOmni: A Collaborative Open Source Tool for Building Future-Ready Multimodal Conversational Agents | [paper] | [code]
[2024/08/03] Self-Emotion Blended Dialogue Generation in Social Simulation Agents | [paper] | [code]
[2024/07/31] Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent | [paper] | [code]
[2024/07/13] Cohesive Conversations: Enhancing Authenticity in Multi-Agent Simulated Dialogues | [paper] | [code]
[2024/07/04] Controllable Conversations: Planning-Based Dialogue Agent with Large Language Models | [paper] | [code]
[2024/07/01] Empathic Grounding: Explorations using Multimodal Interaction and Large Language Models with Conversational Agents | [paper] | [code]
[2024/06/30] CAMON: Cooperative Agents for Multi-Object Navigation with LLM-based Conversations | [paper] | [code]
[2024/06/09] Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions | [paper] | [code]
[2024/05/29] Toward Conversational Agents with Context and Time Sensitive Long-term Memory | [paper] | [code]
[2024/05/16] Speaker Verification in Agent-Generated Conversations | [paper] | [code]
[2024/04/19] Towards Human-centered Proactive Conversational Agents | [paper] | [code]
[2024/04/10] Apollonion: Profile-centric Dialog Agent | [paper] | [code]
[2024/03/17] Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback | [paper] | [code]
[2024/03/08] ChatASU: Evoking LLM's Reflexion to Truly Understand Aspect Sentiment in Dialogues | [paper] | [code]
[2024/02/25] Understanding Public Perceptions of AI Conversational Agents: A Cross-Cultural Analysis | [paper] | [code]
[2024/02/23] On the Multi-turn Instruction Following for Conversational Web Agents | [paper] | [code]
[2024/02/20] CHATATC: Large Language Model-Driven Conversational Agents for Supporting Strategic Air Traffic Flow Management | [paper] | [code]
[2024/01/29] Assistive Large Language Model Agents for Socially-Aware Negotiation Dialogues | [paper] | [code]
[2024/01/10] Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk | [paper] | [code]
[2024/01/02] CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation | [paper] | [code]
[2023/12/21] Team Flow at DRC2023: Building Common Ground and Text-based Turn-taking in a Travel Agent Spoken Dialogue System | [paper] | [code]
[2023/11/15] ToolTalk: Evaluating Tool-Usage in a Conversational Setting | [paper] | [code]
[2023/10/01] Adapting LLM Agents Through Communication | [paper] | [code]
[2023/06/28] Inferring the Goals of Communicating Agents from Actions and Instructions | [paper] | [code]
[2023/04/26] Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models | [paper] | [code]
[2023/03/31] CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society | [paper] | [code]

Game Playing

[2025/06/30] SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning | [paper] | [code]
[2025/06/05] Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games | [paper] | [code]
[2025/06/04] TextAtari: 100K Frames Game Playing with Language Agents | [paper] | [code]
[2025/05/29] The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets | [paper] | [code]
[2025/05/28] First Steps Towards Overhearing LLM Agents: A Case Study With Dungeons & Dragons Gameplay | [paper] | [code]
[2025/05/25] When Ethics and Payoffs Diverge: LLM Agents in Morally Charged Social Dilemmas | [paper] | [code]
[2025/05/23] CoMet: Metaphor-Driven Covert Communication for Multi-Agent Language Games | [paper] | [code]
[2025/05/20] BAR: A Backward Reasoning based Agent for Complex Minecraft Tasks | [paper] | [code]
[2025/04/23] Monte Carlo Planning with Large Language Model for Text-Based Game Agents | [paper] | [code]
[2025/04/15] TextArena | [paper] | [code]
[2025/04/09] Persona Dynamics: Unveiling the Impact of Personality Traits on Agents in Text-Based Games | [paper] | [code]
[2025/03/08] DSGBench: A Diverse Strategic Game Benchmark for Evaluating LLM-based Agents in Complex Decision-Making Environments | [paper] | [code]
[2025/03/06] VQEL: Enabling Self-Developed Symbolic Language in Agents through Vector Quantization in Emergent Language Games | [paper] | [code]
[2025/03/06] Factorio Learning Environment | [paper] | [code]
[2025/02/05] Multimodal Transformer Models for Turn-taking Prediction: Effects on Conversational Dynamics of Human-Agent Interaction during Cooperative Gameplay | [paper] | [code]
[2025/02/01] Who's the MVP? A Game-Theoretic Evaluation Benchmark for Modular Attribution in LLM Agents | [paper] | [code]
[2025/01/24] Multi-agent KTO: Reinforcing Strategic Interactions of Large Language Model in Language Game | [paper] | [code]
[2024/12/06] TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft | [paper] | [code]
[2024/11/08] Game-theoretic LLM: Agent Workflow for Negotiation Games | [paper] | [code]
[2024/10/28] Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games | [paper] | [code]
[2024/09/03] An Implementation of Werewolf Agent That does not Truly Trust LLMs | [paper] | [code]
[2024/08/05] Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information | [paper] | [code]
[2024/07/23] AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game | [paper] | [code]
[2024/07/17] A LLM Benchmark based on the Minecraft Builder Dialog Agent Task | [paper] | [code]
[2024/06/27] OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents | [paper] | [code]
[2024/06/07] GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents | [paper] | [code]
[2024/06/05] The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games | [paper] | [code]
[2024/05/24] Hacc-Man: An Arcade Game for Jailbreaking LLMs | [paper] | [code]
[2024/05/23] Human-Agent Cooperation in Games under Incomplete Information through Natural Language Communication | [paper] | [code]
[2024/05/08] LLMs with Personalities in Multi-issue Negotiation Games | [paper] | [code]
[2024/04/30] PANGeA: Procedural Artificial Narrative using Generative AI for Turn-Based Video Games | [paper] | [code]
[2024/04/03] Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game | [paper] | [code]
[2024/03/28] MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs | [paper] | [code]
[2024/03/18] How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments | [paper] | [code]
[2024/02/19] PsychoGAT: A Novel Psychological Measurement Paradigm through Interactive Fiction Games with LLM Agents | [paper] | [code]
[2024/02/13] Large Language Models as Minecraft Agents | [paper] | [code]
[2024/02/12] Large Language Models as Agents in Two-Player Games | [paper] | [code]
[2024/02/04] Enhance Reasoning for Large Language Models in the Game Werewolf | [paper] | [code]
[2024/02/02] PokeLLMon: A Human-Parity Agent for Pokemon Battles with Large Language Models | [paper] | [code]
[2023/12/29] Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game | [paper] | [code]
[2023/12/01] Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games | [paper] | [code]
[2023/10/31] Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models | [paper] | [code]
[2023/09/29] Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4 | [paper] | [code]
[2023/09/18] MindAgent: Emergent Gaming Interaction | [paper] | [code]
[2023/09/10] An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language Model Game Agents | [paper] | [code]
[2023/09/09] Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf | [paper] | [code]
[2023/08/23] Are ChatGPT and GPT-4 Good Poker Players? -- A Pre-Flop Analysis | [paper] | [code]
[2023/05/31] Recursive Metropolis-Hastings Naming Game: Symbol Emergence in a Multi-agent System based on Probabilistic Generative Models | [paper] | [code]
[2023/05/26] Playing repeated games with Large Language Models | [paper] | [code]
[2023/05/25] Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory | [paper] | [code]
[2023/05/08] Knowledge-enhanced Agents for Interactive Text Games | [paper] | [code]
[2023/04/06] Can Large Language Models Play Text Games Well? Current State-of-the-Art and Open Questions | [paper] | [code]

Human-Agent Interaction

[2025/06/11] A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy | [paper] | [code]
[2025/05/16] Talk to Your Slides: Language-Driven Agents for Efficient Slide Editing | [paper] | [code]
[2025/03/26] TAMA: A Human-AI Collaborative Thematic Analysis Framework Using Multi-Agent LLMs for Clinical Interviews | [paper] | [code]
[2025/02/17] Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration | [paper] | [code]
[2025/02/05] Multimodal Transformer Models for Turn-taking Prediction: Effects on Conversational Dynamics of Human-Agent Interaction during Cooperative Gameplay | [paper] | [code]
[2025/01/28] CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation | [paper] | [code]
[2024/12/20] Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration | [paper] | [code]
[2024/06/28] Designing and Evaluating Multi-Chatbot Interface for Human-AI Communication: Preliminary Findings from a Persuasion Task | [paper] | [code]
[2024/06/11] Towards Human-AI Collaboration in Healthcare: Guided Deferral Systems with Large Language Models | [paper] | [code]
[2024/06/02] Towards a copilot in BIM authoring tool using a large language model-based agent for intelligent human-machine interaction | [paper] | [code]
[2024/03/05] ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary | [paper] | [code]
[2024/02/20] Large Language Model-based Human-Agent Collaboration for Complex Task Solving | [paper] | [code]
[2024/02/18] Shaping Human-AI Collaboration: Varied Scaffolding Levels in Co-writing with Language Models | [paper] | [code]
[2024/02/17] MONAL: Model Autophagy Analysis for Modeling Human-AI Interactions | [paper] | [code]
[2023/09/22] Learning to Coordinate with Anyone | [paper] | [code]
[2023/07/31] HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution | [paper] | [code]
[2023/04/26] Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models | [paper] | [code]

Tool Usage

[2025/07/10] PyVision: Agentic Vision with Dynamic Tooling | [paper] | [code]
[2025/07/09] VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation | [paper] | [code]
[2025/07/03] WebSailor: Navigating Super-human Reasoning for Web Agent | [paper] | [code]
[2025/07/02] OpenTable-R1: A Reinforcement Learning Augmented Tool Agent for Open-Domain Table Question Answering | [paper] | [code]
[2025/06/30] LineRetriever: Planning-Aware Observation Reduction for Web Agents | [paper] | [code]
[2025/06/27] More Vulnerable than You Think: On the Stability of Tool-Integrated LLM Agents | [paper] | [code]
[2025/06/24] Doc2Agent: Scalable Generation of Tool-Using Agents from API Documentation | [paper] | [code]
[2025/06/24] NaviAgent: Bilevel Planning on Tool Dependency Graphs for Function Calling | [paper] | [code]
[2025/06/18] Understanding GUI Agent Localization Biases through Logit Sharpness | [paper] | [code]
[2025/06/18] Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence | [paper] | [code]
[2025/06/17] AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents | [paper] | [code]
[2025/06/12] VideoDeepResearch: Long Video Understanding With Agentic Tool Using | [paper] | [code]
[2025/06/12] Build the web for agents, not agents for the web | [paper] | [code]
[2025/06/10] Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Scheduling System | [paper] | [code]
[2025/06/10] GUIRoboTron-Speech: Towards Automated GUI Agents Based on Speech Instructions | [paper] | [code]
[2025/06/09] CheMatAgent: Enhancing LLMs for Chemistry and Materials Science through Tree-Search Based Tool Learning | [paper] | [code]
[2025/06/04] Go-Browse: Training Web Agents with Structured Exploration | [paper] | [code]
[2025/06/03] GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents | [paper] | [code]
[2025/06/02] AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning | [paper] | [code]
[2025/05/30] MedOrch: Medical Diagnosis with Tool-Augmented Reasoning Agents for Flexible Extensibility | [paper] | [code]
[2025/05/28] RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments | [paper] | [code]
[2025/05/28] EvolveSearch: An Iterative Self-Evolving Search Agent | [paper] | [code]
[2025/05/28] UI-Evol: Automatic Knowledge Evolving for Computer Use Agents | [paper] | [code]
[2025/05/28] WebDancer: Towards Autonomous Information Seeking Agency | [paper] | [code]
[2025/05/27] BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism | [paper] | [code]
[2025/05/27] UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents | [paper] | [code]
[2025/05/27] ChemHAS: Hierarchical Agent Stacking for Enhancing Chemistry Tools | [paper] | [code]
[2025/05/26] T^2Agent A Tool-augmented Multimodal Misinformation Detection Agent with Monte Carlo Tree Search | [paper] | [code]
[2025/05/26] WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback | [paper] | [code]
[2025/05/23] Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding | [paper] | [code]
[2025/05/23] ProgRM: Build Better GUI Agents with Progress Rewards | [paper] | [code]
[2025/05/23] Gaming Tool Preferences in Agentic LLMs | [paper] | [code]
[2025/05/22] WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning | [paper] | [code]
[2025/05/22] T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning | [paper] | [code]
[2025/05/21] Web-Shepherd: Advancing PRMs for Reinforcing Web Agents | [paper] | [code]
[2025/05/21] X-WebAgentBench: A Multilingual Interactive Web Benchmark for Evaluating Global Agentic System | [paper] | [code]
[2025/05/21] GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents | [paper] | [code]
[2025/05/21] AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving | [paper] | [code]
[2025/05/20] Mobile-Agent-V: A Video-Guided Approach for Effortless and Efficient Operational Knowledge Injection in Mobile Automation | [paper] | [code]
[2025/05/20] Efficient Agent Training for Computer Use | [paper] | [code]
[2025/05/20] s3: You Don't Need That Much Data to Train a Search Agent via RL | [paper] | [code]
[2025/05/19] GEM: Gaussian Embedding Modeling for Out-of-Distribution Detection in GUI Agents | [paper] | [code]
[2025/05/18] Enhance Mobile Agents Thinking Process Via Iterative Preference Learning | [paper] | [code]
[2025/05/17] Demystifying and Enhancing the Efficiency of Large Language Model Based Search Agents | [paper] | [code]
[2025/05/16] EnvInjection: Environmental Prompt Injection Attack to Multi-modal Web Agents | [paper] | [code]
[2025/05/09] ScaleMCP: Dynamic and Auto-Synchronizing Model Context Protocol Tools for LLM Agents | [paper] | [code]
[2025/04/28] MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools | [paper] | [code]
[2025/04/27] AndroidGen: Building an Android Language Agent under Data Scarcity | [paper] | [code]
[2025/04/24] Toward a Human-Centered Evaluation Framework for Trustworthy LLM-Powered GUI Agents | [paper] | [code]
[2025/04/23] WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model | [paper] | [code]
[2025/04/22] Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation | [paper] | [code]
[2025/04/19] InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners | [paper] | [code]
[2025/04/17] WebLists: Extracting Structured Information From Complex Interactive Websites Using Executable LLM Agents | [paper] | [code]
[2025/04/16] Enhancing Web Agents with Explicit Rollback Mechanisms | [paper] | [code]
[2025/04/15] The Obvious Invisible Threat: LLM-Powered GUI Agents' Vulnerability to Fine-Print Injections | [paper] | [code]
[2025/04/14] Breaking the Data Barrier -- Building GUI Agents Through Task Generalization | [paper] | [code]
[2025/04/14] GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents | [paper] | [code]
[2025/04/09] Inducing Programmatic Skills for Agentic Tasks | [paper] | [code]
[2025/04/09] SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills | [paper] | [code]
[2025/04/02] An Illusion of Progress? Assessing the Current State of Web Agents | [paper] | [code]
[2025/04/01] On the Robustness of Agentic Function Calling | [paper] | [code]
[2025/04/01] Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents | [paper] | [code]
[2025/03/26] Open Deep Search: Democratizing Search with Open-source Reasoning Agents | [paper] | [code]
[2025/03/24] Safeguarding Mobile GUI Agent via Logic-based Action Verification | [paper] | [code]
[2025/03/18] PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play | [paper] | [code]
[2025/03/14] DeskVision: Large Scale Desktop Region Captioning for Advanced GUI Agents | [paper] | [code]
[2025/03/12] Learning to Contextualize Web Pages for Enhanced Decision Making by LLM Agents | [paper] | [code]
[2025/03/10] BEARCUBS: A benchmark for computer-using web agents | [paper] | [code]
[2025/03/06] Measuring temporal effects of agent knowledge by date-controlled tool use | [paper] | [code]
[2025/03/06] SafeArena: Evaluating the Safety of Autonomous Web Agents | [paper] | [code]
[2025/03/04] LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications | [paper] | [code]
[2025/03/01] Smoothing Grounding and Reasoning for MLLM-Powered GUI Agents with Query-Oriented Pivot Tasks | [paper] | [code]
[2025/02/27] Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis | [paper] | [code]
[2025/02/24] MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions | [paper] | [code]
[2025/02/24] Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration | [paper] | [code]
[2025/02/17] LLM Agents Making Agent Tools | [paper] | [code]
[2025/02/17] SMART: Self-Aware Agent for Tool Overuse Mitigation | [paper] | [code]
[2025/02/16] OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning | [paper] | [code]
[2025/02/12] Can a Single Model Master Both Multi-turn Conversations and Tool Use? CoALM: A Unified Conversational Agentic Language Model | [paper] | [code]
[2025/02/07] Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research | [paper] | [code]
[2025/02/06] Division-of-Thoughts: Harnessing Hybrid Language Model Synergy for Efficient On-Device Agents | [paper] | [code]
[2025/02/05] ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation | [paper] | [code]
[2025/01/28] CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation | [paper] | [code]
[2025/01/21] UI-TARS: Pioneering Automated GUI Interaction with Native Agents | [paper] | [code]
[2025/01/20] Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks | [paper] | [code]
[2025/01/20] PlotEdit: Natural Language-Driven Accessible Chart Editing in PDFs via Multimodal LLM Agents | [paper] | [code]
[2025/01/08] InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection | [paper] | [code]
[2025/01/08] FinSphere: A Conversational Stock Analysis Agent Equipped with Quantitative Tools based on Real-Time Database | [paper] | [code]
[2025/01/07] PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides | [paper] | [code]
[2024/12/28] Efficient Multi-Agent Collaboration with Tool Use for Online Planning in Complex Table Question Answering | [paper] | [code]
[2024/12/21] InfoTech Assistant : A Multimodal Conversational Agent for InfoTechnology Web Portal Queries | [paper] | [code]
[2024/12/12] AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials | [paper] | [code]
[2024/12/08] Cooperative SQL Generation for Segmented Databases By Using Multi-functional LLM Agents | [paper] | [code]
[2024/12/05] Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction | [paper] | [code]
[2024/11/26] ShowUI: One Vision-Language-Action Model for GUI Visual Agent | [paper] | [code]
[2024/11/22] ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data | [paper] | [code]
[2024/11/20] AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations | [paper] | [code]
[2024/11/15] The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use | [paper] | [code]
[2024/11/04] WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning | [paper] | [code]
[2024/11/04] Attacking Vision-Language Computer Agents via Pop-ups | [paper] | [code]
[2024/11/02] Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage | [paper] | [code]
[2024/10/28] AutoGLM: Autonomous Foundation Agents for GUIs | [paper] | [code]
[2024/10/25] OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization | [paper] | [code]
[2024/10/24] Infogent: An Agent-Based Framework for Web Information Aggregation | [paper] | [code]
[2024/10/23] ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents | [paper] | [code]
[2024/10/22] Large Language Models Empowered Personalized Web Agents | [paper] | [code]
[2024/10/21] VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use | [paper] | [code]
[2024/10/21] Beyond Browsing: API-Based Web Agents | [paper] | [code]
[2024/10/18] Toolshed: Scale Tool-Equipped Agents with Advanced RAG-Tool Fusion and Tool Knowledge Bases | [paper] | [code]
[2024/10/17] Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation | [paper] | [code]
[2024/10/17] MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling | [paper] | [code]
[2024/10/17] MobA: A Two-Level Agent System for Efficient Mobile Task Automation | [paper] | [code]
[2024/10/17] AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents | [paper] | [code]
[2024/10/16] Agent Skill Acquisition for Large Language Models via CycleQD | [paper] | [code]
[2024/10/10] Agent S: An Open Agentic Framework that Uses Computers Like a Human | [paper] | [code]
[2024/10/07] Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents | [paper] | [code]
[2024/10/03] NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild | [paper] | [code]
[2024/09/24] Automated test generation to evaluate tool-augmented LLMs as conversational AI agents | [paper] | [code]
[2024/09/17] EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage | [paper] | [code]
[2024/09/01] TinyAgent: Function Calling at the Edge | [paper] | [code]
[2024/08/30] Tool-Assisted Agent on SQL Inspection and Refinement in Real-World Scenarios | [paper] | [code]
[2024/08/15] VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool | [paper] | [code]
[2024/08/05] Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions | [paper] | [code]
[2024/08/01] OmniParser for Pure Vision Based GUI Agent | [paper] | [code]
[2024/07/26] AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents | [paper] | [code]
[2024/07/22] AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? | [paper] | [code]
[2024/07/11] GTA: A Benchmark for General Tool Agents | [paper] | [code]
[2024/07/01] Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents | [paper] | [code]
[2024/06/17] GUICourse: From General Vision Language Models to Versatile GUI Agents | [paper] | [code]
[2024/06/16] GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents | [paper] | [code]
[2024/06/06] Tool-Planner: Task Planning with Clusters across Multiple Tools | [paper] | [code]
[2024/06/03] Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration | [paper] | [code]
[2024/06/02] Towards a copilot in BIM authoring tool using a large language model-based agent for intelligent human-machine interaction | [paper] | [code]
[2024/05/30] Large Language Models Can Self-Improve At Web Agent Tasks | [paper] | [code]
[2024/05/17] Latent State Estimation Helps UI Agents to Reason | [paper] | [code]
[2024/05/06] SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering | [paper] | [code]
[2024/05/02] CACTUS: Chemistry Agent Connecting Tool-Usage to Science | [paper] | [code]
[2024/05/01] Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning | [paper] | [code]
[2024/04/23] Evaluating Tool-Augmented Agents in Remote Sensing Platforms | [paper] | [code]
[2024/04/17] The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey | [paper] | [code]
[2024/04/17] Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent | [paper] | [code]
[2024/04/16] Grounded Language Agent for Product Search via Intelligent Web Interactions | [paper] | [code]
[2024/04/04] AutoWebGLM: A Large Language Model-based Web Navigating Agent | [paper] | [code]
[2024/04/01] Rapid Mobile App Development for Generative AI Agents on MIT App Inventor | [paper] | [code]
[2024/03/05] InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents | [paper] | [code]
[2024/03/05] Android in the Zoo: Chain-of-Action-Thought for GUI Agents | [paper] | [code]
[2024/02/27] BASES: Large-scale Web Search User Simulation with Large Language Model based Agents | [paper] | [code]
[2024/02/26] Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models | [paper] | [code]
[2024/02/23] On the Multi-turn Instruction Following for Conversational Web Agents | [paper] | [code]
[2024/02/20] AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning | [paper] | [code]
[2024/02/18] SciAgent: Tool-augmented Language Models for Scientific Reasoning | [paper] | [code]
[2024/02/16] ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages | [paper] | [code]
[2024/02/08] UFO: A UI-Focused Agent for Windows OS Interaction | [paper] | [code]
[2024/02/06] AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls | [paper] | [code]
[2024/01/11] EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction | [paper] | [code]
[2024/01/03] GPT-4V(ision) is a Generalist Web Agent, if Grounded | [paper] | [code]
[2023/12/21] AppAgent: Multimodal Agents as Smartphone Users | [paper] | [code]
[2023/12/18] CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update | [paper] | [code]
[2023/12/14] CogAgent: A Visual Language Model for GUI Agents | [paper] | [code]
[2023/11/19] TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems | [paper] | [code]
[2023/11/15] ToolTalk: Evaluating Tool-Usage in a Conversational Setting | [paper] | [code]
[2023/11/10] Smart Agent-Based Modeling: On the Use of Large Language Models in Computer Simulations | [paper] | [code]
[2023/10/12] A Zero-Shot Language Agent for Computer Control with Structured Reflection | [paper] | [code]
[2023/08/07] TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage | [paper] | [code]
[2023/06/09] Mind2Web: Towards a Generalist Agent for the Web | [paper] | [code]
[2023/05/22] Making Language Models Better Tool Learners with Execution Feedback | [paper] | [code]
[2023/05/19] ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings | [paper] | [code]

Simulation

[2025/07/10] Automating MD simulations for Proteins using Large language Models: NAMD-Agent | [paper] | [code]
[2025/07/01] TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation | [paper] | [code]
[2025/06/26] CitySim: Modeling Urban Behaviors and City Dynamics with Large-Scale LLM-Driven Agent Simulation | [paper] | [code]
[2025/06/24] LLM-Based Social Simulations Require a Boundary | [paper] | [code]
[2025/06/23] TrajTok: Technical Report for 2025 Waymo Open Sim Agents Challenge | [paper] | [code]
[2025/06/16] CAMS: A CityGPT-Powered Agentic Framework for Urban Human Mobility Simulation | [paper] | [code]
[2025/06/07] Modeling Earth-Scale Human-Like Societies with One Billion Agents | [paper] | [code]
[2025/06/03] MASTER: Enhancing Large Language Model via Multi-Agent Simulated Teaching | [paper] | [code]
[2025/06/02] LAM SIMULATOR: Advancing Data Generation for Large Action Model Training via Online Exploration and Trajectory Feedback | [paper] | [code]
[2025/05/31] Dyna-Think: Synergizing Reasoning, Acting, and World Model Simulation in AI Agents | [paper] | [code]
[2025/05/28] Scalable, Symbiotic, AI and Non-AI Agent Based Parallel Discrete Event Simulations | [paper] | [code]
[2025/05/26] Embracing Imperfection: Simulating Students with Diverse Cognitive Levels Using LLM-based Agents | [paper] | [code]
[2025/05/25] When Ethics and Payoffs Diverge: LLM Agents in Morally Charged Social Dilemmas | [paper] | [code]
[2025/05/19] Simulation Agent: A Framework for Integrating Simulation and Large Language Models for Enhanced Decision-Making | [paper] | [code]
[2025/05/11] EcoLANG: Efficient and Effective Agent Communication Language Induction for Social Simulation | [paper] | [code]
[2025/04/20] BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation | [paper] | [code]
[2025/04/17] SimUSER: Simulating User Behavior with Large Language Models for Recommender System Evaluation | [paper] | [code]
[2025/04/14] SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users | [paper] | [code]
[2025/04/10] MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations | [paper] | [code]
[2025/04/04] APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay | [paper] | [code]
[2025/04/04] Algorithmic Prompt Generation for Diverse Human-like Teaming and Communication with Large Language Models | [paper] | [code]
[2025/03/28] Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions | [paper] | [code]
[2025/03/18] Retrieval-Augmented Simulacra: Generative Agents for Up-to-date and Knowledge-Adaptive Simulations | [paper] | [code]
[2025/03/12] Can A Society of Generative Agents Simulate Human Behavior and Inform Public Health Policy? A Case Study on Vaccine Hesitancy | [paper] | [code]
[2025/02/06] Simulating the Emergence of Differential Case Marking with Communicating Neural-Network Agents | [paper] | [code]
[2025/02/03] Eliciting Language Model Behaviors with Investigator Agents | [paper] | [code]
[2025/02/03] TwinMarket: A Scalable Behavioral and Social Simulation for Financial Markets | [paper] | [code]
[2025/01/25] Are Human Interactions Replicable by Generative Agents? A Case Study on Pronoun Usage in Hierarchical Interactions | [paper] | [code]
[2025/01/19] Self-Explanation in Social AI Agents | [paper] | [code]
[2025/01/12] LLMs Model Non-WEIRD Populations: Experiments with Synthetic Cultural Agents | [paper] | [code]
[2024/12/10] Political Actor Agent: Simulating Legislative System for Roll Call Votes Prediction with Large Language Models | [paper] | [code]
[2024/11/18] OASIS: Open Agent Social Interaction Simulations with One Million Agents | [paper] | [code]
[2024/10/28] ElectionSim: Massive Population Election Simulation Powered by Large Language Model Driven Agents | [paper] | [code]
[2024/10/24] Schema-Guided Culture-Aware Complex Event Simulation with Multi-Agent Role-Play | [paper] | [code]
[2024/10/18] SRAP-Agent: Simulating and Optimizing Scarce Resource Allocation Policy with LLM-based Agent | [paper] | [code]
[2024/10/05] Large Language Models can Achieve Social Balance | [paper] | [code]
[2024/09/25] Plurals: A System for Guiding LLMs Via Simulated Social Ensembles | [paper] | [code]
[2024/09/14] Synergistic Simulations: Multi-Agent Problem Solving with Large Language Models | [paper] | [code]
[2024/09/02] Agentic Society: Merging skeleton from real world and texture from Large Language Model | [paper] | [code]
[2024/08/28] Logic-Enhanced Language Model Agents for Trustworthy Social Simulations | [paper] | [code]
[2024/08/15] AgentCourt: Simulating Court with Adversarial Evolvable Lawyer Agents | [paper] | [code]
[2024/08/03] Self-Emotion Blended Dialogue Generation in Social Simulation Agents | [paper] | [code]
[2024/06/26] Simulating The U.S. Senate: An LLM-Driven Agent Approach to Modeling Legislative Behavior and Bipartisanship | [paper] | [code]
[2024/06/20] Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory | [paper] | [code]
[2024/06/10] Can Language Models Serve as Text-Based World Simulators? | [paper] | [code]
[2024/05/12] Exploring the Potential of Conversational AI Support for Agent-Based Social Simulation Model Design | [paper] | [code]
[2024/04/23] BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis | [paper] | [code]
[2024/03/20] AgentGroupChat: An Interactive Group Chat Simulacra For Better Eliciting Emergent Behavior | [paper] | [code]
[2024/03/05] AgentsCourt: Building Judicial Decision-Making Agents with Court Debate Simulation and Legal Knowledge Augmentation | [paper] | [code]
[2024/02/26] Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation | [paper] | [code]
[2024/02/20] What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents | [paper] | [code]
[2024/02/07] Can Large Language Model Agents Simulate Human Trust Behavior? | [paper] | [code]
[2024/01/08] SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems | [paper] | [code]
[2023/12/06] LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem | [paper] | [code]
[2023/11/28] War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars | [paper] | [code]
[2023/10/10] MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents | [paper] | [code]
[2023/06/05] User Behavior Simulation with Large Language Model based Agents | [paper] | [code]
[2023/05/26] Training Socially Aligned Language Models on Simulated Social Interactions | [paper] | [code]
[2023/04/07] Generative Agents: Interactive Simulacra of Human Behavior | [paper] | [code]

Application

Math

[2025/05/21] ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges | [paper] | [code]
[2025/03/23] MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection | [paper] | [code]
[2025/03/05] MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving | [paper] | [code]
[2025/02/25] LLM Knows Geometry Better than Algebra: Numerical Understanding of LLM-Based Agents in A Trading Arena | [paper] | [code]
[2025/02/18] One Size doesn't Fit All: A Personalized Conversational Tutoring Agent for Mathematics Instruction | [paper] | [code]
[2025/02/04] Automating Mathematical Proof Generation Using Large Language Model Agents and Knowledge Graphs | [paper] | [code]
[2024/10/29] Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning | [paper] | [code]
[2024/10/13] Expanding Search Space with Diverse Prompting Agents: An Efficient Sampling Approach for LLM Mathematical Reasoning | [paper] | [code]
[2024/08/03] MathLearner: A Large Language Model Agent Framework for Learning to Solve Mathematical Problems | [paper] | [code]
[2024/04/10] MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education | [paper] | [code]
[2024/04/06] MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems | [paper] | [code]

Chemistry

[2025/05/27] ChemHAS: Hierarchical Agent Stacking for Enhancing Chemistry Tools | [paper] | [code]
[2025/04/18] System of Agentic AI for the Discovery of Metal-Organic Frameworks | [paper] | [code]
[2025/03/22] Building Resource-Constrained Language Agents: A Korean Case Study on Chemical Toxicity Information | [paper] | [code]
[2025/01/23] Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents | [paper] | [code]
[2025/01/11] ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning | [paper] | [code]
[2024/08/29] HoneyComb: A Flexible LLM-Based Agent System for Materials Science | [paper] | [code]
[2024/06/26] A Review of Large Language Models and Autonomous Agents in Chemistry | [paper] | [code]

Biology

[2025/04/28] m-KAILIN: Knowledge-Driven Agentic Scientific Corpus Distillation Framework for Biomedical Large Language Models Training | [paper] | [code]
[2025/04/08] SkillFlow: Efficient Skill and Code Transfer Through Communication in Adapting AI Agents | [paper] | [code]
[2025/04/07] scAgent: Universal Single-Cell Annotation via a LLM Agent | [paper] | [code]
[2024/10/16] PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking | [paper] | [code]
[2024/06/29] BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science | [paper] | [code]
[2024/05/25] GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases | [paper] | [code]
[2024/04/27] CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments | [paper] | [code]
[2024/04/03] Empowering Biomedical Discovery with AI Agents | [paper] | [code]
[2024/01/27] ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning | [paper] | [code]

Physics

[2025/06/06] Can Theoretical Physics Research Benefit from Language Agents? | [paper] | [code]
[2025/01/23] Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents | [paper] | [code]
[2024/12/09] StarWhisper Telescope: Agent-Based Observation Assistant System to Approach AI Astrophysicist | [paper] | [code]
[2024/08/29] HoneyComb: A Flexible LLM-Based Agent System for Materials Science | [paper] | [code]
[2024/01/27] ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning | [paper] | [code]

Geography

[2024/12/23] MineAgent: Towards Remote-Sensing Mineral Exploration with Multimodal Large Language Models | [paper] | [code]
[2024/07/13] An Autonomous GIS Agent Framework for Geospatial Data Retrieval | [paper] | [code]

Art

[2025/01/22] FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces | [paper] | [code]
[2024/10/02] Agent-Driven Large Language Models for Mandarin Lyric Generation | [paper] | [code]
[2024/09/05] LLM-based multi-agent poetry generation in non-cooperative environments | [paper] | [code]
[2024/08/13] What should I wear to a party in a Greek taverna? Evaluation for Conversational Agents in the Fashion Domain | [paper] | [code]
[2024/07/01] IBSEN: Director-Actor Agent Collaboration for Controllable and Interactive Drama Script Generation | [paper] | [code]
[2024/04/28] ComposerX: Multi-Agent Symbolic Music Composition with LLMs | [paper] | [code]
[2024/03/12] AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production | [paper] | [code]
[2023/10/18] MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models | [paper] | [code]

Medicine

[2025/07/10] Toward Real-World Chinese Psychological Support Dialogues: CPsDD Dataset and a Co-Evolving Multi-Agent System | [paper] | [code]
[2025/07/03] RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents | [paper] | [code]
[2025/07/01] STELLA: Self-Evolving LLM Agent for Biomedical Research | [paper] | [code]
[2025/06/27] Exploring Modularity of Agentic Systems for Drug Discovery | [paper] | [code]
[2025/06/26] Large Language Model Agent for Modular Task Execution in Drug Discovery | [paper] | [code]
[2025/06/25] An Agentic System for Rare Disease Diagnosis with Traceable Reasoning | [paper] | [code]
[2025/06/24] MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration | [paper] | [code]
[2025/06/18] From RAG to Agentic: Validating Islamic-Medicine Responses with LLM Agents | [paper] | [code]
[2025/06/17] RadFabric: Agentic AI System with Reasoning Capability for Radiology | [paper] | [code]
[2025/06/16] Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning | [paper] | [code]
[2025/06/13] Large Language Model-Powered Conversational Agent Delivering Problem-Solving Therapy (PST) for Family Caregivers: Enhancing Empathy and Therapeutic Alliance Using In-Context Learning | [paper] | [code]
[2025/06/12] Neural at ArchEHR-QA 2025: Agentic Prompt Optimization for Evidence-Grounded Clinical Question Answering | [paper] | [code]
[2025/06/11] ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning | [paper] | [code]
[2025/06/04] AI Agents for Conversational Patient Triage: Preliminary Simulation-Based Evaluation with Real-World EHR Data | [paper] | [code]
[2025/06/04] MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale | [paper] | [code]
[2025/05/31] MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning | [paper] | [code]
[2025/05/30] MedOrch: Medical Diagnosis with Tool-Augmented Reasoning Agents for Flexible Extensibility | [paper] | [code]
[2025/05/27] Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making | [paper] | [code]
[2025/05/27] BehaviorSFT: Behavioral Token Conditioning for Clinical Agents Across the Proactivity Spectrum | [paper] | [code]
[2025/05/24] DDO: Dual-Decision Optimization via Multi-Agent Collaboration for LLM-Based Medical Consultation | [paper] | [code]
[2025/05/21] A Risk Taxonomy for Evaluating AI-Powered Psychotherapy Agents | [paper] | [code]
[2025/05/18] MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks | [paper] | [code]
[2025/05/06] FRAME: Feedback-Refined Agent Methodology for Enhancing Medical Research Insights | [paper] | [code]
[2025/04/30] Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA | [paper] | [code]
[2025/04/28] m-KAILIN: Knowledge-Driven Agentic Scientific Corpus Distillation Framework for Biomedical Large Language Models Training | [paper] | [code]
[2025/04/25] MAGI: Multi-Agent Guided Interview for Psychiatric Assessment | [paper] | [code]
[2025/04/13] EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety | [paper] | [code]
[2025/04/08] TxGemma: Efficient and Agentic LLMs for Therapeutics | [paper] | [code]
[2025/04/04] YaleNLP @ PerAnsSumm 2025: Multi-Perspective Integration via Mixture-of-Agents for Enhanced Healthcare QA Summarization | [paper] | [code]
[2025/03/28] Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions | [paper] | [code]
[2025/03/26] TAMA: A Human-AI Collaborative Thematic Analysis Framework Using Multi-Agent LLMs for Clinical Interviews | [paper] | [code]
[2025/03/26] 3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark | [paper] | [code]
[2025/03/21] Autonomous Radiotherapy Treatment Planning Using DOLA: A Privacy-Preserving, LLM-Based Optimization Agent | [paper] | [code]
[2025/03/19] When Pigs Get Sick: Multi-Agent AI for Swine Disease Detection | [paper] | [code]
[2025/03/19] EmpathyAgent: Can Embodied Agents Conduct Empathetic Actions? | [paper] | [code]
[2025/03/17] MAP: Evaluation and Multi-Agent Enhancement of Large Language Models for Inpatient Pathways | [paper] | [code]
[2025/03/10] MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning | [paper] | [code]
[2025/03/07] GEMA-Score: Granular Explainable Multi-Agent Score for Radiology Report Evaluation | [paper] | [code]
[2025/03/07] Multi Agent based Medical Assistant for Edge Devices | [paper] | [code]
[2025/02/27] M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging | [paper] | [code]
[2025/02/26] MEDDxAgent: A Unified Modular Agent Framework for Explainable Automatic Differential Diagnosis | [paper] | [code]
[2025/02/25] Scaffolding Empathy: Training Counselors with Simulated Patients and Utterance-level Performance Visualizations | [paper] | [code]
[2025/02/24] Improving Interactive Diagnostic Ability of a Large Language Model Agent Through Clinical Experience Learning | [paper] | [code]
[2025/02/19] LIDDIA: Language-based Intelligent Drug Discovery Agent | [paper] | [code]
[2025/02/18] An LLM-Powered Agent for Physiological Data Analysis: A Case Study on PPG-based Heart Rate Estimation | [paper] | [code]
[2025/02/18] Sleepless Nights, Sugary Days: Creating Synthetic Users with Health Conditions for Realistic Coaching Agent Interactions | [paper] | [code]
[2025/02/13] PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology | [paper] | [code]
[2025/02/09] HamRaz: A Culture-Based Persian Conversation Dataset for Person-Centered Therapy Using LLM Agents | [paper] | [code]
[2025/02/09] The Application of MATEC (Multi-AI Agent Team Care) Framework in Sepsis Care | [paper] | [code]
[2025/02/05] CAMI: A Counselor Agent Supporting Motivational Interviewing through State Inference and Topic Exploration | [paper] | [code]
[2025/02/02] Agent-Based Uncertainty Awareness Improves Automated Radiology Report Labeling with an Open-Source Large Language Model | [paper] | [code]
[2025/01/27] MADP: Multi-Agent Deductive Planning for Enhanced Cognitive-Behavioral Mental Health Question Answer | [paper] | [code]
[2025/01/16] AutoCBT: An Autonomous Multi-agent Framework for Cognitive Behavioral Therapy in Psychological Counseling | [paper] | [code]
[2025/01/03] PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents | [paper] | [code]
[2024/12/19] PsyDraw: A Multi-Agent Multimodal System for Mental Health Screening in Left-Behind Children | [paper] | [code]
[2024/12/17] RareAgents: Advancing Rare Disease Care through LLM-Empowered Multi-disciplinary Team | [paper] | [code]
[2024/12/16] LLMs Can Simulate Standardized Patients via Agent Coevolution | [paper] | [code]
[2024/12/13] Script-Based Dialog Policy Planning for LLM-Powered Conversational Agents: A Basic Architecture for an "AI Therapist" | [paper] | [code]
[2024/12/05] Educational-Psychological Dialogue Robot Based on Multi-Agent Collaboration | [paper] | [code]
[2024/12/02] Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking | [paper] | [code]
[2024/11/21] PIORS: Personalized Intelligent Outpatient Reception based on Large Language Model with Multi-Agents Medical Scenario Simulation | [paper] | [code]
[2024/11/16] Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios | [paper] | [code]
[2024/11/03] EcoAct: Economic Agent Determines When to Register What Action | [paper] | [code]
[2024/10/25] $\texttt{PatentAgent}$: Intelligent Agent for Automated Pharmaceutical Patent Analysis | [paper] | [code]
[2024/10/23] ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents | [paper] | [code]
[2024/10/17] MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling | [paper] | [code]
[2024/10/16] MedAide: Towards an Omni Medical Aide via Specialized LLM-based Multi-Agent Collaboration | [paper] | [code]
[2024/10/02] Zodiac: A Cardiologist-Level LLM Framework for Multi-Agent Diagnostics | [paper] | [code]
[2024/08/28] Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions | [paper] | [code]
[2024/08/23] DrugAgent: Explainable Drug Repurposing Agent with Large Language Model-based Reasoning | [paper] | [code]
[2024/08/14] Development of a Large Language Model-based Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments | [paper] | [code]
[2024/07/18] CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis | [paper] | [code]
[2024/07/10] Virtual Agents for Alcohol Use Counseling: Exploring LLM-Powered Motivational Interviewing | [paper] | [code]
[2024/07/03] MentalAgora: A Gateway to Advanced Personalized Care in Mental Health through Multi-Agent Debating and Attribute Control | [paper] | [code]
[2024/07/02] MMedAgent: Learning to Use Medical Tools with Multi-modal Agent | [paper] | [code]
[2024/04/23] ClinicalAgent: Clinical Trial Multi-Agent System with Large Language Model-based Reasoning | [paper] | [code]
[2024/04/03] Empowering Biomedical Discovery with AI Agents | [paper] | [code]
[2024/02/20] Can Large Language Models be Used to Provide Psychological Counselling? An Analysis of GPT-4-Generated Responses Using Role-play Dialogues | [paper] | [code]
[2024/02/20] AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning | [paper] | [code]
[2024/02/15] Knowledge-Infused LLM-Powered Conversational Health Agent: A Case Study for Diabetes Patients | [paper] | [code]
[2024/02/01] Generation, Distillation and Evaluation of Motivational Interviewing-Style Reflections with a Foundational Language Model | [paper] | [code]
[2023/12/19] Can ChatGPT be Your Personal Medical Assistant? | [paper] | [code]
[2023/10/03] Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View | [paper] | [code]

Finance

[2025/07/08] ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues? | [paper] | [code]
[2025/07/07] MindFlow: Revolutionizing E-commerce Customer Support with Multimodal LLM Agents | [paper] | [code]
[2025/06/10] Improved LLM Agents for Financial Document Question Answering | [paper] | [code]
[2025/06/09] EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments | [paper] | [code]
[2025/05/20] Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents | [paper] | [code]
[2025/04/08] Are Generative AI Agents Effective Personalized Financial Advisors? | [paper] | [code]
[2025/04/07] AI for Climate Finance: Agentic Retrieval and Multi-Step Reasoning for Early Warning System Investments | [paper] | [code]
[2025/03/27] EQ-Negotiator: An Emotion-Reasoning LLM Agent in Credit Dialogues | [paper] | [code]
[2025/03/05] Cite Before You Speak: Enhancing Context-Response Grounding in E-commerce Conversational LLM-Agents | [paper] | [code]
[2025/02/25] LLM Knows Geometry Better than Algebra: Numerical Understanding of LLM-Based Agents in A Trading Arena | [paper] | [code]
[2025/02/08] Agentic AI Systems Applied to tasks in Financial Services: Modeling and model risk management crews | [paper] | [code]
[2025/02/01] MarketSenseAI 2.0: Enhancing Stock Analysis through LLM Agents | [paper] | [code]
[2025/01/08] FinSphere: A Conversational Stock Analysis Agent Equipped with Quantitative Tools based on Real-Time Database | [paper] | [code]
[2024/12/27] OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis | [paper] | [code]
[2024/12/19] Beyond the Sum: Unlocking AI Agents Potential Through Market Forces | [paper] | [code]
[2024/11/07] Enhancing Investment Analysis: Optimizing AI-Agent Collaboration in Financial Research | [paper] | [code]
[2024/10/29] Enhancing Financial Question Answering with a Multi-Agent Reflection Framework | [paper] | [code]
[2024/09/19] Strategic Collusion of LLM Agents: Market Division in Multi-Commodity Competitions | [paper] | [code]
[2024/07/18] dzFinNlp at AraFinNLP: Improving Intent Detection in Financial Conversational Agents | [paper] | [code]
[2024/07/09] FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making | [paper] | [code]
[2024/07/05] Towards Automated Functional Equation Proving: A Benchmark Dataset and A Domain-Specific In-Context Agent | [paper] | [code]
[2024/05/07] Enhancing the Efficiency and Accuracy of Underlying Asset Reviews in Structured Finance: The Application of Multi-agent Framework | [paper] | [code]

Software Engineering

[2025/06/13] Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards | [paper] | [code]
[2025/06/04] MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale | [paper] | [code]
[2025/06/03] Coding Agents with Multimodal Browsing are Generalist Problem Solvers | [paper] | [code]
[2025/05/28] Co-Saving: Resource Aware Multi-Agent Collaboration for Software Development | [paper] | [code]
[2025/05/26] Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI | [paper] | [code]
[2025/05/26] SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents | [paper] | [code]
[2025/05/24] SEW: Self-Evolving Agentic Workflows for Automated Code Generation | [paper] | [code]
[2025/05/22] Optimizing LLM-Based Multi-Agent System with Textual Feedback: A Case Study on Software Development | [paper] | [code]
[2025/05/19] Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents | [paper] | [code]
[2025/05/13] LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries | [paper] | [code]
[2025/04/30] SWE-smith: Scaling Data for Software Engineering Agents | [paper] | [code]
[2025/04/28] ResearchCodeAgent: An LLM Multi-Agent System for Automated Codification of Research Methodologies | [paper] | [code]
[2025/04/18] CodeVisionary: An Agent-based Framework for Evaluating Large Language Models in Code Generation | [paper] | [code]
[2025/04/09] R2E-Gym: Procedural Environments and Hybrid Verifiers for Scaling Open-Weights SWE Agents | [paper] | [code]
[2025/03/27] GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics | [paper] | [code]
[2025/03/24] Verbal Process Supervision Elicits Better Coding Agents | [paper] | [code]
[2025/03/18] DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal | [paper] | [code]
[2025/03/12] LocAgent: Graph-Guided LLM Agents for Code Localization | [paper] | [code]
[2025/03/10] ProjectEval: A Benchmark for Programming Agents Automated Evaluation on Project-Level Code Generation | [paper] | [code]
[2025/02/19] An LLM-based Agent for Reliable Docker Environment Configuration | [paper] | [code]
[2025/02/18] Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors | [paper] | [code]
[2025/02/18] UXAgent: An LLM Agent-Based Usability Testing Framework for Web Design | [paper] | [code]
[2025/02/14] The Ann Arbor Architecture for Agent-Oriented Programming | [paper] | [code]
[2025/02/11] Multi-Agent Collaboration for Multilingual Code Instruction Tuning | [paper] | [code]
[2025/02/10] SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering | [paper] | [code]
[2025/02/08] CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging | [paper] | [code]
[2024/12/30] Training Software Engineering Agents and Verifiers with SWE-Gym | [paper] | [code]
[2024/12/24] Molly: Making Large Language Model Agents Solve Python Problem More Logically | [paper] | [code]
[2024/12/16] Seeker: Towards Exception Safety Code Generation with Intermediate Language Agents Framework | [paper] | [code]
[2024/11/07] CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models | [paper] | [code]
[2024/10/29] SceneGenAgent: Precise Industrial Scene Generation with Coding Agent | [paper] | [code]
[2024/10/09] DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models | [paper] | [code]
[2024/10/09] Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach | [paper] | [code]
[2024/09/02] Co-Learning: Code Learning for Multi-Agent Reinforcement Collaborative Framework with Conversational Natural Language Interfaces | [paper] | [code]
[2024/08/19] GoNoGo: An Efficient LLM-based Multi-Agent System for Streamlining Automotive Software Release Decision-Making | [paper] | [code]
[2024/08/13] Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents | [paper] | [code]
[2024/08/05] LLM Agents Improve Semantic Code Search | [paper] | [code]
[2024/07/26] AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents | [paper] | [code]
[2024/07/01] Agentless: Demystifying LLM-based Software Engineering Agents | [paper] | [code]
[2024/06/13] Multi-Agent Software Development through Cross-Team Collaboration | [paper] | [code]
[2024/05/06] SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering | [paper] | [code]
[2024/04/11] Behavior Trees Enable Structured Programming of Language Model Agents | [paper] | [code]
[2024/04/02] Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization | [paper] | [code]
[2024/03/02] SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code | [paper] | [code]
[2024/02/26] RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation | [paper] | [code]
[2024/02/19] WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment | [paper] | [code]
[2024/02/02] StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback | [paper] | [code]
[2024/02/01] Executable Code Actions Elicit Better LLM Agents | [paper] | [code]
[2023/12/28] Experiential Co-Learning of Software-Developing Agents | [paper] | [code]
[2023/12/20] AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation | [paper] | [code]
[2023/07/27] PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback | [paper] | [code]
[2023/07/16] ChatDev: Communicative Agents for Software Development | [paper] | [code]
[2023/04/15] Self-collaboration Code Generation via ChatGPT | [paper] | [code]

Research

[2025/07/01] STELLA: Self-Evolving LLM Agent for Biomedical Research | [paper] | [code]
[2025/06/27] RExBench: Can coding agents autonomously implement AI research extensions? | [paper] | [code]
[2025/06/25] Language Modeling by Language Models | [paper] | [code]
[2025/06/23] From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents | [paper] | [code]
[2025/06/12] VideoDeepResearch: Long Video Understanding With Agentic Tool Using | [paper] | [code]
[2025/06/06] Can Theoretical Physics Research Benefit from Language Agents? | [paper] | [code]
[2025/05/30] Unifying Language Agent Algorithms with Graph-based Orchestration Engine for Reproducible Agent Research | [paper] | [code]
[2025/05/29] Large Language Model-Based Agents for Automated Research Reproducibility: An Exploratory Study in Alzheimer's Disease | [paper] | [code]
[2025/05/26] MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research | [paper] | [code]
[2025/05/22] BioDSA-1K: Benchmarking Data Science Agents for Biomedical Research | [paper] | [code]
[2025/05/22] NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification | [paper] | [code]
[2025/04/28] ResearchCodeAgent: An LLM Multi-Agent System for Automated Codification of Research Methodologies | [paper] | [code]
[2025/04/21] Completing A Systematic Review in Hours instead of Months with Interactive AI Agents | [paper] | [code]
[2025/04/10] CollEX -- A Multimodal Agentic RAG System Enabling Interactive Exploration of Scientific Collections | [paper] | [code]
[2025/04/10] The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search | [paper] | [code]
[2025/04/02] Automated Survey Collection with LLM-based Conversational Agents | [paper] | [code]
[2025/03/23] AgentRxiv: Towards Collaborative Autonomous Research | [paper] | [code]
[2025/03/12] Agentic AI for Scientific Discovery: A Survey of Progress, Challenges, and Future Directions | [paper] | [code]
[2025/03/11] ReviewAgents: Bridging the Gap Between Human and AI-Generated Paper Reviews | [paper] | [code]
[2025/02/25] LAG: LLM agents for Leaderboard Auto Generation on Demanding | [paper] | [code]
[2025/02/20] MLGym: A New Framework and Benchmark for Advancing AI Research Agents | [paper] | [code]
[2025/02/07] Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research | [paper] | [code]
[2025/01/08] Agent Laboratory: Using LLM Agents as Research Assistants | [paper] | [code]
[2024/10/17] Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents | [paper] | [code]
[2024/10/12] Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System | [paper] | [code]
[2024/10/07] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery | [paper] | [code]
[2024/10/07] ImProver: Agent-Based Automated Proof Optimization | [paper] | [code]
[2024/09/23] Towards a Realistic Long-Term Benchmark for Open-Web Research Agents | [paper] | [code]
[2024/09/17] CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark | [paper] | [code]
[2024/09/12] DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? | [paper] | [code]
[2024/09/11] SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories | [paper] | [code]
[2024/09/10] Language agents achieve superhuman synthesis of scientific knowledge | [paper] | [code]
[2024/09/09] SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning | [paper] | [code]
[2024/08/26] MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents | [paper] | [code]
[2024/08/20] Automating Knowledge Discovery from Scientific Literature via LLMs: A Dual-Agent Approach with Progressive Ontology Prompting | [paper] | [code]
[2024/06/13] ResearchArena: Benchmarking Large Language Models' Ability to Collect and Organize Information as Research Agents | [paper] | [code]
[2024/05/02] CACTUS: Chemistry Agent Connecting Tool-Usage to Science | [paper] | [code]
[2024/04/09] SurveyAgent: A Conversational System for Personalized and Efficient Research Survey | [paper] | [code]
[2024/02/28] Data Interpreter: An LLM Agent For Data Science | [paper] | [code]
[2024/02/18] SciAgent: Tool-augmented Language Models for Scientific Reasoning | [paper] | [code]
[2024/02/06] Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science | [paper] | [code]
[2024/01/08] MARG: Multi-Agent Review Generation for Scientific Papers | [paper] | [code]

Automation

Workflow

[2025/06/02] Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents | [paper] | [code]
[2025/05/26] ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows | [paper] | [code]
[2025/04/17] MetaSynth: Meta-Prompting-Driven Agentic Scaffolds for Diverse Synthetic Data Generation | [paper] | [code]
[2025/02/24] Turning Conversations into Workflows: A Framework to Extract and Evaluate Dialog Workflows for Service AI Agents | [paper] | [code]
[2025/02/11] EvoFlow: Evolving Diverse Agentic Workflows On The Fly | [paper] | [code]
[2025/02/07] nvAgent: Automated Data Visualization from Natural Language via Collaborative Agent Workflow | [paper] | [code]
[2025/02/06] ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization | [paper] | [code]
[2024/12/17] An Agentic Approach to Automatic Creation of P&ID Diagrams from Natural Language Descriptions | [paper] | [code]
[2024/12/15] LAW: Legal Agentic Workflows for Custody and Fund Services Contracts | [paper] | [code]
[2024/11/22] ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data | [paper] | [code]
[2024/11/12] BudgetMLAgent: A Cost-Effective LLM Multi-Agent system for Automating Machine Learning Tasks | [paper] | [code]
[2024/11/08] Game-theoretic LLM: Agent Workflow for Negotiation Games | [paper] | [code]
[2024/10/24] An LLM Agent for Automatic Geospatial Data Analysis | [paper] | [code]
[2024/10/17] From Barriers to Tactics: A Behavioral Science-Informed Agentic Workflow for Personalized Nutrition Coaching | [paper] | [code]
[2024/10/17] ControlAgent: Automating Control System Design via Novel Integration of LLM Agents and Domain Expertise | [paper] | [code]
[2024/10/16] Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance | [paper] | [code]
[2024/10/14] AFlow: Automating Agentic Workflow Generation | [paper] | [code]
[2024/10/10] Benchmarking Agentic Workflow Generation | [paper] | [code]
[2024/10/03] AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML | [paper] | [code]
[2024/09/11] Agent Workflow Memory | [paper] | [code]
[2024/08/16] The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation | [paper] | [code]
[2024/07/15] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? | [paper] | [code]
[2024/07/03] AgentInstruct: Toward Generative Teaching with Agentic Flows | [paper] | [code]
[2024/07/01] AutoFlow: Automated Workflow Generation for Large Language Model Agents | [paper] | [code]
[2024/06/21] Autonomous Agents for Collaborative Task under Information Asymmetry | [paper] | [code]
[2024/03/13] AutoGuide: Automated Generation and Selection of Context-Aware Guidelines for Large Language Model Agents | [paper] | [code]
[2024/03/05] ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary | [paper] | [code]

Automatic Evaluation

[2025/06/26] Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge | [paper] | [code]
[2025/06/23] AI Agents-as-Judge: Automated Assessment of Accuracy, Consistency, Completeness and Clarity for Enterprise Documents | [paper] | [code]
[2025/06/08] Manifesto from Dagstuhl Perspectives Workshop 24352 -- Conversational Agents: A Framework for Evaluation (CAFE) | [paper] | [code]
[2025/05/22] HiMATE: A Hierarchical Multi-Agent Framework for Machine Translation Evaluation | [paper] | [code]
[2025/05/21] UrduFactCheck: An Agentic Fact-Checking Framework for Urdu with Evidence Boosting and Benchmarking | [paper] | [code]
[2025/05/21] AGENT-X: Adaptive Guideline-based Expert Network for Threshold-free AI-generated teXt detection | [paper] | [code]
[2025/05/20] CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring | [paper] | [code]
[2025/05/18] ESC-Judge: A Framework for Comparing Emotional Support Conversational Agents | [paper] | [code]
[2025/05/13] TRAIL: Trace Reasoning and Agentic Issue Localization | [paper] | [code]
[2025/05/05] AutoLibra: Agent Metric Induction from Open-Ended Feedback | [paper] | [code]
[2025/05/01] Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models | [paper] | [code]
[2025/04/21] EvalAgent: Discovering Implicit Evaluation Criteria from the Web | [paper] | [code]
[2025/04/09] A Unified Agentic Framework for Evaluating Conditional Image Generation | [paper] | [code]
[2025/04/01] VerifiAgent: a Unified Verification Agent in Language Model Reasoning | [paper] | [code]
[2025/04/01] Multi-Agent LLM Judge: automatic personalized LLM judge design for evaluating natural language generation applications | [paper] | [code]
[2025/03/07] GEMA-Score: Granular Explainable Multi-Agent Score for Radiology Report Evaluation | [paper] | [code]
[2025/02/26] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems | [paper] | [code]
[2025/02/25] Debt Collection Negotiations with Large Language Models: An Evaluation System and Optimizing Decision Making with Multi-Agent | [paper] | [code]
[2025/02/25] FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models | [paper] | [code]
[2025/02/14] Automated Hypothesis Validation with Agentic Sequential Falsifications | [paper] | [code]
[2025/01/19] IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems | [paper] | [code]
[2025/01/17] Agent-as-Judge for Factual Summarization of Long Narratives | [paper] | [code]
[2025/01/03] PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents | [paper] | [code]
[2024/12/28] M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation | [paper] | [code]
[2024/12/10] Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models | [paper] | [code]
[2024/11/25] SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text | [paper] | [code]
[2024/11/15] Large Language Models as User-Agents for Evaluating Task-Oriented-Dialogue Systems | [paper] | [code]
[2024/09/24] Automated test generation to evaluate tool-augmented LLMs as conversational AI agents | [paper] | [code]
[2024/09/22] The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests | [paper] | [code]
[2024/09/13] Safeguarding Decentralized Social Media: LLM Agents for Automating Community Rule Compliance | [paper] | [code]
[2024/05/23] ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation | [paper] | [code]
[2024/03/28] MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation | [paper] | [code]
[2023/08/14] ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate | [paper] | [code]

Training

Fine tuning

[2025/07/10] SAND: Boosting LLM Agents with Self-Taught Action Deliberation | [paper] | [code]
[2025/07/08] Agentic-R1: Distilled Dual-Strategy Reasoning | [paper] | [code]
[2025/06/28] Knowledge Augmented Finetuning Matters in both RAG and Agent Based Dialog Systems | [paper] | [code]
[2025/06/04] Go-Browse: Training Web Agents with Structured Exploration | [paper] | [code]
[2025/06/02] AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning | [paper] | [code]
[2025/05/31] ARIA: Training Language Agents with Intention-Driven Reward Aggregation | [paper] | [code]
[2025/05/28] LaMDAgent: An Autonomous Framework for Post-Training Pipeline Optimization via LLM Agents | [paper] | [code]
[2025/05/27] BehaviorSFT: Behavioral Token Conditioning for Clinical Agents Across the Proactivity Spectrum | [paper] | [code]
[2025/05/26] Frictional Agent Alignment Framework: Slow Down and Don't Break Things | [paper] | [code]
[2025/05/26] Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking | [paper] | [code]
[2025/05/26] MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability | [paper] | [code]
[2025/03/05] MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems | [paper] | [code]
[2025/03/05] Enhancing Collective Intelligence in Large Language Models Through Emotional Integration | [paper] | [code]
[2025/03/04] ATLaS: Agent Tuning via Learning Critical Steps | [paper] | [code]
[2025/02/24] Training a Generally Curious Agent | [paper] | [code]
[2025/02/19] UM_FHS at TREC 2024 PLABA: Exploration of Fine-tuning and AI agent approach for plain language adaptations of biomedical text | [paper] | [code]
[2025/02/18] Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors | [paper] | [code]
[2025/02/11] Multi-Agent Collaboration for Multilingual Code Instruction Tuning | [paper] | [code]
[2025/02/10] Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training | [paper] | [code]
[2025/01/10] Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains | [paper] | [code]
[2025/01/03] AgentRefine: Enhancing Agent Generalization through Refinement Tuning | [paper] | [code]
[2024/12/30] Training Software Engineering Agents and Verifiers with SWE-Gym | [paper] | [code]
[2024/12/30] Aviary: training language agents on challenging scientific tasks | [paper] | [code]
[2024/12/16] Virtual Agent-Based Communication Skills Training to Facilitate Health Persuasion Among Peers | [paper] | [code]
[2024/11/29] Training Agents with Weakly Supervised Feedback from Large Language Models | [paper] | [code]
[2024/11/21] Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning | [paper] | [code]
[2024/10/20] Training Language Models to Critique With Multi-agent Feedback | [paper] | [code]
[2024/10/16] Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance | [paper] | [code]
[2024/10/10] AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories | [paper] | [code]
[2024/07/25] Recursive Introspection: Teaching Language Model Agents How to Self-Improve | [paper] | [code]
[2024/06/11] CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation | [paper] | [code]
[2024/04/05] Social Skill Training with Large Language Models | [paper] | [code]
[2024/04/02] CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models | [paper] | [code]
[2024/03/29] Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning | [paper] | [code]
[2024/03/21] ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy | [paper] | [code]
[2024/03/19] Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models | [paper] | [code]
[2024/02/23] AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning | [paper] | [code]
[2024/02/21] Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent | [paper] | [code]
[2024/02/18] Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents | [paper] | [code]
[2024/01/10] Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training | [paper] | [code]
[2024/01/05] From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models | [paper] | [code]
[2023/12/22] Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning | [paper] | [code]
[2023/11/28] Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld | [paper] | [code]
[2023/10/19] AgentTuning: Enabling Generalized Agent Abilities for LLMs | [paper] | [code]
[2023/10/09] FireAct: Toward Language Agent Fine-tuning | [paper] | [code]
[2023/05/26] Training Socially Aligned Language Models on Simulated Social Interactions | [paper] | [code]

RL

[2025/07/03] MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent | [paper] | [code]
[2025/07/03] RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents | [paper] | [code]
[2025/07/02] OpenTable-R1: A Reinforcement Learning Augmented Tool Agent for Open-Domain Table Question Answering | [paper] | [code]
[2025/06/30] L0: Reinforcement Learning to Become General Agents | [paper] | [code]
[2025/06/30] Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning | [paper] | [code]
[2025/06/30] SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning | [paper] | [code]
[2025/06/24] KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality | [paper] | [code]
[2025/06/16] Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning | [paper] | [code]
[2025/06/13] Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards | [paper] | [code]
[2025/05/29] ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering | [paper] | [code]
[2025/05/28] WorkForceAgent-R1: Incentivizing Reasoning Capability in LLM-based Web Agents via Reinforcement Learning | [paper] | [code]
[2025/05/28] WebDancer: Towards Autonomous Information Seeking Agency | [paper] | [code]
[2025/05/27] SPA-RL: Reinforcing LLM Agents via Stepwise Progress Attribution | [paper] | [code]
[2025/05/26] DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue | [paper] | [code]
[2025/05/26] REARANK: Reasoning Re-ranking Agent via Reinforcement Learning | [paper] | [code]
[2025/05/22] WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning | [paper] | [code]
[2025/05/21] An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents | [paper] | [code]
[2025/05/20] Reinforcing Question Answering Agents with Minimalist Policy Gradient Optimization | [paper] | [code]
[2025/05/20] s3: You Don't Need That Much Data to Train a Search Agent via RL | [paper] | [code]
[2025/05/17] Retrospex: Language Agent Meets Offline Reinforcement Learning Critic | [paper] | [code]
[2025/05/06] Divide, Optimize, Merge: Fine-Grained LLM Agent Optimization at Scale | [paper] | [code]
[2025/04/24] RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning | [paper] | [code]
[2025/04/20] Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey | [paper] | [code]
[2025/04/04] Learning Natural Language Constraints for Safe Reinforcement Learning of Language Agents | [paper] | [code]
[2025/03/16] LLM-Mediated Guidance of MARL Systems | [paper] | [code]
[2025/03/12] ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning | [paper] | [code]
[2025/03/03] Improving Retrospective Language Agents via Joint Policy Gradient Optimization | [paper] | [code]
[2025/02/25] AgentRM: Enhancing Agent Generalization with Reward Modeling | [paper] | [code]
[2025/02/09] Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning | [paper] | [code]
[2025/02/06] Multi-Agent Reinforcement Learning with Focal Diversity Optimization | [paper] | [code]
[2025/01/25] Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning | [paper] | [code]
[2024/11/26] LLM-Based Offline Learning for Embodied Agents via Consistency-Guided Reward Ensemble | [paper] | [code]
[2024/11/07] Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations | [paper] | [code]
[2024/11/06] From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning | [paper] | [code]
[2024/11/04] WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning | [paper] | [code]
[2024/10/11] Words as Beacons: Guiding RL Agents with High-Level Language Prompts | [paper] | [code]
[2024/10/10] MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization | [paper] | [code]
[2024/07/02] Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling | [paper] | [code]
[2024/06/26] Mental Modeling of Reinforcement Learning Agents by Language Models | [paper] | [code]
[2024/06/17] Input Conditioned Graph Generation for Language Agents | [paper] | [code]
[2024/06/05] LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback | [paper] | [code]
[2024/06/03] Re-ReST: Reflection-Reinforced Self-Training for Language Agents | [paper] | [code]
[2024/05/30] Safe Multi-agent Reinforcement Learning with Natural Language Constraints | [paper] | [code]
[2024/05/17] LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions | [paper] | [code]
[2024/05/16] Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning | [paper] | [code]
[2024/05/01] Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning | [paper] | [code]
[2024/03/05] Language Guided Exploration for RL Agents in Text Environments | [paper] | [code]
[2024/02/17] Offline Training of Language Model Agents with Functions as Learnable Weights | [paper] | [code]
[2024/02/02] StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback | [paper] | [code]
[2023/10/25] MultiPrompter: Cooperative Prompt Optimization with Multi-Agent Reinforcement Learning | [paper] | [code]
[2023/03/29] Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks | [paper] | [code]

DPO

[2025/06/17] Expectation Confirmation Preference Optimization for Multi-Turn Conversational Recommendation Agent | [paper] | [code]
[2025/06/04] Debate, Reflect, and Distill: Multi-Agent Feedback with Tree-Structured Preference Optimization for Efficient Language Model Enhancement | [paper] | [code]
[2025/06/02] PGPO: Enhancing Agent Reasoning via Pseudocode-style Planning Guided Preference Optimization | [paper] | [code]
[2025/05/26] MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability | [paper] | [code]
[2025/05/04] Adaptive Thinking via Mode Policy Optimization for Social Language Agents | [paper] | [code]
[2025/04/27] Anyprefer: An Agentic Framework for Preference Data Synthesis | [paper] | [code]
[2025/02/26] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems | [paper] | [code]
[2025/01/03] SDPO: Segment-Level Direct Preference Optimization for Social Agents | [paper] | [code]
[2024/10/29] Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning | [paper] | [code]
[2024/05/31] Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training | [paper] | [code]

Scaling

Single-Agent Framework

[2025/07/08] Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving | [paper] | [code]
[2025/07/04] GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation | [paper] | [code]
[2025/06/29] AURA: Agent for Understanding, Reasoning, and Automated Tool Use in Voice-Driven Tasks | [paper] | [code]
[2025/06/27] A Large Language Model-Empowered Agent for Reliable and Robust Structural Analysis | [paper] | [code]
[2025/06/17] VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents | [paper] | [code]
[2025/06/17] OAgents: An Empirical Study of Building Effective Agents | [paper] | [code]
[2025/06/16] Leveraging In-Context Learning for Language Model Agents | [paper] | [code]
[2025/06/14] Towards Building General Purpose Embedding Models for Industry 4.0 Agents | [paper] | [code]
[2025/06/12] AutoMind: Adaptive Knowledgeable Agent for Automated Data Science | [paper] | [code]
[2025/06/03] DIAMOND: An LLM-Driven Agent for Context-Aware Baseball Highlight Summarization | [paper] | [code]
[2025/06/03] Comparative Analysis of AI Agent Architectures for Entity Relationship Classification | [paper] | [code]
[2025/06/02] Self-Challenging Language Model Agents | [paper] | [code]
[2025/05/30] NexusSum: Hierarchical LLM Agents for Long-Form Narrative Summarization | [paper] | [code]
[2025/05/21] ViQAgent: Zero-Shot Video Question Answering via Agent with Open-Vocabulary Grounding Validation | [paper] | [code]
[2025/05/20] ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions | [paper] | [code]
[2025/05/12] Putting It All into Context: Simplifying Agents with LCLMs | [paper] | [code]
[2025/04/17] Pandora: A Code-Driven Large Language Model Agent for Unified Reasoning Across Diverse Structured Knowledge | [paper] | [code]
[2025/04/11] Toward Super Agent System with Hybrid AI Routers | [paper] | [code]
[2025/04/10] AgentAda: Skill-Adaptive Data Analytics for Tailored Insight Discovery | [paper] | [code]
[2025/04/07] DoCIA: An Online Document-Level Context Incorporation Agent for Speech Translation | [paper] | [code]
[2025/03/20] Do Visual Imaginations Improve Vision-and-Language Navigation Agents? | [paper] | [code]
[2025/03/14] Large Reasoning Models in Agent Scenarios: Exploring the Necessity of Reasoning Capabilities | [paper] | [code]
[2025/03/10] DatawiseAgent: A Notebook-Centric LLM Agent Framework for Automated Data Science | [paper] | [code]
[2025/03/10] ASTRA: A Negotiation Agent with Adaptive and Strategic Reasoning through Action in Dynamic Offer Optimization | [paper] | [code]
[2025/02/26] TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding | [paper] | [code]
[2025/02/14] Agentic Verification for Ambiguous Query Disambiguation | [paper] | [code]
[2025/02/12] SPeCtrum: A Grounded Framework for Multidimensional Identity Representation in LLM-Based Agent | [paper] | [code]
[2025/02/09] AutoAgent: A Fully-Automated and Zero-Code Framework for LLM Agents | [paper] | [code]
[2025/02/04] Adaptive Self-improvement LLM Agentic System for ML Library Development | [paper] | [code]
[2025/01/31] Enabling Autonomic Microservice Management through Self-Learning Agents | [paper] | [code]
[2024/12/28] OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System | [paper] | [code]
[2024/12/21] Self-guided Knowledgeable Network of Thoughts: Amplifying Reasoning with Large Language Models | [paper] | [code]
[2024/12/15] AgentPS: Agentic Process Supervision for Multi-modal Content Quality Assurance through Multi-round QA | [paper] | [code]
[2024/12/11] A Multimodal Social Agent | [paper] | [code]
[2024/12/11] Federated In-Context LLM Agent Learning | [paper] | [code]
[2024/12/04] How to Correctly do Semantic Backpropagation on Language-based Agentic Systems | [paper] | [code]
[2024/12/02] SAUP: Situation Awareness Uncertainty Propagation on LLM Agent | [paper] | [code]
[2024/12/01] Towards Adaptive Mechanism Activation in Language Agent | [paper] | [code]
[2024/11/20] MindForge: Empowering Embodied Agents with Theory of Mind for Lifelong Collaborative Learning | [paper] | [code]
[2024/11/16] IntentGPT: Few-shot Intent Discovery with Large Language Models | [paper] | [code]
[2024/11/04] DynaSaur: Large Language Agents Beyond Predefined Actions | [paper] | [code]
[2024/11/04] CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments | [paper] | [code]
[2024/10/29] ADAM: An Embodied Causal Agent in Open-World Environments | [paper] | [code]
[2024/10/27] TrajAgent: An Agent Framework for Unified Trajectory Modelling | [paper] | [code]
[2024/10/22] Adsorb-Agent: Autonomous Identification of Stable Adsorption Configurations via Large Language Model Agent | [paper] | [code]
[2024/10/11] Encoding Agent Trajectories as Representations with Sequence Transformers | [paper] | [code]
[2024/10/10] Agents Thinking Fast and Slow: A Talker-Reasoner Architecture | [paper] | [code]
[2024/10/08] AgentSquare: Automatic LLM Agent Search in Modular Design Space | [paper] | [code]
[2024/10/08] Applying Refusal-Vector Ablation to Llama 3.1 70B Agents | [paper] | [code]
[2024/09/24] MOSS: Enabling Code-Driven Evolution and Context Management for AI Agents | [paper] | [code]
[2024/09/19] Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation | [paper] | [code]
[2024/09/15] Automatic Control With Human-Like Reasoning: Exploring Language Model Embodied Air Traffic Agents | [paper] | [code]
[2024/09/12] Self-Supervised Inference of Agents in Trustless Environments | [paper] | [code]
[2024/09/05] From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents | [paper] | [code]
[2024/09/05] Rx Strategist: Prescription Verification using LLM Agents System | [paper] | [code]
[2024/09/03] AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction | [paper] | [code]
[2024/08/26] AgentMove: A Large Language Model based Agentic Framework for Zero-shot Next Location Prediction | [paper] | [code]
[2024/08/19] Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation | [paper] | [code]
[2024/08/13] Causal Agent based on Large Language Model | [paper] | [code]
[2024/08/02] Coalitions of Large Language Models Increase the Robustness of AI Agents | [paper] | [code]
[2024/07/27] AgentPeerTalk: Empowering Students through Agentic-AI-Driven Discernment of Bullying and Joking in Peer Interactions in Schools | [paper] | [code]
[2024/07/25] Enhancing Agent Learning through World Dynamics Modeling | [paper] | [code]
[2024/07/25] RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models | [paper] | [code]
[2024/07/16] Preemptive Detection and Correction of Misaligned Actions in LLM Agents | [paper] | [code]
[2024/07/15] Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning | [paper] | [code]
[2024/07/02] Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents | [paper] | [code]
[2024/06/24] OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer | [paper] | [code]
[2024/06/07] SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals | [paper] | [code]
[2024/05/25] AutoManual: Constructing Instruction Manuals by LLM Agents via Interactive Environmental Learning | [paper] | [code]
[2024/05/24] Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models | [paper] | [code]
[2024/05/16] Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents | [paper] | [code]
[2024/04/30] Large Language Model Agent for Fake News Detection | [paper] | [code]
[2024/04/28] Logic Agent: Enhancing Validity with Logic Rule Invocation | [paper] | [code]
[2024/04/13] LLMSat: A Large Language Model-Based Goal-Oriented Agent for Autonomous Space Exploration | [paper] | [code]
[2024/04/01] TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering | [paper] | [code]
[2024/03/29] ITCMA: A Generative Agent Based on a Computational Consciousness Structure | [paper] | [code]
[2024/02/25] Bootstrapping Cognitive Agents with a Large Language Model | [paper] | [code]
[2024/02/24] Empowering Large Language Model Agents through Action Learning | [paper] | [code]
[2024/02/20] Soft Self-Consistency Improves Language Model Agents | [paper] | [code]
[2024/02/04] NavHint: Vision and Language Navigation Agent with a Hint Generator | [paper] | [code]
[2024/01/05] AFSPP: Agent Framework for Shaping Preference and Personality with Large Language Models | [paper] | [code]
[2023/11/23] Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach | [paper] | [code]
[2023/11/02] ProAgent: From Robotic Process Automation to Agentic Process Automation | [paper] | [code]
[2023/10/16] CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization | [paper] | [code]
[2023/09/29] Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency | [paper] | [code]
[2023/09/14] Agents: An Open-source Framework for Autonomous Language Agents | [paper] | [code]
[2023/09/08] A Versatile Graph Learning Approach through LLM-based Agent | [paper] | [code]
[2023/09/05] Cognitive Architectures for Language Agents | [paper] | [code]
[2023/05/27] SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks | [paper] | [code]
[2023/05/25] Voyager: An Open-Ended Embodied Agent with Large Language Models | [paper] | [code]

Multi-Agent System

[2025/07/09] Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings | [paper] | [code]
[2025/07/09] MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection | [paper] | [code]
[2025/06/27] GenEscape: Hierarchical Multi-Agent Generation of Escape Room Puzzles | [paper] | [code]
[2025/06/20] SysTemp: A Multi-Agent System for Template-Based Generation of SysML v2 | [paper] | [code]
[2025/06/19] StoryWriter: A Multi-Agent Framework for Long Story Generation | [paper] | [code]
[2025/06/18] AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need | [paper] | [code]
[2025/06/17] MAS-LitEval : Multi-Agent System for Literary Translation Quality Assessment | [paper] | [code]
[2025/06/17] Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team | [paper] | [code]
[2025/06/13] A Hybrid Multi-Agent Prompting Approach for Simplifying Complex Sentences | [paper] | [code]
[2025/06/13] AutoGen Driven Multi Agent Framework for Iterative Crime Data Analysis and Prediction | [paper] | [code]
[2025/06/13] Investigating the Potential of Large Language Model-Based Router Multi-Agent Architectures for Foundation Design Automation: A Task Classification and Expert Selection Study | [paper] | [code]
[2025/06/12] A Multi-Agent Probabilistic Inference Framework Inspired by Kairanban-Style CoT System with IdoBata Conversation for Debiasing | [paper] | [code]
[2025/06/11] Multi-Agent Language Models: Advancing Cooperation, Coordination, and Adaptation | [paper] | [code]
[2025/06/11] ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning | [paper] | [code]
[2025/06/11] Chat-of-Thought: Collaborative Multi-Agent System for Generating Domain Specific Information | [paper] | [code]
[2025/06/10] CAF-I: A Collaborative Multi-Agent Framework for Enhanced Irony Detection with Large Language Models | [paper] | [code]
[2025/06/10] Reinforce LLM Reasoning through Multi-Agent Reflection | [paper] | [code]
[2025/06/09] From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium | [paper] | [code]
[2025/06/08] Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models | [paper] | [code]
[2025/06/06] MAPLE: Multi-Agent Adaptive Planning with Long-Term Memory for Table Reasoning | [paper] | [code]
[2025/06/06] Does It Run and Is That Enough? Revisiting Text-to-Chart Generation with a Multi-Agent Approach | [paper] | [code]
[2025/06/05] Demonstrations of Integrity Attacks in Multi-Agent Systems | [paper] | [code]
[2025/06/04] CLAIM: An Intent-Driven Multi-Agent Framework for Analyzing Manipulation in Courtroom Dialogues | [paper] | [code]
[2025/06/03] MASTER: Enhancing Large Language Model via Multi-Agent Simulated Teaching | [paper] | [code]
[2025/06/03] Adaptive Graph Pruning for Multi-Agent Communication | [paper] | [code]
[2025/06/03] A Multi-Agent Framework for Mitigating Dialect Biases in Privacy Policy Question-Answering Systems | [paper] | [code]
[2025/06/03] Mitigating Manipulation and Enhancing Persuasion: A Reflective Multi-Agent Approach for Legal Argument Generation | [paper] | [code]
[2025/06/03] MAEBE: Multi-Agent Emergent Behavior Framework | [paper] | [code]
[2025/06/02] STORM-BORN: A Challenging Mathematical Derivations Dataset Curated via a Human-in-the-Loop Multi-Agent Framework | [paper] | [code]
[2025/06/02] An Empirical Study of Group Conformity in Multi-Agent Systems | [paper] | [code]
[2025/05/31] Goal-Aware Identification and Rectification of Misinformation in Multi-Agent Systems | [paper] | [code]
[2025/05/31] PAKTON: A Multi-Agent Framework for Question Answering in Long Legal Agreements | [paper] | [code]
[2025/05/30] CREFT: Sequential Multi-Agent LLM for Character Relation Extraction | [paper] | [code]
[2025/05/30] Multiple LLM Agents Debate for Equitable Cultural Alignment | [paper] | [code]
[2025/05/30] An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring | [paper] | [code]
[2025/05/29] Cross-Task Experiential Learning on LLM-based Multi-Agent Collaboration | [paper] | [code]
[2025/05/29] OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation | [paper] | [code]
[2025/05/28] Co-Saving: Resource Aware Multi-Agent Collaboration for Software Development | [paper] | [code]
[2025/05/28] CoMaPOI: A Collaborative Multi-Agent Framework for Next POI Prediction Bridging the Gap Between Trajectory and Language | [paper] | [code]
[2025/05/28] GETReason: Enhancing Image Context Extraction through Hierarchical Multi-Agent Reasoning | [paper] | [code]
[2025/05/27] Long Context Scaling: Divide and Conquer via Multi-Agent Question-driven Collaboration | [paper] | [code]
[2025/05/27] Rethinking Information Synthesis in Multimodal Question Answering A Multi-Agent Perspective | [paper] | [code]
[2025/05/27] Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration | [paper] | [code]
[2025/05/26] CoTGuard: Using Chain-of-Thought Triggering for Copyright Protection in Multi-Agent LLM Systems | [paper] | [code]
[2025/05/26] Multi-Agent Collaboration via Evolving Orchestration | [paper] | [code]
[2025/05/26] Select, Read, and Write: A Multi-Agent Framework of Full-Text-based Related Work Generation | [paper] | [code]
[2025/05/26] Project Riley: Multimodal Multi-Agent LLM Collaboration with Emotional Reasoning and Voting | [paper] | [code]
[2025/05/25] MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems | [paper] | [code]
[2025/05/25] GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling | [paper] | [code]
[2025/05/23] ManuSearch: Democratizing Deep Search in Large Language Models with a Transparent and Open Multi-Agent Framework | [paper] | [code]
[2025/05/23] PD$^3$: A Project Duplication Detection Framework via Adapted Multi-Agent Debate | [paper] | [code]
[2025/05/22] EMULATE: A Multi-Agent Framework for Determining the Veracity of Atomic Claims by Emulating Human Actions | [paper] | [code]
[2025/05/22] X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs | [paper] | [code]
[2025/05/21] MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision | [paper] | [code]
[2025/05/20] MAATS: A Multi-Agent Automated Translation System Based on MQM Evaluation | [paper] | [code]
[2025/05/20] MLZero: A Multi-Agent System for End-to-end Machine Learning Automation | [paper] | [code]
[2025/05/19] AD-AGENT: A Multi-agent Framework for End-to-end Anomaly Detection | [paper] | [code]
[2025/05/18] IP Leakage Attacks Targeting LLM-Based Multi-Agent Systems | [paper] | [code]
[2025/05/17] BELLE: A Bi-Level Multi-Agent Reasoning Framework for Multi-Hop Question Answering | [paper] | [code]
[2025/05/16] Connecting the Dots: A Chain-of-Collaboration Prompting Framework for LLM Agents | [paper] | [code]
[2025/05/15] Assessing Collective Reasoning in Multi-Agent LLMs via Hidden Profile Tasks | [paper] | [code]
[2025/05/12] Towards Multi-Agent Reasoning Systems for Collaborative Expertise Delegation: An Exploratory Design Study | [paper] | [code]
[2025/05/06] The Power of Stories: Narrative Priming Shapes How LLM Agents Collaborate and Compete | [paper] | [code]
[2025/04/30] Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems | [paper] | [code]
[2025/04/26] MATCHA: Can Multi-Agent Collaboration Build a Trustworthy Conversational Recommender? | [paper] | [code]
[2025/04/24] Collaborating Action by Action: A Multi-agent LLM Framework for Embodied Reasoning | [paper] | [code]
[2025/04/23] Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation | [paper] | [code]
[2025/04/21] EducationQ: Evaluating LLMs' Teaching Capabilities Through Multi-Agent Dialogue Framework | [paper] | [code]
[2025/04/17] Are AI agents the new machine translation frontier? Challenges and opportunities of single- and multi-agent systems for multilingual digital communication | [paper] | [code]
[2025/04/15] X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents | [paper] | [code]
[2025/04/11] Beyond Self-Reports: Multi-Observer Agents for Personality Assessment in Large Language Models | [paper] | [code]
[2025/04/11] DocAgent: A Multi-Agent System for Automated Code Documentation Generation | [paper] | [code]
[2025/04/08] FactGuard: Leveraging Multi-Agent Systems to Generate Answerable and Unanswerable Questions for Enhanced Long-Context LLM Extraction | [paper] | [code]
[2025/04/04] YaleNLP @ PerAnsSumm 2025: Multi-Perspective Integration via Mixture-of-Agents for Enhanced Healthcare QA Summarization | [paper] | [code]
[2025/04/02] Self-Resource Allocation in Multi-Agent LLM Systems | [paper] | [code]
[2025/04/02] Achieving Unanimous Consensus in Decision Making Using Multi-Agents | [paper] | [code]
[2025/04/01] When Persuasion Overrides Truth in Multi-Agent LLM Debates: Introducing a Confidence-Weighted Persuasion Override Rate (CW-POR) | [paper] | [code]
[2025/04/01] AI Hiring with LLMs: A Context-Aware and Explainable Multi-Agent Framework for Resume Screening | [paper] | [code]
[2025/04/01] AgentNet: Decentralized Evolutionary Coordination for LLM-based Multi-Agent Systems | [paper] | [code]
[2025/03/31] $\textit{Agents Under Siege}$: Breaking Pragmatic Multi-Agent LLM Systems with Optimized Prompt Attacks | [paper] | [code]
[2025/03/28] WorkTeam: Constructing Workflows from Natural Language with Multi-Agents | [paper] | [code]
[2025/03/28] Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions | [paper] | [code]
[2025/03/27] Collab: Controlled Decoding using Mixture of Agents for LLM Alignment | [paper] | [code]
[2025/03/27] Debate-Driven Multi-Agent LLMs for Phishing Email Detection | [paper] | [code]
[2025/03/26] TAMA: A Human-AI Collaborative Thematic Analysis Framework Using Multi-Agent LLMs for Clinical Interviews | [paper] | [code]
[2025/03/26] 3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark | [paper] | [code]
[2025/03/25] Multi-agent Application System in Office Collaboration Scenarios | [paper] | [code]
[2025/03/24] AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-Performance LLM-Based Multi-Agent Collaboration | [paper] | [code]
[2025/03/23] MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection | [paper] | [code]
[2025/03/21] ConvoGen: Enhancing Conversational AI with Synthetic Data: A Multi-Agent Approach | [paper] | [code]
[2025/03/21] MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization | [paper] | [code]
[2025/03/19] When Pigs Get Sick: Multi-Agent AI for Swine Disease Detection | [paper] | [code]
[2025/03/19] MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration | [paper] | [code]
[2025/03/18] Gricean Norms as a Basis for Effective Collaboration | [paper] | [code]
[2025/03/17] Identifying Cooperative Personalities in Multi-agent Contexts through Personality Steering with Representation Engineering | [paper] | [code]
[2025/03/17] MAP: Evaluation and Multi-Agent Enhancement of Large Language Models for Inpatient Pathways | [paper] | [code]
[2025/03/16] LLM-Mediated Guidance of MARL Systems | [paper] | [code]
[2025/03/14] AIstorian lets AI be a historian: A KG-powered multi-agent system for accurate biography generation | [paper] | [code]
[2025/03/14] Prompt Injection Detection and Mitigation via AI Multi-Agent NLP Frameworks | [paper] | [code]
[2025/03/14] RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration | [paper] | [code]
[2025/03/13] LLMs Working in Harmony: A Survey on the Technological Aspects of Building Effective LLM-Based Multi Agent Systems | [paper] | [code]
[2025/03/12] ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning | [paper] | [code]
[2025/03/07] MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio | [paper] | [code]
[2025/03/07] GEMA-Score: Granular Explainable Multi-Agent Score for Radiology Report Evaluation | [paper] | [code]
[2025/03/07] Multi Agent based Medical Assistant for Edge Devices | [paper] | [code]
[2025/03/05] MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving | [paper] | [code]
[2025/03/05] MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems | [paper] | [code]
[2025/03/05] Multi-Agent Systems Powered by Large Language Models: Applications in Swarm Intelligence | [paper] | [code]
[2025/03/05] Preserving Cultural Identity with Context-Aware Translation Through Multi-Agent AI Systems | [paper] | [code]
[2025/03/05] Enhancing Collective Intelligence in Large Language Models Through Emotional Integration | [paper] | [code]
[2025/03/04] BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modelling | [paper] | [code]
[2025/03/04] Multi-Agent System for AI-Assisted Extraction of Narrative Arcs in TV Series | [paper] | [code]
[2025/03/01] Structured Reasoning for Fairness: A Multi-Agent Approach to Bias Detection in Textual Data | [paper] | [code]
[2025/02/28] PreMind: Multi-Agent Video Understanding for Advanced Indexing of Presentation-style Videos | [paper] | [code]
[2025/02/27] M^3Builder: A Multi-Agent System for Automated Machine Learning in Medical Imaging | [paper] | [code]
[2025/02/26] Stay Focused: Problem Drift in Multi-Agent Debate | [paper] | [code]
[2025/02/26] Voting or Consensus? Decision-Making in Multi-Agent Debate | [paper] | [code]
[2025/02/25] Enhancing Text Classification with a Novel Multi-Agent Collaboration Framework Leveraging BERT | [paper] | [code]
[2025/02/25] A Cooperative Multi-Agent Framework for Zero-Shot Named Entity Recognition | [paper] | [code]
[2025/02/25] Debt Collection Negotiations with Large Language Models: An Evaluation System and Optimizing Decision Making with Multi-Agent | [paper] | [code]
[2025/02/25] FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models | [paper] | [code]
[2025/02/24] MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions | [paper] | [code]
[2025/02/24] Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration | [paper] | [code]
[2025/02/24] METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling | [paper] | [code]
[2025/02/23] The Hidden Strength of Disagreement: Unraveling the Consensus-Diversity Tradeoff in Adaptive Multi-Agent Systems | [paper] | [code]
[2025/02/20] Enhancing Language Multi-Agent Learning with Multi-Agent Credit Re-Assignment for Interactive Environment Generalization | [paper] | [code]
[2025/02/20] CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models | [paper] | [code]
[2025/02/17] Table-Critic: A Multi-Agent Framework for Collaborative Criticism and Refinement in Table Reasoning | [paper] | [code]
[2025/02/17] HARBOR: Exploring Persona Dynamics in Multi-Agent Competition | [paper] | [code]
[2025/02/15] Divergent Thoughts toward One Goal: LLM-based Multi-Agent Collaboration System for Electronic Design Automation | [paper] | [code]
[2025/02/13] PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology | [paper] | [code]
[2025/02/13] Mind the Gaps: Logical English, Prolog, and Multi-agent Systems for Autonomous Vehicles | [paper] | [code]
[2025/02/12] Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation | [paper] | [code]
[2025/02/12] If Multi-Agent Debate is the Answer, What is the Question? | [paper] | [code]
[2025/02/11] Don't Just Demo, Teach Me the Principles: A Principle-Based Multi-Agent Prompting Strategy for Text Classification | [paper] | [code]
[2025/02/11] Multi-Agent Collaboration for Multilingual Code Instruction Tuning | [paper] | [code]
[2025/02/10] KARMA: Leveraging Multi-Agent LLMs for Automated Knowledge Graph Enrichment | [paper] | [code]
[2025/02/09] Preventing Rogue Agents Improves Multi-Agent Collaboration | [paper] | [code]
[2025/02/09] The Application of MATEC (Multi-AI Agent Team Care) Framework in Sepsis Care | [paper] | [code]
[2025/02/08] CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging | [paper] | [code]
[2025/02/08] Multi-Agent Simulator Drives Language Models for Legal Intensive Interaction | [paper] | [code]
[2025/02/07] S$^2$-MAD: Breaking the Token Barrier to Enhance Multi-Agent Debate Efficiency | [paper] | [code]
[2025/02/06] Multi-Agent Reinforcement Learning with Focal Diversity Optimization | [paper] | [code]
[2025/02/06] Enhancing Online Learning Efficiency Through Heterogeneous Resource Integration with a Multi-Agent RAG System | [paper] | [code]
[2025/02/06] Multi-agent Architecture Search via Agentic Supernet | [paper] | [code]
[2025/02/04] Position: Scaling LLM Agents Requires Asymptotic Analysis with LLM Primitives | [paper] | [code]
[2025/02/04] Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies | [paper] | [code]
[2025/02/03] PlotGen: Multi-Agent LLM-based Scientific Data Visualization via Multimodal Feedback | [paper] | [code]
[2025/02/03] ChartCitor: Multi-Agent Framework for Fine-Grained Chart Visual Attribution | [paper] | [code]
[2025/02/02] Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial? | [paper] | [code]
[2025/02/02] Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search | [paper] | [code]
[2025/01/29] Layered Chain-of-Thought Prompting for Multi-Agent LLM Systems: A Comprehensive Approach to Explainable Large Language Models | [paper] | [code]
[2025/01/27] MADP: Multi-Agent Deductive Planning for Enhanced Cognitive-Behavioral Mental Health Question Answer | [paper] | [code]
[2025/01/25] Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning | [paper] | [code]
[2025/01/24] Multi-agent KTO: Reinforcing Strategic Interactions of Large Language Model in Language Game | [paper] | [code]
[2025/01/24] Unmasking Conversational Bias in AI Multiagent Systems | [paper] | [code]
[2025/01/22] FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces | [paper] | [code]
[2025/01/19] IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems | [paper] | [code]
[2025/01/16] AutoCBT: An Autonomous Multi-agent Framework for Cognitive Behavioral Therapy in Psychological Counseling | [paper] | [code]
[2025/01/14] Talk to Right Specialists: Routing and Planning in Multi-agent System for Question Answering | [paper] | [code]
[2025/01/05] LatteReview: A Multi-Agent Framework for Systematic Review Automation Using Large Language Models | [paper] | [code]
[2025/01/02] Harnessing Multi-Agent LLMs for Complex Engineering Problem-Solving: A Framework for Senior Design Projects | [paper] | [code]
[2024/12/30] Distributed Mixture-of-Agents for Edge Inference with Large Language Models | [paper] | [code]
[2024/12/28] M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation | [paper] | [code]
[2024/12/28] Efficient Multi-Agent Collaboration with Tool Use for Online Planning in Complex Table Question Answering | [paper] | [code]
[2024/12/24] Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering | [paper] | [code]
[2024/12/22] Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration | [paper] | [code]
[2024/12/22] A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions via Iterative Refinement and LLM-Driven Feedback Loops | [paper] | [code]
[2024/12/20] Mitigating Social Bias in Large Language Models: A Multi-Objective Approach within a Multi-Agent Framework | [paper] | [code]
[2024/12/19] PsyDraw: A Multi-Agent Multimodal System for Mental Health Screening in Left-Behind Children | [paper] | [code]
[2024/12/18] Gradual Vigilance and Interval Communication: Enhancing Value Alignment in Multi-Agent Debates | [paper] | [code]
[2024/12/15] Cultural Palette: Pluralising Culture Alignment via Multi-agent Palette | [paper] | [code]
[2024/12/13] AutoPatent: A Multi-Agent Framework for Automatic Patent Generation | [paper] | [code]
[2024/12/12] DiverseAgentEntropy: Quantifying Black-Box LLM Uncertainty through Diverse Perspectives and Multi-Agent Interaction | [paper] | [code]
[2024/12/11] NAT-NL2GQL: A Novel Multi-Agent Framework for Translating Natural Language to Graph Query Language | [paper] | [code]
[2024/12/10] AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework | [paper] | [code]
[2024/12/07] SLA Management in Reconfigurable Multi-Agent RAG: A Systems Approach to Question Answering | [paper] | [code]
[2024/12/06] Breaking Event Rumor Detection via Stance-Separated Multi-Agent Debate | [paper] | [code]
[2024/12/06] Towards Effective GenAI Multi-Agent Collaboration: Design and Evaluation for Enterprise Applications | [paper] | [code]
[2024/12/06] Enhancing LLMs for Impression Generation in Radiology Reports through a Multi-Agent System | [paper] | [code]
[2024/12/06] TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft | [paper] | [code]
[2024/12/05] Educational-Psychological Dialogue Robot Based on Multi-Agent Collaboration | [paper] | [code]
[2024/12/01] Multi-Agent Collaboration in Incident Response with Large Language Models | [paper] | [code]
[2024/11/28] MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification | [paper] | [code]
[2024/11/21] PIORS: Personalized Intelligent Outpatient Reception based on Large Language Model with Multi-Agents Medical Scenario Simulation | [paper] | [code]
[2024/11/21] Enhancing LLMs for Power System Simulations: A Feedback-driven Multi-agent Framework | [paper] | [code]
[2024/11/18] The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning | [paper] | [code]
[2024/11/12] BudgetMLAgent: A Cost-Effective LLM Multi-Agent system for Automating Machine Learning Tasks | [paper] | [code]
[2024/11/11] Using Generative AI and Multi-Agents to Provide Automatic Feedback | [paper] | [code]
[2024/11/09] Mixture of Knowledge Minigraph Agents for Literature Review Generation | [paper] | [code]
[2024/11/05] SAUCE: Synchronous and Asynchronous User-Customizable Environment for Multi-Agent LLM Interaction | [paper] | [code]
[2024/11/05] SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents | [paper] | [code]
[2024/11/01] DARD: A Multi-Agent Approach for Task-Oriented Dialog Systems | [paper] | [code]
[2024/10/30] ACC-Debate: An Actor-Critic Approach to Multi-Agent Debate | [paper] | [code]
[2024/10/29] Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning | [paper] | [code]
[2024/10/29] MARCO: Multi-Agent Real-time Chat Orchestration | [paper] | [code]
[2024/10/28] CRAT: A Multi-Agent Framework for Causality-Enhanced Reflective and Retrieval-Augmented Translation with Large Language Models | [paper] | [code]
[2024/10/27] AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions | [paper] | [code]
[2024/10/24] Schema-Guided Culture-Aware Complex Event Simulation with Multi-Agent Role-Play | [paper] | [code]
[2024/10/23] GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration | [paper] | [code]
[2024/10/22] Decoding Time Series with LLMs: A Multi-Agent Framework for Cross-Domain Annotation | [paper] | [code]
[2024/10/19] An Electoral Approach to Diversify LLM-based Multi-Agent Collective Decision-Making | [paper] | [code]
[2024/10/18] Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation | [paper] | [code]
[2024/10/17] AdaSwitch: Adaptive Switching between Small and Large Agents for Effective Cloud-Local Collaborative Learning | [paper] | [code]
[2024/10/16] PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking | [paper] | [code]
[2024/10/13] LLM-Based Multi-Agent Systems are Scalable Graph Generative Models | [paper] | [code]
[2024/10/12] Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System | [paper] | [code]
[2024/10/11] JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework | [paper] | [code]
[2024/10/11] PEAR: A Robust and Flexible Automation Framework for Ptychography Enabled by Multiple Large Language Model Agents | [paper] | [code]
[2024/10/10] AI-Press: A Multi-Agent News Generating and Feedback Simulation System Powered by Large Language Models | [paper] | [code]
[2024/10/10] Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining | [paper] | [code]
[2024/10/10] Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System | [paper] | [code]
[2024/10/10] Prompt Engineering a Schizophrenia Chatbot: Utilizing a Multi-Agent Approach for Enhanced Compliance with Prompt Instructions | [paper] | [code]
[2024/10/10] Diversity of Thought Elicits Stronger Reasoning Capabilities in Multi-Agent Debate Frameworks | [paper] | [code]
[2024/10/09] Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach | [paper] | [code]
[2024/10/07] Adversarial Multi-Agent Evaluation of Large Language Models through Iterative Debates | [paper] | [code]
[2024/10/06] MindScope: Exploring cognitive biases in large language models through Multi-Agent Systems | [paper] | [code]
[2024/10/03] Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions | [paper] | [code]
[2024/10/03] Agents' Room: Narrative Generation through Multi-step Collaboration | [paper] | [code]
[2024/10/03] Can Large Language Models Grasp Legal Theories? Enhance Legal Reasoning with Insights from Multi-Agent Collaboration | [paper] | [code]
[2024/10/03] ColaCare: Enhancing Electronic Health Record Modeling through Large Language Model-Driven Multi-Agent Collaboration | [paper] | [code]
[2024/10/03] AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML | [paper] | [code]
[2024/10/02] RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance | [paper] | [code]
[2024/10/02] Zodiac: A Cardiologist-Level LLM Framework for Multi-Agent Diagnostics | [paper] | [code]
[2024/09/21] Towards Automated Patent Workflows: AI-Orchestrated Multi-Agent Framework for Intellectual Property Management and Analysis | [paper] | [code]
[2024/09/21] GroupDebate: Enhancing the Efficiency of Multi-Agent Debate Using Group Discussion | [paper] | [code]
[2024/09/20] Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts | [paper] | [code]
[2024/09/18] MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning | [paper] | [code]
[2024/09/17] The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives | [paper] | [code]
[2024/09/16] Instigating Cooperation among LLM Agents Using Adaptive Information Modulation | [paper] | [code]
[2024/09/14] Synergistic Simulations: Multi-Agent Problem Solving with Large Language Models | [paper] | [code]
[2024/09/12] Knowledge Tagging with Large Language Model based Multi-Agent System | [paper] | [code]
[2024/09/11] Propaganda to Hate: A Multimodal Analysis of Arabic Memes with Multi-Agent LLMs | [paper] | [code]
[2024/09/09] SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning | [paper] | [code]
[2024/09/06] Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets | [paper] | [code]
[2024/09/05] xLAM: A Family of Large Action Models to Empower AI Agent Systems | [paper] | [code]
[2024/09/02] Co-Learning: Code Learning for Multi-Agent Reinforcement Collaborative Framework with Conversational Natural Language Interfaces | [paper] | [code]
[2024/08/28] BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems | [paper] | [code]
[2024/08/27] AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems | [paper] | [code]
[2024/08/24] Towards Human-Level Understanding of Complex Process Engineering Schematics: A Pedagogical, Introspective Multi-Agent Framework for Open-Domain Question Answering | [paper] | [code]
[2024/08/22] MuMA-ToM: Multi-modal Multi-Agent Theory of Mind | [paper] | [code]
[2024/08/21] DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework | [paper] | [code]
[2024/08/16] The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation | [paper] | [code]
[2024/08/15] MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL | [paper] | [code]
[2024/08/15] Text2BIM: Generating Building Models Using a Large Language Model-based Multi-Agent Framework | [paper] | [code]
[2024/08/14] Development of a Large Language Model-based Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments | [paper] | [code]
[2024/08/08] Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate | [paper] | [code]
[2024/08/05] ReDel: A Toolkit for LLM-Powered Recursive Multi-Agent Systems | [paper] | [code]
[2024/08/05] Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information | [paper] | [code]
[2024/07/23] LawLuo: A Multi-Agent Collaborative Framework for Multi-Round Chinese Legal Consultation | [paper] | [code]
[2024/07/21] Multi-Agent Causal Discovery Using Large Language Models | [paper] | [code]
[2024/07/19] NeLLCom-X: A Comprehensive Neural-Agent Framework to Simulate Language Learning and Group Communication | [paper] | [code]
[2024/07/17] Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models | [paper] | [code]
[2024/07/16] InvAgent: A Large Language Model based Multi-Agent System for Inventory Management in Supply Chains | [paper] | [code]
[2024/07/13] Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks | [paper] | [code]
[2024/07/13] Cohesive Conversations: Enhancing Authenticity in Multi-Agent Simulated Dialogues | [paper] | [code]
[2024/07/10] Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities | [paper] | [code]
[2024/07/09] FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making | [paper] | [code]
[2024/07/09] Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence | [paper] | [code]
[2024/07/04] Solving Zebra Puzzles Using Constraint-Guided Multi-Agent Systems | [paper] | [code]
[2024/07/03] MentalAgora: A Gateway to Advanced Personalized Care in Mental Health through Multi-Agent Debating and Attribute Control | [paper] | [code]
[2024/06/17] Improving Multi-Agent Debate with Sparse Communication Topology | [paper] | [code]
[2024/06/13] Multi-Agent Software Development through Cross-Team Collaboration | [paper] | [code]
[2024/06/11] CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation | [paper] | [code]
[2024/06/07] Mixture-of-Agents Enhances Large Language Model Capabilities | [paper] | [code]
[2024/06/05] Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework | [paper] | [code]
[2024/06/04] Chain of Agents: Large Language Models Collaborating on Long-Context Tasks | [paper] | [code]
[2024/06/03] Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration | [paper] | [code]
[2024/05/30] Safe Multi-agent Reinforcement Learning with Natural Language Constraints | [paper] | [code]
[2024/05/23] CityGPT: Towards Urban IoT Learning, Analysis and Interaction with Multi-Agent System | [paper] | [code]
[2024/05/20] (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts | [paper] | [code]
[2024/05/10] LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play | [paper] | [code]
[2024/05/07] Enhancing the Efficiency and Accuracy of Underlying Asset Reviews in Structured Finance: The Application of Multi-agent Framework | [paper] | [code]
[2024/05/06] Persona Inconstancy in Multi-Agent LLM Collaboration: Conformity, Confabulation, and Impersonation | [paper] | [code]
[2024/05/05] Language Evolution for Evading Social Media Regulation via LLM-based Multi-agent Simulation | [paper] | [code]
[2024/04/25] Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents | [paper] | [code]
[2024/04/23] ClinicalAgent: Clinical Trial Multi-Agent System with Large Language Model-based Reasoning | [paper] | [code]
[2024/04/14] Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation | [paper] | [code]
[2024/04/12] Leveraging Multi-AI Agents for Cross-Domain Knowledge Discovery | [paper] | [code]
[2024/04/09] Foundation Models to the Rescue: Deadlock Resolution in Connected Multi-Robot Systems | [paper] | [code]
[2024/04/08] 360$^\circ$REA: Towards A Reusable Experience Accumulation with 360{\deg} Assessment for Multi-Agent System | [paper] | [code]
[2024/04/06] MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems | [paper] | [code]
[2024/04/02] Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization | [paper] | [code]
[2024/04/02] CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models | [paper] | [code]
[2024/03/26] MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution | [paper] | [code]
[2024/03/22] CACA Agent: Capability Collaboration based AI Agent | [paper] | [code]
[2024/03/21] Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering | [paper] | [code]
[2024/03/19] Embodied LLM Agents Learn to Cooperate in Organized Teams | [paper] | [code]
[2024/03/12] Transforming Competition into Collaboration: The Revolutionary Role of Multi-Agent Systems and Language Models in Modern Organizations | [paper] | [code]
[2024/03/02] AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks | [paper] | [code]
[2024/02/28] Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key? | [paper] | [code]
[2024/02/26] Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering | [paper] | [code]
[2024/02/26] LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments | [paper] | [code]
[2024/02/21] LLM Based Multi-Agent Generation of Semi-structured Documents from Semantic Templates in the Public Administration Domain | [paper] | [code]
[2024/02/18] Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation | [paper] | [code]
[2024/02/18] LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration | [paper] | [code]
[2024/02/15] TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation | [paper] | [code]
[2024/02/03] More Agents Is All You Need | [paper] | [code]
[2024/02/02] Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions | [paper] | [code]
[2024/02/02] A Multi-Agent Conversational Recommender System | [paper] | [code]
[2024/01/11] Combating Adversarial Attacks with Multi-Agent Debate | [paper] | [code]
[2024/01/08] MARG: Multi-Agent Review Generation for Scientific Papers | [paper] | [code]
[2024/01/08] SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems | [paper] | [code]
[2024/01/08] Why Solving Multi-agent Path Finding with Large Language Model has not Succeeded Yet | [paper] | [code]
[2023/12/20] AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation | [paper] | [code]
[2023/10/31] Multi-Agent Consensus Seeking via Large Language Models | [paper] | [code]
[2023/10/25] MultiPrompter: Cooperative Prompt Optimization with Multi-Agent Reinforcement Learning | [paper] | [code]
[2023/08/22] ProAgent: Building Proactive Cooperative Agents with Large Language Models | [paper] | [code]
[2023/08/21] AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors | [paper] | [code]
[2023/08/14] ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate | [paper] | [code]
[2023/08/01] MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework | [paper] | [code]
[2023/06/05] Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents | [paper] | [code]
[2023/05/31] Recursive Metropolis-Hastings Naming Game: Symbol Emergence in a Multi-agent System based on Probabilistic Generative Models | [paper] | [code]
[2023/05/30] Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate | [paper] | [code]
[2023/04/26] Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models | [paper] | [code]
[2023/04/24] ChatLLM Network: More brains, More intelligence | [paper] | [code]

Stability

Safety

[2025/07/09] VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation | [paper] | [code]
[2025/07/04] LTLCrit: A Temporal Logic-based LLM Critic for Safe and Efficient Embodied Agents | [paper] | [code]
[2025/07/01] Enhancing LLM Agent Safety via Causal Influence Prompting | [paper] | [code]
[2025/07/01] GAF-Guard: An Agentic Framework for Risk Management and Governance in Large Language Models | [paper] | [code]
[2025/06/25] Model Editing as a Double-Edged Sword: Steering Agent Ethical Behavior Toward Beneficence or Harm | [paper] | [code]
[2025/06/11] Effective Red-Teaming of Policy-Adherent Agents | [paper] | [code]
[2025/06/11] Disclosure Audits for LLM Agents | [paper] | [code]
[2025/06/09] SAFEFLOW: A Principled Protocol for Trustworthy and Transactional Autonomous Agent Systems | [paper] | [code]
[2025/06/04] RedDebate: Safer Responses through Multi-Agent Red Teaming Debates | [paper] | [code]
[2025/06/01] Simple Prompt Injection Attacks Can Leak Personal Data Observed by LLM Agents During Task Execution | [paper] | [code]
[2025/05/29] AgentAlign: Navigating Safety Alignment in the Shift from Informative to Agentic Large Language Models | [paper] | [code]
[2025/05/28] RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments | [paper] | [code]
[2025/05/26] TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent | [paper] | [code]
[2025/05/25] GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling | [paper] | [code]
[2025/05/18] IP Leakage Attacks Targeting LLM-Based Multi-Agent Systems | [paper] | [code]
[2025/05/16] EnvInjection: Environmental Prompt Injection Attack to Multi-modal Web Agents | [paper] | [code]
[2025/04/24] Assessing the Potential of Generative Agents in Crowdsourced Fact-Checking | [paper] | [code]
[2025/04/15] Towards Automated Safety Requirements Derivation Using Agent-based RAG | [paper] | [code]
[2025/03/26] sudo rm -rf agentic_security | [paper] | [code]
[2025/03/24] AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents | [paper] | [code]
[2025/03/06] SafeArena: Evaluating the Safety of Autonomous Web Agents | [paper] | [code]
[2025/02/20] CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models | [paper] | [code]
[2025/02/18] AEIA-MN: Evaluating the Robustness of Multimodal LLM-Powered Mobile Agents Against Active Environmental Injection Attacks | [paper] | [code]
[2025/02/17] "Nuclear Deployed!": Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents | [paper] | [code]
[2025/02/01] ALU: Agentic LLM Unlearning | [paper] | [code]
[2025/01/28] Context is Key for Agent Security | [paper] | [code]
[2024/12/21] The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents | [paper] | [code]
[2024/12/16] Seeker: Towards Exception Safety Code Generation with Intermediate Language Agents Framework | [paper] | [code]
[2024/12/09] The Fusion of Large Language Models and Formal Methods for Trustworthy AI Agents: A Roadmap | [paper] | [code]
[2024/11/08] Towards Low-Resource Harmful Meme Detection with LMM Agents | [paper] | [code]
[2024/11/06] MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue | [paper] | [code]
[2024/11/04] Attacking Vision-Language Computer Agents via Pop-ups | [paper] | [code]
[2024/10/22] AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents | [paper] | [code]
[2024/10/18] Coherence-Driven Multimodal Safety Dialogue with Active Learning for Embodied Agents | [paper] | [code]
[2024/10/11] AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents | [paper] | [code]
[2024/10/09] I Want to Break Free! Persuasion and Anti-Social Behavior of LLMs in Multi-Agent Settings with Social Hierarchy | [paper] | [code]
[2024/09/28] SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models | [paper] | [code]
[2024/09/17] EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage | [paper] | [code]
[2024/09/13] AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents | [paper] | [code]
[2024/08/20] Athena: Safe Autonomous Agents with Verbal Contrastive Learning | [paper] | [code]
[2024/08/05] Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions | [paper] | [code]
[2024/07/23] RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent | [paper] | [code]
[2024/06/05] BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents | [paper] | [code]
[2024/05/30] Safe Multi-agent Reinforcement Learning with Natural Language Constraints | [paper] | [code]
[2024/05/24] Hacc-Man: An Arcade Game for Jailbreaking LLMs | [paper] | [code]
[2024/03/02] AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks | [paper] | [code]
[2024/02/17] Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents | [paper] | [code]
[2024/02/16] ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages | [paper] | [code]
[2024/02/02] TrustAgent: Towards Safe and Trustworthy LLM-based Agents | [paper] | [code]
[2024/01/11] Combating Adversarial Attacks with Multi-Agent Debate | [paper] | [code]
[2023/11/17] Testing Language Model Agents Safely in the Wild | [paper] | [code]

Bias

[2025/05/27] Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making | [paper] | [code]
[2025/05/14] Language Agents Mirror Human Causal Reasoning Biases. How Can We Help Them Think Like Scientists? | [paper] | [code]
[2025/04/10] MALIBU Benchmark: Multi-Agent LLM Implicit Bias Uncovered | [paper] | [code]
[2025/03/27] Bias-Aware Agent: Enhancing Fairness in AI-Driven Knowledge Retrieval | [paper] | [code]
[2025/03/01] Structured Reasoning for Fairness: A Multi-Agent Approach to Bias Detection in Textual Data | [paper] | [code]
[2025/01/29] Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models | [paper] | [code]
[2025/01/24] Unmasking Conversational Bias in AI Multiagent Systems | [paper] | [code]
[2024/12/20] Mitigating Social Bias in Large Language Models: A Multi-Objective Approach within a Multi-Agent Framework | [paper] | [code]
[2024/11/12] Mitigating Bias in Queer Representation within Large Language Models: A Collaborative Agent Approach | [paper] | [code]
[2024/10/06] MindScope: Exploring cognitive biases in large language models through Multi-Agent Systems | [paper] | [code]
[2024/10/03] Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions | [paper] | [code]
[2024/05/23] ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation | [paper] | [code]
[2024/04/23] Aligning LLM Agents by Learning Latent Preference from User Edits | [paper] | [code]
[2024/02/19] Polarization of Autonomous Generative AI Agents Under Echo Chambers | [paper] | [code]
[2024/02/14] Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications | [paper] | [code]
[2024/01/09] Agent Alignment in Evolving Social Norms | [paper] | [code]

Hallucination

[2025/06/23] A Comment On "The Illusion of Thinking": Reframing the Reasoning Cliff as an Agentic Gap | [paper] | [code]
[2025/05/28] Position: Uncertainty Quantification Needs Reassessment for Large-language Model Agents | [paper] | [code]
[2025/03/14] Prompt Injection Detection and Mitigation via AI Multi-Agent NLP Frameworks | [paper] | [code]
[2025/03/14] RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration | [paper] | [code]
[2025/03/01] EXCLAIM: An Explainable Cross-Modal Agentic System for Misinformation Detection with Hierarchical Retrieval | [paper] | [code]
[2025/02/26] Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in QA Agents | [paper] | [code]
[2025/02/14] Automated Hypothesis Validation with Agentic Sequential Falsifications | [paper] | [code]
[2025/02/04] Position: Stop Acting Like Language Model Agents Are Normal Agents | [paper] | [code]
[2025/02/03] SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models | [paper] | [code]
[2025/01/19] Hallucination Mitigation using Agentic AI Natural Language-Based Frameworks | [paper] | [code]
[2024/11/25] Enhancing Multi-Agent Consensus through Third-Party LLM Integration: Analyzing Uncertainty and Mitigating Hallucinations in Large Language Models | [paper] | [code]
[2024/11/12] SHARP: Unlocking Interactive Hallucination via Stance Transfer in Role-Playing Agents | [paper] | [code]
[2024/07/08] DebUnc: Mitigating Hallucinations in Large Language Model Agent Communication with Uncertainty Estimations | [paper] | [code]
[2024/06/29] BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science | [paper] | [code]
[2024/06/17] Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector | [paper] | [code]
[2024/06/05] Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework | [paper] | [code]
[2024/05/28] TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models | [paper] | [code]
[2024/02/13] Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast | [paper] | [code]

Infrastructure

Benchmark&Evaluation

[2025/07/08] ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues? | [paper] | [code]
[2025/07/07] Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions | [paper] | [code]
[2025/07/04] Recon, Answer, Verify: Agents in Search of Truth | [paper] | [code]
[2025/07/04] STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking | [paper] | [code]
[2025/07/01] TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation | [paper] | [code]
[2025/06/27] Don't Trust Generative Agents to Mimic Communication on Social Networks Unless You Benchmarked their Empirical Realism | [paper] | [code]
[2025/06/27] RExBench: Can coding agents autonomously implement AI research extensions? | [paper] | [code]
[2025/06/26] Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents | [paper] | [code]
[2025/06/25] The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind | [paper] | [code]
[2025/06/20] MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents | [paper] | [code]
[2025/06/20] Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems | [paper] | [code]
[2025/06/19] IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks | [paper] | [code]
[2025/06/13] DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents | [paper] | [code]
[2025/06/13] The Behavior Gap: Evaluating Zero-shot LLM Agents in Complex Task-Oriented Dialogs | [paper] | [code]
[2025/06/11] Bench to the Future: A Pastcasting Benchmark for Forecasting Agents | [paper] | [code]
[2025/06/10] Atomic-to-Compositional Generalization for Mobile Agents with A New Benchmark and Scheduling System | [paper] | [code]
[2025/06/10] UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench | [paper] | [code]
[2025/06/09] EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments | [paper] | [code]
[2025/06/09] HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization | [paper] | [code]
[2025/06/09] $\tau^2$-Bench: Evaluating Conversational Agents in a Dual-Control Environment | [paper] | [code]
[2025/06/05] Flex-TravelPlanner: A Benchmark for Flexible Planning with Language Agents | [paper] | [code]
[2025/06/04] AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM-Based Agents | [paper] | [code]
[2025/06/02] FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents | [paper] | [code]
[2025/06/02] WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks | [paper] | [code]
[2025/05/31] DefenderBench: A Toolkit for Evaluating Language Agents in Cybersecurity Environments | [paper] | [code]
[2025/05/30] Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation | [paper] | [code]
[2025/05/30] Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks | [paper] | [code]
[2025/05/30] Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents | [paper] | [code]
[2025/05/29] GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents | [paper] | [code]
[2025/05/27] AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs | [paper] | [code]
[2025/05/26] ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows | [paper] | [code]
[2025/05/26] MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research | [paper] | [code]
[2025/05/26] On Path to Multimodal Historical Reasoning: HistBench and HistAgent | [paper] | [code]
[2025/05/24] CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions | [paper] | [code]
[2025/05/22] BioDSA-1K: Benchmarking Data Science Agents for Biomedical Research | [paper] | [code]
[2025/05/22] From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization | [paper] | [code]
[2025/05/22] AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios | [paper] | [code]
[2025/05/21] X-WebAgentBench: A Multilingual Interactive Web Benchmark for Evaluating Global Agentic System | [paper] | [code]
[2025/05/21] BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems | [paper] | [code]
[2025/05/21] InfoDeepSeek: Benchmarking Agentic Information Seeking for Retrieval-Augmented Generation | [paper] | [code]
[2025/05/21] MAPS: A Multilingual Benchmark for Global Agent Performance and Security | [paper] | [code]
[2025/05/18] MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks | [paper] | [code]
[2025/05/17] Mobile-Bench-v2: A More Realistic and Comprehensive Benchmark for VLM-based Mobile Agents | [paper] | [code]
[2025/05/16] GuideBench: Benchmarking Domain-Oriented Guideline Following for LLM Agents | [paper] | [code]
[2025/05/16] REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning? | [paper] | [code]
[2025/05/02] PIPA: A Unified Evaluation Protocol for Diagnosing Interactive Planning Agents | [paper] | [code]
[2025/04/25] Auto-SLURP: A Benchmark Dataset for Evaluating Multi-Agent Frameworks in Smart Personal Assistant | [paper] | [code]
[2025/04/24] Toward a Human-Centered Evaluation Framework for Trustworthy LLM-Powered GUI Agents | [paper] | [code]
[2025/04/21] PLANET: A Collection of Benchmarks for Evaluating LLMs' Planning Capabilities | [paper] | [code]
[2025/04/16] BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents | [paper] | [code]
[2025/04/15] GraphicBench: A Planning Benchmark for Graphic Design with Language Agents | [paper] | [code]
[2025/04/13] AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents | [paper] | [code]
[2025/04/11] TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning | [paper] | [code]
[2025/04/11] AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories | [paper] | [code]
[2025/04/10] MALIBU Benchmark: Multi-Agent LLM Implicit Bias Uncovered | [paper] | [code]
[2025/04/06] CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization | [paper] | [code]
[2025/04/04] How Social is It? A Benchmark for LLMs' Capabilities in Multi-user Multi-turn Social Agent Tasks | [paper] | [code]
[2025/03/31] SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research Papers | [paper] | [code]
[2025/03/28] Evaluating LLM-based Agents for Multi-Turn Conversations: A Survey | [paper] | [code]
[2025/03/25] Writing as a testbed for open ended agents | [paper] | [code]
[2025/03/24] EconEvals: Benchmarks and Litmus Tests for LLM Agents in Unknown Environments | [paper] | [code]
[2025/03/20] Survey on Evaluation of LLM-based Agents | [paper] | [code]
[2025/03/16] VeriLA: A Human-Centered Evaluation Framework for Interpretable Verification of LLM Agent Failures | [paper] | [code]
[2025/03/11] AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence | [paper] | [code]
[2025/03/10] MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning | [paper] | [code]
[2025/03/10] ProjectEval: A Benchmark for Programming Agents Automated Evaluation on Project-Level Code Generation | [paper] | [code]
[2025/03/10] RefactorBench: Evaluating Stateful Reasoning in Language Agents Through Code | [paper] | [code]
[2025/03/10] BEARCUBS: A benchmark for computer-using web agents | [paper] | [code]
[2025/03/08] DSGBench: A Diverse Strategic Game Benchmark for Evaluating LLM-based Agents in Complex Decision-Making Environments | [paper] | [code]
[2025/03/03] MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents | [paper] | [code]
[2025/02/26] TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding | [paper] | [code]
[2025/02/25] RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction | [paper] | [code]
[2025/02/20] MLGym: A New Framework and Benchmark for Advancing AI Research Agents | [paper] | [code]
[2025/02/19] DataSciBench: An LLM Agent Benchmark for Data Science | [paper] | [code]
[2025/02/13] EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents | [paper] | [code]
[2025/02/07] Evaluating Personality Traits in Large Language Models: Insights from Psychological Questionnaires | [paper] | [code]
[2025/02/06] Robotouille: An Asynchronous Planning Benchmark for LLM Agents | [paper] | [code]
[2025/02/01] Who's the MVP? A Game-Theoretic Evaluation Benchmark for Modular Attribution in LLM Agents | [paper] | [code]
[2025/01/21] EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents | [paper] | [code]
[2024/12/23] LegalAgentBench: Evaluating LLM Agents in Legal Domain | [paper] | [code]
[2024/12/19] Agent-SafetyBench: Evaluating the Safety of LLM Agents | [paper] | [code]
[2024/12/18] TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks | [paper] | [code]
[2024/12/18] ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning | [paper] | [code]
[2024/12/06] TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft | [paper] | [code]
[2024/12/02] Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking | [paper] | [code]
[2024/11/05] Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent | [paper] | [code]
[2024/10/28] Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games | [paper] | [code]
[2024/10/25] AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios | [paper] | [code]
[2024/10/25] AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs | [paper] | [code]
[2024/10/23] MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control | [paper] | [code]
[2024/10/16] Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance | [paper] | [code]
[2024/10/15] Revisiting Benchmark and Assessment: An Agent-based Exploratory Dynamic Evaluation Framework for LLMs | [paper] | [code]
[2024/10/11] JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework | [paper] | [code]
[2024/10/11] AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents | [paper] | [code]
[2024/10/10] Benchmarking Agentic Workflow Generation | [paper] | [code]
[2024/10/09] MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering | [paper] | [code]
[2024/10/09] Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making | [paper] | [code]
[2024/10/09] DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models | [paper] | [code]
[2024/10/07] Adversarial Multi-Agent Evaluation of Large Language Models through Iterative Debates | [paper] | [code]
[2024/10/07] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery | [paper] | [code]
[2024/09/23] Towards a Realistic Long-Term Benchmark for Open-Web Research Agents | [paper] | [code]
[2024/09/17] CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark | [paper] | [code]
[2024/09/12] DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? | [paper] | [code]
[2024/09/11] SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories | [paper] | [code]
[2024/09/02] ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems | [paper] | [code]
[2024/08/28] BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems | [paper] | [code]
[2024/08/19] BLADE: Benchmarking Language Model Agents for Data-Driven Science | [paper] | [code]
[2024/08/13] What should I wear to a party in a Greek taverna? Evaluation for Conversational Agents in the Fashion Domain | [paper] | [code]
[2024/08/12] VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents | [paper] | [code]
[2024/07/26] OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation | [paper] | [code]
[2024/07/26] AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents | [paper] | [code]
[2024/07/25] PersonaGym: Evaluating Persona Agents and LLMs | [paper] | [code]
[2024/07/23] AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game | [paper] | [code]
[2024/07/22] AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? | [paper] | [code]
[2024/07/12] IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents | [paper] | [code]
[2024/07/11] GTA: A Benchmark for General Tool Agents | [paper] | [code]
[2024/07/05] Towards Automated Functional Equation Proving: A Benchmark Dataset and A Domain-Specific In-Context Agent | [paper] | [code]
[2024/07/01] MIRAI: Evaluating LLM Agents for Event Forecasting | [paper] | [code]
[2024/07/01] ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions | [paper] | [code]
[2024/07/01] Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents | [paper] | [code]
[2024/06/28] Designing and Evaluating Multi-Chatbot Interface for Human-AI Communication: Preliminary Findings from a Persuasion Task | [paper] | [code]
[2024/06/13] ResearchArena: Benchmarking Large Language Models' Ability to Collect and Organize Information as Research Agents | [paper] | [code]
[2024/06/13] StreamBench: Towards Benchmarking Continuous Improvement of Language Agents | [paper] | [code]
[2024/06/07] WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild | [paper] | [code]
[2024/06/07] GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents | [paper] | [code]
[2024/05/28] TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models | [paper] | [code]
[2024/05/23] AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents | [paper] | [code]
[2024/05/13] AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments | [paper] | [code]
[2024/05/01] WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting | [paper] | [code]
[2024/04/23] Evaluating Tool-Augmented Agents in Remote Sensing Platforms | [paper] | [code]
[2024/04/22] How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO | [paper] | [code]
[2024/04/15] MMInA: Benchmarking Multihop Multimodal Internet Agents | [paper] | [code]
[2024/04/11] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments | [paper] | [code]
[2024/04/09] AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents | [paper] | [code]
[2024/04/05] GroundCocoa: A Benchmark for Evaluating Compositional & Conditional Reasoning in Language Models | [paper] | [code]
[2024/03/29] DataAgent: Evaluating Large Language Models' Ability to Answer Zero-Shot, Natural Language Queries | [paper] | [code]
[2024/03/26] Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies | [paper] | [code]
[2024/03/20] SocialBench: Sociality Evaluation of Role-Playing Conversational Agents | [paper] | [code]
[2024/03/18] How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments | [paper] | [code]
[2024/03/18] Tur[k]ingBench: A Challenge Benchmark for Web Agents | [paper] | [code]
[2024/03/13] Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation | [paper] | [code]
[2024/03/05] InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents | [paper] | [code]
[2024/02/27] Evaluating Very Long-Term Conversational Memory of LLM Agents | [paper] | [code]
[2024/02/27] Benchmarking Data Science Agents | [paper] | [code]
[2024/02/19] A Critical Evaluation of AI Feedback for Aligning Large Language Models | [paper] | [code]
[2024/02/18] Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation | [paper] | [code]
[2024/02/18] MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization | [paper] | [code]
[2024/02/05] LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models | [paper] | [code]
[2024/02/02] TravelPlanner: A Benchmark for Real-World Planning with Language Agents | [paper] | [code]
[2024/01/02] CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation | [paper] | [code]
[2023/12/28] How Far Are LLMs from Believable AI? A Benchmark for Evaluating the Believability of Human Behavior Simulation | [paper] | [code]
[2023/12/26] RoleEval: A Bilingual Role Evaluation Benchmark for Large Language Models | [paper] | [code]
[2023/11/16] ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code | [paper] | [code]
[2023/11/15] ToolTalk: Evaluating Tool-Usage in a Conversational Setting | [paper] | [code]
[2023/10/24] FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions | [paper] | [code]
[2023/10/09] Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena | [paper] | [code]
[2023/10/02] SmartPlay: A Benchmark for LLMs as Intelligent Agents | [paper] | [code]
[2023/10/01] RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models | [paper] | [code]
[2023/08/11] BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents | [paper] | [code]
[2023/08/07] AgentBench: Evaluating LLMs as Agents | [paper] | [code]
[2023/04/27] ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time | [paper] | [code]

Environment&Platform

[2025/05/30] Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents | [paper] | [code]
[2025/05/22] Beyond Static Testbeds: An Interaction-Centric Agent Simulation Platform for Dynamic Recommender Systems | [paper] | [code]
[2025/05/22] MASLab: A Unified and Comprehensive Codebase for LLM-based Multi-Agent Systems | [paper] | [code]
[2025/04/15] TextArena | [paper] | [code]
[2025/03/14] Cerebrum (AIOS SDK): A Platform for Agent Development, Deployment, Distribution, and Discovery | [paper] | [code]
[2025/03/06] Factorio Learning Environment | [paper] | [code]
[2025/03/05] Unified Mind Model: Reimagining Autonomous Agents in the LLM Era | [paper] | [code]
[2025/03/04] LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications | [paper] | [code]
[2025/02/14] The Ann Arbor Architecture for Agent-Oriented Programming | [paper] | [code]
[2024/12/30] Training Software Engineering Agents and Verifiers with SWE-Gym | [paper] | [code]
[2024/11/05] SAUCE: Synchronous and Asynchronous User-Customizable Environment for Multi-Agent LLM Interaction | [paper] | [code]
[2024/08/09] AutoGen Studio: A No-Code Developer Tool for Building and Debugging Multi-Agent Systems | [paper] | [code]
[2024/08/06] OpenOmni: A Collaborative Open Source Tool for Building Future-Ready Multimodal Conversational Agents | [paper] | [code]
[2024/07/23] OpenHands: An Open Platform for AI Software Developers as Generalist Agents | [paper] | [code]
[2024/07/14] AutoGRAMS: Autonomous Graphical Agent Modeling Software | [paper] | [code]
[2024/07/12] IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents | [paper] | [code]
[2024/07/08] Coding Reliable LLM-based Integrated Task and Knowledge Agents with GenieWorksheets | [paper] | [code]
[2024/06/06] AgentGym: Evolving Large Language Model-based Agents across Diverse Environments | [paper] | [code]
[2024/05/23] AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents | [paper] | [code]
[2024/02/27] OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web | [paper] | [code]
[2023/03/14] CB2: Collaborative Natural Language Interaction Research Platform | [paper] | [code]

Dataset

[2025/07/10] Toward Real-World Chinese Psychological Support Dialogues: CPsDD Dataset and a Co-Evolving Multi-Agent System | [paper] | [code]
[2025/06/26] AgentStealth: Reinforcing Large Language Model for Anonymizing User-generated Text | [paper] | [code]
[2025/06/25] MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation | [paper] | [code]
[2025/06/11] ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning | [paper] | [code]
[2025/06/02] STORM-BORN: A Challenging Mathematical Derivations Dataset Curated via a Human-in-the-Loop Multi-Agent Framework | [paper] | [code]
[2025/05/27] Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation | [paper] | [code]
[2025/05/19] Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents | [paper] | [code]
[2025/02/09] MTPChat: A Multimodal Time-Aware Persona Dataset for Conversational Agents | [paper] | [code]
[2025/02/09] HamRaz: A Culture-Based Persian Conversation Dataset for Person-Centered Therapy Using LLM Agents | [paper] | [code]
[2025/01/23] Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents | [paper] | [code]
[2025/01/14] Agent-Centric Projection of Prompting Techniques and Implications for Synthetic Training Data for Large Language Models | [paper] | [code]
[2024/12/30] Plancraft: an evaluation dataset for planning with LLM agents | [paper] | [code]
[2024/12/28] BaiJia: A Large-Scale Role-Playing Agent Corpus of Chinese Historical Characters | [paper] | [code]
[2024/12/24] Explainable Multi-Modal Data Exploration in Natural Language via LLM Agent | [paper] | [code]
[2024/12/06] CALICO: Conversational Agent Localization via Synthetic Data Generation | [paper] | [code]
[2024/11/28] MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification | [paper] | [code]
[2024/11/21] Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning | [paper] | [code]
[2024/10/18] Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation | [paper] | [code]
[2024/10/10] AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories | [paper] | [code]
[2024/09/06] Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets | [paper] | [code]
[2024/08/22] MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symbolic LLM Agents | [paper] | [code]
[2024/08/16] The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation | [paper] | [code]
[2024/07/12] IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents | [paper] | [code]
[2024/06/16] GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents | [paper] | [code]
[2024/03/19] Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models | [paper] | [code]
[2024/02/27] OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web | [paper] | [code]
[2023/07/31] HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution | [paper] | [code]

Others

[2025/07/04] Agent-Based Detection and Resolution of Incompleteness and Ambiguity in Interactions with Large Language Models | [paper] | [code]
[2025/07/02] Data Agent: A Holistic Architecture for Orchestrating Data+AI Ecosystems | [paper] | [code]
[2025/06/30] LLM Agents Are the Antidote to Walled Gardens | [paper] | [code]
[2025/06/20] UProp: Investigating the Uncertainty Propagation of LLMs in Multi-Step Agentic Decision-Making | [paper] | [code]
[2025/06/10] TACTIC: Translation Agents with Cognitive-Theoretic Interactive Collaboration | [paper] | [code]
[2025/06/06] Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce | [paper] | [code]
[2025/06/02] Enhancing Interpretable Image Classification Through LLM Agents and Conditional Concept Bottleneck Models | [paper] | [code]
[2025/05/23] Distilling LLM Agent into Small Models with Retrieval and Code Tools | [paper] | [code]
[2025/05/23] Runaway is Ashamed, But Helpful: On the Early-Exit Behavior of Large Language Model-based Agents in Embodied Environments | [paper] | [code]
[2025/05/23] The Real Barrier to LLM Agent Usability is Agentic ROI | [paper] | [code]
[2025/05/20] Structured Agent Distillation for Large Language Model | [paper] | [code]
[2025/05/20] Agent Context Protocols Enhance Collective Inference | [paper] | [code]
[2025/05/15] Learning Virtual Machine Scheduling in Cloud Computing through Language Agents | [paper] | [code]
[2025/05/04] Interpretable Emergent Language Using Inter-Agent Transformers | [paper] | [code]
[2025/05/02] VTS-LLM: Domain-Adaptive LLM Agent for Enhancing Awareness in Vessel Traffic Services through Natural Language | [paper] | [code]
[2025/05/01] Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks | [paper] | [code]
[2025/04/23] OptimAI: Optimization from Natural Language Using LLM-Powered AI Agents | [paper] | [code]
[2025/04/04] Agentic Knowledgeable Self-awareness | [paper] | [code]
[2025/04/04] Inherent and emergent liability issues in LLM-based agentic systems: a principal-agent perspective | [paper] | [code]
[2025/04/02] Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection | [paper] | [code]
[2025/03/14] GNNs as Predictors of Agentic Workflow Performances | [paper] | [code]
[2025/03/14] CoLLMLight: Cooperative Large Language Model Agents for Network-Wide Traffic Signal Control | [paper] | [code]
[2025/03/14] Agent-Enhanced Large Language Models for Researching Political Institutions | [paper] | [code]
[2025/03/14] LLM Agents for Education: Advances and Applications | [paper] | [code]
[2025/02/20] Optimizing Model Selection for Compound AI Systems | [paper] | [code]
[2024/12/03] Large Multimodal Agents for Accurate Phishing Detection with Enhanced Token Optimization and Cost Reduction | [paper] | [code]
[2024/03/18] EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents | [paper] | [code]

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
parsed_v5		parsed_v5
.gitignore		.gitignore
README.md		README.md
config_v5.json		config_v5.json
download_pdf.py		download_pdf.py
papers_v5.json		papers_v5.json
script_v5_step1.py		script_v5_step1.py
script_v5_step2.py		script_v5_step2.py

Folders and files

Latest commit

History

Repository files navigation

LLM-Agents-Papers

✍️ Description

💛 Recommendation

📰 Papers

Survey

Technique For Enhancement

Planning

Memory Mechanism

Feedback&Reflection

RAG

Search

Interaction

Role Playing

Conversation

Game Playing

Human-Agent Interaction

Tool Usage

Simulation

Application

Math

Chemistry

Biology

Physics

Geography

Art

Medicine

Finance

Software Engineering

Research

Automation

Workflow

Automatic Evaluation

Training

Fine tuning

RL

DPO

Scaling

Single-Agent Framework

Multi-Agent System

Stability

Safety

Bias

Hallucination

Infrastructure

Benchmark&Evaluation

Environment&Platform

Dataset

Others

⭐ Star History

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages