Skip to content

AGI-Edgerunners/LLM-Agents-Papers

Repository files navigation

LLM-Agents-Papers

✍️ Description

Last Updated Time: 2025/2/23

A repo lists papers related to LLM based agent. Includes

💛 Recommendation

For more comprehensive reading, we also recommend other paper lists:

📰 Papers

Survey

  • [2025/02/20] Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems | [paper] | [code]

  • [2025/02/18] Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents | [paper] | [code]

  • [2025/02/16] A Survey of LLM-based Agents in Medicine: How far are we from Baymax? | [paper] | [code]

  • [2025/01/15] Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG | [paper] | [code]

  • [2024/12/23] A Survey on LLM-based Multi-Agent System: Recent Advances and New Frontiers in Application | [paper] | [code]

  • [2024/12/18] A Survey on Large Language Model-based Agents for Statistics and Data Science | [paper] | [code]

  • [2024/12/05] A Survey on Large Language Model-Based Social Agents in Game-Theoretic Scenarios | [paper] | [code]

  • [2024/12/04] From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents | [paper] | [code]

  • [2024/11/27] Large Language Model-Brained GUI Agents: A Survey | [paper] | [code]

  • [2024/09/27] A Survey on Complex Tasks for Goal-Directed Interactive Agents | [paper] | [code]

  • [2024/09/13] Agents in Software Engineering: Survey, Landscape, and Vision | [paper] | [code]

  • [2024/09/04] A Survey on Emergent Language | [paper] | [code]

  • [2024/08/05] From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future | [paper] | [code]

  • [2024/07/26] Large Language Model Agent in Financial Trading: A Survey | [paper] | [code]

  • [2024/06/03] Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization | [paper] | [code]

  • [2024/06/01] Towards Rationality in Language and Multimodal Agents: A Survey | [paper] | [code]

  • [2024/04/17] Advancing Social Intelligence in AI Agents: Technical Challenges and Open Questions | [paper] | [code]

  • [2024/04/02] A Survey on Large Language Model-Based Game Agents | [paper] | [code]

  • [2024/03/26] Leveraging Large Language Models in Human-Robot Interaction: A Critical Analysis of Potential and Pitfalls | [paper] | [code]

  • [2024/03/07] Promising and worth-to-try future directions for advancing state-of-the-art surrogates methods of agent-based models in social and health computational sciences | [paper] | [code]

  • [2024/02/28] Large Language Models and Games: A Survey and Roadmap | [paper] | [code]

  • [2024/02/28] A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems | [paper] | [code]

  • [2024/02/05] Understanding the planning of LLM agents: A survey | [paper] | [code]

  • [2024/01/01] If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents | [paper] | [code]

  • [2023/12/31] A Survey of Personality, Persona, and Profile in Conversational Agents and Chatbots | [paper] | [code]

  • [2023/12/19] Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives | [paper] | [code]

  • [2023/09/14] The Rise and Potential of Large Language Model Based Agents: A Survey | [paper] | [code]

  • [2023/08/22] A Survey on Large Language Model based Autonomous Agents | [paper] | [code]

  • [2023/06/27] Next Steps for Human-Centered Generative AI: A Technical Perspective | [paper] | [code]


Technique For Enhancement

Planning

  • [2025/02/08] CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging | [paper] | [code]

  • [2025/02/06] Robotouille: An Asynchronous Planning Benchmark for LLM Agents | [paper] | [code]

  • [2025/01/27] MADP: Multi-Agent Deductive Planning for Enhanced Cognitive-Behavioral Mental Health Question Answer | [paper] | [code]

  • [2025/01/14] Talk to Right Specialists: Routing and Planning in Multi-agent System for Question Answering | [paper] | [code]

  • [2024/12/30] Plancraft: an evaluation dataset for planning with LLM agents | [paper] | [code]

  • [2024/12/28] Efficient Multi-Agent Collaboration with Tool Use for Online Planning in Complex Table Question Answering | [paper] | [code]

  • [2024/12/13] Script-Based Dialog Policy Planning for LLM-Powered Conversational Agents: A Basic Architecture for an "AI Therapist" | [paper] | [code]

  • [2024/11/13] One STEP at a time: Language Agents are Stepwise Planners | [paper] | [code]

  • [2024/11/05] Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent | [paper] | [code]

  • [2024/10/12] CAMPHOR: Collaborative Agents for Multi-input Planning and High-Order Reasoning On Device | [paper] | [code]

  • [2024/10/01] Self-controller: Controlling LLMs with Multi-round Step-by-step Self-awareness | [paper] | [code]

  • [2024/09/30] Interactive Speculative Planning: Enhance Agent Efficiency through Co-design of System and User Interface | [paper] | [code]

  • [2024/09/28] SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models | [paper] | [code]

  • [2024/09/25] MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making | [paper] | [code]

  • [2024/08/15] VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool | [paper] | [code]

  • [2024/08/12] Towards Autonomous Agents: Adaptive-planning, Reasoning, and Acting in Language Models | [paper] | [code]

  • [2024/08/01] AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation | [paper] | [code]

  • [2024/07/04] Controllable Conversations: Planning-Based Dialogue Agent with Large Language Models | [paper] | [code]

  • [2024/06/17] RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents | [paper] | [code]

  • [2024/06/09] A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback Learning | [paper] | [code]

  • [2024/06/06] Tool-Planner: Task Planning with Clusters across Multiple Tools | [paper] | [code]

  • [2024/05/28] A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models | [paper] | [code]

  • [2024/05/27] REVECA: Adaptive Planning and Trajectory-based Validation in Cooperative Language Agents using Information Relevance and Relative Proximity | [paper] | [code]

  • [2024/04/21] Socratic Planner: Inquiry-Based Zero-Shot Planning for Embodied Instruction Following | [paper] | [code]

  • [2024/04/17] The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey | [paper] | [code]

  • [2024/03/11] Strength Lies in Differences! Improving Strategy Planning for Non-collaborative Dialogues via Diversified User Simulation | [paper] | [code]

  • [2024/03/10] TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision | [paper] | [code]

  • [2024/03/05] KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents | [paper] | [code]

  • [2024/02/29] PlanGPT: Enhancing Urban Planning with Tailored Language Model and Efficient Retrieval | [paper] | [code]

  • [2024/02/18] What's the Plan? Evaluating and Developing Planning-Aware Techniques for Language Models | [paper] | [code]

  • [2024/02/18] PreAct: Prediction Enhances Agent's Planning Ability | [paper] | [code]

  • [2024/02/16] When is Tree Search Useful for LLM Planning? It Depends on the Discriminator | [paper] | [code]

  • [2024/02/15] TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation | [paper] | [code]

  • [2024/02/09] Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity | [paper] | [code]

  • [2024/02/06] RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents | [paper] | [code]

  • [2024/02/02] TravelPlanner: A Benchmark for Real-World Planning with Language Agents | [paper] | [code]

  • [2024/01/10] AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning | [paper] | [code]

  • [2023/11/19] TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems | [paper] | [code]

  • [2023/10/12] Tree-Planner: Efficient Close-loop Task Planning with Large Language Models | [paper] | [code]

  • [2023/10/09] Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena | [paper] | [code]

  • [2023/08/07] TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage | [paper] | [code]

  • [2023/08/01] SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning | [paper] | [code]

  • [2023/05/26] AdaPlanner: Adaptive Planning from Feedback with Language Models | [paper] | [code]

  • [2023/05/24] Reasoning with Language Model is Planning with World Model | [paper] | [code]

  • [2023/05/24] Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning | [paper] | [code]

  • [2023/03/29] Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks | [paper] | [code]

  • [2023/02/03] Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents | [paper] | [code]

  • [2022/12/08] LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models | [paper] | [code]

Memory Mechanism

  • [2025/02/17] A-MEM: Agentic Memory for LLM Agents | [paper] | [code]

  • [2025/02/08] On Memory Construction and Retrieval for Personalized Conversational Agents | [paper] | [code]

  • [2025/01/20] Zep: A Temporal Knowledge Graph Architecture for Agent Memory | [paper] | [code]

  • [2025/01/15] Doc-Guided Sent2Sent++: A Sent2Sent++ Agent with Doc-Guided memory for Document-level Machine Translation | [paper] | [code]

  • [2024/12/17] On the Structural Memory of LLM Agents | [paper] | [code]

  • [2024/12/17] Memory-Augmented Agent Training for Business Document Understanding | [paper] | [code]

  • [2024/10/10] DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory | [paper] | [code]

  • [2024/09/28] Crafting Personalized Agents through Retrieval-Augmented Generation on Editable Memory Graphs | [paper] | [code]

  • [2024/09/11] Agent Workflow Memory | [paper] | [code]

  • [2024/09/01] Self-evolving Agents with reflective and memory-augmented abilities | [paper] | [code]

  • [2024/08/18] HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model | [paper] | [code]

  • [2024/08/07] Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks | [paper] | [code]

  • [2024/05/29] Toward Conversational Agents with Context and Time Sensitive Long-term Memory | [paper] | [code]

  • [2024/04/15] Memory Sharing for Large Language Model based Agents | [paper] | [code]

  • [2024/02/19] Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations | [paper] | [code]

  • [2024/02/07] InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory | [paper] | [code]

  • [2024/02/06] RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents | [paper] | [code]

  • [2024/01/05] From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models | [paper] | [code]

  • [2023/12/22] Empowering Working Memory for Large Language Model Agents | [paper] | [code]

  • [2023/12/22] Personalized Large Language Model Assistant with Evolving Conditional Memory | [paper] | [code]

  • [2023/11/10] JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models | [paper] | [code]

  • [2023/06/06] ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory | [paper] | [code]

  • [2023/05/23] RET-LLM: Towards a General Read-Write Memory for Large Language Models | [paper] | [code]

  • [2023/05/17] MemoryBank: Enhancing Large Language Models with Long-Term Memory | [paper] | [code]

  • [2023/05/02] The Role of Summarization in Generative Agents: A Preliminary Perspective | [paper] | [code]

  • [2023/05/01] Learning to Reason and Memorize with Self-Notes | [paper] | [code]

  • [2023/04/26] Enhancing Large Language Model with Self-Controlled Memory Framework | [paper] | [code]

  • [2023/04/21] Emergent and Predictable Memorization in Large Language Models | [paper] | [code]

Feedback&Reflection

  • [2025/02/20] STeCa: Step-level Trajectory Calibration for LLM Agent Learning | [paper] | [code]

  • [2025/02/17] Table-Critic: A Multi-Agent Framework for Collaborative Criticism and Refinement in Table Reasoning | [paper] | [code]

  • [2025/02/17] A Study on Leveraging Search and Self-Feedback for Agent Reasoning | [paper] | [code]

  • [2025/02/03] PlotGen: Multi-Agent LLM-based Scientific Data Visualization via Multimodal Feedback | [paper] | [code]

  • [2025/01/26] Large Language Models as Theory of Mind Aware Generative Agents with Counterfactual Reflection | [paper] | [code]

  • [2025/01/23] AgentRec: Agent Recommendation Using Sentence Embeddings Aligned to Human Feedback | [paper] | [code]

  • [2025/01/08] InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection | [paper] | [code]

  • [2024/12/31] Enhancing LLM Reasoning with Multi-Path Collaborative Reactive and Reflection agents | [paper] | [code]

  • [2024/12/22] A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions via Iterative Refinement and LLM-Driven Feedback Loops | [paper] | [code]

  • [2024/11/29] Training Agents with Weakly Supervised Feedback from Large Language Models | [paper] | [code]

  • [2024/11/21] Enhancing LLMs for Power System Simulations: A Feedback-driven Multi-agent Framework | [paper] | [code]

  • [2024/11/11] Using Generative AI and Multi-Agents to Provide Automatic Feedback | [paper] | [code]

  • [2024/11/04] Positive Experience Reflection for Agents in Interactive Text Environments | [paper] | [code]

  • [2024/10/29] Enhancing Financial Question Answering with a Multi-Agent Reflection Framework | [paper] | [code]

  • [2024/10/28] CRAT: A Multi-Agent Framework for Causality-Enhanced Reflective and Retrieval-Augmented Translation with Large Language Models | [paper] | [code]

  • [2024/10/25] OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization | [paper] | [code]

  • [2024/10/23] ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents | [paper] | [code]

  • [2024/10/20] Training Language Models to Critique With Multi-agent Feedback | [paper] | [code]

  • [2024/10/16] PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking | [paper] | [code]

  • [2024/10/08] DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback | [paper] | [code]

  • [2024/10/02] ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning | [paper] | [code]

  • [2024/10/02] RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance | [paper] | [code]

  • [2024/09/18] MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning | [paper] | [code]

  • [2024/09/05] E2CL: Exploration-based Error Correction Learning for Embodied Agents | [paper] | [code]

  • [2024/09/01] Self-evolving Agents with reflective and memory-augmented abilities | [paper] | [code]

  • [2024/08/30] Tool-Assisted Agent on SQL Inspection and Refinement in Real-World Scenarios | [paper] | [code]

  • [2024/08/15] MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL | [paper] | [code]

  • [2024/07/25] Recursive Introspection: Teaching Language Model Agents How to Self-Improve | [paper] | [code]

  • [2024/06/09] A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback Learning | [paper] | [code]

  • [2024/06/05] LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback | [paper] | [code]

  • [2024/06/03] Re-ReST: Reflection-Reinforced Self-Training for Language Agents | [paper] | [code]

  • [2024/03/18] QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback-based Self-Correction | [paper] | [code]

  • [2024/03/17] Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback | [paper] | [code]

  • [2024/03/08] ChatASU: Evoking LLM's Reflexion to Truly Understand Aspect Sentiment in Dialogues | [paper] | [code]

  • [2024/03/04] Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents | [paper] | [code]

  • [2024/02/27] Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization | [paper] | [code]

  • [2024/02/26] SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection | [paper] | [code]

  • [2024/02/22] Mirror: A Multiple-perspective Self-Reflection Method for Knowledge-rich Reasoning | [paper] | [code]

  • [2024/02/19] A Critical Evaluation of AI Feedback for Aligning Large Language Models | [paper] | [code]

  • [2024/02/06] AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls | [paper] | [code]

  • [2024/02/02] StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback | [paper] | [code]

  • [2023/11/14] The ART of LLM Refinement: Ask, Refine, and Trust | [paper] | [code]

  • [2023/10/31] Learning From Mistakes Makes LLM Better Reasoner | [paper] | [code]

  • [2023/10/12] A Zero-Shot Language Agent for Computer Control with Structured Reflection | [paper] | [code]

  • [2023/07/27] PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback | [paper] | [code]

  • [2023/05/22] Making Language Models Better Tool Learners with Execution Feedback | [paper] | [code]

  • [2023/05/17] Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback | [paper] | [code]

  • [2023/04/21] Improving Grounded Language Understanding in a Collaborative Environment by Interacting with Agents Through Help Feedback | [paper] | [code]

  • [2023/04/11] Teaching Large Language Models to Self-Debug | [paper] | [code]

  • [2023/03/30] Self-Refine: Iterative Refinement with Self-Feedback | [paper] | [code]

RAG

  • [2025/02/19] RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision | [paper] | [code]

  • [2025/02/08] On Memory Construction and Retrieval for Personalized Conversational Agents | [paper] | [code]

  • [2025/02/06] Enhancing Online Learning Efficiency Through Heterogeneous Resource Integration with a Multi-Agent RAG System | [paper] | [code]

  • [2025/01/25] Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning | [paper] | [code]

  • [2024/12/31] MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation | [paper] | [code]

  • [2024/12/24] GeAR: Graph-enhanced Agent for Retrieval-augmented Generation | [paper] | [code]

  • [2024/12/20] Towards Interpretable Radiology Report Generation via Concept Bottlenecks using a Multi-Agentic RAG | [paper] | [code]

  • [2024/12/16] BioRAGent: A Retrieval-Augmented Generation System for Showcasing Generative Query Expansion and Domain-Specific Search for Scientific Q&A | [paper] | [code]

  • [2024/12/07] SLA Management in Reconfigurable Multi-Agent RAG: A Systems Approach to Question Answering | [paper] | [code]

  • [2024/11/05] Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent | [paper] | [code]

  • [2024/10/28] CRAT: A Multi-Agent Framework for Causality-Enhanced Reflective and Retrieval-Augmented Translation with Large Language Models | [paper] | [code]

  • [2024/10/18] Toolshed: Scale Tool-Equipped Agents with Advanced RAG-Tool Fusion and Tool Knowledge Bases | [paper] | [code]

  • [2024/10/01] Conversational Exploratory Search of Scholarly Publications Using Knowledge Graphs | [paper] | [code]

  • [2024/09/28] Crafting Personalized Agents through Retrieval-Augmented Generation on Editable Memory Graphs | [paper] | [code]

  • [2024/08/18] Agentic Retrieval-Augmented Generation for Time Series Analysis | [paper] | [code]

  • [2024/08/05] LLM Agents Improve Semantic Code Search | [paper] | [code]

  • [2024/08/03] MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance | [paper] | [code]

  • [2024/07/20] Golden-Retriever: High-Fidelity Agentic Retrieval Augmented Generation for Industrial Knowledge Base | [paper] | [code]

  • [2024/06/26] Geode: A Zero-shot Geospatial Question-Answering Agent with Explicit Reasoning and Precise Spatio-Temporal Retrieval | [paper] | [code]

  • [2024/06/19] StackRAG Agent: Improving Developer Answers with Retrieval-Augmented Generation | [paper] | [code]

  • [2024/06/09] A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback Learning | [paper] | [code]

  • [2024/03/05] AgentsCourt: Building Judicial Decision-Making Agents with Court Debate Simulation and Legal Knowledge Augmentation | [paper] | [code]

  • [2024/02/06] RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents | [paper] | [code]

  • [2023/12/27] Automating Knowledge Acquisition for Content-Centric Cognitive Agents Using LLMs | [paper] | [code]

Search

  • [2025/02/20] I-MCTS: Enhancing Agentic AutoML via Introspective Monte Carlo Tree Search | [paper] | [code]

  • [2025/02/18] R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs | [paper] | [code]

  • [2025/02/18] Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks | [paper] | [code]

  • [2025/02/17] A Study on Leveraging Search and Self-Feedback for Agent Reasoning | [paper] | [code]

  • [2025/02/05] SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs | [paper] | [code]

  • [2025/02/02] Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search | [paper] | [code]

  • [2025/01/31] KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search | [paper] | [code]

  • [2025/01/09] Search-o1: Agentic Search-Enhanced Large Reasoning Models | [paper] | [code]

  • [2024/12/24] A Novel Task-Driven Method with Evolvable Interactive Agents Using Event Trees for Enhanced Emergency Decision Support | [paper] | [code]

  • [2024/12/22] Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration | [paper] | [code]

  • [2024/12/05] Agent AI with LangGraph: A Modular Framework for Enhancing Machine Translation Using Large Language Models | [paper] | [code]

  • [2024/11/07] CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models | [paper] | [code]

  • [2024/10/29] Synergizing LLM Agents and Knowledge Graph for Socioeconomic Prediction in LBSN | [paper] | [code]

  • [2024/10/25] AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs | [paper] | [code]

  • [2024/10/22] SELA: Tree-Search Enhanced LLM Agents for Automated Machine Learning | [paper] | [code]

  • [2024/10/13] Expanding Search Space with Diverse Prompting Agents: An Efficient Sampling Approach for LLM Mathematical Reasoning | [paper] | [code]

  • [2024/10/13] LLM-Based Multi-Agent Systems are Scalable Graph Generative Models | [paper] | [code]

  • [2024/10/02] ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning | [paper] | [code]

  • [2024/09/09] SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning | [paper] | [code]

  • [2024/07/01] Tree Search for Language Model Agents | [paper] | [code]

  • [2024/06/17] Input Conditioned Graph Generation for Language Agents | [paper] | [code]

  • [2024/02/17] KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph | [paper] | [code]

  • [2024/02/16] When is Tree Search Useful for LLM Planning? It Depends on the Discriminator | [paper] | [code]

  • [2024/02/09] CoSearchAgent: A Lightweight Collaborative Search Agent with Large Language Models | [paper] | [code]

  • [2023/05/17] Tree of Thoughts: Deliberate Problem Solving with Large Language Models | [paper] | [code]

Interaction

Role Playing

  • [2025/02/20] InstructAgent: Building User Controllable Recommender via LLM Agent | [paper] | [code]

  • [2025/02/18] SEFL: Harnessing Large Language Model Agents to Improve Educational Feedback Systems | [paper] | [code]

  • [2025/02/17] Can LLM Agents Maintain a Persona in Discourse? | [paper] | [code]

  • [2025/02/17] LM Agents for Coordinating Multi-User Information Gathering | [paper] | [code]

  • [2025/02/16] SCALE: Towards Collaborative Content Analysis in Social Science with Large Language Model Agents and Human Intervention | [paper] | [code]

  • [2025/02/13] Language Agents as Digital Representatives in Collective Decision-Making | [paper] | [code]

  • [2025/02/06] PsyPlay: Personality-Infused Role-Playing Conversational Agents | [paper] | [code]

  • [2025/02/03] Plan-Then-Execute: An Empirical Study of User Trust and Team Performance When Using LLM Agents As A Daily Assistant | [paper] | [code]

  • [2025/01/23] AgentRec: Agent Recommendation Using Sentence Embeddings Aligned to Human Feedback | [paper] | [code]

  • [2025/01/15] Personality Modeling for Persuasion of Misinformation using AI Agent | [paper] | [code]

  • [2024/12/28] BaiJia: A Large-Scale Role-Playing Agent Corpus of Chinese Historical Characters | [paper] | [code]

  • [2024/12/22] Modular Conversational Agents for Surveys and Interviews | [paper] | [code]

  • [2024/12/11] SweetieChat: A Strategy-Enhanced Role-playing Framework for Diverse Scenarios Handling Emotional Support Agent | [paper] | [code]

  • [2024/12/10] My Words Imply Your Opinion: Reader Agent-Based Propagation Enhancement for Personalized Implicit Emotion Analysis | [paper] | [code]

  • [2024/11/21] Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel Planning | [paper] | [code]

  • [2024/11/19] Probing the Capacity of Language Model Agents to Operationalize Disparate Experiential Context Despite Distraction | [paper] | [code]

  • [2024/11/12] SHARP: Unlocking Interactive Hallucination via Stance Transfer in Role-Playing Agents | [paper] | [code]

  • [2024/11/04] A Multi-Task Role-Playing Agent Capable of Imitating Character Linguistic Styles | [paper] | [code]

  • [2024/10/28] Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments | [paper] | [code]

  • [2024/10/24] Schema-Guided Culture-Aware Complex Event Simulation with Multi-Agent Role-Play | [paper] | [code]

  • [2024/09/23] ERABAL: Enhancing Role-Playing Agents through Boundary-Aware Learning | [paper] | [code]

  • [2024/09/19] FoodPuzzle: Developing Large Language Model Agents as Flavor Scientists | [paper] | [code]

  • [2024/09/12] TravelAgent: An AI Assistant for Personalized Travel Planning | [paper] | [code]

  • [2024/09/11] Using Generative Agents to Create Tip Sheets for Investigative Data Reporting | [paper] | [code]

  • [2024/08/28] Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions | [paper] | [code]

  • [2024/08/21] Drama Engine: A Framework for Narrative Agents | [paper] | [code]

  • [2024/06/24] The Effects of Embodiment and Personality Expression on Learning in LLM-based Educational Agents | [paper] | [code]

  • [2024/06/17] HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing | [paper] | [code]

  • [2024/06/11] Agent-SiMT: Agent-assisted Simultaneous Machine Translation with Large Language Models | [paper] | [code]

  • [2024/06/09] Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions | [paper] | [code]

  • [2024/05/28] TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models | [paper] | [code]

  • [2024/05/10] LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play | [paper] | [code]

  • [2024/05/08] LLMs with Personalities in Multi-issue Negotiation Games | [paper] | [code]

  • [2024/05/06] Large Language Models (LLMs) as Agents for Augmented Democracy | [paper] | [code]

  • [2024/05/02] GAIA: A General AI Assistant for Intelligent Accelerator Operations | [paper] | [code]

  • [2024/05/01] "Ask Me Anything": How Comcast Uses LLMs to Assist Agents in Real Time | [paper] | [code]

  • [2024/04/26] Large Language Model Agent as a Mechanical Designer | [paper] | [code]

  • [2024/04/19] Cooperative Sentiment Agents for Multimodal Sentiment Analysis | [paper] | [code]

  • [2024/03/31] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model | [paper] | [code]

  • [2024/03/23] EduAgent: Generative Student Agents in Learning | [paper] | [code]

  • [2024/03/19] Characteristic AI Agents via Large Language Models | [paper] | [code]

  • [2024/03/15] VideoAgent: Long-form Video Understanding with Large Language Model as Agent | [paper] | [code]

  • [2024/03/13] Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation | [paper] | [code]

  • [2024/02/29] On the Decision-Making Abilities in Role-Playing using Large Language Models | [paper] | [code]

  • [2024/02/28] Prospect Personalized Recommendation on Large Language Model-based Agent Platform | [paper] | [code]

  • [2024/02/26] Language Agents as Optimizable Graphs | [paper] | [code]

  • [2024/02/22] Triad: A Framework Leveraging a Multi-Role LLM-based Agent to Solve Knowledge Base Question Answering | [paper] | [code]

  • [2024/02/22] Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation | [paper] | [code]

  • [2024/02/21] Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent | [paper] | [code]

  • [2024/02/19] Stick to your Role! Stability of Personal Values Expressed in Large Language Models | [paper] | [code]

  • [2024/02/18] Modelling Political Coalition Negotiations Using LLM-based Agents | [paper] | [code]

  • [2024/02/06] Professional Agents -- Evolving Large Language Models into Autonomous Experts with Human-Level Competencies | [paper] | [code]

  • [2024/02/06] Can Generative Agents Predict Emotion? | [paper] | [code]

  • [2024/02/05] GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models | [paper] | [code]

  • [2024/01/31] LLMs Simulate Big Five Personality Traits: Further Evidence | [paper] | [code]

  • [2023/12/22] Personalized Large Language Model Assistant with Evolving Conditional Memory | [paper] | [code]

  • [2023/12/21] ChatGPT as a commenter to the news: can LLMs generate human-like opinions? | [paper] | [code]

  • [2023/12/20] Machine Mindset: An MBTI Exploration of Large Language Models | [paper] | [code]

  • [2023/12/19] Can ChatGPT be Your Personal Medical Assistant? | [paper] | [code]

  • [2023/10/13] AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems | [paper] | [code]

  • [2023/10/01] RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models | [paper] | [code]

  • [2023/09/02] ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models | [paper] | [code]

  • [2023/08/22] Towards an On-device Agent for Text Rewriting | [paper] | [code]

  • [2023/08/10] LLM As DBA | [paper] | [code]

  • [2023/08/03] InterAct: Exploring the Potentials of ChatGPT as a Cooperative Agent | [paper] | [code]

  • [2023/07/11] Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration | [paper] | [code]

  • [2023/07/05] Building Cooperative Embodied Agents Modularly with Large Language Models | [paper] | [code]

  • [2023/05/25] Role-Play with Large Language Models | [paper] | [code]

  • [2023/05/09] TidyBot: Personalized Robot Assistance with Large Language Models | [paper] | [code]

Conversation

  • [2025/02/20] Enhancing Conversational Agents with Theory of Mind: Aligning Beliefs, Desires, and Intentions for Human-Like Interaction | [paper] | [code]

  • [2025/02/18] One Size doesn't Fit All: A Personalized Conversational Tutoring Agent for Mathematics Instruction | [paper] | [code]

  • [2025/02/18] Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors | [paper] | [code]

  • [2025/02/18] You need to MIMIC to get FAME: Solving Meeting Transcript Scarcity with a Multi-Agent Conversations | [paper] | [code]

  • [2025/02/17] InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context | [paper] | [code]

  • [2025/02/13] Reliable Conversational Agents under ASP Control that Understand Natural Language | [paper] | [code]

  • [2025/02/12] Can a Single Model Master Both Multi-turn Conversations and Tool Use? CoALM: A Unified Conversational Agentic Language Model | [paper] | [code]

  • [2025/02/09] MTPChat: A Multimodal Time-Aware Persona Dataset for Conversational Agents | [paper] | [code]

  • [2025/02/09] HamRaz: A Culture-Based Persian Conversation Dataset for Person-Centered Therapy Using LLM Agents | [paper] | [code]

  • [2025/02/08] On Memory Construction and Retrieval for Personalized Conversational Agents | [paper] | [code]

  • [2025/02/06] PsyPlay: Personality-Infused Role-Playing Conversational Agents | [paper] | [code]

  • [2025/01/24] Unmasking Conversational Bias in AI Multiagent Systems | [paper] | [code]

  • [2025/01/23] Communicating Activations Between Language Model Agents | [paper] | [code]

  • [2025/01/19] IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems | [paper] | [code]

  • [2025/01/14] Developing Enhanced Conversational Agents for Social Virtual Worlds | [paper] | [code]

  • [2025/01/03] PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents | [paper] | [code]

  • [2024/12/30] Exploring and Controlling Diversity in LLM-Agent Conversation | [paper] | [code]

  • [2024/12/24] Extracting triples from dialogues for conversational social agents | [paper] | [code]

  • [2024/12/22] Modular Conversational Agents for Surveys and Interviews | [paper] | [code]

  • [2024/12/21] InfoTech Assistant : A Multimodal Conversational Agent for InfoTechnology Web Portal Queries | [paper] | [code]

  • [2024/12/13] Script-Based Dialog Policy Planning for LLM-Powered Conversational Agents: A Basic Architecture for an "AI Therapist" | [paper] | [code]

  • [2024/12/06] CALICO: Conversational Agent Localization via Synthetic Data Generation | [paper] | [code]

  • [2024/12/05] Educational-Psychological Dialogue Robot Based on Multi-Agent Collaboration | [paper] | [code]

  • [2024/12/01] Examining Identity Drift in Conversations of LLM Agents | [paper] | [code]

  • [2024/11/07] Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model | [paper] | [code]

  • [2024/11/07] Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations | [paper] | [code]

  • [2024/11/06] MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue | [paper] | [code]

  • [2024/11/01] DARD: A Multi-Agent Approach for Task-Oriented Dialog Systems | [paper] | [code]

  • [2024/11/01] ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents | [paper] | [code]

  • [2024/10/29] MARCO: Multi-Agent Real-time Chat Orchestration | [paper] | [code]

  • [2024/10/25] AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs | [paper] | [code]

  • [2024/10/18] Coherence-Driven Multimodal Safety Dialogue with Active Learning for Embodied Agents | [paper] | [code]

  • [2024/10/15] HR-Agent: A Task-Oriented Dialogue (TOD) LLM Agent Tailored for HR Applications | [paper] | [code]

  • [2024/10/10] Rewriting Conversational Utterances with Instructed Large Language Models | [paper] | [code]

  • [2024/09/24] Automated test generation to evaluate tool-augmented LLMs as conversational AI agents | [paper] | [code]

  • [2024/09/23] Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents | [paper] | [code]

  • [2024/09/13] AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents | [paper] | [code]

  • [2024/09/06] Sparse Rewards Can Self-Train Dialogue Agents | [paper] | [code]

  • [2024/09/02] Co-Learning: Code Learning for Multi-Agent Reinforcement Collaborative Framework with Conversational Natural Language Interfaces | [paper] | [code]

  • [2024/08/27] Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations | [paper] | [code]

  • [2024/08/22] MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symbolic LLM Agents | [paper] | [code]

  • [2024/08/13] What should I wear to a party in a Greek taverna? Evaluation for Conversational Agents in the Fashion Domain | [paper] | [code]

  • [2024/08/06] OpenOmni: A Collaborative Open Source Tool for Building Future-Ready Multimodal Conversational Agents | [paper] | [code]

  • [2024/08/03] Self-Emotion Blended Dialogue Generation in Social Simulation Agents | [paper] | [code]

  • [2024/07/31] Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent | [paper] | [code]

  • [2024/07/13] Cohesive Conversations: Enhancing Authenticity in Multi-Agent Simulated Dialogues | [paper] | [code]

  • [2024/07/04] Controllable Conversations: Planning-Based Dialogue Agent with Large Language Models | [paper] | [code]

  • [2024/07/01] Empathic Grounding: Explorations using Multimodal Interaction and Large Language Models with Conversational Agents | [paper] | [code]

  • [2024/06/30] CAMON: Cooperative Agents for Multi-Object Navigation with LLM-based Conversations | [paper] | [code]

  • [2024/06/09] Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions | [paper] | [code]

  • [2024/05/29] Toward Conversational Agents with Context and Time Sensitive Long-term Memory | [paper] | [code]

  • [2024/05/16] Speaker Verification in Agent-Generated Conversations | [paper] | [code]

  • [2024/04/19] Towards Human-centered Proactive Conversational Agents | [paper] | [code]

  • [2024/04/10] Apollonion: Profile-centric Dialog Agent | [paper] | [code]

  • [2024/03/17] Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback | [paper] | [code]

  • [2024/03/08] ChatASU: Evoking LLM's Reflexion to Truly Understand Aspect Sentiment in Dialogues | [paper] | [code]

  • [2024/02/25] Understanding Public Perceptions of AI Conversational Agents: A Cross-Cultural Analysis | [paper] | [code]

  • [2024/02/23] On the Multi-turn Instruction Following for Conversational Web Agents | [paper] | [code]

  • [2024/02/20] CHATATC: Large Language Model-Driven Conversational Agents for Supporting Strategic Air Traffic Flow Management | [paper] | [code]

  • [2024/01/29] Assistive Large Language Model Agents for Socially-Aware Negotiation Dialogues | [paper] | [code]

  • [2024/01/10] Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk | [paper] | [code]

  • [2024/01/02] CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation | [paper] | [code]

  • [2023/12/21] Team Flow at DRC2023: Building Common Ground and Text-based Turn-taking in a Travel Agent Spoken Dialogue System | [paper] | [code]

  • [2023/11/15] ToolTalk: Evaluating Tool-Usage in a Conversational Setting | [paper] | [code]

  • [2023/10/01] Adapting LLM Agents Through Communication | [paper] | [code]

  • [2023/06/28] Inferring the Goals of Communicating Agents from Actions and Instructions | [paper] | [code]

  • [2023/04/26] Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models | [paper] | [code]

  • [2023/03/31] CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society | [paper] | [code]

Game Playing

  • [2025/02/01] Who's the MVP? A Game-Theoretic Evaluation Benchmark for Modular Attribution in LLM Agents | [paper] | [code]

  • [2025/01/24] Multi-agent KTO: Reinforcing Strategic Interactions of Large Language Model in Language Game | [paper] | [code]

  • [2024/12/06] TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft | [paper] | [code]

  • [2024/11/08] Game-theoretic LLM: Agent Workflow for Negotiation Games | [paper] | [code]

  • [2024/10/28] Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games | [paper] | [code]

  • [2024/09/03] An Implementation of Werewolf Agent That does not Truly Trust LLMs | [paper] | [code]

  • [2024/08/05] Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information | [paper] | [code]

  • [2024/07/23] AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game | [paper] | [code]

  • [2024/07/17] A LLM Benchmark based on the Minecraft Builder Dialog Agent Task | [paper] | [code]

  • [2024/06/27] OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents | [paper] | [code]

  • [2024/06/07] GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents | [paper] | [code]

  • [2024/06/05] The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games | [paper] | [code]

  • [2024/05/24] Hacc-Man: An Arcade Game for Jailbreaking LLMs | [paper] | [code]

  • [2024/05/23] Human-Agent Cooperation in Games under Incomplete Information through Natural Language Communication | [paper] | [code]

  • [2024/05/08] LLMs with Personalities in Multi-issue Negotiation Games | [paper] | [code]

  • [2024/04/30] PANGeA: Procedural Artificial Narrative using Generative AI for Turn-Based Video Games | [paper] | [code]

  • [2024/04/03] Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game | [paper] | [code]

  • [2024/03/28] MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs | [paper] | [code]

  • [2024/03/18] How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments | [paper] | [code]

  • [2024/02/19] PsychoGAT: A Novel Psychological Measurement Paradigm through Interactive Fiction Games with LLM Agents | [paper] | [code]

  • [2024/02/13] Large Language Models as Minecraft Agents | [paper] | [code]

  • [2024/02/12] Large Language Models as Agents in Two-Player Games | [paper] | [code]

  • [2024/02/04] Enhance Reasoning for Large Language Models in the Game Werewolf | [paper] | [code]

  • [2024/02/02] PokeLLMon: A Human-Parity Agent for Pokemon Battles with Large Language Models | [paper] | [code]

  • [2023/12/29] Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game | [paper] | [code]

  • [2023/12/01] Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games | [paper] | [code]

  • [2023/10/31] Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models | [paper] | [code]

  • [2023/09/29] Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4 | [paper] | [code]

  • [2023/09/18] MindAgent: Emergent Gaming Interaction | [paper] | [code]

  • [2023/09/10] An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language Model Game Agents | [paper] | [code]

  • [2023/09/09] Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf | [paper] | [code]

  • [2023/08/23] Are ChatGPT and GPT-4 Good Poker Players? -- A Pre-Flop Analysis | [paper] | [code]

  • [2023/05/31] Recursive Metropolis-Hastings Naming Game: Symbol Emergence in a Multi-agent System based on Probabilistic Generative Models | [paper] | [code]

  • [2023/05/26] Playing repeated games with Large Language Models | [paper] | [code]

  • [2023/05/25] Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory | [paper] | [code]

  • [2023/05/08] Knowledge-enhanced Agents for Interactive Text Games | [paper] | [code]

  • [2023/04/06] Can Large Language Models Play Text Games Well? Current State-of-the-Art and Open Questions | [paper] | [code]

Human-Agent Interaction

  • [2025/02/17] Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration | [paper] | [code]

  • [2025/01/28] CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation | [paper] | [code]

  • [2024/12/20] Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration | [paper] | [code]

  • [2024/06/28] Designing and Evaluating Multi-Chatbot Interface for Human-AI Communication: Preliminary Findings from a Persuasion Task | [paper] | [code]

  • [2024/06/11] Towards Human-AI Collaboration in Healthcare: Guided Deferral Systems with Large Language Models | [paper] | [code]

  • [2024/06/02] Towards a copilot in BIM authoring tool using a large language model-based agent for intelligent human-machine interaction | [paper] | [code]

  • [2024/03/05] ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary | [paper] | [code]

  • [2024/02/20] Large Language Model-based Human-Agent Collaboration for Complex Task Solving | [paper] | [code]

  • [2024/02/18] Shaping Human-AI Collaboration: Varied Scaffolding Levels in Co-writing with Language Models | [paper] | [code]

  • [2024/02/17] MONAL: Model Autophagy Analysis for Modeling Human-AI Interactions | [paper] | [code]

  • [2023/09/22] Learning to Coordinate with Anyone | [paper] | [code]

  • [2023/07/31] HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution | [paper] | [code]

  • [2023/04/26] Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models | [paper] | [code]

Tool Usage

  • [2025/02/17] LLM Agents Making Agent Tools | [paper] | [code]

  • [2025/02/17] SMART: Self-Aware Agent for Tool Overuse Mitigation | [paper] | [code]

  • [2025/02/16] OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning | [paper] | [code]

  • [2025/02/12] Can a Single Model Master Both Multi-turn Conversations and Tool Use? CoALM: A Unified Conversational Agentic Language Model | [paper] | [code]

  • [2025/02/07] Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research | [paper] | [code]

  • [2025/02/06] Division-of-Thoughts: Harnessing Hybrid Language Model Synergy for Efficient On-Device Agents | [paper] | [code]

  • [2025/02/05] ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation | [paper] | [code]

  • [2025/01/28] CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation | [paper] | [code]

  • [2025/01/21] UI-TARS: Pioneering Automated GUI Interaction with Native Agents | [paper] | [code]

  • [2025/01/20] Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks | [paper] | [code]

  • [2025/01/20] PlotEdit: Natural Language-Driven Accessible Chart Editing in PDFs via Multimodal LLM Agents | [paper] | [code]

  • [2025/01/08] InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection | [paper] | [code]

  • [2025/01/08] FinSphere: A Conversational Stock Analysis Agent Equipped with Quantitative Tools based on Real-Time Database | [paper] | [code]

  • [2025/01/07] PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides | [paper] | [code]

  • [2024/12/28] Efficient Multi-Agent Collaboration with Tool Use for Online Planning in Complex Table Question Answering | [paper] | [code]

  • [2024/12/21] InfoTech Assistant : A Multimodal Conversational Agent for InfoTechnology Web Portal Queries | [paper] | [code]

  • [2024/12/12] AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials | [paper] | [code]

  • [2024/12/08] Cooperative SQL Generation for Segmented Databases By Using Multi-functional LLM Agents | [paper] | [code]

  • [2024/12/05] Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction | [paper] | [code]

  • [2024/11/26] ShowUI: One Vision-Language-Action Model for GUI Visual Agent | [paper] | [code]

  • [2024/11/22] ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data | [paper] | [code]

  • [2024/11/20] AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations | [paper] | [code]

  • [2024/11/15] The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use | [paper] | [code]

  • [2024/11/04] WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning | [paper] | [code]

  • [2024/11/04] Attacking Vision-Language Computer Agents via Pop-ups | [paper] | [code]

  • [2024/11/02] Infant Agent: A Tool-Integrated, Logic-Driven Agent with Cost-Effective API Usage | [paper] | [code]

  • [2024/10/28] AutoGLM: Autonomous Foundation Agents for GUIs | [paper] | [code]

  • [2024/10/25] OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization | [paper] | [code]

  • [2024/10/24] Infogent: An Agent-Based Framework for Web Information Aggregation | [paper] | [code]

  • [2024/10/23] ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents | [paper] | [code]

  • [2024/10/22] Large Language Models Empowered Personalized Web Agents | [paper] | [code]

  • [2024/10/21] VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use | [paper] | [code]

  • [2024/10/21] Beyond Browsing: API-Based Web Agents | [paper] | [code]

  • [2024/10/18] Toolshed: Scale Tool-Equipped Agents with Advanced RAG-Tool Fusion and Tool Knowledge Bases | [paper] | [code]

  • [2024/10/17] Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation | [paper] | [code]

  • [2024/10/17] MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling | [paper] | [code]

  • [2024/10/17] MobA: A Two-Level Agent System for Efficient Mobile Task Automation | [paper] | [code]

  • [2024/10/17] AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents | [paper] | [code]

  • [2024/10/16] Agent Skill Acquisition for Large Language Models via CycleQD | [paper] | [code]

  • [2024/10/10] Agent S: An Open Agentic Framework that Uses Computers Like a Human | [paper] | [code]

  • [2024/10/07] Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents | [paper] | [code]

  • [2024/10/03] NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild | [paper] | [code]

  • [2024/09/24] Automated test generation to evaluate tool-augmented LLMs as conversational AI agents | [paper] | [code]

  • [2024/09/17] EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage | [paper] | [code]

  • [2024/09/01] TinyAgent: Function Calling at the Edge | [paper] | [code]

  • [2024/08/30] Tool-Assisted Agent on SQL Inspection and Refinement in Real-World Scenarios | [paper] | [code]

  • [2024/08/15] VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool | [paper] | [code]

  • [2024/08/05] Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions | [paper] | [code]

  • [2024/08/01] OmniParser for Pure Vision Based GUI Agent | [paper] | [code]

  • [2024/07/26] AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents | [paper] | [code]

  • [2024/07/22] AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? | [paper] | [code]

  • [2024/07/11] GTA: A Benchmark for General Tool Agents | [paper] | [code]

  • [2024/07/01] Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents | [paper] | [code]

  • [2024/06/17] GUICourse: From General Vision Language Models to Versatile GUI Agents | [paper] | [code]

  • [2024/06/16] GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents | [paper] | [code]

  • [2024/06/06] Tool-Planner: Task Planning with Clusters across Multiple Tools | [paper] | [code]

  • [2024/06/03] Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration | [paper] | [code]

  • [2024/06/02] Towards a copilot in BIM authoring tool using a large language model-based agent for intelligent human-machine interaction | [paper] | [code]

  • [2024/05/30] Large Language Models Can Self-Improve At Web Agent Tasks | [paper] | [code]

  • [2024/05/17] Latent State Estimation Helps UI Agents to Reason | [paper] | [code]

  • [2024/05/06] SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering | [paper] | [code]

  • [2024/05/02] CACTUS: Chemistry Agent Connecting Tool-Usage to Science | [paper] | [code]

  • [2024/05/01] Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning | [paper] | [code]

  • [2024/04/23] Evaluating Tool-Augmented Agents in Remote Sensing Platforms | [paper] | [code]

  • [2024/04/17] The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey | [paper] | [code]

  • [2024/04/17] Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent | [paper] | [code]

  • [2024/04/16] Grounded Language Agent for Product Search via Intelligent Web Interactions | [paper] | [code]

  • [2024/04/04] AutoWebGLM: A Large Language Model-based Web Navigating Agent | [paper] | [code]

  • [2024/04/01] Rapid Mobile App Development for Generative AI Agents on MIT App Inventor | [paper] | [code]

  • [2024/03/05] InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents | [paper] | [code]

  • [2024/03/05] Android in the Zoo: Chain-of-Action-Thought for GUI Agents | [paper] | [code]

  • [2024/02/27] BASES: Large-scale Web Search User Simulation with Large Language Model based Agents | [paper] | [code]

  • [2024/02/26] Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models | [paper] | [code]

  • [2024/02/23] On the Multi-turn Instruction Following for Conversational Web Agents | [paper] | [code]

  • [2024/02/20] AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning | [paper] | [code]

  • [2024/02/18] SciAgent: Tool-augmented Language Models for Scientific Reasoning | [paper] | [code]

  • [2024/02/16] ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages | [paper] | [code]

  • [2024/02/08] UFO: A UI-Focused Agent for Windows OS Interaction | [paper] | [code]

  • [2024/02/06] AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls | [paper] | [code]

  • [2024/01/11] EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction | [paper] | [code]

  • [2024/01/03] GPT-4V(ision) is a Generalist Web Agent, if Grounded | [paper] | [code]

  • [2023/12/21] AppAgent: Multimodal Agents as Smartphone Users | [paper] | [code]

  • [2023/12/18] CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update | [paper] | [code]

  • [2023/12/14] CogAgent: A Visual Language Model for GUI Agents | [paper] | [code]

  • [2023/11/19] TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems | [paper] | [code]

  • [2023/11/15] ToolTalk: Evaluating Tool-Usage in a Conversational Setting | [paper] | [code]

  • [2023/11/10] Smart Agent-Based Modeling: On the Use of Large Language Models in Computer Simulations | [paper] | [code]

  • [2023/10/12] A Zero-Shot Language Agent for Computer Control with Structured Reflection | [paper] | [code]

  • [2023/08/07] TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage | [paper] | [code]

  • [2023/06/09] Mind2Web: Towards a Generalist Agent for the Web | [paper] | [code]

  • [2023/05/22] Making Language Models Better Tool Learners with Execution Feedback | [paper] | [code]

  • [2023/05/19] ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings | [paper] | [code]

Simulation

  • [2025/02/06] Simulating the Emergence of Differential Case Marking with Communicating Neural-Network Agents | [paper] | [code]

  • [2025/02/03] Eliciting Language Model Behaviors with Investigator Agents | [paper] | [code]

  • [2025/01/25] Are Human Interactions Replicable by Generative Agents? A Case Study on Pronoun Usage in Hierarchical Interactions | [paper] | [code]

  • [2025/01/19] Self-Explanation in Social AI Agents | [paper] | [code]

  • [2025/01/12] LLMs Model Non-WEIRD Populations: Experiments with Synthetic Cultural Agents | [paper] | [code]

  • [2024/12/10] Political Actor Agent: Simulating Legislative System for Roll Call Votes Prediction with Large Language Models | [paper] | [code]

  • [2024/11/18] OASIS: Open Agent Social Interaction Simulations with One Million Agents | [paper] | [code]

  • [2024/10/28] ElectionSim: Massive Population Election Simulation Powered by Large Language Model Driven Agents | [paper] | [code]

  • [2024/10/24] Schema-Guided Culture-Aware Complex Event Simulation with Multi-Agent Role-Play | [paper] | [code]

  • [2024/10/18] SRAP-Agent: Simulating and Optimizing Scarce Resource Allocation Policy with LLM-based Agent | [paper] | [code]

  • [2024/10/05] Large Language Models can Achieve Social Balance | [paper] | [code]

  • [2024/09/25] Plurals: A System for Guiding LLMs Via Simulated Social Ensembles | [paper] | [code]

  • [2024/09/14] Synergistic Simulations: Multi-Agent Problem Solving with Large Language Models | [paper] | [code]

  • [2024/09/02] Agentic Society: Merging skeleton from real world and texture from Large Language Model | [paper] | [code]

  • [2024/08/28] Logic-Enhanced Language Model Agents for Trustworthy Social Simulations | [paper] | [code]

  • [2024/08/15] AgentCourt: Simulating Court with Adversarial Evolvable Lawyer Agents | [paper] | [code]

  • [2024/08/03] Self-Emotion Blended Dialogue Generation in Social Simulation Agents | [paper] | [code]

  • [2024/06/26] Simulating The U.S. Senate: An LLM-Driven Agent Approach to Modeling Legislative Behavior and Bipartisanship | [paper] | [code]

  • [2024/06/20] Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory | [paper] | [code]

  • [2024/06/10] Can Language Models Serve as Text-Based World Simulators? | [paper] | [code]

  • [2024/05/12] Exploring the Potential of Conversational AI Support for Agent-Based Social Simulation Model Design | [paper] | [code]

  • [2024/04/23] BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis | [paper] | [code]

  • [2024/03/20] AgentGroupChat: An Interactive Group Chat Simulacra For Better Eliciting Emergent Behavior | [paper] | [code]

  • [2024/03/05] AgentsCourt: Building Judicial Decision-Making Agents with Court Debate Simulation and Legal Knowledge Augmentation | [paper] | [code]

  • [2024/02/26] Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation | [paper] | [code]

  • [2024/02/20] What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents | [paper] | [code]

  • [2024/02/07] Can Large Language Model Agents Simulate Human Trust Behavior? | [paper] | [code]

  • [2024/01/08] SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems | [paper] | [code]

  • [2023/12/06] LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem | [paper] | [code]

  • [2023/11/28] War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars | [paper] | [code]

  • [2023/10/10] MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents | [paper] | [code]

  • [2023/06/05] User Behavior Simulation with Large Language Model based Agents | [paper] | [code]

  • [2023/05/26] Training Socially Aligned Language Models on Simulated Social Interactions | [paper] | [code]

  • [2023/04/07] Generative Agents: Interactive Simulacra of Human Behavior | [paper] | [code]

Application

Math

  • [2025/02/18] One Size doesn't Fit All: A Personalized Conversational Tutoring Agent for Mathematics Instruction | [paper] | [code]

  • [2024/10/29] Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning | [paper] | [code]

  • [2024/10/13] Expanding Search Space with Diverse Prompting Agents: An Efficient Sampling Approach for LLM Mathematical Reasoning | [paper] | [code]

  • [2024/08/03] MathLearner: A Large Language Model Agent Framework for Learning to Solve Mathematical Problems | [paper] | [code]

  • [2024/04/10] MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education | [paper] | [code]

  • [2024/04/06] MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems | [paper] | [code]

Chemistry

  • [2025/01/23] Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents | [paper] | [code]

  • [2025/01/11] ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning | [paper] | [code]

  • [2024/08/29] HoneyComb: A Flexible LLM-Based Agent System for Materials Science | [paper] | [code]

  • [2024/06/26] A Review of Large Language Models and Autonomous Agents in Chemistry | [paper] | [code]

Biology

  • [2024/10/16] PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking | [paper] | [code]

  • [2024/06/29] BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science | [paper] | [code]

  • [2024/05/25] GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases | [paper] | [code]

  • [2024/04/27] CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments | [paper] | [code]

  • [2024/04/03] Empowering Biomedical Discovery with AI Agents | [paper] | [code]

  • [2024/01/27] ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning | [paper] | [code]

Physics

  • [2025/01/23] Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents | [paper] | [code]

  • [2024/12/09] StarWhisper Telescope: Agent-Based Observation Assistant System to Approach AI Astrophysicist | [paper] | [code]

  • [2024/08/29] HoneyComb: A Flexible LLM-Based Agent System for Materials Science | [paper] | [code]

  • [2024/01/27] ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning | [paper] | [code]

Geography

  • [2024/12/23] MineAgent: Towards Remote-Sensing Mineral Exploration with Multimodal Large Language Models | [paper] | [code]

  • [2024/07/13] An Autonomous GIS Agent Framework for Geospatial Data Retrieval | [paper] | [code]

Art

  • [2025/01/22] FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces | [paper] | [code]

  • [2024/10/02] Agent-Driven Large Language Models for Mandarin Lyric Generation | [paper] | [code]

  • [2024/09/05] LLM-based multi-agent poetry generation in non-cooperative environments | [paper] | [code]

  • [2024/08/13] What should I wear to a party in a Greek taverna? Evaluation for Conversational Agents in the Fashion Domain | [paper] | [code]

  • [2024/07/01] IBSEN: Director-Actor Agent Collaboration for Controllable and Interactive Drama Script Generation | [paper] | [code]

  • [2024/04/28] ComposerX: Multi-Agent Symbolic Music Composition with LLMs | [paper] | [code]

  • [2024/03/12] AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production | [paper] | [code]

  • [2023/10/18] MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models | [paper] | [code]

Medicine

  • [2025/02/19] LIDDIA: Language-based Intelligent Drug Discovery Agent | [paper] | [code]

  • [2025/02/18] An LLM-Powered Agent for Physiological Data Analysis: A Case Study on PPG-based Heart Rate Estimation | [paper] | [code]

  • [2025/02/18] Sleepless Nights, Sugary Days: Creating Synthetic Users with Health Conditions for Realistic Coaching Agent Interactions | [paper] | [code]

  • [2025/02/13] PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology | [paper] | [code]

  • [2025/02/09] HamRaz: A Culture-Based Persian Conversation Dataset for Person-Centered Therapy Using LLM Agents | [paper] | [code]

  • [2025/02/05] CAMI: A Counselor Agent Supporting Motivational Interviewing through State Inference and Topic Exploration | [paper] | [code]

  • [2025/02/02] Agent-Based Uncertainty Awareness Improves Automated Radiology Report Labeling with an Open-Source Large Language Model | [paper] | [code]

  • [2025/01/27] MADP: Multi-Agent Deductive Planning for Enhanced Cognitive-Behavioral Mental Health Question Answer | [paper] | [code]

  • [2025/01/16] AutoCBT: An Autonomous Multi-agent Framework for Cognitive Behavioral Therapy in Psychological Counseling | [paper] | [code]

  • [2025/01/03] PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents | [paper] | [code]

  • [2024/12/19] PsyDraw: A Multi-Agent Multimodal System for Mental Health Screening in Left-Behind Children | [paper] | [code]

  • [2024/12/17] RareAgents: Advancing Rare Disease Care through LLM-Empowered Multi-disciplinary Team | [paper] | [code]

  • [2024/12/16] LLMs Can Simulate Standardized Patients via Agent Coevolution | [paper] | [code]

  • [2024/12/13] Script-Based Dialog Policy Planning for LLM-Powered Conversational Agents: A Basic Architecture for an "AI Therapist" | [paper] | [code]

  • [2024/12/05] Educational-Psychological Dialogue Robot Based on Multi-Agent Collaboration | [paper] | [code]

  • [2024/12/02] Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking | [paper] | [code]

  • [2024/11/21] PIORS: Personalized Intelligent Outpatient Reception based on Large Language Model with Multi-Agents Medical Scenario Simulation | [paper] | [code]

  • [2024/11/16] Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios | [paper] | [code]

  • [2024/11/03] EcoAct: Economic Agent Determines When to Register What Action | [paper] | [code]

  • [2024/10/25] $\texttt{PatentAgent}$: Intelligent Agent for Automated Pharmaceutical Patent Analysis | [paper] | [code]

  • [2024/10/23] ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents | [paper] | [code]

  • [2024/10/17] MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling | [paper] | [code]

  • [2024/10/16] MedAide: Towards an Omni Medical Aide via Specialized LLM-based Multi-Agent Collaboration | [paper] | [code]

  • [2024/10/02] Zodiac: A Cardiologist-Level LLM Framework for Multi-Agent Diagnostics | [paper] | [code]

  • [2024/08/28] Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions | [paper] | [code]

  • [2024/08/23] DrugAgent: Explainable Drug Repurposing Agent with Large Language Model-based Reasoning | [paper] | [code]

  • [2024/08/14] Development of a Large Language Model-based Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments | [paper] | [code]

  • [2024/07/18] CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis | [paper] | [code]

  • [2024/07/10] Virtual Agents for Alcohol Use Counseling: Exploring LLM-Powered Motivational Interviewing | [paper] | [code]

  • [2024/07/03] MentalAgora: A Gateway to Advanced Personalized Care in Mental Health through Multi-Agent Debating and Attribute Control | [paper] | [code]

  • [2024/07/02] MMedAgent: Learning to Use Medical Tools with Multi-modal Agent | [paper] | [code]

  • [2024/04/23] ClinicalAgent: Clinical Trial Multi-Agent System with Large Language Model-based Reasoning | [paper] | [code]

  • [2024/04/03] Empowering Biomedical Discovery with AI Agents | [paper] | [code]

  • [2024/02/20] Can Large Language Models be Used to Provide Psychological Counselling? An Analysis of GPT-4-Generated Responses Using Role-play Dialogues | [paper] | [code]

  • [2024/02/20] AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning | [paper] | [code]

  • [2024/02/15] Knowledge-Infused LLM-Powered Conversational Health Agent: A Case Study for Diabetes Patients | [paper] | [code]

  • [2024/02/01] Generation, Distillation and Evaluation of Motivational Interviewing-Style Reflections with a Foundational Language Model | [paper] | [code]

  • [2023/12/19] Can ChatGPT be Your Personal Medical Assistant? | [paper] | [code]

  • [2023/10/03] Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View | [paper] | [code]

Finance

  • [2025/02/08] Agentic AI Systems Applied to tasks in Financial Services: Modeling and model risk management crews | [paper] | [code]

  • [2025/02/01] MarketSenseAI 2.0: Enhancing Stock Analysis through LLM Agents | [paper] | [code]

  • [2025/01/08] FinSphere: A Conversational Stock Analysis Agent Equipped with Quantitative Tools based on Real-Time Database | [paper] | [code]

  • [2024/12/27] OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis | [paper] | [code]

  • [2024/12/19] Beyond the Sum: Unlocking AI Agents Potential Through Market Forces | [paper] | [code]

  • [2024/11/07] Enhancing Investment Analysis: Optimizing AI-Agent Collaboration in Financial Research | [paper] | [code]

  • [2024/10/29] Enhancing Financial Question Answering with a Multi-Agent Reflection Framework | [paper] | [code]

  • [2024/09/19] Strategic Collusion of LLM Agents: Market Division in Multi-Commodity Competitions | [paper] | [code]

  • [2024/07/18] dzFinNlp at AraFinNLP: Improving Intent Detection in Financial Conversational Agents | [paper] | [code]

  • [2024/07/09] FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making | [paper] | [code]

  • [2024/07/05] Towards Automated Functional Equation Proving: A Benchmark Dataset and A Domain-Specific In-Context Agent | [paper] | [code]

  • [2024/05/07] Enhancing the Efficiency and Accuracy of Underlying Asset Reviews in Structured Finance: The Application of Multi-agent Framework | [paper] | [code]

Software Engineering

  • [2025/02/19] An LLM-based Agent for Reliable Docker Environment Configuration | [paper] | [code]

  • [2025/02/18] Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors | [paper] | [code]

  • [2025/02/18] UXAgent: An LLM Agent-Based Usability Testing Framework for Web Design | [paper] | [code]

  • [2025/02/14] The Ann Arbor Architecture for Agent-Oriented Programming | [paper] | [code]

  • [2025/02/11] Multi-Agent Collaboration for Multilingual Code Instruction Tuning | [paper] | [code]

  • [2025/02/10] SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering | [paper] | [code]

  • [2025/02/08] CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging | [paper] | [code]

  • [2024/12/30] Training Software Engineering Agents and Verifiers with SWE-Gym | [paper] | [code]

  • [2024/12/24] Molly: Making Large Language Model Agents Solve Python Problem More Logically | [paper] | [code]

  • [2024/12/16] Seeker: Towards Exception Safety Code Generation with Intermediate Language Agents Framework | [paper] | [code]

  • [2024/11/07] CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models | [paper] | [code]

  • [2024/10/29] SceneGenAgent: Precise Industrial Scene Generation with Coding Agent | [paper] | [code]

  • [2024/10/09] DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models | [paper] | [code]

  • [2024/10/09] Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach | [paper] | [code]

  • [2024/09/02] Co-Learning: Code Learning for Multi-Agent Reinforcement Collaborative Framework with Conversational Natural Language Interfaces | [paper] | [code]

  • [2024/08/19] GoNoGo: An Efficient LLM-based Multi-Agent System for Streamlining Automotive Software Release Decision-Making | [paper] | [code]

  • [2024/08/13] Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents | [paper] | [code]

  • [2024/08/05] LLM Agents Improve Semantic Code Search | [paper] | [code]

  • [2024/07/26] AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents | [paper] | [code]

  • [2024/07/01] Agentless: Demystifying LLM-based Software Engineering Agents | [paper] | [code]

  • [2024/06/13] Multi-Agent Software Development through Cross-Team Collaboration | [paper] | [code]

  • [2024/05/06] SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering | [paper] | [code]

  • [2024/04/11] Behavior Trees Enable Structured Programming of Language Model Agents | [paper] | [code]

  • [2024/04/02] Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization | [paper] | [code]

  • [2024/03/02] SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code | [paper] | [code]

  • [2024/02/26] RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation | [paper] | [code]

  • [2024/02/19] WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment | [paper] | [code]

  • [2024/02/02] StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback | [paper] | [code]

  • [2024/02/01] Executable Code Actions Elicit Better LLM Agents | [paper] | [code]

  • [2023/12/28] Experiential Co-Learning of Software-Developing Agents | [paper] | [code]

  • [2023/12/20] AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation | [paper] | [code]

  • [2023/07/27] PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback | [paper] | [code]

  • [2023/07/16] ChatDev: Communicative Agents for Software Development | [paper] | [code]

  • [2023/04/15] Self-collaboration Code Generation via ChatGPT | [paper] | [code]

Research

  • [2025/02/20] MLGym: A New Framework and Benchmark for Advancing AI Research Agents | [paper] | [code]

  • [2025/02/07] Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research | [paper] | [code]

  • [2025/01/08] Agent Laboratory: Using LLM Agents as Research Assistants | [paper] | [code]

  • [2024/10/17] Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents | [paper] | [code]

  • [2024/10/12] Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System | [paper] | [code]

  • [2024/10/07] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery | [paper] | [code]

  • [2024/10/07] ImProver: Agent-Based Automated Proof Optimization | [paper] | [code]

  • [2024/09/23] Towards a Realistic Long-Term Benchmark for Open-Web Research Agents | [paper] | [code]

  • [2024/09/17] CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark | [paper] | [code]

  • [2024/09/12] DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? | [paper] | [code]

  • [2024/09/11] SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories | [paper] | [code]

  • [2024/09/10] Language agents achieve superhuman synthesis of scientific knowledge | [paper] | [code]

  • [2024/09/09] SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning | [paper] | [code]

  • [2024/08/26] MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents | [paper] | [code]

  • [2024/08/20] Automating Knowledge Discovery from Scientific Literature via LLMs: A Dual-Agent Approach with Progressive Ontology Prompting | [paper] | [code]

  • [2024/06/13] ResearchArena: Benchmarking Large Language Models' Ability to Collect and Organize Information as Research Agents | [paper] | [code]

  • [2024/05/02] CACTUS: Chemistry Agent Connecting Tool-Usage to Science | [paper] | [code]

  • [2024/04/09] SurveyAgent: A Conversational System for Personalized and Efficient Research Survey | [paper] | [code]

  • [2024/02/28] Data Interpreter: An LLM Agent For Data Science | [paper] | [code]

  • [2024/02/18] SciAgent: Tool-augmented Language Models for Scientific Reasoning | [paper] | [code]

  • [2024/02/06] Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science | [paper] | [code]

  • [2024/01/08] MARG: Multi-Agent Review Generation for Scientific Papers | [paper] | [code]

Automation

Workflow

  • [2025/02/11] EvoFlow: Evolving Diverse Agentic Workflows On The Fly | [paper] | [code]

  • [2025/02/07] nvAgent: Automated Data Visualization from Natural Language via Collaborative Agent Workflow | [paper] | [code]

  • [2025/02/06] ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization | [paper] | [code]

  • [2024/12/17] An Agentic Approach to Automatic Creation of P&ID Diagrams from Natural Language Descriptions | [paper] | [code]

  • [2024/12/15] LAW: Legal Agentic Workflows for Custody and Fund Services Contracts | [paper] | [code]

  • [2024/11/22] ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data | [paper] | [code]

  • [2024/11/12] BudgetMLAgent: A Cost-Effective LLM Multi-Agent system for Automating Machine Learning Tasks | [paper] | [code]

  • [2024/11/08] Game-theoretic LLM: Agent Workflow for Negotiation Games | [paper] | [code]

  • [2024/10/24] An LLM Agent for Automatic Geospatial Data Analysis | [paper] | [code]

  • [2024/10/17] From Barriers to Tactics: A Behavioral Science-Informed Agentic Workflow for Personalized Nutrition Coaching | [paper] | [code]

  • [2024/10/17] ControlAgent: Automating Control System Design via Novel Integration of LLM Agents and Domain Expertise | [paper] | [code]

  • [2024/10/16] Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance | [paper] | [code]

  • [2024/10/14] AFlow: Automating Agentic Workflow Generation | [paper] | [code]

  • [2024/10/10] Benchmarking Agentic Workflow Generation | [paper] | [code]

  • [2024/10/03] AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML | [paper] | [code]

  • [2024/09/11] Agent Workflow Memory | [paper] | [code]

  • [2024/08/16] The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation | [paper] | [code]

  • [2024/07/15] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? | [paper] | [code]

  • [2024/07/03] AgentInstruct: Toward Generative Teaching with Agentic Flows | [paper] | [code]

  • [2024/07/01] AutoFlow: Automated Workflow Generation for Large Language Model Agents | [paper] | [code]

  • [2024/06/21] Autonomous Agents for Collaborative Task under Information Asymmetry | [paper] | [code]

  • [2024/03/13] AutoGuide: Automated Generation and Selection of Context-Aware Guidelines for Large Language Model Agents | [paper] | [code]

  • [2024/03/05] ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary | [paper] | [code]

Automatic Evaluation

  • [2025/02/14] Automated Hypothesis Validation with Agentic Sequential Falsifications | [paper] | [code]

  • [2025/01/19] IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems | [paper] | [code]

  • [2025/01/17] Agent-as-Judge for Factual Summarization of Long Narratives | [paper] | [code]

  • [2025/01/03] PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents | [paper] | [code]

  • [2024/12/28] M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation | [paper] | [code]

  • [2024/12/10] Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models | [paper] | [code]

  • [2024/11/25] SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text | [paper] | [code]

  • [2024/11/15] Large Language Models as User-Agents for Evaluating Task-Oriented-Dialogue Systems | [paper] | [code]

  • [2024/09/24] Automated test generation to evaluate tool-augmented LLMs as conversational AI agents | [paper] | [code]

  • [2024/09/22] The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests | [paper] | [code]

  • [2024/09/13] Safeguarding Decentralized Social Media: LLM Agents for Automating Community Rule Compliance | [paper] | [code]

  • [2024/05/23] ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation | [paper] | [code]

  • [2024/03/28] MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation | [paper] | [code]

  • [2023/08/14] ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate | [paper] | [code]

Training

Fine tuning

  • [2025/02/19] UM_FHS at TREC 2024 PLABA: Exploration of Fine-tuning and AI agent approach for plain language adaptations of biomedical text | [paper] | [code]

  • [2025/02/18] Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors | [paper] | [code]

  • [2025/02/11] Multi-Agent Collaboration for Multilingual Code Instruction Tuning | [paper] | [code]

  • [2025/02/10] Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training | [paper] | [code]

  • [2025/01/10] Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains | [paper] | [code]

  • [2025/01/03] AgentRefine: Enhancing Agent Generalization through Refinement Tuning | [paper] | [code]

  • [2024/12/30] Training Software Engineering Agents and Verifiers with SWE-Gym | [paper] | [code]

  • [2024/12/30] Aviary: training language agents on challenging scientific tasks | [paper] | [code]

  • [2024/12/16] Virtual Agent-Based Communication Skills Training to Facilitate Health Persuasion Among Peers | [paper] | [code]

  • [2024/11/29] Training Agents with Weakly Supervised Feedback from Large Language Models | [paper] | [code]

  • [2024/11/21] Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning | [paper] | [code]

  • [2024/10/20] Training Language Models to Critique With Multi-agent Feedback | [paper] | [code]

  • [2024/10/16] Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance | [paper] | [code]

  • [2024/10/10] AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories | [paper] | [code]

  • [2024/07/25] Recursive Introspection: Teaching Language Model Agents How to Self-Improve | [paper] | [code]

  • [2024/06/11] CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation | [paper] | [code]

  • [2024/04/05] Social Skill Training with Large Language Models | [paper] | [code]

  • [2024/04/02] CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models | [paper] | [code]

  • [2024/03/29] Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning | [paper] | [code]

  • [2024/03/21] ReAct Meets ActRe: When Language Agents Enjoy Training Data Autonomy | [paper] | [code]

  • [2024/03/19] Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models | [paper] | [code]

  • [2024/02/23] AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning | [paper] | [code]

  • [2024/02/21] Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent | [paper] | [code]

  • [2024/02/18] Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents | [paper] | [code]

  • [2024/01/10] Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training | [paper] | [code]

  • [2024/01/05] From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models | [paper] | [code]

  • [2023/12/22] Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning | [paper] | [code]

  • [2023/11/28] Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld | [paper] | [code]

  • [2023/10/19] AgentTuning: Enabling Generalized Agent Abilities for LLMs | [paper] | [code]

  • [2023/10/09] FireAct: Toward Language Agent Fine-tuning | [paper] | [code]

  • [2023/05/26] Training Socially Aligned Language Models on Simulated Social Interactions | [paper] | [code]

RL

  • [2025/02/09] Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning | [paper] | [code]

  • [2025/02/06] Multi-Agent Reinforcement Learning with Focal Diversity Optimization | [paper] | [code]

  • [2025/01/25] Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning | [paper] | [code]

  • [2024/11/26] LLM-Based Offline Learning for Embodied Agents via Consistency-Guided Reward Ensemble | [paper] | [code]

  • [2024/11/07] Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations | [paper] | [code]

  • [2024/11/06] From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning | [paper] | [code]

  • [2024/11/04] WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning | [paper] | [code]

  • [2024/10/11] Words as Beacons: Guiding RL Agents with High-Level Language Prompts | [paper] | [code]

  • [2024/10/10] MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization | [paper] | [code]

  • [2024/07/02] Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling | [paper] | [code]

  • [2024/06/26] Mental Modeling of Reinforcement Learning Agents by Language Models | [paper] | [code]

  • [2024/06/17] Input Conditioned Graph Generation for Language Agents | [paper] | [code]

  • [2024/06/05] LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback | [paper] | [code]

  • [2024/06/03] Re-ReST: Reflection-Reinforced Self-Training for Language Agents | [paper] | [code]

  • [2024/05/30] Safe Multi-agent Reinforcement Learning with Natural Language Constraints | [paper] | [code]

  • [2024/05/17] LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions | [paper] | [code]

  • [2024/05/16] Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning | [paper] | [code]

  • [2024/05/01] Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning | [paper] | [code]

  • [2024/03/05] Language Guided Exploration for RL Agents in Text Environments | [paper] | [code]

  • [2024/02/17] Offline Training of Language Model Agents with Functions as Learnable Weights | [paper] | [code]

  • [2024/02/02] StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback | [paper] | [code]

  • [2023/10/25] MultiPrompter: Cooperative Prompt Optimization with Multi-Agent Reinforcement Learning | [paper] | [code]

  • [2023/03/29] Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks | [paper] | [code]

DPO

  • [2025/01/03] SDPO: Segment-Level Direct Preference Optimization for Social Agents | [paper] | [code]

  • [2024/10/29] Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning | [paper] | [code]

  • [2024/05/31] Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training | [paper] | [code]

Scaling

Single-Agent Framework

  • [2025/02/14] Agentic Verification for Ambiguous Query Disambiguation | [paper] | [code]

  • [2025/02/12] SPeCtrum: A Grounded Framework for Multidimensional Identity Representation in LLM-Based Agent | [paper] | [code]

  • [2025/02/09] AutoAgent: A Fully-Automated and Zero-Code Framework for LLM Agents | [paper] | [code]

  • [2025/02/04] Adaptive Self-improvement LLM Agentic System for ML Library Development | [paper] | [code]

  • [2025/01/31] Enabling Autonomic Microservice Management through Self-Learning Agents | [paper] | [code]

  • [2024/12/28] OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System | [paper] | [code]

  • [2024/12/21] Self-guided Knowledgeable Network of Thoughts: Amplifying Reasoning with Large Language Models | [paper] | [code]

  • [2024/12/15] AgentPS: Agentic Process Supervision for Multi-modal Content Quality Assurance through Multi-round QA | [paper] | [code]

  • [2024/12/11] A Multimodal Social Agent | [paper] | [code]

  • [2024/12/11] Federated In-Context LLM Agent Learning | [paper] | [code]

  • [2024/12/04] How to Correctly do Semantic Backpropagation on Language-based Agentic Systems | [paper] | [code]

  • [2024/12/02] SAUP: Situation Awareness Uncertainty Propagation on LLM Agent | [paper] | [code]

  • [2024/12/01] Towards Adaptive Mechanism Activation in Language Agent | [paper] | [code]

  • [2024/11/20] MindForge: Empowering Embodied Agents with Theory of Mind for Lifelong Collaborative Learning | [paper] | [code]

  • [2024/11/16] IntentGPT: Few-shot Intent Discovery with Large Language Models | [paper] | [code]

  • [2024/11/04] DynaSaur: Large Language Agents Beyond Predefined Actions | [paper] | [code]

  • [2024/11/04] CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments | [paper] | [code]

  • [2024/10/29] ADAM: An Embodied Causal Agent in Open-World Environments | [paper] | [code]

  • [2024/10/27] TrajAgent: An Agent Framework for Unified Trajectory Modelling | [paper] | [code]

  • [2024/10/22] Adsorb-Agent: Autonomous Identification of Stable Adsorption Configurations via Large Language Model Agent | [paper] | [code]

  • [2024/10/11] Encoding Agent Trajectories as Representations with Sequence Transformers | [paper] | [code]

  • [2024/10/10] Agents Thinking Fast and Slow: A Talker-Reasoner Architecture | [paper] | [code]

  • [2024/10/08] AgentSquare: Automatic LLM Agent Search in Modular Design Space | [paper] | [code]

  • [2024/10/08] Applying Refusal-Vector Ablation to Llama 3.1 70B Agents | [paper] | [code]

  • [2024/09/24] MOSS: Enabling Code-Driven Evolution and Context Management for AI Agents | [paper] | [code]

  • [2024/09/19] Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation | [paper] | [code]

  • [2024/09/15] Automatic Control With Human-Like Reasoning: Exploring Language Model Embodied Air Traffic Agents | [paper] | [code]

  • [2024/09/12] Self-Supervised Inference of Agents in Trustless Environments | [paper] | [code]

  • [2024/09/05] From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents | [paper] | [code]

  • [2024/09/05] Rx Strategist: Prescription Verification using LLM Agents System | [paper] | [code]

  • [2024/09/03] AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction | [paper] | [code]

  • [2024/08/26] AgentMove: A Large Language Model based Agentic Framework for Zero-shot Next Location Prediction | [paper] | [code]

  • [2024/08/19] Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation | [paper] | [code]

  • [2024/08/13] Causal Agent based on Large Language Model | [paper] | [code]

  • [2024/08/02] Coalitions of Large Language Models Increase the Robustness of AI Agents | [paper] | [code]

  • [2024/07/27] AgentPeerTalk: Empowering Students through Agentic-AI-Driven Discernment of Bullying and Joking in Peer Interactions in Schools | [paper] | [code]

  • [2024/07/25] Enhancing Agent Learning through World Dynamics Modeling | [paper] | [code]

  • [2024/07/25] RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models | [paper] | [code]

  • [2024/07/16] Preemptive Detection and Correction of Misaligned Actions in LLM Agents | [paper] | [code]

  • [2024/07/15] Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning | [paper] | [code]

  • [2024/07/02] Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents | [paper] | [code]

  • [2024/06/24] OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer | [paper] | [code]

  • [2024/06/07] SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals | [paper] | [code]

  • [2024/05/25] AutoManual: Constructing Instruction Manuals by LLM Agents via Interactive Environmental Learning | [paper] | [code]

  • [2024/05/24] Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models | [paper] | [code]

  • [2024/05/16] Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents | [paper] | [code]

  • [2024/04/30] Large Language Model Agent for Fake News Detection | [paper] | [code]

  • [2024/04/28] Logic Agent: Enhancing Validity with Logic Rule Invocation | [paper] | [code]

  • [2024/04/13] LLMSat: A Large Language Model-Based Goal-Oriented Agent for Autonomous Space Exploration | [paper] | [code]

  • [2024/04/01] TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering | [paper] | [code]

  • [2024/03/29] ITCMA: A Generative Agent Based on a Computational Consciousness Structure | [paper] | [code]

  • [2024/02/25] Bootstrapping Cognitive Agents with a Large Language Model | [paper] | [code]

  • [2024/02/24] Empowering Large Language Model Agents through Action Learning | [paper] | [code]

  • [2024/02/20] Soft Self-Consistency Improves Language Model Agents | [paper] | [code]

  • [2024/02/04] NavHint: Vision and Language Navigation Agent with a Hint Generator | [paper] | [code]

  • [2024/01/05] AFSPP: Agent Framework for Shaping Preference and Personality with Large Language Models | [paper] | [code]

  • [2023/11/23] Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach | [paper] | [code]

  • [2023/11/02] ProAgent: From Robotic Process Automation to Agentic Process Automation | [paper] | [code]

  • [2023/10/16] CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization | [paper] | [code]

  • [2023/09/29] Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency | [paper] | [code]

  • [2023/09/14] Agents: An Open-source Framework for Autonomous Language Agents | [paper] | [code]

  • [2023/09/08] A Versatile Graph Learning Approach through LLM-based Agent | [paper] | [code]

  • [2023/09/05] Cognitive Architectures for Language Agents | [paper] | [code]

  • [2023/05/27] SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks | [paper] | [code]

  • [2023/05/25] Voyager: An Open-Ended Embodied Agent with Large Language Models | [paper] | [code]

Multi-Agent System

  • [2025/02/20] Enhancing Language Multi-Agent Learning with Multi-Agent Credit Re-Assignment for Interactive Environment Generalization | [paper] | [code]

  • [2025/02/20] CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models | [paper] | [code]

  • [2025/02/17] Table-Critic: A Multi-Agent Framework for Collaborative Criticism and Refinement in Table Reasoning | [paper] | [code]

  • [2025/02/17] HARBOR: Exploring Persona Dynamics in Multi-Agent Competition | [paper] | [code]

  • [2025/02/15] Divergent Thoughts toward One Goal: LLM-based Multi-Agent Collaboration System for Electronic Design Automation | [paper] | [code]

  • [2025/02/13] PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology | [paper] | [code]

  • [2025/02/13] Mind the Gaps: Logical English, Prolog, and Multi-agent Systems for Autonomous Vehicles | [paper] | [code]

  • [2025/02/12] Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation | [paper] | [code]

  • [2025/02/12] If Multi-Agent Debate is the Answer, What is the Question? | [paper] | [code]

  • [2025/02/11] Don't Just Demo, Teach Me the Principles: A Principle-Based Multi-Agent Prompting Strategy for Text Classification | [paper] | [code]

  • [2025/02/11] Multi-Agent Collaboration for Multilingual Code Instruction Tuning | [paper] | [code]

  • [2025/02/10] KARMA: Leveraging Multi-Agent LLMs for Automated Knowledge Graph Enrichment | [paper] | [code]

  • [2025/02/09] Preventing Rogue Agents Improves Multi-Agent Collaboration | [paper] | [code]

  • [2025/02/08] CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging | [paper] | [code]

  • [2025/02/08] Multi-Agent Simulator Drives Language Models for Legal Intensive Interaction | [paper] | [code]

  • [2025/02/07] S$^2$-MAD: Breaking the Token Barrier to Enhance Multi-Agent Debate Efficiency | [paper] | [code]

  • [2025/02/06] Multi-Agent Reinforcement Learning with Focal Diversity Optimization | [paper] | [code]

  • [2025/02/06] Enhancing Online Learning Efficiency Through Heterogeneous Resource Integration with a Multi-Agent RAG System | [paper] | [code]

  • [2025/02/06] Multi-agent Architecture Search via Agentic Supernet | [paper] | [code]

  • [2025/02/04] Position: Scaling LLM Agents Requires Asymptotic Analysis with LLM Primitives | [paper] | [code]

  • [2025/02/04] Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies | [paper] | [code]

  • [2025/02/03] PlotGen: Multi-Agent LLM-based Scientific Data Visualization via Multimodal Feedback | [paper] | [code]

  • [2025/02/03] ChartCitor: Multi-Agent Framework for Fine-Grained Chart Visual Attribution | [paper] | [code]

  • [2025/02/02] Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial? | [paper] | [code]

  • [2025/02/02] Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search | [paper] | [code]

  • [2025/01/29] Layered Chain-of-Thought Prompting for Multi-Agent LLM Systems: A Comprehensive Approach to Explainable Large Language Models | [paper] | [code]

  • [2025/01/27] MADP: Multi-Agent Deductive Planning for Enhanced Cognitive-Behavioral Mental Health Question Answer | [paper] | [code]

  • [2025/01/25] Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning | [paper] | [code]

  • [2025/01/24] Multi-agent KTO: Reinforcing Strategic Interactions of Large Language Model in Language Game | [paper] | [code]

  • [2025/01/24] Unmasking Conversational Bias in AI Multiagent Systems | [paper] | [code]

  • [2025/01/22] FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces | [paper] | [code]

  • [2025/01/19] IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems | [paper] | [code]

  • [2025/01/16] AutoCBT: An Autonomous Multi-agent Framework for Cognitive Behavioral Therapy in Psychological Counseling | [paper] | [code]

  • [2025/01/14] Talk to Right Specialists: Routing and Planning in Multi-agent System for Question Answering | [paper] | [code]

  • [2025/01/05] LatteReview: A Multi-Agent Framework for Systematic Review Automation Using Large Language Models | [paper] | [code]

  • [2025/01/02] Harnessing Multi-Agent LLMs for Complex Engineering Problem-Solving: A Framework for Senior Design Projects | [paper] | [code]

  • [2024/12/30] Distributed Mixture-of-Agents for Edge Inference with Large Language Models | [paper] | [code]

  • [2024/12/28] M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation | [paper] | [code]

  • [2024/12/28] Efficient Multi-Agent Collaboration with Tool Use for Online Planning in Complex Table Question Answering | [paper] | [code]

  • [2024/12/24] Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering | [paper] | [code]

  • [2024/12/22] Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration | [paper] | [code]

  • [2024/12/22] A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions via Iterative Refinement and LLM-Driven Feedback Loops | [paper] | [code]

  • [2024/12/20] Mitigating Social Bias in Large Language Models: A Multi-Objective Approach within a Multi-Agent Framework | [paper] | [code]

  • [2024/12/19] PsyDraw: A Multi-Agent Multimodal System for Mental Health Screening in Left-Behind Children | [paper] | [code]

  • [2024/12/18] Gradual Vigilance and Interval Communication: Enhancing Value Alignment in Multi-Agent Debates | [paper] | [code]

  • [2024/12/15] Cultural Palette: Pluralising Culture Alignment via Multi-agent Palette | [paper] | [code]

  • [2024/12/13] AutoPatent: A Multi-Agent Framework for Automatic Patent Generation | [paper] | [code]

  • [2024/12/12] DiverseAgentEntropy: Quantifying Black-Box LLM Uncertainty through Diverse Perspectives and Multi-Agent Interaction | [paper] | [code]

  • [2024/12/11] NAT-NL2GQL: A Novel Multi-Agent Framework for Translating Natural Language to Graph Query Language | [paper] | [code]

  • [2024/12/10] AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework | [paper] | [code]

  • [2024/12/07] SLA Management in Reconfigurable Multi-Agent RAG: A Systems Approach to Question Answering | [paper] | [code]

  • [2024/12/06] Breaking Event Rumor Detection via Stance-Separated Multi-Agent Debate | [paper] | [code]

  • [2024/12/06] Towards Effective GenAI Multi-Agent Collaboration: Design and Evaluation for Enterprise Applications | [paper] | [code]

  • [2024/12/06] Enhancing LLMs for Impression Generation in Radiology Reports through a Multi-Agent System | [paper] | [code]

  • [2024/12/06] TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft | [paper] | [code]

  • [2024/12/05] Educational-Psychological Dialogue Robot Based on Multi-Agent Collaboration | [paper] | [code]

  • [2024/12/01] Multi-Agent Collaboration in Incident Response with Large Language Models | [paper] | [code]

  • [2024/11/28] MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification | [paper] | [code]

  • [2024/11/21] PIORS: Personalized Intelligent Outpatient Reception based on Large Language Model with Multi-Agents Medical Scenario Simulation | [paper] | [code]

  • [2024/11/21] Enhancing LLMs for Power System Simulations: A Feedback-driven Multi-agent Framework | [paper] | [code]

  • [2024/11/18] The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning | [paper] | [code]

  • [2024/11/12] BudgetMLAgent: A Cost-Effective LLM Multi-Agent system for Automating Machine Learning Tasks | [paper] | [code]

  • [2024/11/11] Using Generative AI and Multi-Agents to Provide Automatic Feedback | [paper] | [code]

  • [2024/11/09] Mixture of Knowledge Minigraph Agents for Literature Review Generation | [paper] | [code]

  • [2024/11/05] SAUCE: Synchronous and Asynchronous User-Customizable Environment for Multi-Agent LLM Interaction | [paper] | [code]

  • [2024/11/05] SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents | [paper] | [code]

  • [2024/11/01] DARD: A Multi-Agent Approach for Task-Oriented Dialog Systems | [paper] | [code]

  • [2024/10/30] ACC-Debate: An Actor-Critic Approach to Multi-Agent Debate | [paper] | [code]

  • [2024/10/29] Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning | [paper] | [code]

  • [2024/10/29] MARCO: Multi-Agent Real-time Chat Orchestration | [paper] | [code]

  • [2024/10/28] CRAT: A Multi-Agent Framework for Causality-Enhanced Reflective and Retrieval-Augmented Translation with Large Language Models | [paper] | [code]

  • [2024/10/27] AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions | [paper] | [code]

  • [2024/10/24] Schema-Guided Culture-Aware Complex Event Simulation with Multi-Agent Role-Play | [paper] | [code]

  • [2024/10/23] GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration | [paper] | [code]

  • [2024/10/22] Decoding Time Series with LLMs: A Multi-Agent Framework for Cross-Domain Annotation | [paper] | [code]

  • [2024/10/19] An Electoral Approach to Diversify LLM-based Multi-Agent Collective Decision-Making | [paper] | [code]

  • [2024/10/18] Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation | [paper] | [code]

  • [2024/10/17] AdaSwitch: Adaptive Switching between Small and Large Agents for Effective Cloud-Local Collaborative Learning | [paper] | [code]

  • [2024/10/16] PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking | [paper] | [code]

  • [2024/10/13] LLM-Based Multi-Agent Systems are Scalable Graph Generative Models | [paper] | [code]

  • [2024/10/12] Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System | [paper] | [code]

  • [2024/10/11] JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework | [paper] | [code]

  • [2024/10/11] PEAR: A Robust and Flexible Automation Framework for Ptychography Enabled by Multiple Large Language Model Agents | [paper] | [code]

  • [2024/10/10] AI-Press: A Multi-Agent News Generating and Feedback Simulation System Powered by Large Language Models | [paper] | [code]

  • [2024/10/10] Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining | [paper] | [code]

  • [2024/10/10] Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System | [paper] | [code]

  • [2024/10/10] Prompt Engineering a Schizophrenia Chatbot: Utilizing a Multi-Agent Approach for Enhanced Compliance with Prompt Instructions | [paper] | [code]

  • [2024/10/10] Diversity of Thought Elicits Stronger Reasoning Capabilities in Multi-Agent Debate Frameworks | [paper] | [code]

  • [2024/10/09] Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach | [paper] | [code]

  • [2024/10/07] Adversarial Multi-Agent Evaluation of Large Language Models through Iterative Debates | [paper] | [code]

  • [2024/10/06] MindScope: Exploring cognitive biases in large language models through Multi-Agent Systems | [paper] | [code]

  • [2024/10/03] Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions | [paper] | [code]

  • [2024/10/03] Agents' Room: Narrative Generation through Multi-step Collaboration | [paper] | [code]

  • [2024/10/03] Can Large Language Models Grasp Legal Theories? Enhance Legal Reasoning with Insights from Multi-Agent Collaboration | [paper] | [code]

  • [2024/10/03] ColaCare: Enhancing Electronic Health Record Modeling through Large Language Model-Driven Multi-Agent Collaboration | [paper] | [code]

  • [2024/10/03] AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML | [paper] | [code]

  • [2024/10/02] RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance | [paper] | [code]

  • [2024/10/02] Zodiac: A Cardiologist-Level LLM Framework for Multi-Agent Diagnostics | [paper] | [code]

  • [2024/09/21] Towards Automated Patent Workflows: AI-Orchestrated Multi-Agent Framework for Intellectual Property Management and Analysis | [paper] | [code]

  • [2024/09/21] GroupDebate: Enhancing the Efficiency of Multi-Agent Debate Using Group Discussion | [paper] | [code]

  • [2024/09/20] Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts | [paper] | [code]

  • [2024/09/18] MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning | [paper] | [code]

  • [2024/09/17] The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives | [paper] | [code]

  • [2024/09/16] Instigating Cooperation among LLM Agents Using Adaptive Information Modulation | [paper] | [code]

  • [2024/09/14] Synergistic Simulations: Multi-Agent Problem Solving with Large Language Models | [paper] | [code]

  • [2024/09/12] Knowledge Tagging with Large Language Model based Multi-Agent System | [paper] | [code]

  • [2024/09/11] Propaganda to Hate: A Multimodal Analysis of Arabic Memes with Multi-Agent LLMs | [paper] | [code]

  • [2024/09/09] SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning | [paper] | [code]

  • [2024/09/06] Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets | [paper] | [code]

  • [2024/09/05] xLAM: A Family of Large Action Models to Empower AI Agent Systems | [paper] | [code]

  • [2024/09/02] Co-Learning: Code Learning for Multi-Agent Reinforcement Collaborative Framework with Conversational Natural Language Interfaces | [paper] | [code]

  • [2024/08/28] BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems | [paper] | [code]

  • [2024/08/27] AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems | [paper] | [code]

  • [2024/08/24] Towards Human-Level Understanding of Complex Process Engineering Schematics: A Pedagogical, Introspective Multi-Agent Framework for Open-Domain Question Answering | [paper] | [code]

  • [2024/08/22] MuMA-ToM: Multi-modal Multi-Agent Theory of Mind | [paper] | [code]

  • [2024/08/21] DreamFactory: Pioneering Multi-Scene Long Video Generation with a Multi-Agent Framework | [paper] | [code]

  • [2024/08/16] The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation | [paper] | [code]

  • [2024/08/15] MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL | [paper] | [code]

  • [2024/08/15] Text2BIM: Generating Building Models Using a Large Language Model-based Multi-Agent Framework | [paper] | [code]

  • [2024/08/14] Development of a Large Language Model-based Multi-Agent Clinical Decision Support System for Korean Triage and Acuity Scale (KTAS)-Based Triage and Treatment Planning in Emergency Departments | [paper] | [code]

  • [2024/08/08] Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate | [paper] | [code]

  • [2024/08/05] ReDel: A Toolkit for LLM-Powered Recursive Multi-Agent Systems | [paper] | [code]

  • [2024/08/05] Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information | [paper] | [code]

  • [2024/07/23] LawLuo: A Multi-Agent Collaborative Framework for Multi-Round Chinese Legal Consultation | [paper] | [code]

  • [2024/07/21] Multi-Agent Causal Discovery Using Large Language Models | [paper] | [code]

  • [2024/07/19] NeLLCom-X: A Comprehensive Neural-Agent Framework to Simulate Language Learning and Group Communication | [paper] | [code]

  • [2024/07/17] Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models | [paper] | [code]

  • [2024/07/16] InvAgent: A Large Language Model based Multi-Agent System for Inventory Management in Supply Chains | [paper] | [code]

  • [2024/07/13] Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks | [paper] | [code]

  • [2024/07/13] Cohesive Conversations: Enhancing Authenticity in Multi-Agent Simulated Dialogues | [paper] | [code]

  • [2024/07/10] Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities | [paper] | [code]

  • [2024/07/09] FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making | [paper] | [code]

  • [2024/07/09] Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence | [paper] | [code]

  • [2024/07/04] Solving Zebra Puzzles Using Constraint-Guided Multi-Agent Systems | [paper] | [code]

  • [2024/07/03] MentalAgora: A Gateway to Advanced Personalized Care in Mental Health through Multi-Agent Debating and Attribute Control | [paper] | [code]

  • [2024/06/17] Improving Multi-Agent Debate with Sparse Communication Topology | [paper] | [code]

  • [2024/06/13] Multi-Agent Software Development through Cross-Team Collaboration | [paper] | [code]

  • [2024/06/11] CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation | [paper] | [code]

  • [2024/06/07] Mixture-of-Agents Enhances Large Language Model Capabilities | [paper] | [code]

  • [2024/06/05] Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework | [paper] | [code]

  • [2024/06/04] Chain of Agents: Large Language Models Collaborating on Long-Context Tasks | [paper] | [code]

  • [2024/06/03] Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration | [paper] | [code]

  • [2024/05/30] Safe Multi-agent Reinforcement Learning with Natural Language Constraints | [paper] | [code]

  • [2024/05/23] CityGPT: Towards Urban IoT Learning, Analysis and Interaction with Multi-Agent System | [paper] | [code]

  • [2024/05/20] (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts | [paper] | [code]

  • [2024/05/10] LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play | [paper] | [code]

  • [2024/05/07] Enhancing the Efficiency and Accuracy of Underlying Asset Reviews in Structured Finance: The Application of Multi-agent Framework | [paper] | [code]

  • [2024/05/06] Persona Inconstancy in Multi-Agent LLM Collaboration: Conformity, Confabulation, and Impersonation | [paper] | [code]

  • [2024/05/05] Language Evolution for Evading Social Media Regulation via LLM-based Multi-agent Simulation | [paper] | [code]

  • [2024/04/25] Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents | [paper] | [code]

  • [2024/04/23] ClinicalAgent: Clinical Trial Multi-Agent System with Large Language Model-based Reasoning | [paper] | [code]

  • [2024/04/14] Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation | [paper] | [code]

  • [2024/04/12] Leveraging Multi-AI Agents for Cross-Domain Knowledge Discovery | [paper] | [code]

  • [2024/04/09] Foundation Models to the Rescue: Deadlock Resolution in Connected Multi-Robot Systems | [paper] | [code]

  • [2024/04/08] 360$^\circ$REA: Towards A Reusable Experience Accumulation with 360{\deg} Assessment for Multi-Agent System | [paper] | [code]

  • [2024/04/06] MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems | [paper] | [code]

  • [2024/04/02] Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization | [paper] | [code]

  • [2024/04/02] CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models | [paper] | [code]

  • [2024/03/26] MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution | [paper] | [code]

  • [2024/03/22] CACA Agent: Capability Collaboration based AI Agent | [paper] | [code]

  • [2024/03/21] Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering | [paper] | [code]

  • [2024/03/19] Embodied LLM Agents Learn to Cooperate in Organized Teams | [paper] | [code]

  • [2024/03/12] Transforming Competition into Collaboration: The Revolutionary Role of Multi-Agent Systems and Language Models in Modern Organizations | [paper] | [code]

  • [2024/03/02] AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks | [paper] | [code]

  • [2024/02/28] Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key? | [paper] | [code]

  • [2024/02/26] Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering | [paper] | [code]

  • [2024/02/26] LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments | [paper] | [code]

  • [2024/02/21] LLM Based Multi-Agent Generation of Semi-structured Documents from Semantic Templates in the Public Administration Domain | [paper] | [code]

  • [2024/02/18] Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation | [paper] | [code]

  • [2024/02/18] LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration | [paper] | [code]

  • [2024/02/15] TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation | [paper] | [code]

  • [2024/02/03] More Agents Is All You Need | [paper] | [code]

  • [2024/02/02] Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions | [paper] | [code]

  • [2024/02/02] A Multi-Agent Conversational Recommender System | [paper] | [code]

  • [2024/01/11] Combating Adversarial Attacks with Multi-Agent Debate | [paper] | [code]

  • [2024/01/08] MARG: Multi-Agent Review Generation for Scientific Papers | [paper] | [code]

  • [2024/01/08] SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems | [paper] | [code]

  • [2024/01/08] Why Solving Multi-agent Path Finding with Large Language Model has not Succeeded Yet | [paper] | [code]

  • [2023/12/20] AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation | [paper] | [code]

  • [2023/10/31] Multi-Agent Consensus Seeking via Large Language Models | [paper] | [code]

  • [2023/10/25] MultiPrompter: Cooperative Prompt Optimization with Multi-Agent Reinforcement Learning | [paper] | [code]

  • [2023/08/22] ProAgent: Building Proactive Cooperative Agents with Large Language Models | [paper] | [code]

  • [2023/08/21] AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors | [paper] | [code]

  • [2023/08/14] ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate | [paper] | [code]

  • [2023/08/01] MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework | [paper] | [code]

  • [2023/06/05] Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents | [paper] | [code]

  • [2023/05/31] Recursive Metropolis-Hastings Naming Game: Symbol Emergence in a Multi-agent System based on Probabilistic Generative Models | [paper] | [code]

  • [2023/05/30] Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate | [paper] | [code]

  • [2023/04/26] Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models | [paper] | [code]

  • [2023/04/24] ChatLLM Network: More brains, More intelligence | [paper] | [code]

Stability

Safety

  • [2025/02/20] CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models | [paper] | [code]

  • [2025/02/18] AEIA-MN: Evaluating the Robustness of Multimodal LLM-Powered Mobile Agents Against Active Environmental Injection Attacks | [paper] | [code]

  • [2025/02/17] "Nuclear Deployed!": Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents | [paper] | [code]

  • [2025/02/01] ALU: Agentic LLM Unlearning | [paper] | [code]

  • [2025/01/28] Context is Key for Agent Security | [paper] | [code]

  • [2024/12/21] The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents | [paper] | [code]

  • [2024/12/16] Seeker: Towards Exception Safety Code Generation with Intermediate Language Agents Framework | [paper] | [code]

  • [2024/12/09] The Fusion of Large Language Models and Formal Methods for Trustworthy AI Agents: A Roadmap | [paper] | [code]

  • [2024/11/08] Towards Low-Resource Harmful Meme Detection with LMM Agents | [paper] | [code]

  • [2024/11/06] MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue | [paper] | [code]

  • [2024/11/04] Attacking Vision-Language Computer Agents via Pop-ups | [paper] | [code]

  • [2024/10/22] AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents | [paper] | [code]

  • [2024/10/18] Coherence-Driven Multimodal Safety Dialogue with Active Learning for Embodied Agents | [paper] | [code]

  • [2024/10/11] AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents | [paper] | [code]

  • [2024/10/09] I Want to Break Free! Persuasion and Anti-Social Behavior of LLMs in Multi-Agent Settings with Social Hierarchy | [paper] | [code]

  • [2024/09/28] SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models | [paper] | [code]

  • [2024/09/17] EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage | [paper] | [code]

  • [2024/09/13] AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents | [paper] | [code]

  • [2024/08/20] Athena: Safe Autonomous Agents with Verbal Contrastive Learning | [paper] | [code]

  • [2024/08/05] Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions | [paper] | [code]

  • [2024/07/23] RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent | [paper] | [code]

  • [2024/06/05] BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents | [paper] | [code]

  • [2024/05/30] Safe Multi-agent Reinforcement Learning with Natural Language Constraints | [paper] | [code]

  • [2024/05/24] Hacc-Man: An Arcade Game for Jailbreaking LLMs | [paper] | [code]

  • [2024/03/02] AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks | [paper] | [code]

  • [2024/02/17] Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents | [paper] | [code]

  • [2024/02/16] ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages | [paper] | [code]

  • [2024/02/02] TrustAgent: Towards Safe and Trustworthy LLM-based Agents | [paper] | [code]

  • [2024/01/11] Combating Adversarial Attacks with Multi-Agent Debate | [paper] | [code]

  • [2023/11/17] Testing Language Model Agents Safely in the Wild | [paper] | [code]

Bias

  • [2025/01/29] Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models | [paper] | [code]

  • [2025/01/24] Unmasking Conversational Bias in AI Multiagent Systems | [paper] | [code]

  • [2024/12/20] Mitigating Social Bias in Large Language Models: A Multi-Objective Approach within a Multi-Agent Framework | [paper] | [code]

  • [2024/11/12] Mitigating Bias in Queer Representation within Large Language Models: A Collaborative Agent Approach | [paper] | [code]

  • [2024/10/06] MindScope: Exploring cognitive biases in large language models through Multi-Agent Systems | [paper] | [code]

  • [2024/10/03] Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions | [paper] | [code]

  • [2024/05/23] ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation | [paper] | [code]

  • [2024/04/23] Aligning LLM Agents by Learning Latent Preference from User Edits | [paper] | [code]

  • [2024/02/19] Polarization of Autonomous Generative AI Agents Under Echo Chambers | [paper] | [code]

  • [2024/02/14] Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications | [paper] | [code]

  • [2024/01/09] Agent Alignment in Evolving Social Norms | [paper] | [code]

Hallucination

  • [2025/02/14] Automated Hypothesis Validation with Agentic Sequential Falsifications | [paper] | [code]

  • [2025/02/04] Position: Stop Acting Like Language Model Agents Are Normal Agents | [paper] | [code]

  • [2025/02/03] SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models | [paper] | [code]

  • [2025/01/19] Hallucination Mitigation using Agentic AI Natural Language-Based Frameworks | [paper] | [code]

  • [2024/11/25] Enhancing Multi-Agent Consensus through Third-Party LLM Integration: Analyzing Uncertainty and Mitigating Hallucinations in Large Language Models | [paper] | [code]

  • [2024/11/12] SHARP: Unlocking Interactive Hallucination via Stance Transfer in Role-Playing Agents | [paper] | [code]

  • [2024/07/08] DebUnc: Mitigating Hallucinations in Large Language Model Agent Communication with Uncertainty Estimations | [paper] | [code]

  • [2024/06/29] BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science | [paper] | [code]

  • [2024/06/17] Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector | [paper] | [code]

  • [2024/06/05] Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework | [paper] | [code]

  • [2024/05/28] TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models | [paper] | [code]

  • [2024/02/13] Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast | [paper] | [code]

Infrastructure

Benchmark&Evaluation

  • [2025/02/20] MLGym: A New Framework and Benchmark for Advancing AI Research Agents | [paper] | [code]

  • [2025/02/19] DataSciBench: An LLM Agent Benchmark for Data Science | [paper] | [code]

  • [2025/02/13] EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents | [paper] | [code]

  • [2025/02/07] Evaluating Personality Traits in Large Language Models: Insights from Psychological Questionnaires | [paper] | [code]

  • [2025/02/06] Robotouille: An Asynchronous Planning Benchmark for LLM Agents | [paper] | [code]

  • [2025/02/01] Who's the MVP? A Game-Theoretic Evaluation Benchmark for Modular Attribution in LLM Agents | [paper] | [code]

  • [2025/01/21] EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents | [paper] | [code]

  • [2024/12/23] LegalAgentBench: Evaluating LLM Agents in Legal Domain | [paper] | [code]

  • [2024/12/19] Agent-SafetyBench: Evaluating the Safety of LLM Agents | [paper] | [code]

  • [2024/12/18] TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks | [paper] | [code]

  • [2024/12/18] ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning | [paper] | [code]

  • [2024/12/06] TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft | [paper] | [code]

  • [2024/12/02] Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking | [paper] | [code]

  • [2024/11/05] Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent | [paper] | [code]

  • [2024/10/28] Can Machines Think Like Humans? A Behavioral Evaluation of LLM-Agents in Dictator Games | [paper] | [code]

  • [2024/10/25] AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios | [paper] | [code]

  • [2024/10/25] AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs | [paper] | [code]

  • [2024/10/23] MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control | [paper] | [code]

  • [2024/10/16] Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance | [paper] | [code]

  • [2024/10/15] Revisiting Benchmark and Assessment: An Agent-based Exploratory Dynamic Evaluation Framework for LLMs | [paper] | [code]

  • [2024/10/11] JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework | [paper] | [code]

  • [2024/10/11] AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents | [paper] | [code]

  • [2024/10/10] Benchmarking Agentic Workflow Generation | [paper] | [code]

  • [2024/10/09] MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering | [paper] | [code]

  • [2024/10/09] Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making | [paper] | [code]

  • [2024/10/09] DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models | [paper] | [code]

  • [2024/10/07] Adversarial Multi-Agent Evaluation of Large Language Models through Iterative Debates | [paper] | [code]

  • [2024/10/07] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery | [paper] | [code]

  • [2024/09/23] Towards a Realistic Long-Term Benchmark for Open-Web Research Agents | [paper] | [code]

  • [2024/09/17] CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark | [paper] | [code]

  • [2024/09/12] DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? | [paper] | [code]

  • [2024/09/11] SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories | [paper] | [code]

  • [2024/09/02] ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems | [paper] | [code]

  • [2024/08/28] BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems | [paper] | [code]

  • [2024/08/19] BLADE: Benchmarking Language Model Agents for Data-Driven Science | [paper] | [code]

  • [2024/08/13] What should I wear to a party in a Greek taverna? Evaluation for Conversational Agents in the Fashion Domain | [paper] | [code]

  • [2024/08/12] VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents | [paper] | [code]

  • [2024/07/26] OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation | [paper] | [code]

  • [2024/07/26] AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents | [paper] | [code]

  • [2024/07/25] PersonaGym: Evaluating Persona Agents and LLMs | [paper] | [code]

  • [2024/07/23] AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game | [paper] | [code]

  • [2024/07/22] AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? | [paper] | [code]

  • [2024/07/12] IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents | [paper] | [code]

  • [2024/07/11] GTA: A Benchmark for General Tool Agents | [paper] | [code]

  • [2024/07/05] Towards Automated Functional Equation Proving: A Benchmark Dataset and A Domain-Specific In-Context Agent | [paper] | [code]

  • [2024/07/01] MIRAI: Evaluating LLM Agents for Event Forecasting | [paper] | [code]

  • [2024/07/01] ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions | [paper] | [code]

  • [2024/07/01] Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents | [paper] | [code]

  • [2024/06/28] Designing and Evaluating Multi-Chatbot Interface for Human-AI Communication: Preliminary Findings from a Persuasion Task | [paper] | [code]

  • [2024/06/13] ResearchArena: Benchmarking Large Language Models' Ability to Collect and Organize Information as Research Agents | [paper] | [code]

  • [2024/06/13] StreamBench: Towards Benchmarking Continuous Improvement of Language Agents | [paper] | [code]

  • [2024/06/07] WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild | [paper] | [code]

  • [2024/06/07] GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents | [paper] | [code]

  • [2024/05/28] TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models | [paper] | [code]

  • [2024/05/23] AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents | [paper] | [code]

  • [2024/05/13] AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments | [paper] | [code]

  • [2024/05/01] WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting | [paper] | [code]

  • [2024/04/23] Evaluating Tool-Augmented Agents in Remote Sensing Platforms | [paper] | [code]

  • [2024/04/22] How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO | [paper] | [code]

  • [2024/04/15] MMInA: Benchmarking Multihop Multimodal Internet Agents | [paper] | [code]

  • [2024/04/11] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments | [paper] | [code]

  • [2024/04/09] AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents | [paper] | [code]

  • [2024/04/05] GroundCocoa: A Benchmark for Evaluating Compositional & Conditional Reasoning in Language Models | [paper] | [code]

  • [2024/03/29] DataAgent: Evaluating Large Language Models' Ability to Answer Zero-Shot, Natural Language Queries | [paper] | [code]

  • [2024/03/26] Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies | [paper] | [code]

  • [2024/03/20] SocialBench: Sociality Evaluation of Role-Playing Conversational Agents | [paper] | [code]

  • [2024/03/18] How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments | [paper] | [code]

  • [2024/03/18] Tur[k]ingBench: A Challenge Benchmark for Web Agents | [paper] | [code]

  • [2024/03/13] Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation | [paper] | [code]

  • [2024/03/05] InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents | [paper] | [code]

  • [2024/02/27] Evaluating Very Long-Term Conversational Memory of LLM Agents | [paper] | [code]

  • [2024/02/27] Benchmarking Data Science Agents | [paper] | [code]

  • [2024/02/19] A Critical Evaluation of AI Feedback for Aligning Large Language Models | [paper] | [code]

  • [2024/02/18] Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation | [paper] | [code]

  • [2024/02/18] MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization | [paper] | [code]

  • [2024/02/05] LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models | [paper] | [code]

  • [2024/02/02] TravelPlanner: A Benchmark for Real-World Planning with Language Agents | [paper] | [code]

  • [2024/01/02] CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation | [paper] | [code]

  • [2023/12/28] How Far Are LLMs from Believable AI? A Benchmark for Evaluating the Believability of Human Behavior Simulation | [paper] | [code]

  • [2023/12/26] RoleEval: A Bilingual Role Evaluation Benchmark for Large Language Models | [paper] | [code]

  • [2023/11/16] ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code | [paper] | [code]

  • [2023/11/15] ToolTalk: Evaluating Tool-Usage in a Conversational Setting | [paper] | [code]

  • [2023/10/24] FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions | [paper] | [code]

  • [2023/10/09] Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena | [paper] | [code]

  • [2023/10/02] SmartPlay: A Benchmark for LLMs as Intelligent Agents | [paper] | [code]

  • [2023/10/01] RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models | [paper] | [code]

  • [2023/08/11] BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents | [paper] | [code]

  • [2023/08/07] AgentBench: Evaluating LLMs as Agents | [paper] | [code]

  • [2023/04/27] ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time | [paper] | [code]

Environment&Platform

  • [2025/02/14] The Ann Arbor Architecture for Agent-Oriented Programming | [paper] | [code]

  • [2024/12/30] Training Software Engineering Agents and Verifiers with SWE-Gym | [paper] | [code]

  • [2024/11/05] SAUCE: Synchronous and Asynchronous User-Customizable Environment for Multi-Agent LLM Interaction | [paper] | [code]

  • [2024/08/09] AutoGen Studio: A No-Code Developer Tool for Building and Debugging Multi-Agent Systems | [paper] | [code]

  • [2024/08/06] OpenOmni: A Collaborative Open Source Tool for Building Future-Ready Multimodal Conversational Agents | [paper] | [code]

  • [2024/07/23] OpenHands: An Open Platform for AI Software Developers as Generalist Agents | [paper] | [code]

  • [2024/07/14] AutoGRAMS: Autonomous Graphical Agent Modeling Software | [paper] | [code]

  • [2024/07/12] IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents | [paper] | [code]

  • [2024/07/08] Coding Reliable LLM-based Integrated Task and Knowledge Agents with GenieWorksheets | [paper] | [code]

  • [2024/06/06] AgentGym: Evolving Large Language Model-based Agents across Diverse Environments | [paper] | [code]

  • [2024/05/23] AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents | [paper] | [code]

  • [2024/02/27] OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web | [paper] | [code]

  • [2023/03/14] CB2: Collaborative Natural Language Interaction Research Platform | [paper] | [code]

Dataset

  • [2025/02/09] MTPChat: A Multimodal Time-Aware Persona Dataset for Conversational Agents | [paper] | [code]

  • [2025/02/09] HamRaz: A Culture-Based Persian Conversation Dataset for Person-Centered Therapy Using LLM Agents | [paper] | [code]

  • [2025/01/23] Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents | [paper] | [code]

  • [2025/01/14] Agent-Centric Projection of Prompting Techniques and Implications for Synthetic Training Data for Large Language Models | [paper] | [code]

  • [2024/12/30] Plancraft: an evaluation dataset for planning with LLM agents | [paper] | [code]

  • [2024/12/28] BaiJia: A Large-Scale Role-Playing Agent Corpus of Chinese Historical Characters | [paper] | [code]

  • [2024/12/24] Explainable Multi-Modal Data Exploration in Natural Language via LLM Agent | [paper] | [code]

  • [2024/12/06] CALICO: Conversational Agent Localization via Synthetic Data Generation | [paper] | [code]

  • [2024/11/28] MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification | [paper] | [code]

  • [2024/11/21] Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning | [paper] | [code]

  • [2024/10/18] Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation | [paper] | [code]

  • [2024/10/10] AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories | [paper] | [code]

  • [2024/09/06] Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets | [paper] | [code]

  • [2024/08/22] MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symbolic LLM Agents | [paper] | [code]

  • [2024/08/16] The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation | [paper] | [code]

  • [2024/07/12] IDAT: A Multi-Modal Dataset and Toolkit for Building and Evaluating Interactive Task-Solving Agents | [paper] | [code]

  • [2024/06/16] GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents | [paper] | [code]

  • [2024/03/19] Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models | [paper] | [code]

  • [2024/02/27] OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web | [paper] | [code]

  • [2023/07/31] HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution | [paper] | [code]

Others

  • [2025/02/20] Optimizing Model Selection for Compound AI Systems | [paper] | [code]

  • [2024/12/03] Large Multimodal Agents for Accurate Phishing Detection with Enhanced Token Optimization and Cost Reduction | [paper] | [code]

  • [2024/03/18] EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents | [paper] | [code]


⭐ Star History

Star History Chart

Releases

No releases published

Packages

No packages published

Languages