Skip to content

About Awesome things towards foundation agents. Papers / Repos / Blogs / ...

License

Notifications You must be signed in to change notification settings

FoundationAgents/awesome-foundation-agents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 

Repository files navigation

Awesome-Foundation-Agents

PR Welcome License: MIT Awesome Arxiv

We maintain a curated collection of papers exploring the path towards Foundation Agents, with a focus on formulating the core concepts and navigating the research landscape.

⌛️ Coming soon: Version 2! We're continuously compiling and updating cutting-edge insights. Feel free to suggest any related work you find valuable!

Our Works Towards Foundation Agents

✨✨✨ Advances and Challenges in Foundation Agents (Paper)

The key of human brain. The Framework of Foundation Agent

Awesome Papers

Table of Contents

Core Components of Intelligent Agents

Cognition

Cognition System

Learning

Space

Full
  • Add SFT,RLHF,PEFT
  • ReFT: Reasoning with Reinforced Fine-Tuning, arxiv 2024, [paper] [code]
  • Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning [paper] [code]
  • R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning, arxiv 2025, [paper] [code]
Partial
  • Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei et al. 2022, [paper] [code]
  • Voyager: An Open-Ended Embodied Agent with Large Language Models, arxiv 2023, [paper] [code]
  • Reflexion: Language Agents with Verbal Reinforcement Learning, NeurIPS 2023, [paper] [code]
  • ReAct meets ActRe: Autonomous Annotations of Agent Trajectories for Contrastive Self-Training, arxiv 2024, [paper] [code]
  • Generative Agents: Interactive Simulacra of Human Behavior, ACM UIST 2023, [paper] [code]

Objective

Perception
  • CLIP: Learning Transferable Visual Models from Natural Language Supervision, ICML 2021, [paper] [code]
  • LLaVA: Visual Instruction Tuning, NeurIPS 2023, [paper] [code]
  • CogVLM: Visual Expert for Pretrained Language Models, NeurIPS 2025, [paper] [code]
  • Qwen2-Audio Technical Report, arxiv 2024, [paper] [code]
  • Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning, arxiv 2025, [paper] [code]
Reasoning
  • SKY-T1: Train Your Own o1 Preview Model Within $450, 2025, [paper] [code]
  • Open Thoughts, 2025, [paper] [code]
  • LIMO: Less is More for Reasoning, arxiv 2025, [paper] [code]
  • STaR: Bootstrapping Reasoning with Reasoning, arxiv 2022, [paper] [code]
  • ReST: Reinforced Self-Training for Language Modeling, arxiv 2023, [paper] [code]
  • OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models, arxiv 2024, [paper] [code]
  • LLaMA-Berry: Pairwise Optimization for o1-like Olympiad-level Mathematical Reasoning, arxiv 2024, [paper] [code]
  • RAGEN: Training Agents by Reinforcing Reasoning, arxiv 2025, [paper] [code]
  • Open-R1, 2024, [paper] [code]
World
  • Inner Monologue: Embodied Reasoning through Planning with Language Models, CoRL 2023, [paper] [code]
  • Self-Refine: Iterative Refinement with Self-Feedback, NeurIPS 2024, [paper] [code]
  • Reflexion: Language Agents with Verbal Reinforcement Learning, NeurIPS 2023, [paper] [code]
  • ExpeL: LLM Agents Are Experiential Learners, AAAI 2024, [paper] [code]
  • AutoManual: Generating Instruction Manuals by LLM Agents via Interactive Environmental Learning, arxiv 2024, [paper] [code]
  • ReAct meets ActRe: Autonomous Annotations of Agent Trajectories for Contrastive Self-Training, arxiv 2024, [paper] [code]

Reasoning

Structured

Dynamic
  • ReAct: Synergizing Reasoning and Acting in Language Models, arxiv 2022, [paper] [code]
  • Markov Chain of Thought for Efficient Mathematical Reasoning, arxiv 2024, [paper] [code]
  • Tree of Thoughts: Deliberate Problem Solving with Large Language Models, NeurIPS 2023, [paper] [code]
  • Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models, ICML 2024, [paper] [code]
  • Reasoning via Planning (RAP): Improving Language Models with World Models, EMNLP 2023, [paper] [code]
  • Graph of Thoughts: Solving Elaborate Problems with Large Language Models, AAAI 2023, [paper] [code]
  • Path of Thoughts: Extracting and Following Paths for Robust Relational Reasoning with Large Language Models, arxiv 2024, [paper] [code]
  • On the Diagram of Thought, arxiv 2024, [paper] [code]
Static
  • Self-Consistency Improves Chain of Thought Reasoning in Language Models, ICLR 2023, [paper] [code]
  • Self-Refine: Iterative Refinement with Self-Feedback, NeurIPS 2024, [paper] [code]
  • Progressive-Hint Prompting Improves Reasoning in Large Language Models, arxiv 2023, [paper] [code]
  • On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks, arxiv 2024, [paper] [code]
  • Chain-of-Verification Reduces Hallucination in Large Language Models, ICLR 2024 Workshop, [paper] [code]
Domain
  • MathPrompter: Mathematical Reasoning Using Large Language Models, ACL 2023, [paper] [code]
  • LLMs Can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought, arxiv 2024, [paper] [code]
  • Physics Reasoner: Knowledge-Augmented Reasoning for Solving Physics Problems with Large Language Models, COLING 2025, [paper] [code]

Unstructured

Prompt
  • Chain of Thought Prompting Elicits Reasoning in Large Language Models, NeurIPS 2022, [paper] [code]
  • Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models, ICLR 2024, [paper] [code]
  • Ask Me Anything: A Simple Strategy for Prompting Language Models, arxiv 2022, [paper] [code]
  • Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources, arxiv 2023, [paper] [code]
  • Self-Explained Keywords Empower Large Language Models for Code Generation, arxiv 2024, [paper] [code]
Model
  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, arxiv 2025, [paper] [code]
  • Claude 3.7 Sonnet, 2025, [paper] [code]
  • OpenAI o1 System Card, arxiv 2024, [paper] [code]
Implicit
  • Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking, arxiv 2024, [paper] [code]
  • Chain of Continuous Thought (Coconut): Training Large Language Models to Reason in a Continuous Latent Space, arxiv 2024, [paper] [code]

Planning

  • Describe, Explain, Plan and Select (DEPS): Interactive Planning with Large Language Models, arxiv 2023, [paper] [code]
  • ProgPrompt: Generating Situated Robot Task Plans Using Large Language Models, ICRA 2023, [paper] [code]
  • ADAPT: As-Needed Decomposition and Planning with Language Models, arxiv 2023, [paper] [code]
  • Tree of Thoughts: Deliberate Problem Solving with Large Language Models, NeurIPS 2023, [paper] [code]
  • Reasoning via Planning (RAP): Improving Language Models with World Models, EMNLP 2023, [paper] [code]
  • TravelPlanner: A Benchmark for Real-World Planning with Language Agents, ICML 2024, [paper] [code]
  • PDDL—The Planning Domain Definition Language, 1998, [paper] [code]
  • Mind2Web: Towards a Generalist Agent for the Web, NeurIPS 2023, [paper] [code]

Memory

Memory in Intelligence Agents

1. Representation

1.1 Sensory

1.1.1 Text-based
  • RecAgent (Wang et al., 2023)
  • CoPS (Zhou et al., 2024)
  • MemoryBank (Zhong et al., 2024)
  • Memory Sandbox (Huang et al., 2023)
1.1.2 Multi-modal
  • VideoAgent (Fan et al., 2024)
  • WorldGPT (Ge et al., 2024)
  • Agent S (Agashe et al., 2024)
  • OS-Copilot (Wu et al., 2024)
  • MuLan (Li et al., 2024)

1.2 Short-term

1.2.1 Context
  • MemGPT (Packer et al., 2023)
  • KARMA (Wang et al., 2024)
  • LSFS (Shi et al., 2024)
  • OSCAR (Wang et al., 2024)
  • RCI (Geunwoo et al., 2023)
1.2.2 Working
  • Generative Agent (Park et al., 2023)
  • RLP (Fischer et al., 2023)
  • CALYPSO (Zhu et al., 2023)
  • HiAgent (Hu et al., 2024)

1.3 Long-term

1.3.1 Semantic
  • AriGraph (Anokhin et al., 2024)
  • RecAgent (Wang et al., 2023)
  • HippoRAG (Gutierrez et al., 2024)
1.3.2 Episodic
  • MobileGPT (Lee et al., 2023)
  • MemoryBank (Zhong et al., 2024)
  • Episodic Verbalization (Barmann et al., 2024)
  • MrSteve (Park et al., 2024)
1.3.3 Procedural
  • AAG (Roth et al., 2024)
  • Cradle (Tan et al., 2024)
  • JARVIS-1 (Wang et al., 2024)
  • LARP (Yan et al., 2023)

2. Lifecycle

2.1 Acquisition

2.1.1 Information Compression
  • HiAgent (Hu et al., 2024)
  • LMAgent (Liu et al., 2024)
  • ReadAgent (Lee et al., 2024)
  • M²WF (Wang et al., 2025)
2.1.2 Experience Consolidation
  • ExpeL (Zhao et al., 2024)
  • MindOS (Hu et al., 2025)
  • Vanschoren et al. (2018)
  • Hou et al. (2024)

2.2 Encoding

2.2.1 Selective Attention
  • AgentCorrd (Pan et al., 2024)
  • MS (Gao et al., 2024)
  • GraphVideoAgent (Chu et al., 2025)
  • A-MEM (Xu et al., 2025)
  • Ali et al. (2024)
2.2.2 Multi-modal Fusion
  • Optimus-1 (Li et al., 2024)
  • Optimus-2 (Li et al., 2025)
  • JARVIS-1 (Wang et al., 2024)

2.3 Derivation

2.3.1 Reflection
  • Agent S (Agashe et al., 2024)
  • OSCAR (Wang et al., 2024)
  • R2D2 (Huang et al., 2025)
  • Mobile-Agent-E (Wang et al., 2025)
2.3.2 Summarization
  • SummEdits (Laban et al., 2023)
  • SCM (Wang et al., 2023)
  • Healthcare Copilot (Ren et al., 2024)
  • Wang et al. (2023)
2.3.3 Knowledge Distillation
  • Knowagent (Zhu et al., 2024)
  • AoTD (Shi et al., 2024)
  • LDPD (Liu et al., 2024)
  • Sub-goal Distillation (Hashemzadeh et al., 2024)
  • MAGDi (Chen et al., 2024)
2.3.4 Selective Forgetting
  • Lyfe Agent (Kaiya et al., 2023)
  • TiM (Liu et al., 2023)
  • MemoryBank (Zhong et al., 2024)
  • S³ (Gao et al., 2023)
  • Hou et al. (2024)

2.4 Retrieval

2.4.1 Indexing
  • HippoRAG (Gutierrez et al., 2024)
  • TradingGPT (Li et al., 2023)
  • LongMemEval (Wu et al., 2024)
  • SeCom (Pan et al., 2025)
2.4.2 Matching
  • Product Keys (Lample et al., 2019)
  • OSAgent (Xu et al., 2024)
  • Bahdanau et al. (2014)
  • Hou et al. (2024)

2.5 Neural Memory

2.5.1 Associative Memory
  • Hopfield Networks (Demircigil et al., 2017; Ramsauer et al., 2020)
  • Neural Turing Machines (Falcon et al., 2022)
2.5.2 Parameter Integration
  • MemoryLLM (Wang et al., 2024)
  • SELF-PARAM (Wang et al., 2024)
  • MemoRAG (Qian et al., 2024)
  • TTT-Layer (Sun et al., 2024)
  • Titans (Behrouz et al., 2024)
  • R³Mem (Wang et al., 2025)

2.6 Utilization

2.6.1 RAG
  • RAGLAB (Zhang et al., 2024)
  • Adaptive Retrieval (Mallen et al., 2023)
  • Atlas (Farahani et al., 2024)
  • Yuan et al. (2025)
2.6.2 Long-context Modeling
  • RMT (Bulatov et al., 2022, 2023)
  • AutoCompressor (Chevalier et al., 2023)
  • ICAE (Ge et al., 2023)
  • Gist (Mu et al., 2024)
  • CompAct (Yoon et al., 2024)
2.6.3 Alleviating Hallucination
  • Lamini (Li et al., 2024)
  • Memoria (Park et al., 2023)
  • PEER (He et al., 2024)
  • Ding et al. (2024)

Perception

Perception System

Unimodal Models

Text

  • BERT (Devlin et al., 2018)
  • RoBERTa (Liu et al., 2019)
  • ALBERT (Lan et al., 2019)

Image

  • ResNet (He et al., 2016)
  • DETR (Carion et al., 2020)
  • Grounding DINO 1.5 (Ren et al., 2024)

Video

  • ViViT (Arnab et al., 2021)
  • VideoMAE (Tong et al., 2022)

Audio

  • FastSpeech 2 (Ren et al., 2020)
  • Seamless (Barrault et al., 2023)
  • wav2vec 2.0 (Baevski et al., 2020)

Other Unimodal

  • Visual ChatGPT (Wu et al., 2023)
  • HuggingGPT (Shen et al., 2024)
  • MM-REACT (Yang et al., 2023)
  • ViperGPT (Suris et al., 2023)
  • AudioGPT (Huang et al., 2024)
  • LLaVA-Plus (Liu et al., 2025)

Cross-modal Models

Text-Image

  • CLIP (Alec et al., 2021)
  • ALIGN (Jia et al., 2021)
  • DALL·E 3 (Betker et al., 2023)
  • VisualBERT (Li et al., 2019)

Text-Video

  • VideoCLIP (Xu et al., 2021)
  • Phenaki (Villegas et al., 2022)
  • Make-A-Video (Singer et al., 2022)

Text-Audio

  • Wav2CLIP (Wu et al., 2022)
  • VATT (Akbari et al., 2021)
  • AudioCLIP (Guzhov et al., 2022)

Other Cross-modal

  • CLIP-Forge (Sanghi et al., 2022)
  • Point-E (Nichol et al., 2022)

MultiModal Models

VLM (Vision-Language Models)

  • MiniGPT-v2 (Chen et al., 2023)
  • LLaVA-NeXT (Liu et al., 2024)
  • CogVLM2 (Hong et al., 2024)
  • Qwen2-VL (Wang et al., 2024)
  • Emu2 (Sun et al., 2024)
Edge-Side VLM
  • TinyGPT-V (Yuan et al., 2023)
  • MobileVLM (Chu et al., 2023)
  • MiniCPM-V (Yao et al., 2024)
  • OmniParser (Lu et al., 2024)

VLA (Vision-Language for Action)

  • CLIPort (Shridhar et al., 2022)
  • RT-1 (Brohan et al., 2022)
  • MOO (Stone et al., 2023)
  • PerAct (Shridhar et al., 2023)
  • Diffusion Policy (Chi et al., 2023)
  • PaLM-E (Driess et al., 2023)
  • MultiPLY (Hong et al., 2024)

ALM (Audio-Language Models)

  • Audio Flamingo (Kong et al., 2024)
  • SpeechVerse (Das et al., 2024)
  • UniAudio 1.5 (Yang et al., 2024)
  • Qwen2-Audio (Chu et al., 2024)
  • Audio-LLM (Li et al., 2024)
  • Mini-Omni (Xie et al., 2024)
  • SpeechGPT (Zhang et al., 2023)

AVLM (Audio-Visual-Language Models)

  • ONE-PEACE (Wang et al., 2023)
  • PandaGPT (Su et al., 2023)
  • Macaw-LLM (Lyu et al., 2023)
  • LanguageBind (Zhu et al., 2023)
  • UnIVAL (Shukor et al., 2023)
  • X-LLM (Chen et al., 2023)

Other MultiModal

  • PointLLM (Xu et al., 2025)
  • MiniGPT-3D (Tang et al., 2024)
  • NExT-GPT (Wu et al., 2023)
  • Unified-IO 2 (Lu et al., 2024)
  • CoDi-2 (Tang et al., 2024)
  • ModaVerse (Wang et al., 2024)

World Model

World Model in Foundation Agents

External Approaches

DINO-WM [358]: Video World Models on Pre-trained Visual Features Enable Zero-Shot Planning, arxiv 2024, [paper], [[code][]]

SAPIEN [351]: A Simulated Part-based Interactive Environment, CVPR 2020, [paper], [[code][]]

MuZero [349]: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model, Nature 2020, [paper], [[code][]]

GR-2 [357]: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation, arxiv 2024, [paper], [[code][]]

COAT [356]: Discovery of the Hidden World with Large Language Models, arxiv 2024, [paper], [[code][]]

AutoManual [108]: Generating Instruction Manuals by LLM Agents via Interactive Environmental Learning, arxiv 2024, [paper], [[code][]]

PILCO [355]: A Model-Based and Data-Efficient Approach to Policy Search, ICML 2011, [paper], [[code][]]

Internal Approaches

ActRe [49]: ReAct meets ActRe: Autonomous Annotations of Agent Trajectories for Contrastive Self-Training, arxiv 2024, [paper], [[code][]]

World Models [348]: World Models, NeurIPS 2018, [paper], [[code][]]

Dreamer [350]: Dream to Control: Learning Behaviors by Latent Imagination, ICLR 2020, [paper], [[code][]]

Diffusion WM [353]: Diffusion for World Modeling: Visual Details Matter in Atari, arxiv 2024, [paper], [[code][]]

GQN [354]: Neural Scene Representation and Rendering, Science 2018, [paper], [[code][]]

Daydreamer [352]: World Models for Physical Robot Learning, CoRL 2023, [paper], [[code][]]

Action

The action.

Action Space:

Language

Text

  • ReAct: Synergizing Reasoning and Acting in Language Models, ICLR 2023, [paper] [code]

  • AutoGPT: Build, Deploy, and Run AI Agents, Github, [code]

  • Reflexion: Language Agents with Verbal Reinforcement Learning, NeurIPS 2023, [paper] [code]

  • LLM+P: Empowering Large Language Models with Optimal Planning Proficiency, arXiv 2023, [paper] [code]

Code

  • MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework, ICLR 2023, [paper] [code]

  • ChatDev: Communicative Agents for Software Development, ACL 2024, [paper] [code]

  • SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering, NeurIPS 2025, [paper] [code]

  • OpenHands: An Open Platform for AI Software Developers as Generalist Agents, arXiv 2024, [paper] [code]

Chat

  • Generative Agents: Interactive Simulacra of Human Behavior, UIST 2023, [paper] [code]

  • AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation, COLM 2024, [paper] [code]

Digital

Game

  • MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge, NeurIPS 2022, [paper] [code]

  • Voyager: An Open-Ended Embodied Agent with Large Language Models, TMLR 2024, [paper] [code]

  • SwarmBrain: Embodied agent for real-time strategy game StarCraft II via large language models, arXiv 2024, [paper] [code]

  • JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models, NeurIPS 2025, [paper] [code]

Multimodal

  • MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action, arXiv 2023, [paper] [code]

  • ViperGPT: Visual Inference via Python Execution for Reasoning, ICCV 2023, [paper] [code]

  • Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models, arXiv 2023, [paper] [code]

  • HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, NeurIPS 2023, [paper] [code]

Web

  • WebGPT: Browser-assisted question-answering with human feedback, arXiv 2021, [paper] [blog]

  • WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents, NeurIPS 2022, [paper] [code]

  • A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis, ICLR 2024, [paper]

  • Mind2Web: Towards a Generalist Agent for the Web, NeurIPS 2025, [paper] [code]

GUI

  • Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception, arXiv 2024, [paper] [code]

  • AppAgent: Multimodal Agents as Smartphone Users, arXiv 2023, [paper] [code]

  • UFO: A UI-Focused Agent for Windows OS Interaction, arXiv 2024, [paper] [code]

  • OmniParser for Pure Vision Based GUI Agent, arXiv 2024, [paper] [code]

DB & KG

  • A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?, arXiv 2024, [paper] [Handbook]

  • Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search, arXiv 2025, [paper]

  • EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing, arXiv 2025, [paper] [code]

  • NL2SQL-Bugs: A Benchmark for Detecting Semantic Errors in NL2SQL Translation, arXiv 2025, [paper] [code]

  • nvBench 2.0: A Benchmark for Natural Language to Visualization under Ambiguity, arXiv 2025, [paper] [code]

  • The Dawn of Natural Language to SQL: Are We Fully Ready?, VLDB 2024, [paper] [code]

  • Are Large Language Models Good Statisticians?, NIPS 2024, [paper] [code]

  • UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models, EMNLP 2022, [paper] [code]

  • Don't Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environments, ACL 2023, [paper] [code]

  • Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs, NeurIPS 2025, [paper] [project]

  • Spider 2.0: Evaluating language models on real-world enterprise text-to-sql workflows., ICLR 2025, [paper] [code]

  • Middleware for llms: Tools are instrumental for language agents in complex environments., EMNLP 2024, [paper] [code]

Physical

  • RT-1: Robotics Transformer for Real-World Control at Scale, RSS 2023, [paper] [project]

  • RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control, CoRL 2023, [paper] [project]

  • Open X-Embodiment: Robotic Learning Datasets and RT-X Models, arXiv 2023, [paper] [project]

  • GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation, arXiv 2024, [paper] [project]

  • π0: A vision-language-action flow model for general robot control., arXiv 2024, [paper]

  • Do as I can, not as I say Grounding language in robotic affordances, CoRL 2022, [paper] [project]

  • Voxposer: Composable 3d value maps for robotic manipulation with language models., CoRL 2023, [paper] [code]

  • Embodiedgpt: Vision-language pre-training via embodied chain of thought., NeurIPS 2023, [paper] [project]

Learning

ICL (In-Context Learning)

Prompt

  • CoT: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, NeurIPS 2022, [paper]

  • ReAct: React: Synergizing reasoning and acting in language models, arXiv 2022, [paper] [project]

  • Auto-CoT: Automatic Chain of Thought Prompting in Large Language Models, ICLR 2023, [paper] [code]

  • ToT: Tree of Thoughts: Deliberate Problem Solving with Large Language Models, NeurIPS 2023, [paper] [code]

  • GoT: Graph of Thoughts: Solving Elaborate Problems with Large Language Models, AAAI 2023, [paper] [code]

  • LearnAct: Empowering Large Language Model Agents through Action Learning, arXiv 2024, [paper] [code]

  • CoA: Improving Multi-Agent Debate with Sparse Communication Topology, arXiv 2024, [paper]

Decompose

  • Least-to-Most: Least-to-Most Prompting Enables Complex Reasoning in Large Language Models, ICLR 2023, [paper]

  • HuggingGPT: Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face, NeurIPS 2024, [paper] [code]

  • Plan-and-Solve: Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models, ACL 2023, [paper] [code]

  • ProgPrompt: Progprompt: Generating situated robot task plans using large language models, ICRA 2023, [paper] [project]

Role-play

  • Generative Agents: Generative agents: Interactive simulacra of human behavio, arXiv 2023, [paper] [code]

  • MetaGPT: Meta{GPT}: Meta Programming for Multi-Agent Collaborative Framework, ICLR 2023, [paper] [code]

  • ChatDev: ChatDev: Communicative Agents for Software Development, ACL 2024, [paper] [code]

  • SWE-Agent: SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering, arXiv 2024, [paper] [project]

Refine

  • Reflexion: Reflexion: language agents with verbal reinforcement learning, NeurIPS 2023, [paper] [code]

  • Self-refine: Self-refine: Iterative refinement with self-feedback, NeurIPS 2024, [paper] [code]

  • GPTSwarm: GPTSwarm: Language Agents as Optimizable Graphs, ICML 2024, [paper] [project]

PT & SFT (Pre-Training & Supervised Fine-Tuning)

Pre-Train

  • RT-1: RT-1: Robotics Transformer for Real-World Control at Scale, arXiv 2022, [paper] [project]

  • RT-2: RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control, arXiv 2023, [paper] [project]

  • RT-X: Open x-embodiment: Robotic learning datasets and rt-x models, arXiv 2023, [paper] [project]

  • GR-2: GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation, arXiv 2024, [paper] [project]

  • LAM: Large Action Models: From Inception to Implementation, arXiv 2024, [paper] [code]

SFT

  • CogACT: CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation, arXiv 2024, [paper] [project]

  • RT-H: RT-H: Action Hierarchies Using Language, arXiv 2024, [paper] [project]

  • OpenVLA: OpenVLA: An Open-Source Vision-Language-Action Model, arXiv 2024, [paper] [project]

  • $\pi_0$: $\pi_0$: A Vision-Language-Action Flow Model for General Robot Control, arXiv 2024, [paper] [project]

  • UniAct: Universal Actions for Enhanced Embodied Foundation Models, CVPR 2025, [paper] [code]

RL (Reinforcement Learning)

  • RLHF: Training language models to follow instructions with human feedback, NeurIPS 2022, [paper]

  • DPO: Direct preference optimization: Your language model is secretly a reward model, NeurIPS 2023, [paper]

  • RLFP: Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own, CoRL 2024, [paper] [project]

  • ELLM: Guiding pretraining in reinforcement learning with large language models, ICML 2023, [paper] [code]

  • GenSim: Gensim: Generating robotic simulation tasks via large language models, arXiv 2023, [paper] [project]

  • LEA: Reinforcement learning-based recommender systems with large language models for state reward and action modeling, ACM 2024, [paper]

  • MLAQ: Empowering LLM Agents with Zero-Shot Optimal Decision-Making through Q-learning, ICLR 2025, [paper]

  • KALM: KALM: Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts, NeurIPS 2024, [paper] [project]

  • When2Ask: Enabling intelligent interactions between an agent and an LLM: A reinforcement learning approach, RLC 2024, [paper]

  • Eureka: Eureka: Human-level reward design via coding large language models, ICLR 2024, [paper] [project]

  • ArCHer: ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL, arXiv 2024, [paper] [project]

  • LLaRP: Large Language Models as Generalizable Policies for Embodied Tasks, ICLR 2024, [paper] [project]

  • GPTSwarm: GPTSwarm: Language Agents as Optimizable Graphs, ICML 2024, [paper] [project]

Reward

Reward System

Extrinsic Reward

Dense Reward

  • InstructGPT (Ouyang et al., 2022)
  • DRO (Richemond et al., 2024)
  • sDPO (Kim et al., 2024)
  • ΨPO (Azar et al., 2024)
  • β-DPO (Wu et al., 2025)
  • ORPO (Hong et al., 2024)
  • DNO (Rosset et al., 2024)
  • f-DPO (Wang et al., 2023)
  • Xu et al., 2023
  • Rafailov et al., 2024

Sparse Reward

  • PAFT (Pentyala et al., 2024)
  • SimPO (Meng et al., 2025)
  • LiPO (Liu et al., 2024)
  • RRHF (Yuan et al., 2023)
  • PRO (Song et al., 2024)
  • D²O (Duan et al., 2024)
  • NPO (Zhang et al., 2024)
  • Ahmadian et al., 2024

Delayed Reward

  • CPO (Xu et al., 2024)
  • NLHF (Munos et al., 2023)
  • Swamy et al., 2024

Adaptive Reward

  • InstructGPT (Ouyang et al., 2022)
  • DRO (Richemond et al., 2024)
  • β-DPO (Wu et al., 2025)
  • ORPO (Hong et al., 2024)
  • PAFT (Pentyala et al., 2024)
  • SimPO (Meng et al., 2025)
  • NLHF (Munos et al., 2023)
  • Swamy et al., 2024
  • f-DPO (Wang et al., 2023)

Intrinsic Reward

Curiosity-Driven Reward

  • Pathak et al., 2017
  • Pathak et al., 2019
  • Plan2Explore (Sekar et al., 2020)

Diversity Reward

  • LIIR (Du et al., 2019)

Competence-Based Reward

  • CURIOUS (Colas et al., 2019)
  • Skew-Fit (Pong et al., 2019)
  • DISCERN (Hassani et al., 2021)
  • Yuan et al., 2024
  • KTO (Ethayarajh et al., 2024)

Exploration Reward

  • Yuan et al., 2024
  • Burda et al., 2018

Information Gain Reward

  • Ton et al., 2024
  • VIME (Houthooft et al., 2016)
  • EMI (Kim et al., 2018)
  • MAX (Shyam et al., 2019)
  • KTO (Ethayarajh et al., 2024)

Hybrid Reward

Combination of Intrinsic and Extrinsic Reward

  • d-RLAIF (Lee et al., 2023)
  • Bai et al., 2022
  • Xiong et al., 2023
  • Dong et al., 2024

Hierarchical Reward

Hierarchical Reward

  • TDPO (Zeng et al., 2024)

Emotion

Self-Enhancement in Intelligent Agents

Self-evolution

Optimization Spaces

Prompt

  • Prompt optimization in multi-step tasks (promst): Integrating human feedback and preference alignment, EMNLP 2024 [paper]

  • StraGo: Harnessing strategic guidance for prompt optimization, EMNLP 2024 [paper]

  • Connecting large language models with evolutionary algorithms yields powerful prompt optimizers, ICLR 2024 [paper]

Workflow

Tools

Optimization Algorithms

Optimization Strategies

  • Large Language Models Are Human-Level Prompt Engineers, ICLR 2023 [paper]

  • Automatic Prompt Optimization with "Gradient Descent" and Beam Search, EMNLP 2023 [paper]

  • GPTSwarm: Language Agents as Optimizable Graphs, ICML 2024 [paper]

  • Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution, ICML 2024 [paper]

  • Teaching Large Language Models to Self-Debug, ICLR 2024 [paper]

  • Large Language Models as Optimizers, ICLR 2024 [paper]

  • DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines, ICLR 2024 [paper]

  • Prompt Engineering a Prompt Engineer, Findings of ACL 2024 [paper]

  • Prompt optimization in multi-step tasks (promst): Integrating human feedback and preference alignment, EMNLP 2024 [paper]

  • StraGo: Harnessing strategic guidance for prompt optimization, EMNLP 2024 [paper]

  • Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs, EMNLP 2024 [paper]

  • Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMs, NeurIPS 2024 [paper]

  • Optimizing Generative AI by Backpropagating Language Model Feedback, Nature [paper]

  • Are Large Language Models Good Prompt Optimizers?, arxiv [paper]

Theoretical Perspectives

  • An Explanation of In-context Learning as Implicit Bayesian Inference, ICLR 2022, [paper]

  • Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?, EMNLP 2022, [paper]

  • What Can Transformers Learn In-Context? A Case Study of Simple Function Classes, NeurIPS 2022, [paper]

  • What Learning Algorithm Is In-Context Learning? Investigations with Linear Models, ICLR 2023, [paper]

  • Transformers Learn In-Context by Gradient Descent, ICML 2023, [paper]

  • Transformers Learn to Achieve Second-Order Convergence Rates for In-Context Linear Regression, NeurIPS 2024, [paper]

Utilization Scenario

Online Opitmization

  • Reflexion: language agents with verbal reinforcement learning, NeurIPS 2023, [paper]

  • Self-refine: Iterative refinement with self-feedback, NeurIPS 2023, [paper]

  • ReAct: Synergizing Reasoning and Acting in Language Models, ICLR 2023, [paper]

  • Tree of thoughts: Deliberate problem solving with large language models, NeurIPS 2023, [paper]

  • Voyager: An Open-Ended Embodied Agent with Large Language Models, TMLR 2024, [paper]

  • Let's Verify Step by Step, ICLR 2024, [paper]

  • MetaGPT: Meta programming for multi-agent collaborative framework, ICLR 2024, [paper]

  • Camel: Communicative agents for “mind” exploration of large language model society, NeurIPS 2023, [paper]

  • ChatDev: Communicative Agents for Software Development, ACL 2024, [paper]

  • Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face, NeurIPS 2023, [paper]

  • Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation, COLM 2024, [paper]

  • Quiet-star: Language models can teach themselves to think before speaking, CoRR 2024, [paper]

  • **Text2reward: Automated dense reward function generation for reinforcement learning. **, ICLR 2024, [paper]

  • Extracting prompts by inverting LLM outputs, ACL 2024, [paper]

  • Aligning large language models via self-steering optimization., arxiv 2024, [paper]

  • Aligning large language models via self-steering optimization., arxiv 2024, [paper]

Offline Optimization

  • Are Large Language Models Good Statisticians?, NeurIPS 2024, [paper]

  • nvBench 2.0: A Benchmark for Natural Language to Visualization under Ambiguity, arxiv 2025, [paper]

  • Srag: Structured retrieval-augmented generation for multi-entity question answering over wikipedia graph, arxiv 2025, [paper]

  • Fine-grained retrieval-augmented generation for visual question answering, arxiv 2025, [paper]

  • xLAM: A Family of Large Action Models to Empower AI Agent Systems, arxiv 2024, [paper]

  • Automated design of agentic systems., arxiv 2024, [paper]

  • LIRE: listwise reward enhancement for preference alignment, ACL 2024, [paper]

Scientific Knowledge Discovery

Hypothesis Generation and Testing

  • Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers, arXiv 2024, [paper]

  • SciAgents: Automating Scientific Discovery Through Bioinspired Multi-Agent Intelligent Graph Reasoning, Advanced Materials 2024, [paper]

  • Genesis: Towards the Automation of Systems Biology Research, arXiv 2024, [paper]

  • The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, arXiv 2024, [paper]

  • Agent Laboratory: Using LLM Agents as Research Assistants, arXiv 2025, [paper]

  • ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning, arXiv 2025, [paper]

  • ChemOS 2.0: An orchestration architecture for chemical self-driving laboratories, Matter 2024, [paper]

  • Towards an AI co-scientist, arXiv 2025, [paper]

Protocol Planning and Tool Innovation

  • Autonomous mobile robots for exploratory synthetic chemistry, Nature 2024, [paper]

  • Delocalized, asynchronous, closed-loop discovery of organic laser emitters, Science 2024, [paper]

  • The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation, bioRxiv 2024, [paper]

Data Analysis and Implication Derivation

  • Solving olympiad geometry without human demonstrations, Nature 2024, [paper]

  • Toward a Team of AI-made Scientists for Scientific Discovery from Gene Expression Data, arXiv 2024, [paper]

  • Data Interpreter: An LLM Agent For Data Science, arXiv 2024, [paper]

Collaborative and Evolutionary Intelligent Systems

LLM-based Multi-Agent Systems

Application

Strategic Learning

  • RECONCILE (Chen et al., 2023)
  • LLM-Game-Agent (Lan et al., 2023)
  • BattleAgentBench (Wang et al., 2024)

Modeling and Simulation

  • Generative Agents (Park et al., 2023)
  • Agent Hospital (Li et al., 2024)
  • MedAgents (Tang et al., 2024)
  • MEDCO (Wei et al., 2024)

Collaborative Task Solving

  • MetaGPT (Hong et al., 2023)
  • ChatDev (Qian et al., 2024)
  • Agent Laboratory (Schmidgall et al., 2025)
  • The Virtual Lab (Swanson et al., 2024)

Composition and Protocol

Agent Composition

Homogeneous

  • CoELA (Zhang et al., 2023)
  • VillagerAgent (Dong et al., 2024)
  • LLM-Coordination (Agashe et al., 2024)

Heterogeneous

  • MetaGPT (Hong et al., 2023)
  • ChatDev (Qian et al., 2024)
  • Generative Agents (Park et al., 2023)
  • S-Agents (Chen et al., 2024)

Interaction Protocols

Message Types

  • SciAgents (Ghafarollahi et al., 2024)
  • AppAgent (Chi et al., 2023)
  • MetaGPT (Hong et al., 2023)

Communication Interfaces

  • AgentBench (Liu et al., 2023)
  • VAB (Liu et al., 2024)
  • TaskWeaver (Qiao et al., 2024)
  • HULA (Takerngsaksiri et al., 2025)

Next Generation Protocol

  • MCP (Anthropic)
  • Agora (Marro et al., 2024)
  • IoA (Chen et al., 2024)

Topology

Static Topology

  • MEDCO (Wei et al., 2024)
  • Agent Hospital (Li et al., 2024)
  • Welfare Diplomacy (Mukobi et al., 2023)
  • MedAgents (Tang et al., 2024)

Dynamic Topology

  • DyLAN (Liu et al., 2023)
  • GPTSwarm (Zhuge et al., 2024)
  • CodeR (Chen et al., 2024)
  • Oasis (Yang et al., 2024)

Collaboration

Agent-Agent Collaboration

Consensus-oriented

  • Agent Laboratory (Schmidgall et al., 2025)
  • The Virtual Lab (Swanson et al., 2024)
  • OASIS (Yang et al., 2024)

Collaborative Learning

  • Generative Agents (Park et al., 2023)
  • Welfare Diplomacy (Mukobi et al., 2023)
  • LLM-Game-Agent (Lan et al., 2023)
  • BattleAgentBench (Wang et al., 2024)

Teaching/Mentoring

  • MEDCO (Wei et al., 2024)
  • Agent Hospital (Li et al., 2024)

Task-oriented

  • MedAgents (Tang et al., 2024)
  • S-Agents (Chen et al., 2024)

Human-AI Collaboration

  • Dittos (Leong et al., 2024)
  • PRELUDE (Gao et al., 2024)

Evolution

Collective Intelligence

  • Generative Agents (Park et al., 2023)
  • Welfare Diplomacy (Mukobi et al., 2023)
  • LLM-Game-Agent (Lan et al., 2023)
  • BattleAgentBench (Wang et al., 2024)

Individual Adaptability

  • Agent Hospital (Li et al., 2024)
  • Agent Laboratory (Schmidgall et al., 2025)
  • MEDCO (Wei et al., 2024)

Evaluation

Benchmark for Specific Tasks

  • MBPP (dataset-mbpp)
  • HotpotQA (dataset-hotpot-qa)
  • MATH (dataset-math)
  • SVAMP (dataset-svamp)
  • MultiArith (dataset-multiarith)

Benchmark for MAS

  • Collab-Overcooked (Sun et al., 2025)
  • REALM-Bench (Geng et al., 2025)
  • PARTNR (Chang et al., 2024)
  • VillagerBench (Dong et al., 2024)
  • AutoArena (Zhao et al., 2024)
  • MultiagentBench (Zhu et al., 2025)

Building Safe and Beneficial AI

Agent Intrinsic Safety

Safety Threats

Jailbreak

White-box Jailbreak

  • Jailbreak attacks and defenses against large language models: A survey, arXiv 2024, [paper]

  • Universal and transferable adversarial attacks on aligned language models, arXiv 2023, [paper]

  • Boosting jailbreak attack with momentum, arXiv 2024, [paper]

  • Improved techniques for optimization-based jailbreaking on large language models, arXiv 2024, [paper]

  • Jailbreak Instruction-Tuned LLMs via end-of-sentence MLP Re-weighting, arXiv 2024, [paper]

  • Open the Pandora's Box of LLMs: Jailbreaking LLMs through Representation Engineering, arXiv 2024, [paper]

  • DROJ: A Prompt-Driven Attack against Large Language Models, arXiv 2024, [paper]

  • Autodan: Generating stealthy jailbreak prompts on aligned large language models, arXiv 2023, [paper]

  • POEX: Policy Executable Embodied AI Jailbreak Attacks, arXiv 2024, [paper]

Black-box Jailbreak

  • Jailbroken: How does LLM safety training fail?, NeurIPS 2023, [paper]

  • Jailbreaking black box large language models in twenty queries, arXiv 2023, [paper]

  • Jailbreaking large language models against moderation guardrails via cipher characters, NeurIPS 2024, [paper]

  • Visual adversarial examples jailbreak aligned large language models, AAAI 2024, [paper]

  • POEX: Policy Executable Embodied AI Jailbreak Attacks, arXiv 2024, [paper]

  • Autodan: Generating stealthy jailbreak prompts on aligned large language models, arXiv 2023, [paper]

  • Guard: Role-playing to generate natural-language jailbreakings to test guideline adherence of large language models, arXiv 2024, [paper]

  • Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models, arXiv 2024, [paper]

  • Rt-attack: Jailbreaking text-to-image models via random token, arXiv 2024, [paper]

Prompt Injection

Direct Prompt Injection

  • Not what you've signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection, AISec@CCS 2023, [paper

  • Automatic and universal prompt injection attacks against large language models, arXiv 2024, [paper]

  • Optimization-based prompt injection attack to LLM-as-a-judge, CCS 2024, [paper]

  • Benchmarking indirect prompt injections in tool-integrated large language model agents, arXiv 2024, [paper]

  • Trust No AI: Prompt Injection Along The CIA Security Triad, arXiv 2024, [paper]

  • Empirical analysis of large vision-language models against goal hijacking via visual prompt injection, arXiv 2024, [paper]

  • Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition, arXiv 2024, [paper]

  • Ignore this title and HackAPrompt: Exposing systemic vulnerabilities of LLMs through a global prompt hacking competition, EMNLP 2023, [paper]

Indirect Prompt Injection

  • Not what you've signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection, AISec@CCS 2023, [paper]

  • HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models, arXiv 2025, [paper]

  • Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models, arXiv 2024, [paper]

  • Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems, arXiv 2024, [paper]

  • Adversarial search engine optimization for large language models, arXiv 2024, [paper]

Hallucination

Knowledge-conflict Hallucination

  • Survey of hallucination in natural language generation, ACM Computing Surveys 2023, [paper]

  • A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions, arXiv 2023, [paper]

  • DELUCIONQA: Detecting Hallucinations in Domain-specific Question Answering, Findings of EMNLP 2023, [paper]

  • Deficiency of large language models in finance: An empirical examination of hallucination, Failure Modes Workshop @ NeurIPS 2023, [paper]

  • MetaGPT: Meta Programming for Multi-Agent Collaborative Framework, ICLR 2023, [paper]

  • Hallucination is inevitable: An innate limitation of large language models, arXiv 2024, [paper]

  • ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models, arXiv 2024, [paper]

Context-conflict Hallucination

  • Truth-Aware Context Selection: Mitigating the Hallucinations of Large Language Models Being Misled by Untruthful Contexts, arXiv 2024, [paper]

  • Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis, arXiv 2024, [paper]

  • HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild, arXiv 2024, [paper]

  • Analyzing and Mitigating Object Hallucination in Large Vision-Language Models, ICLR 2023, [paper]

  • Mitigating object hallucination in large vision-language models via classifier-free guidance, arXiv 2024, [paper]

  • When Large Language Models contradict humans? Large Language Models' Sycophantic Behaviour, arXiv 2023, [paper]

  • HallusionBench: an advanced diagnostic suite for entangled language hallucination and visual illusion in large vision-language models, CVPR 2024, [paper]

  • DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models, arXiv 2024, [paper]

Misalignment

Goal-misguided Misalignment

  • AI alignment: A comprehensive survey, arXiv 2023, [paper]

  • Specification Gaming: The Flip Side of AI Ingenuity, DeepMind Blog 2020, [paper]

  • The alignment problem from a deep learning perspective, arXiv 2022, [paper]

  • Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!, arXiv 2024, [paper]

  • Agent Alignment in Evolving Social Norms, arXiv 2024, [paper]

  • Model Merging and Safety Alignment: One Bad Model Spoils the Bunch, arXiv 2024, [paper]

Capability-misused Misalignment

  • Trustworthy LLMs: A survey and guideline for evaluating large language models' alignment, arXiv 2023, [paper]

  • Assessing the brittleness of safety alignment via pruning and low-rank modifications, arXiv 2024, [paper]

  • AI alignment: A comprehensive survey, arXiv 2023, [paper]

  • Fine-tuning aligned language models compromises safety, even when users do not intend to!, arXiv 2023, [paper]

  • Fundamental limitations of alignment in large language models, arXiv 2023, [paper]

Poisoning Attacks

Model Poisoning

  • Weight poisoning attacks on pre-trained models, ACL 2020, [paper]

  • Badedit: Backdooring large language models by model editing, arXiv 2024, [paper]

  • The philosopher's stone: Trojaning plugins of large language models, arXiv 2023, [paper]

  • Obliviate: Neutralizing Task-agnostic Backdoors within the Parameter-efficient Fine-tuning Paradigm, arXiv 2024, [paper]

  • Poisoned ChatGPT finds work for idle hands: Exploring developers’ coding practices with insecure suggestions from poisoned AI models, IEEE S&P 2024, [paper

  • Secret Collusion Among Generative AI Agents, arXiv 2024, [paper]

  • Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor, arXiv 2024, [paper]

Data Poisoning

  • Poisoning language models during instruction tuning, ICML 2023, [paper]

  • Agentpoison: Red-teaming LLM agents via poisoning memory or knowledge bases, NeurIPS 2025, [paper]

  • Poison-RAG: Adversarial Data Poisoning Attacks on Retrieval-Augmented Generation in Recommender Systems, arXiv 2025, [paper]

  • PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning, arXiv 2024, [paper]

  • The dark side of human feedback: Poisoning large language models via user inputs, arXiv 2024, [paper]

  • Scaling laws for data poisoning in LLMs, arXiv 2024, [paper]

  • Talk too much: Poisoning large language models under token limit, arXiv 2024, [paper]

  • Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data, arXiv 2024, [paper]

Backdoor Injection

  • Sleeper agents: Training deceptive LLMs that persist through safety training, arXiv 2024, [paper]

  • Wipi: A new web threat for LLM-driven web agents, arXiv 2024, [paper]

  • Exploring backdoor attacks against large language model-based decision making, arXiv 2024, [paper]

  • When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations, arXiv 2024, [paper]

  • Backdooring instruction-tuned large language models with virtual prompt injection, NAACL 2024, [paper]

Privacy Threats

Training Data Inference

Membership Inference Attacks

  • Membership inference attacks against machine learning models, IEEE S&P 2017, [paper]

  • The secret sharer: Evaluating and testing unintended memorization in neural networks, USENIX Security 2019, [paper]

  • Label-only membership inference attacks, ICML 2021, [paper]

  • Practical membership inference attacks against fine-tuned large language models via self-prompt calibration, arXiv 2023, [paper]

  • Membership inference attacks from first principles, IEEE S&P 2022, [paper]

  • Membership inference attacks on machine learning: A survey, ACM Computing Surveys 2022, [paper]

Data Extraction Attacks

  • Extracting training data from large language models, USENIX Security 2021, [paper]

  • Special characters attack: Toward scalable training data extraction from large language models, arXiv 2024, [paper]

  • Ethicist: Targeted training data extraction through loss smoothed soft prompting and calibrated confidence estimation, arXiv 2023, [paper]

  • Language model inversion, arXiv 2023, [paper]

  • Privacy risks of general-purpose language models, IEEE S&P 2020, [paper]

  • Quantifying memorization across neural language models, arXiv 2022, [paper]

  • Stealing part of a production language model, arXiv 2024, [[paper](https://arxiv.org

Interaction Data Inference

System Prompt Stealing

  • Ignore previous prompt: Attack techniques for language models, TSRML@NeurIPS 2022, [paper]

  • Prompt Stealing Attacks Against Text-to-Image Generation Models, USENIX Security 2024, [paper]

  • Safeguarding System Prompts for LLMs, arXiv 2024, [paper]

  • InputSnatch: Stealing Input in LLM Services via Timing Side-Channel Attacks, arXiv 2024, [paper]

  • Effective prompt extraction from language models, arXiv 2023, [paper]

  • Last one standing: A comparative analysis of security and privacy of soft prompt tuning, lora, and in-context learning, arXiv 2023, [paper]

  • LLM app store analysis: A vision and roadmap, ACM TOSEM 2024, [paper]

User Prompt Stealing

  • Prsa: Prompt reverse stealing attacks against large language models, arXiv 2024, [paper]

  • Prompt Leakage effect and defense strategies for multi-turn LLM interactions, arXiv 2024, [paper]

  • Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions, arXiv 2024, [paper]

  • Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models, arXiv 2024, [paper]

  • Pleak: Prompt leaking attacks against large language model applications, CCS 2024, [paper]

  • Stealing User Prompts from Mixture of Experts, arXiv 2024, [paper]

  • Extracting Prompts by Inverting LLM Outputs, arXiv 2024, [paper]

Threats on Non-Brain

Threats on LLM Non-Brains

Perception Safety Threats

Adversarial Attacks

Textual
  • An LLM can Fool Itself: A Prompt-Based Adversarial Attack, arXiv 2023, [paper]

  • Revisiting Character-level Adversarial Attacks for Language Models, ICML 2024, [paper]

  • Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery, NeurIPS 2024, [paper]

  • Universal and transferable adversarial attacks on aligned language models, arXiv 2023, [paper]

Visual
  • Image hijacks: Adversarial images can control generative models at runtime, arXiv 2023, [paper]

  • Image-based Multimodal Models as Intruders: Transferable Multimodal Attacks on Video-based MLLMs, arXiv 2025, [paper]

  • Dissecting Adversarial Robustness of Multimodal LM Agents, ICLR 2025, [paper]

  • Poltergeist: Acoustic adversarial machine learning against cameras and computer vision, IEEE S&P 2021, [paper]

Auditory
  • Inaudible adversarial perturbation: Manipulating the recognition of user speech in real time, arXiv 2023, [paper]

  • The Silent Manipulator: A Practical and Inaudible Backdoor Attack against Speech Recognition Systems, ACM Multimedia 2023, [paper]

  • Enrollment-stage backdoor attacks on speaker recognition systems via adversarial ultrasound, IEEE IoT Journal 2023, [paper]

  • Ultrabd: Backdoor attack against automatic speaker verification systems via adversarial ultrasound, ICPADS 2023, [paper

  • DolphinAttack: Inaudible voice commands, CCS 2017, [paper]

Other Modality
  • A Survey on Adversarial Robustness of LiDAR-based Machine Learning Perception in Autonomous Vehicles, arXiv 2024, [paper]

  • Rocking drones with intentional sound noise on gyroscopic sensors, USENIX Security 2015, [paper]

  • Adversarial attacks on multi-agent communication, ICCV 2021, [paper]

  • GPS location spoofing attack detection for enhancing the security of autonomous vehicles, IEEE VTC-Fall 2021, [paper]

Misperception Issues

  • Grounding large language models in interactive environments with online reinforcement learning, ICML 2023, [paper]

  • Bias and fairness in large language models: A survey, Computational Linguistics 2024, [paper]

  • Domain generalization using causal matching, ICML 2021, [paper]

  • GEM: Glare or gloom, I can still see you—End-to-end multi-modal object detection, IEEE RA-L 2021, [paper]

  • NPHardEval: Dynamic benchmark on reasoning ability of large language models via complexity classes, arXiv 2023, [paper]

  • Modeling opinion misperception and the emergence of silence in online social system, PLOS ONE 2024, [paper]

  • Bridging the domain gap for multi-agent perception, ICRA 2023, [paper]

  • Cooperative and competitive biases for multi-agent reinforcement learning, arXiv 2021, [paper]

  • Model-agnostic multi-agent perception framework, ICRA 2023, [paper]

  • Mutual influence between language and perception in multi-agent communication games, PLOS Computational Biology 2022, [paper]

Action Safety Threats

Supply Chain Attack

  • A new era in LLM security: Exploring security concerns in real-world LLM-based systems, arXiv 2024, [paper]

  • Wipi: A new web threat for LLM-driven web agents, arXiv 2024, [paper]

  • Identifying the risks of LM agents with an LM-emulated sandbox, arXiv 2023, [paper]

  • Not what you've signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection, AISec@CCS 2023, [paper]

  • Benchmarking indirect prompt injections in tool-integrated large language model agents, arXiv 2024, [paper]

Tool Use Risk

  • Identifying the risks of LM agents with an LM-emulated sandbox, arXiv 2023, [paper]

  • Toolsword: Unveiling safety issues of large language models in tool learning across three stages, arXiv 2024, [paper]

  • Benchmarking indirect prompt injections in tool-integrated large language model agents, arXiv 2024, [paper]

Agent Extrinsic Safety

Agent Extrinsic Safety:

Agent-Memory Interaction Threats

Retrieval Augmented Generation

  • Agentpoison: Red-teaming LLM agents via poisoning memory or knowledge bases, NeurIPS 2025, [paper]

  • ConfusedPilot: Confused deputy risks in RAG-based LLMs, arXiv 2024, [paper]

  • PoisonedRAG: Knowledge corruption attacks to retrieval-augmented generation of large language models, arXiv 2024, [paper]

  • Machine against the RAG: Jamming retrieval-augmented generation with blocker documents, arXiv 2024, [paper]

  • BadRAG: Identifying vulnerabilities in retrieval augmented generation of large language models, arXiv 2024, [paper]

  • TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models, arXiv 2024, [paper]

  • Whispers in Grammars: Injecting Covert Backdoors to Compromise Dense Retrieval Systems, arXiv 2024, [paper]

Agent-Environment Interaction Threats

Physical Environment

  • Autonomous vehicles: Sophisticated attacks, safety issues, challenges, open topics, blockchain, and future directions, JCP 2023, [paper]

  • Engineering challenges ahead for robot teamwork in dynamic environments, Applied Sciences 2020, [paper]

  • On GPS spoofing of aerial platforms: a review of threats, challenges, methodologies, and future research directions, PeerJ Computer Science 2021, [paper]

  • Security and privacy in cyber-physical systems: A survey, IEEE Communications Surveys & Tutorials 2017, [paper]

  • Adversarial objects against LiDAR-based autonomous driving systems, arXiv 2019, [paper]

  • Learning to walk in the real world with minimal human effort, arXiv 2020, [paper]

  • Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science, arXiv 2024, [paper]

Digital Environment

  • A new era in LLM security: Exploring security concerns in real-world LLM-based systems, arXiv 2024, [paper]

  • Demystifying RCE vulnerabilities in LLM-integrated apps, CCS 2024, [paper]

  • Wipi: A new web threat for LLM-driven web agents, arXiv 2024, [paper]

  • Application of large language models to DDoS attack detection, SPCPS 2023, [paper]

  • Coercing LLMs to do and reveal (almost) anything, arXiv 2024, [paper]

  • Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science, arXiv 2024, [paper]

  • EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage, arXiv 2024, [paper]

  • AdvWeb: Controllable Black-Box Attacks on VLM-Powered Web Agents, arXiv 2024, [paper]

  • AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection, arXiv 2025, [paper]

Agent-Agent Interaction Threats

Competitive Interactions

  • Multi-Agent Risks from Advanced AI, arXiv 2025, [paper]

  • Hoodwinked: Deception and cooperation in a text-based game for language models, arXiv 2023, [paper]

  • Attacking deep reinforcement learning with decoupled adversarial policy, IEEE TDSC 2022, [paper]

  • Secure consensus of multi-agent systems under denial-of-service attacks, Asian Journal of Control 2023, [paper]

  • A Perfect Collusion Benchmark: How can AI agents be prevented from colluding with information-theoretic undetectability?, Multi-Agent Security Workshop @ NeurIPS 2023, [paper]

Cooperative Interactions

  • On the risk of misinformation pollution with large language models, arXiv 2023, [paper]

  • Agent Smith: A single image can jailbreak one million multimodal LLM agents exponentially fast, arXiv 2024, [paper]

About

About Awesome things towards foundation agents. Papers / Repos / Blogs / ...

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published