Multimodal Agentic AI Scientists

A curated list of papers on Agentic Multimodal Large Language Models (MLLMs) for Scientific Discovery

🚀 Join us in building the AI for Science community! Know a great paper we missed? Open an issue — together, let's accelerate scientific discovery with AI!

This repository accompanies our survey paper: "Exploring Agentic Multimodal Large Language Models: A Survey for AIScientists"

What is an AIScientist?

AIScientists are autonomous agents powered by multimodal large language models (MLLMs) that can understand papers, generate hypotheses, plan and conduct experiments, analyze results, and draft manuscripts across the scientific research lifecycle. Recent systems span open-ended AI research (Lu et al., 2024; Lu et al., 2026), biomedical hypothesis generation (Gottweis et al., 2026), automated biology discovery (Ghareeb et al., 2026), and empirical software generation (Aygün et al., 2026). This survey summarizes a complete pipeline for developing multimodal agentic AIScientists, with representative studies spanning 10 scientific domains.

Comparison with Related Surveys

Prior surveys examine scientific AI agents by workflow stages, autonomy levels, domain resources, or automation-to-autonomy transitions. Our survey adds a pipeline-oriented view across modalities, agent training, inference-time methods, benchmarks, and human-AI collaboration, clarifying how multimodal scientific agents are built, where costs arise, and which human checkpoints remain necessary.

Paper	Taxonomy	Ag.	DM.	Method	HCI	Ben.	#Dom.
Zhang et al. (2024)	Domain	✗	Seq.+	Train. only	✗	✓	6
Gridach et al. (2025)	Research Workflow	✓	✗	Infer. only	✓	✗	4
Luo et al. (2025)	Research Workflow	✗	✗	✗	✗	✗	✗
Zhang et al. (2025)	Research Workflow	✗	Seq.+	✗	✗	✗	✗
Ren et al. (2025)	Agent Composition	✓	✗	Train. & Infer.	✗	✓	6+
Wei et al. (2025)	Auto. & Domain	✓	✗	Infer. only	✗	✓	4
Hu et al. (2025)	Data & Domain	✓	✓	✗	✗	✓	6+
Zheng et al. (2025)	Research Workflow & Auto.	✓	✗	Infer. only	✓	✓	6+
Zhou et al. (2025)	Research Workflow	✓	✗	Infer. only	✓	✓	6+
Ours	ML & Research Pipeline	✓	✓	Train. & Infer.	✓	✓	10

_{Ag. = Agentic AI; DM. = Data Modality; HCI = Human-Computer Interaction; Ben. = Benchmark; #Dom. = Number of domains; Seq.+ = Sequence and more modalities; Train. = Agent Training; Infer. = Agent Inference; Auto. = Autonomy Level}

Ours: An End-to-End Developer Pipeline

Overview of our framework: Starting from diverse Input & Output modalities, through Agent Training and Inference methods, to Evaluation benchmarks, with Human-AI Collaboration integrated at every stage.

⚙️ Methods for Scientific MLLM Agents

Scientific MLLM agents need more than generic instruction following: they must learn domain representations, call tools, ground decisions in evidence, and recover when execution contradicts the plan.

🏋️ Agent Training

🚀 Agent Inference

Knowledge Grounding: RAG, Knowledge Graphs & ICL

ClinicalRAG: Enhancing Clinical Decision Support Through Heterogeneous Knowledge Retrieval (2024) - Lu et al.
AutoProteinEngine: A Large Language Model Driven Agent Framework for Multimodal AutoML in Protein Engineering (2025) - Liu et al.
ESCARGOT: An AI Agent Leveraging Large Language Models, Dynamic Graph of Thoughts, and Biomedical Knowledge Graphs for Enhanced Reasoning (2025) - Matsumoto et al.
A Framework for Autonomous AI-Driven Drug Discovery (2024) - Selinger et al.
Automating AI Discovery for Biomedicine Through Knowledge Graphs and LLM Agents (2025) - Aamer et al.
BioImage.IO Chatbot: A Community-Driven AI Assistant for Integrative Computational Bioimaging (2024) - Lei et al.
A Survey on In-Context Learning (2024) - Dong et al.

Planning, Tool Use & Workflow Control

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery (2025) - Chen et al.
Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions (2025) - Hou et al.
Democratizing AI Scientists Using ToolUniverse (2025) - Gao et al.
Biomni: A General-Purpose Biomedical AI Agent (2025) - Huang et al.
MedRAX: Medical Reasoning Agent for Chest X-ray (2025) - Fallahpour et al.
CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments (2024) - Huang et al.
CACTUS: Chemistry Agent Connecting Tool Usage to Science (2024) - McNaughton et al.
Augmenting Large Language Models with Chemistry Tools (2024) - Bran et al.
Omega: Harnessing the Power of Large Language Models for Bioimage Analysis (2024) - Royer
MT-Mol: Multi Agent System with Tool-Based Reasoning for Molecular Optimization (2025) - Kim et al.

Full-Loop & Self-Correcting Agents

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search (2025) - Yamada et al.
Towards End-to-End Automation of AI Research (2026) - Lu et al.
Accelerating Scientific Discovery with Co-Scientist (2026) - Gottweis et al.
A Multi-Agent System for Automating Scientific Discovery (2026) - Ghareeb et al.
An AI System to Help Scientists Write Expert-Level Empirical Software (2026) - Aygün et al.
Reflexion: Language Agents with Verbal Reinforcement Learning (2023) - Shinn et al.
ArgMed-Agents: Explainable Clinical Decision Reasoning with LLM Discussion via Argumentation Schemes (2024) - Hong et al.
GeneAgent: Self-Verification Language Agent for Gene Set Knowledge Discovery Using Domain Databases (2024) - Wang et al.

🤝 Multi-Agent Systems

Multi-Agent Collaboration Mechanisms: A Survey of LLMs (2025) - Tran et al.
ProtAgents: Protein Discovery via Large Language Model Multi-Agent Collaborations Combining Physics and Machine Learning (2024) - Ghafarollahi & Buehler
Automating Alloy Design and Discovery with Physics-Aware Multimodal Multiagent AI (2025) - Ghafarollahi et al.
TriageAgent: Towards Better Multi-Agents Collaborations for Large Language Model-Based Clinical Triage (2024) - Lu et al.
MedAgents: Large Language Models as Collaborators for Zero-Shot Medical Reasoning (2024) - Tang et al.
MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making (2024) - Kim et al.
ColaCare: Enhancing Electronic Health Record Modeling Through Large Language Model-Driven Multi-Agent Collaboration (2025) - Wang et al.
DrugAgent: Automating AI-Aided Drug Discovery Programming Through LLM Multi-Agent Collaboration (2024) - Liu et al.
Synthetic Arabic Medical Dialogues Using Advanced Multi-Agent LLM Techniques (2024) - ALMutairi et al.
Advancing Healthcare Automation: Multi-Agent System for Medical Necessity Justification (2024) - Pandey et al.

📈 Benchmarks & Evaluation

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering (2022) - Lu et al.
DiscoveryWorld: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents (2024) - Jansen et al.
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery (2025) - Chen et al.
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration (2025) - Shao et al.
HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation (2025) - Liu et al.
ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition (2025) - Liu et al.
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers (2024) - Si et al.
Automated Hypothesis Validation with Agentic Sequential Falsifications (2025) - Huang et al.
Detecting Hallucinations in Large Language Models Using Semantic Entropy (2024) - Farquhar et al.
LLM Hallucinations in the Wild: Large-Scale Evidence from Non-Existent Citations (2026) - Zhao et al.
Evaluating Large Language Model Agents for Automation of Atomic Force Microscopy (2025) - Mandal et al.
How to Detect and Defeat Molecular Mirage: A Metric-Driven Benchmark for Hallucination in LLM-Based Molecular Comprehension (2025) - Li et al.
Detecting and Evaluating Medical Hallucinations in Large Vision Language Models (2024) - Chen et al.
LitLLMs, LLMs for Literature Review: Are We There Yet? (2025) - Agarwal et al.
MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving (2025) - Zhang et al.

🧑‍🔬 Human-AI Collaboration

Exploring Collaboration Patterns and Strategies in Human-AI Co-Creation Through the Lens of Agency (2025) - Zhang et al.
AI-Researcher: Autonomous Scientific Innovation (2025) - Tang et al.
Automated Statistical Model Discovery with Language Models (2024) - Li et al.
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration (2025) - Shao et al.
The Virtual Lab of AI Agents Designs New SARS-CoV-2 Nanobodies (2025) - Swanson et al.
FutureHouse Platform: Superintelligent AI Agents for Scientific Discovery (2025) - Skarlinski et al.
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models (2025) - Baek et al.
Agent Laboratory: Using LLM Agents as Research Assistants (2025) - Schmidgall et al.
An AI Agent for Fully Automated Multi-Omic Analyses (2024) - Zhou et al.
DrugAgent: Automating AI-Aided Drug Discovery Programming Through LLM Multi-Agent Collaboration (2024) - Liu et al.
Toward a Team of AI-Made Scientists for Scientific Discovery from Gene Expression Data (2024) - Liu et al.
CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-Based Experimentation (2025) - Jansen et al.
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (2024) - Lu et al.
Accelerating Scientific Discovery with Co-Scientist (2026) - Gottweis et al.
A Multi-Agent System for Automating Scientific Discovery (2026) - Ghareeb et al.
Localization, Inspection, and Reasoning Module for Autonomous Workflows in Self-Driving Laboratories (2025) - Zhou et al.
Evaluating Large Language Model Agents for Automation of Atomic Force Microscopy (2025) - Mandal et al.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md
papers_by_section.json		papers_by_section.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Agentic AI Scientists

What is an AIScientist?

Comparison with Related Surveys

Ours: An End-to-End Developer Pipeline

Table of Contents

⚙️ Methods for Scientific MLLM Agents

🏋️ Agent Training

Supervised Fine-Tuning & Tool Instruction

Reinforcement Learning & Verifier Feedback

Contrastive & Adversarial Learning

🚀 Agent Inference

Knowledge Grounding: RAG, Knowledge Graphs & ICL

Planning, Tool Use & Workflow Control

Full-Loop & Self-Correcting Agents

🤝 Multi-Agent Systems

📈 Benchmarks & Evaluation

🧑‍🔬 Human-AI Collaboration

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Multimodal Agentic AI Scientists

What is an AIScientist?

Comparison with Related Surveys

Ours: An End-to-End Developer Pipeline

Table of Contents

⚙️ Methods for Scientific MLLM Agents

🏋️ Agent Training

Supervised Fine-Tuning & Tool Instruction

Reinforcement Learning & Verifier Feedback

Contrastive & Adversarial Learning

🚀 Agent Inference

Knowledge Grounding: RAG, Knowledge Graphs & ICL

Planning, Tool Use & Workflow Control

Full-Loop & Self-Correcting Agents

🤝 Multi-Agent Systems

📈 Benchmarks & Evaluation

🧑‍🔬 Human-AI Collaboration

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages