A curated list of papers on Agentic Multimodal Large Language Models (MLLMs) for Scientific Discovery
🚀 Join us in building the AI for Science community! Know a great paper we missed? Open an issue — together, let's accelerate scientific discovery with AI!
This repository accompanies our survey paper: "Exploring Agentic Multimodal Large Language Models: A Survey for AIScientists"
AIScientists are autonomous agents powered by multimodal large language models (MLLMs) that can understand papers, generate hypotheses, plan and conduct experiments, analyze results, and draft manuscripts across the scientific research lifecycle. Recent systems span open-ended AI research (Lu et al., 2024; Lu et al., 2026), biomedical hypothesis generation (Gottweis et al., 2026), automated biology discovery (Ghareeb et al., 2026), and empirical software generation (Aygün et al., 2026). This survey summarizes a complete pipeline for developing multimodal agentic AIScientists, with representative studies spanning 10 scientific domains.
Prior surveys examine scientific AI agents by workflow stages, autonomy levels, domain resources, or automation-to-autonomy transitions. Our survey adds a pipeline-oriented view across modalities, agent training, inference-time methods, benchmarks, and human-AI collaboration, clarifying how multimodal scientific agents are built, where costs arise, and which human checkpoints remain necessary.
| Paper | Taxonomy | Ag. | DM. | Method | HCI | Ben. | #Dom. |
|---|---|---|---|---|---|---|---|
| Zhang et al. (2024) | Domain | ✗ | Seq.+ | Train. only | ✗ | ✓ | 6 |
| Gridach et al. (2025) | Research Workflow | ✓ | ✗ | Infer. only | ✓ | ✗ | 4 |
| Luo et al. (2025) | Research Workflow | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| Zhang et al. (2025) | Research Workflow | ✗ | Seq.+ | ✗ | ✗ | ✗ | ✗ |
| Ren et al. (2025) | Agent Composition | ✓ | ✗ | Train. & Infer. | ✗ | ✓ | 6+ |
| Wei et al. (2025) | Auto. & Domain | ✓ | ✗ | Infer. only | ✗ | ✓ | 4 |
| Hu et al. (2025) | Data & Domain | ✓ | ✓ | ✗ | ✗ | ✓ | 6+ |
| Zheng et al. (2025) | Research Workflow & Auto. | ✓ | ✗ | Infer. only | ✓ | ✓ | 6+ |
| Zhou et al. (2025) | Research Workflow | ✓ | ✗ | Infer. only | ✓ | ✓ | 6+ |
| Ours | ML & Research Pipeline | ✓ | ✓ | Train. & Infer. | ✓ | ✓ | 10 |
Ag. = Agentic AI; DM. = Data Modality; HCI = Human-Computer Interaction; Ben. = Benchmark; #Dom. = Number of domains; Seq.+ = Sequence and more modalities; Train. = Agent Training; Infer. = Agent Inference; Auto. = Autonomy Level
Overview of our framework: Starting from diverse Input & Output modalities, through Agent Training and Inference methods, to Evaluation benchmarks, with Human-AI Collaboration integrated at every stage.
Scientific MLLM agents need more than generic instruction following: they must learn domain representations, call tools, ground decisions in evidence, and recover when execution contradicts the plan.
- SciBERT: A Pretrained Language Model for Scientific Text (2019) - Beltagy et al.
- Gorilla: Large Language Model Connected with Massive APIs (2023) - Patil et al.
- ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs (2024) - Qin et al.
- A Multimodal Conversational Agent for DNA, RNA and Protein Tasks (2025) - de Almeida et al.
- TxGemma: Efficient and Agentic LLMs for Therapeutics (2025) - Wang et al.
- ProtAgents: Protein Discovery via Large Language Model Multi-Agent Collaborations Combining Physics and Machine Learning (2024) - Ghafarollahi & Buehler
- Training Language Models to Follow Instructions with Human Feedback (2022) - Ouyang et al.
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model (2024) - Rafailov et al.
- ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models (2025) - Baek et al.
- Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents (2024) - Li et al.
- Automating Alloy Design and Discovery with Physics-Aware Multimodal Multiagent AI (2025) - Ghafarollahi et al.
- AI Achieves Silver-Medal Standard Solving International Mathematical Olympiad Problems (2024) - AlphaProof & AlphaGeometry teams
- Generation of Rational Drug-like Molecular Structures Through a Multiple-Objective Reinforcement Learning Framework (2025) - Zhang et al.
- DrugCLIP: Contrastive Protein-Molecule Representation Learning for Virtual Screening (2023) - Gao et al.
- Triplet Contrastive Learning Framework with Adversarial Hard-Negative Sample Generation for Multimodal Remote Sensing Images (2024) - Chen et al.
- Generating Mutants of Monotone Affinity Towards Stronger Protein Complexes Through Adversarial Learning (2024) - Lan et al.
- Drug Repositioning Based on Residual Attention Network and Free Multiscale Adversarial Training (2024) - Li et al.
- EPIPDLF: A Pretrained Deep Learning Framework for Predicting Enhancer-Promoter Interactions (2025) - Xiao et al.
- Improved Techniques for Training GANs (2016) - Salimans et al.
- ClinicalRAG: Enhancing Clinical Decision Support Through Heterogeneous Knowledge Retrieval (2024) - Lu et al.
- AutoProteinEngine: A Large Language Model Driven Agent Framework for Multimodal AutoML in Protein Engineering (2025) - Liu et al.
- ESCARGOT: An AI Agent Leveraging Large Language Models, Dynamic Graph of Thoughts, and Biomedical Knowledge Graphs for Enhanced Reasoning (2025) - Matsumoto et al.
- A Framework for Autonomous AI-Driven Drug Discovery (2024) - Selinger et al.
- Automating AI Discovery for Biomedicine Through Knowledge Graphs and LLM Agents (2025) - Aamer et al.
- BioImage.IO Chatbot: A Community-Driven AI Assistant for Integrative Computational Bioimaging (2024) - Lei et al.
- A Survey on In-Context Learning (2024) - Dong et al.
- ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery (2025) - Chen et al.
- Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions (2025) - Hou et al.
- Democratizing AI Scientists Using ToolUniverse (2025) - Gao et al.
- Biomni: A General-Purpose Biomedical AI Agent (2025) - Huang et al.
- MedRAX: Medical Reasoning Agent for Chest X-ray (2025) - Fallahpour et al.
- CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments (2024) - Huang et al.
- CACTUS: Chemistry Agent Connecting Tool Usage to Science (2024) - McNaughton et al.
- Augmenting Large Language Models with Chemistry Tools (2024) - Bran et al.
- Omega: Harnessing the Power of Large Language Models for Bioimage Analysis (2024) - Royer
- MT-Mol: Multi Agent System with Tool-Based Reasoning for Molecular Optimization (2025) - Kim et al.
- The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search (2025) - Yamada et al.
- Towards End-to-End Automation of AI Research (2026) - Lu et al.
- Accelerating Scientific Discovery with Co-Scientist (2026) - Gottweis et al.
- A Multi-Agent System for Automating Scientific Discovery (2026) - Ghareeb et al.
- An AI System to Help Scientists Write Expert-Level Empirical Software (2026) - Aygün et al.
- Reflexion: Language Agents with Verbal Reinforcement Learning (2023) - Shinn et al.
- ArgMed-Agents: Explainable Clinical Decision Reasoning with LLM Discussion via Argumentation Schemes (2024) - Hong et al.
- GeneAgent: Self-Verification Language Agent for Gene Set Knowledge Discovery Using Domain Databases (2024) - Wang et al.
- Multi-Agent Collaboration Mechanisms: A Survey of LLMs (2025) - Tran et al.
- ProtAgents: Protein Discovery via Large Language Model Multi-Agent Collaborations Combining Physics and Machine Learning (2024) - Ghafarollahi & Buehler
- Automating Alloy Design and Discovery with Physics-Aware Multimodal Multiagent AI (2025) - Ghafarollahi et al.
- TriageAgent: Towards Better Multi-Agents Collaborations for Large Language Model-Based Clinical Triage (2024) - Lu et al.
- MedAgents: Large Language Models as Collaborators for Zero-Shot Medical Reasoning (2024) - Tang et al.
- MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making (2024) - Kim et al.
- ColaCare: Enhancing Electronic Health Record Modeling Through Large Language Model-Driven Multi-Agent Collaboration (2025) - Wang et al.
- DrugAgent: Automating AI-Aided Drug Discovery Programming Through LLM Multi-Agent Collaboration (2024) - Liu et al.
- Synthetic Arabic Medical Dialogues Using Advanced Multi-Agent LLM Techniques (2024) - ALMutairi et al.
- Advancing Healthcare Automation: Multi-Agent System for Medical Necessity Justification (2024) - Pandey et al.
- Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering (2022) - Lu et al.
- DiscoveryWorld: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents (2024) - Jansen et al.
- ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery (2025) - Chen et al.
- Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration (2025) - Shao et al.
- HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation (2025) - Liu et al.
- ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition (2025) - Liu et al.
- Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers (2024) - Si et al.
- Automated Hypothesis Validation with Agentic Sequential Falsifications (2025) - Huang et al.
- Detecting Hallucinations in Large Language Models Using Semantic Entropy (2024) - Farquhar et al.
- LLM Hallucinations in the Wild: Large-Scale Evidence from Non-Existent Citations (2026) - Zhao et al.
- Evaluating Large Language Model Agents for Automation of Atomic Force Microscopy (2025) - Mandal et al.
- How to Detect and Defeat Molecular Mirage: A Metric-Driven Benchmark for Hallucination in LLM-Based Molecular Comprehension (2025) - Li et al.
- Detecting and Evaluating Medical Hallucinations in Large Vision Language Models (2024) - Chen et al.
- LitLLMs, LLMs for Literature Review: Are We There Yet? (2025) - Agarwal et al.
- MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving (2025) - Zhang et al.
- Exploring Collaboration Patterns and Strategies in Human-AI Co-Creation Through the Lens of Agency (2025) - Zhang et al.
- AI-Researcher: Autonomous Scientific Innovation (2025) - Tang et al.
- Automated Statistical Model Discovery with Language Models (2024) - Li et al.
- Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration (2025) - Shao et al.
- The Virtual Lab of AI Agents Designs New SARS-CoV-2 Nanobodies (2025) - Swanson et al.
- FutureHouse Platform: Superintelligent AI Agents for Scientific Discovery (2025) - Skarlinski et al.
- ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models (2025) - Baek et al.
- Agent Laboratory: Using LLM Agents as Research Assistants (2025) - Schmidgall et al.
- An AI Agent for Fully Automated Multi-Omic Analyses (2024) - Zhou et al.
- DrugAgent: Automating AI-Aided Drug Discovery Programming Through LLM Multi-Agent Collaboration (2024) - Liu et al.
- Toward a Team of AI-Made Scientists for Scientific Discovery from Gene Expression Data (2024) - Liu et al.
- CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-Based Experimentation (2025) - Jansen et al.
- The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (2024) - Lu et al.
- Accelerating Scientific Discovery with Co-Scientist (2026) - Gottweis et al.
- A Multi-Agent System for Automating Scientific Discovery (2026) - Ghareeb et al.
- Localization, Inspection, and Reasoning Module for Autonomous Workflows in Self-Driving Laboratories (2025) - Zhou et al.
- Evaluating Large Language Model Agents for Automation of Atomic Force Microscopy (2025) - Mandal et al.
This project is licensed under the MIT License - see the LICENSE file for details.

