Skip to content

xqz614/Awesome-Agentic-Clinical-Dialogue

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

213 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Awesome-Agentic-Clinical-Dialogue/blob/main/assets/header.svg

Stars Forks Issues


Welcome to Awesome-Agentic-Clinical-Dialogue. This repo includes papers about methods related to agentic clinical dialogue. We believe that the agentic paradigm is still a largely unexplored area, and we hope this repository will provide you with some valuable insights!
Read our survey paper here: Reinventing Clinical Dialogue: Agentic Paradigms for LLM‑Enabled Healthcare Communication
Courses&Tutorial (🤟Check it out!)PapersDatasetsLeading Group

📘 Overview

This framework facilitates a systematic analysis of the intrinsic trade-offs between creativity and reliability by categorizing methods into four archetypes: Latent Space Clinicians, Emergent Planners, Grounded Synthesizers, and Verifiable Workflow Automators. For each paradigm, we deconstruct the technical realization across the entire cognitive pipeline, encompassing strategic planning, memory management, action execution, collaboration, and evolution, to reveal how distinct architectural choices balance the tension between autonomy and safety. Furthermore, we bridge abstract design philosophies with the pragmatic implementation ecosystem. By mapping real-world applications to our taxonomy and systematically reviewing benchmarks and evaluation metrics specific to clinical agents, we provide a comprehensive reference for future development.

📁 Table of Contents

🔑 Key Categories

  • 🤖Latent Space Clinicians (LSC). These agents leverage the LLM's vast internal knowledge for creative synthesis and forming a coherent understanding of a clinical situation. Their philosophy is to trust the model's emergent reasoning capabilities to function like an experienced clinical assistant providing insights. For example, the zero/few-shot reasoning capabilities of Med-PaLM or MedAgents exemplify this paradigm.
  • 🤖Emergent Planners (EP). This paradigm grants the LLM a high degree of autonomy, allowing it to dynamically devise its own multi-step plan to achieve a complex clinical goal. The agent's behavior is emergent, as it independently determines the necessary steps and goals. Frameworks like AgentMD, which uses ReAct-style prompting.
  • 🤖Grounded Synthesizers (GS). These agents operate under the principle that LLMs should function as powerful natural language interfaces to reliable external information rather than as knowledge creators. Their primary role is to retrieve, integrate, and accurately summarize information from verifiable sources like medical databases or imaging data. Exemplars include the foundational frameworks medical retrieval and indexing techniques such as Med-RAG and MA-COIR.
  • 🤖Verifiable Workflow Automators (VWA). In this paradigm, agent autonomy is strictly constrained within pre-defined, verifiable clinical workflows or decision trees. The LLM acts as a natural language front-end to a structured process, executing tasks rather than making open-ended decisions, which ensures maximum safety and predictability. This approach is exemplified by commercial triage bots, the structured conversational framework of systems like Google's AMIE, and principles from classic task-oriented dialogue systems sush as MeDi-TODER.

✳️Start with Awesome Dataset

Back to Content

I. QA Dialogue

Back to Content

Dataset Name Time (Pub) Downstream Task Brief Description Source
MedQA 2020 Medical Examination (QA) Large-scale multiple-choice questions collected from professional medical board exams (USMLE, Mainland China, Taiwan). Paper Github
MedMCQA 2022 Medical Examination (QA) Large-scale, multiple-choice QA dataset derived from Indian medical entrance examinations (AIIMS/NEET). Paper Github
cMedQA2 2019 QA / Retrieval Chinese medical QA dataset with queries and answers from online health counseling platforms. Paper Github
CMExam 2023 Medical Examination (QA) 60K+ multiple-choice questions from the Chinese National Medical Licensing Examination with detailed annotations. Paper Github
Medbullets 2024 Medical Examination (QA) High-quality USMLE Step 2 & 3 style questions with expert-written explanations for reasoning evaluation. Paper Github
HeadQA 2019 Medical Examination (QA) Multiple-choice questions from Spanish healthcare exams (MIR, EIR, etc.) for testing complex reasoning. Paper Github
MedCalc-Bench 2024 QA / Calculation A dataset focusing on the "computational" aspect of medicine (formulas, risk scores) which LLMs typically struggle with. Paper Github
RJUA-MedDQA 2024 Multimodal QA A benchmark for document-level medical reasoning, requiring models to interpret text, tables, and images in reports. Paper Github
TeleQnA 2024 Telemedicine QA Real-world style doctor-patient QA benchmark aimed at evaluating LLMs in telemedicine scenarios. Paper Github
CareQA 2025 Medical Examination Sourced from Spanish specialized healthcare exams (MIR), offering both closed-ended and open-ended evaluation formats. Paper Github
Huatuo-26M 2023 Medical Examination / QA Massive Chinese medical QA dataset with 26 million QA pairs, used for pre-training and retrieval. Paper Github
MultiMedQA 2022 QA / Evaluation The benchmark suite used for Med-PaLM, including HealthSearchQA, LiveQA, and others for safety & accuracy. Paper Github
CasiMedicos-Arg 2024 Medical Examination (QA) Multilingual dataset (ES, EN, FR, IT) annotated with explanatory argumentative structures for clinical cases. Paper Github
PubMedQA 2019 Literature-based QA Biomedical QA task to answer "yes/no/maybe" from PubMed abstracts. Paper Github
CliCR 2018 Literature-based QA Dataset of clinical case reports designed for machine reading comprehension. Paper Github
MEDIQA-2019 2019 Literature-based QA Shared task data focusing on NLI, RQE (Recognizing Question Entailment), and QA in the medical domain. Paper Github
BioASQ 2013-2023 Literature-based QA Long-running challenge series for large-scale biomedical semantic indexing and question answering. Paper Github
Medical Meadow 2023 Literature-based / Tuning A collection of various medical tasks (Flashcards, Wikidoc) reformatted for instruction tuning. Paper Github
MASH-QA 2020 Consumer Health QA Multiple-span extraction QA dataset for consumer health questions (e.g., from WebMD). Paper Github
HealthQA 2019 Consumer Health QA Dataset focusing on reliability and helpfulness of health answers. Paper Github
AfriMed-QA 2025 Domain-specific QA Pan-African medical QA benchmark (15k Qs) covering 32 specialties and local context from 16 countries. Paper Github
MedCalc-Bench 2024 Domain-specific QA Benchmark for evaluating LLMs on medical calculations (formulas, scores) with patient notes. Paper Github
MedHallu 2025 QA / Hallucination 10k Q-A pairs derived from PubMedQA annotated to detect and categorize medical hallucinations. Paper Github
MedicationQA 2019 Consumer Health QA Dataset of consumer questions about medications (drug interactions, dosage) with expert answers. Paper Github
RJUA-MedDQA 2024 QA / Multimodal Multimodal benchmark for medical document understanding (images/reports) and clinical reasoning. Paper Github
MedExQA 2025 QA / Explanation Dataset including five medical specialties and explanations for each qa pair. Paper Github

II. Task-oriented Dialogue

Back to Content

Dataset Name Time (Pub) Downstream Task Brief Description Source
MedReason 2025 Symptom Diagnosis Large-scale medical reasoning dataset designed to enable explainable medical problem-solving. Paper Github
MedDialog 2020 Symptom Diagnosis Massive dataset (English/Chinese) of doctor-patient conversations scraped from online platforms. Paper Github
DialoAMC 2023 Symptom Diagnosis Dataset for Automated Medical Consultation focusing on symptom elicitation and diagnosis. Paper Github
MedDG 2022 Symptom Diagnosis High-quality entity-annotated medical dialogue dataset for diagnosis and treatment recommendation. Paper Github
MZ (Muzhi) 2018 Symptom Diagnosis Chinese medical dialogue dataset from the "Muzhi" platform for self-diagnosis agents. Paper Github
CMDD 2019 Symptom Diagnosis Chinese Medical Diagnostic Dialogue dataset (Pediatrics) with symptom-disease mappings. Paper Github
DDXPlus 2022 Symptom Diagnosis Large-scale automated diagnosis dataset with pathology-driven probabilistic generation (1.3M patients). Paper Github
DX (DXY) 2019 Symptom Diagnosis Diagnostic dataset from DXY.cn, containing dialogue sessions with explicit symptom transitions. Paper Github
CovidDialog 2020 Symptom Diagnosis (COVID) Dialogues specifically regarding COVID-19 consultations, scraped during the pandemic. Paper Github
Ext-CovidDialog 2023 Symptom Diagnosis (COVID) Extended version of CovidDialog with more data covering evolving variants and scenarios. Paper Github
IMCS-21 2021 Symptom Diagnosis Interactive Medical Consultation System dataset; focuses on multi-turn diagnostic dialogue. Paper Github
BC5CDR 2015 Entity Recognition BioCreative V task dataset for Chemical-Disease Relation extraction (NER/RE). Paper Github
NCBI-Disease 2014 Entity Recognition Corpus of PubMed abstracts annotated with disease mentions for NER. Paper Github
PHEE 2022 Entity Extraction Pharmacovigilance Event Extraction dataset for identifying adverse drug events from text. Paper Github
MedAlign 2023 Instruction Following Clinician-generated dataset for instruction following (summaries, questions) based on EHR data. Paper Github
MedInstruct 2023 Instruction Following Dataset of 52k medical instructions constructed from existing datasets (e.g., MedQA) for tuning. Paper Github
BianqueCorpus 2023 Instruction Following Large-scale multi-turn Chinese health conversation dataset with balanced questioning/suggestions. Paper Github
MedSynth 2025 Generation / Summarization Synthetic medical dialogue-note pairs designed to advance dialogue-to-note and note-to-dialogue tasks. Paper Github
MeQSum 2019 Summarization / Instruction Dataset for summarizing consumer health questions into canonical medical questions. Paper Github

III. Recommendation Dialogue

Back to Content

Dataset Name Time (Pub) Downstream Task Brief Description Source
DialMed 2022 Recommendation (Drug) Dialogue dataset designed for medication recommendation based on patient history/dialogue. Paper Github
HealthCareMagic 2023 Treatment Rec / QA Massive dataset (100k) of real patient queries and doctor responses, explicitly containing treatment recommendations. Paper Github
iCliniq 2023 Treatment Rec / QA 10k highly curated doctor-patient dialogues focusing on providing medical advice and recommendations. Paper Github
ReMeDi 2021 Recommendation "Resources for Medical Dialogue"; focuses on movie/medical recommendation scenarios. Paper Github
MIMIC-III 2016 Database (Source) Large database of de-identified health-related data (EHRs) used to construct recommendation tasks. Paper Source
DrugBank - Knowledge Base (Source) Comprehensive database containing information on drugs and drug targets, used for grounding recommendations. Source
ProKnow-data 2020 Recommendation Data used for proactive knowledge-grounded dialogue, often adapted for medical contexts. Paper Github
DDInter 2024 (Upd) Drug Safety / KB Comprehensive Drug-Drug Interaction database; critical for agents to verify safety before recommending medication. Paper Github
PromptCBLUE 2024 Rec / Classification A unified benchmark where specific subtasks focus on recommending medical departments or classifying medical intents. Paper Github
CMtMed 2024 Hybrid / Treatment Rec Large-scale Chinese Multi-turn Medical dialogue dataset containing explicit "Medical Advice" and treatment plan slots. Paper Github

IV. Supportive Dialogue

Back to Content

Dataset Name Time (Pub) Downstream Task Brief Description Source
EmpatheticDialogues 2019 General Empathetic Large dataset of 25k conversations grounded in emotional situations (general domain). Paper Github
CPsyCoun 2024 Mental Health Support A high-quality, multi-turn dialogue dataset reconstructed from psychological consulting reports for realistic counseling. Paper Github
PsySafe 2024 Mental Health / Safety Focuses on the safety aspect of supportive agents, identifying risky or toxic responses in mental health dialogue. Paper Github
MELD 2019 General Empathetic Multimodal EmotionLines Dataset; textual/audio/visual emotion recognition. Paper Github
PsyQA 2021 Mental Health Support Chinese dataset of psychological health support (Q&A) with strategy annotations. Paper Github
ESConv 2021 Mental Health Support Emotional Support Conversation dataset designed to train agents in empathy and support strategies. Paper Github
SoulChat-Corpus 2023 Mental Health Support Large-scale Chinese dataset for single-turn and multi-turn empathetic psychological counseling. Paper Github
MTS-Dialogue 2023 Clinical Support/Summ. 1.7k doctor-patient conversations paired with corresponding clinical note summaries. Paper Github
SMILECHAT 2023 Mental Health Support Dataset for mental health support focusing on cognitive distortion detection and reframing. Paper Github

V. Hybrid Function

Back to Content

Dataset Name Time (Pub) Downstream Task Brief Description Source
MidMed 2023 Hybrid (Diag/Rec/Chat) Mixed-type dialogue corpus covering diagnosis, recommendation, QA, and chitchat in one session. Paper Github
MedEval 2023 Evaluation Benchmark Multi-level, multi-task benchmark spanning 35 body regions and 8 exam modalities for LLM eval. Paper Github
MedTrinity-25M 2024 Multimodal / Hybrid Massive multimodal dataset (25M images) with multigranular annotations (Image-ROI-Text). Paper Github
MENTAT 2025 Mental Health / Hybrid Clinician-annotated benchmark for complex psychiatric decision-making (diagnosis, triage, etc.). Paper Github
MedAlpaca 2023 Instruction Tuning Collection of datasets (see Medical Meadow) used to train the MedAlpaca model series. Paper Github
NoteChat 2023 Generation / Hybrid Synthetic patient-physician conversations conditioned on clinical notes (Note-to-Dialogue). Paper Github

⛪ Leading Group

Back to Content

Institution Leading Researcher/Group Source
Google Google Health Homepage
NIH Zhiyong Lu Homepage
Open AI Health AI Team Homepage
Ant Group AI for Science Team Homepage
Alibaba Tongyi Lab, Damo, AQ-Med Lab Homepage, Homepage, Homepage
Shanghai AI Lab AI for Science Team, AI4Med Team Homepage, Homepage
Baichuan AI AI Lab Homepage
Meta FAIR Team Homepage
Tecent Jarvislab, Xiaobin Hu Homepage, Homepage
Huawei NoAH Homepage
ByteDance Seed,AI for Science Team Homepage
Microsoft Research Hoifung Poon Homepage
Harvard Xiang Li, Faisal Mahmood Lab, Pranav Rajpurkar, Tianxi Cai Homepage, Homepage, Homepage, Homepage
Maryland Hanan Samet Homepage
MIT Paul Liang, Peter Szolovits Homepage, Homepage
Oxford Tingting Zhu, David A. Clifton, Alison Noble Homepage, Homepage, Homepage
Cambridge Vanderschaar-lab, Andreas Vlachos Homepage, Homepage
NTU Chunyan Miao Homepage
Tsinghua University Yang Liu, Hong-Yu Zhou, Weizhi Ma, Medical Informatics Lab Homepage, Homepage, Homepage, Homepage
SJTU Chaoyi Wu, Weidi Xie, MAGIC Homepage, Homepage, Homepage
UNC Tianlong Chen, Huaxiu Yao Homepage, Homepage
Yale Clinical NLP Lab Homepage
UBC Xiaoxiao Li Homepage
UIUC Jimeng Sun,Jiawei Han Homepage, Homepage
ZJU DCDmllm, Jian Wu Homepage, Homepage
Notre Dame SCLab Homepage
Pennsylvania Tianyu Han, Fenglong Ma, Lyle Ungar Homepage, Homepage, Homepage
Emory Carl Yang Homepage
Stanford SNAP, James Zou, Yejin Choi Homepage, Homepage, Homepage
PKU Liantao Ma, Yasha Wang Homepage
TJU ADM Group Homepage
Edinburgh Ewen M Harrison Homepage
Virginia Aidong Zhang, Xuan Wang Homepage, Homepage
CUHK Freedom AI, YuanWu, Michael R. Lyu, Benyou Wang Homepage, Homepage, Homepage, Homepage
CityU Xiangyu Zhao Homepage
Houston Methodist Wang Lab Homepage
Mbzuai Jianing Qiu Homepage
DKFZ German Cancer Research Center Homepage
California Yuyin Zhou Homepage
ETH Michael Moor Homepage
JOHNS HOPKINS Suchi Saria Homepage
Cornell Fei Wang, Claire Cardie Homepage, Homepage
GE Healthcare Xiao Cao Homepage
Rutgers Mu Zhou Homepage
UT Ying Ding, Wenqi Shi Homepage, Homepage
UC Berkeley Bin Yu Homepage
UW Hannaneh Hajishirzi Homepage
LMU Munich Volker Tresp Homepage
SBU Chenyu You Homepage
FuDan Zhongyu Wei Homepage
Minnesota Rui Zhang Homepage
Monash AIM Lab Homepage
USYD Med AI Lab Homepage
Queensu Medi Lab Homepage
Open Source Platform OpenMed Lab Homepage

📖 Awesome Methods, Model, and Resource List

Back to Content

🤖LSC

Back to Content

📊Planning

Back to Content

  • BioGPT: generative pre-trained transformer for biomedical text generation and mining (Briefings Bioinf., 2023) paper, code

    A domain-specific generative Transformer pre-trained on large-scale biomedical literature to achieve state-of-the-art performance in text generation and mining tasks.

  • BioBART: Pretraining and Evaluation of A Biomedical Generative Language Model (BioNLP, 2022) paper, code

    Adapts the BART architecture to the biomedical domain with enhanced pre-training tasks, significantly improving performance on summarization and dialogue generation.

  • ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission (CHIL, 2020) paper, code

    Develops contextual embeddings specifically for clinical notes to effectively predict hospital readmission and model long-term clinical dependencies.

  • BioMegatron: Larger Biomedical Domain Language Model (EMNLP, 2020) paper, code

    Leverages the Megatron-LM infrastructure to train a large-scale biomedical language model, demonstrating improvements in named entity recognition and QA tasks.

  • Toward expert-level medical question answering with large language models (Nature, 2023) paper, code

    Introduces Med-PaLM, utilizing instruction tuning and ensemble refinement to become the first AI to exceed the passing score on the USMLE.

  • CoD: Towards an Interpretable Medical Agent using Chain of Diagnosis (ICML AI4Science, 2024) paper, code

    Proposes a Chain of Diagnosis (CoD) framework that breaks down the diagnostic process into interpretable steps to enhance transparency and accuracy.

  • HuaTuo: Tuning LLaMA Model with Chinese Medical Knowledge (EMNLP Findings, 2023) paper, code

    Incorporates a structured medical knowledge graph into the LLaMA model via instruction tuning to significantly enhance Chinese medical QA capabilities.

  • Learning Causal Alignment for Reliable Disease Diagnosis (ICCV, 2023) paper, code

    Introduces a causal alignment framework to mitigate confounding biases in medical data, ensuring more reliable and generalizable disease diagnosis.

  • Reasoning with large language models for medical question answering (npj Digit. Med., 2024) paper

    systematically evaluates different reasoning strategies (like Chain-of-Thought) in LLMs to identify the most effective methods for complex medical QA.

  • Empowering biomedical discovery with AI agents (Nature, 2024) paper

    Discusses the paradigm shift towards autonomous AI agents capable of planning and executing experiments to accelerate biomedical research and discovery.

  • A fast nonnegative autoencoder-based approach to latent feature analysis on high-dimensional and incomplete data (IEEE TNNLS, 2024) paper

    Proposes a highly efficient nonnegative autoencoder designed to extract latent features from high-dimensional, sparse, and incomplete medical datasets.

  • Multiview latent space learning with progressively fine-tuned deep features for unsupervised domain adaptation (Inf. Sci., 2024) paper

    Develops a method to align multiview latent spaces using progressively fine-tuned features, improving unsupervised domain adaptation in medical imaging analysis.

  • Autosurv: interpretable deep learning framework for cancer survival analysis incorporating clinical and multi-omics data (npj Precis. Oncol., 2023) paper, code

    A comprehensive and interpretable deep learning framework that integrates clinical and multi-omics data to improve cancer survival prediction accuracy.

  • Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model (arXiv, 2023) paper, code

    Presents a multi-stage training strategy to inject massive medical knowledge into LLMs, enhancing their reasoning and dialogue performance in Chinese medical contexts.

  • Counterfactual reasoning using causal Bayesian networks as a healthcare governance tool (Sci. Rep., 2024) paper

    Applies causal Bayesian networks to perform counterfactual analysis, providing a quantitative tool for evaluating healthcare policies and governance decisions.

  • Large Language Models for Medical Forecasting - Foresight 2 (arXiv, 2024) paper

    Introduces a generative foundation model trained on longitudinal patient records to forecast future medical events and health trajectories.

  • Ontology accelerates few-shot learning capability of large language model: A study in extraction of drug efficacy in a rare pediatric epilepsy (Comput. Methods Programs Biomed., 2025) paper

    Demonstrates that integrating domain ontologies significantly boosts the few-shot learning performance of LLMs for information extraction in rare diseases.

  • A generalist medical language model for disease diagnosis assistance (Nat. Med., 2024) paper

    Presents AMIE, a generalist medical AI system optimized for diagnostic dialogue that matches or exceeds primary care physicians in simulated diagnostic tasks.

  • Taiyi: A Bilingual Fine-Tuned Large Language Model for Diverse Biomedical Tasks (JAMIA, 2024) paper, code

    A bilingual (English/Chinese) LLM specifically fine-tuned to handle a diverse range of biomedical tasks, including NER, RE, and QA.

🧠Memory

Back to Content

  • Focus on What Matters: Enhancing Medical Vision-Language Models with Automatic Attention Alignment Tuning (arXiv, 2025) paper

    Proposes an Automatic Attention Alignment (AAA) mechanism to align the visual attention of VLMs with clinical masks, enhancing interpretability and performance.

  • Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? (EMNLP, 2022) paper, code

    Demonstrates that the ground-truth accuracy of labels in demonstrations matters less than the label space and distribution, reshaping the understanding of in-context learning.

  • HuatuoGPT-II: One-stage Training for Medical Adaption of LLMs (ACL Findings, 2024) paper, code

    Introduces a one-stage training protocol that unifies medical domain adaptation and general instruction following, simplifying the training pipeline.

  • Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine (npj Digit. Med., 2024) paper, code

    Investigates the use of structured prompting strategies to elicit and visualize the diagnostic reasoning paths of LLMs, improving transparency.

  • AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models (MICCAI, 2024) paper, code

    A zero-shot framework utilizing attribute-based text prompts to guide visual-language models in detecting nuclei without task-specific training.

  • A context-based chatbot surpasses radiologists and generic ChatGPT in following the ACR appropriateness guidelines (Sci. Rep., 2023) paper

    Develops a specialized chatbot that leverages clinical context to adhere to ACR appropriateness guidelines more accurately than human radiologists.

  • MedVH: Towards Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context (ECCV, 2024) paper, code

    Establishes a comprehensive benchmark and evaluation dataset specifically designed to detect and analyze hallucinations in medical vision-language models.

  • The FAIIR conversational AI agent assistant for youth mental health service provision (npj Digit. Med., 2025) paper

    Presents FAIIR, a conversational agent designed to assist in the triage and service provision for youth mental health, reducing clinician workload.

  • Galactica: A Large Language Model for Science (arXiv, 2022) paper, code

    A large language model trained on a massive corpus of scientific knowledge, designed to store, reason, and generate scientific content.

  • Clinical ModernBERT: An efficient and long context encoder for biomedical text (arXiv, 2025) paper

    Adapts the ModernBERT architecture to the clinical domain, offering a high-efficiency encoder capable of processing long-context electronic health records.

  • DK-BEHRT: Teaching language models international classification of disease (ICD) codes using known disease descriptions (CHIL, 2024) paper, code

    Enhances the BEHRT model by incorporating textual descriptions of diseases, significantly improving the accuracy of automated ICD coding.

  • Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHRs (arXiv, 2024) paper

    Benchmarks various long-context LLMs on their ability to extract relevant information from lengthy and complex electronic health records.

  • Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models (ACL Findings, 2024) paper, code

    Proposes a recursive summarization technique to compress dialogue history, enabling LLMs to maintain long-term memory in medical consultations.

  • Adapted large language models can outperform medical experts in clinical text summarization (Nat. Med., 2024) paper

    Provides empirical evidence that domain-adapted LLMs generate clinical summaries that are rated higher in quality and accuracy than those by human experts.

  • BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights (EMNLP Findings, 2023) paper, code

    Produces rich semantic textual representations by grounding LLM generation in definitions and relationships from clinical knowledge graphs.

  • AI-Enabled Conversational Journaling for Advancing Parkinson's Disease Symptom Tracking (arXiv, 2025) paper

    Develops a conversational agent that engages patients in journaling to track and analyze Parkinson's disease symptoms over time.

👥Cooperation

Back to Content

  • MEDCO: Medical Education Copilots Based on A Multi-Agent Framework (ECCV Workshops, 2024) paper

    Introduces a multi-agent educational copilot system comprising student, patient, and expert agents to simulate realistic clinical training scenarios.

  • ColaCare: Enhancing Electronic Health Record Modeling through Large Language Model-Driven Multi-Agent Collaboration (WWW, 2025) paper, code

    Enhances EHR predictive modeling by using a multi-agent "medical team" (DoctorAgents and MetaAgent) to collaborate on patient data analysis.

  • ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs (ACL Findings, 2024) paper, code

    A multi-agent framework where diverse LLMs engage in round-table discussions to reach consensus, significantly improving reasoning accuracy.

  • MAM: Modular Multi-Agent Framework for Multi-Modal Medical Diagnosis via Role-Specialized Collaboration (ACL Findings, 2025) paper, code

    Decomposes diagnostic tasks into specialized agent roles (General Practitioner, Specialist, Radiologist) to handle multi-modal medical data effectively.

  • MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making (NeurIPS, 2024) paper, code

    Dynamically adapts the collaboration structure (solo vs. group) of LLM agents based on the medical complexity of the query.

  • Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions (MedAgentSim) (MICCAI, 2025) paper, code

    Presents MedAgentSim, a framework where doctor and patient agents interact and evolve their diagnostic strategies through experience without human labeling.

  • MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning (arXiv, 2023) paper, code

    Leveraging a multi-agent debate mechanism to enhance zero-shot clinical reasoning capabilities by simulating medical consultations.

⏫Self-evolution

Back to Content

  • AlphaEvolve: A coding agent for scientific and algorithmic discovery (arXiv, 2025) paper

    An evolutionary coding agent from DeepMind capable of autonomously discovering novel algorithms and optimizing code for scientific problems.

  • Revolutionizing healthcare: the role of artificial intelligence in clinical practice (BMC Med. Educ., 2023) paper

    A comprehensive review discussing the transformative impact and ethical implications of integrating AI agents into clinical workflows.

  • Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents (arXiv, 2024) paper, code

    Simulates a full hospital environment where doctor agents continuously evolve and improve their diagnostic skills by treating patient agents.

  • STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical Question-Answering (EMNLP, 2024) paper, code

    Uses a self-training pipeline with Direct Preference Optimization (DPO) to improve medical VLM performance using auto-generated data.

  • Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents (arXiv, 2025) paper

    Proposes a framework for open-ended agent evolution where the system can rewrite its own code to continuously improve its learning and reasoning mechanisms.

🤖EP

Back to Content

📊Planning

Back to Content

  • Towards Medical Complex Reasoning with LLMs through Medical Verifiable Problems (ACL Findings, 2025) paper

    Introduces the MedVP dataset, focusing on verifiable medical problems to benchmark and enhance the complex reasoning capabilities of LLMs.

  • Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback (AAAI, 2024) paper, code

    Enhances Chinese medical LLMs using a complete RLHF pipeline with expert doctors involved in the feedback loop to ensure professional accuracy.

  • Advancing Biomedical Claim Verification by Using Large Language Models with Better Structured Prompting Strategies (BioNLP, 2025) paper

    Evaluates various prompting strategies, such as chain-of-thought and self-consistency, to improve the accuracy of biomedical claim verification.

  • Generating Explanations in Medical Question-Answering by Expectation Maximization Inference over Evidence (EMNLP Findings, 2023) paper, code

    Proposes a latent variable model using Expectation Maximization to select relevant evidence and generate high-quality explanations for medical questions.

  • Self-Consistency Improves Chain of Thought Reasoning in Language Models (ICLR, 2023) paper, code

    Introduces a decoding strategy that samples multiple reasoning paths and selects the most consistent answer, significantly boosting performance on reasoning tasks.

  • S2AF: An action framework to self-check the Understanding Self-Consistency of Large Language Models (Neural Netw., 2025) paper

    Develops a framework that enables LLMs to self-evaluate their understanding and consistency through an action-based checking mechanism.

  • Ranked Voting based Self-Consistency of Large Language Models (arXiv, 2025) paper

    Proposes a ranked voting mechanism to aggregate outputs from self-consistency sampling, offering better robustness than simple majority voting.

  • A comparative evaluation of chain-of-thought-based prompt engineering techniques for medical question answering (Sci. Rep., 2025) paper

    Systematically benchmarks different Chain-of-Thought prompting variations to identify the most effective strategies for medical exams.

  • Tree-Planner: Efficient Close-loop Task Planning with Large Language Models (ICLR, 2024) paper, code

    Formulates task planning as a tree search problem, allowing agents to perform efficient closed-loop planning and error correction.

  • Least-to-Most Prompting Enables Complex Reasoning in Large Language Models (ICLR, 2023) paper

    A prompting strategy that decomposes complex problems into a sequence of simpler sub-problems, solving them sequentially to guide the model.

  • Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs (npj Digit. Med., 2024) paper

    Investigates how guideline-based prompting improves the consistency and clinical reliability of LLM responses in medical decision support.

  • Cost-Effective Framework with Optimized Task Decomposition and Batch Prompting for Medical Dialogue Summary (CIKM, 2023) paper

    Proposes a framework that reduces API costs while maintaining summary quality by optimizing task decomposition and using batch prompting.

  • A brain-inspired agentic architecture to improve planning with LLMs (Nat. Commun., 2025) paper

    Draws inspiration from human cognitive processes to design an agent architecture that separates planning, execution, and monitoring for better reliability.

  • Self-critiquing models for assisting human evaluators (NeurIPS, 2022) paper

    Trains models to generate natural language critiques of their own or others' outputs, helping human annotators find errors more efficiently.

  • FRAME: Feedback-Refined Agent Methodology for Enhancing Medical Research Insights (arXiv, 2025) paper

    An agentic framework that iteratively refines its analysis of medical research papers based on structured feedback loops.

  • Agentic Feedback Loop Modeling Improves Recommendation and User Simulation (WWW, 2025) paper

    Models the interaction between recommender agents and user simulator agents as a feedback loop to improve long-term recommendation utility.

🧠Memory

Back to Content

  • MOTOR: A Time-To-Event Foundation Model For Structured Medical Records (MLHC, 2023) paper, code

    A foundation model pre-trained on longitudinal structured medical records to perform time-to-event prediction tasks with high accuracy.

  • Agentic LLM Workflows for Generating Patient-Friendly Medical Reports (arXiv, 2024) paper

    Proposes a multi-agent workflow that transforms complex clinical notes into patient-friendly reports, improving accessibility and understanding.

  • Insights from high and low clinical users of telemedicine: a mixed-methods study of clinician workflows, sentiments, and user experiences (npj Digit. Med., 2025) paper

    A mixed-methods study analyzing clinician workflows and sentiments to understand the factors driving high versus low adoption of telemedicine.

  • Evaluating large language model workflows in clinical decision support for triage and referral and diagnosis (npj Digit. Med., 2025) paper

    Systematically evaluates LLM-based workflows in clinical decision support systems, specifically focusing on their safety and accuracy in triage and referral.

  • SoftTiger: A Clinical Foundation Model for Healthcare Workflows (arXiv, 2024) paper, code

    Introduces a LLaMA-based clinical foundation model optimized to integrate seamlessly into various healthcare workflows, from summarization to triage.

  • STAF-LLM: A scalable and task-adaptive fine-tuning framework for large language models in medical domain (Expert Syst. Appl., 2025) paper

    Presents a scalable framework for task-adaptive fine-tuning that efficiently adapts general LLMs to specific medical tasks with limited resources.

  • Addressing Overprescribing Challenges: Fine-Tuning Large Language Models for Medication Recommendation Tasks (arXiv, 2025) paper

    Investigates fine-tuning strategies for LLMs to generate safer medication recommendations, specifically targeting the reduction of overprescribing errors.

  • From pre-training to fine-tuning: An in-depth analysis of Large Language Models in the biomedical domain (Artif. Intell. Med., 2024) paper

    Provides a comprehensive comparative analysis of pre-training versus fine-tuning strategies for adapting LLMs to biomedical downstream tasks.

  • Open-Ended Medical Visual Question Answering Through Prefix Tuning of Language Models (MICCAI, 2023) paper, code

    Utilizes prefix tuning to adapt frozen language models for medical visual question answering, achieving high performance with few trainable parameters.

  • Diagnosing Transformers: Illuminating Feature Spaces for Clinical Decision-Making (NeurIPS, 2023) paper

    Analyzes the internal feature spaces of Transformer models to interpret how they represent clinical concepts and make decisions.

  • Embedding dynamic graph attention mechanism into Clinical Knowledge Graph for enhanced diagnostic accuracy (Expert Syst. Appl., 2024) paper

    Integrates a dynamic graph attention mechanism into clinical knowledge graphs to capture evolving patient states for more accurate diagnosis.

  • HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision Making (AAAI, 2025) paper

    A framework designed to detect and mitigate hallucinations in clinical decision-making by optimizing the retrieval-augmented context.

  • Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs (arXiv, 2025) paper

    Explores the synergistic effect of instruction tuning and Chain-of-Thought prompting to enhance the contextual understanding of medical QA models.

  • LIFE-CRAFT: A Multi-agentic Conversational RAG Framework for Lifestyle Medicine Coaching with Context Traceability and Case-Based Evidence Synthesis (HCII, 2024) paper

    A multi-agent RAG system designed for lifestyle medicine coaching that ensures advice is traceable to case-based evidence.

👥Cooperation

Back to Content

  • MedLA: A Logic-Driven Multi-Agent Framework for Complex Medical Reasoning with Large Language Models (arXiv, 2025) paper

    Proposes a logic-driven multi-agent framework where agents organize reasoning into explicit syllogistic trees to ensure transparent and verifiable medical decision-making.

  • ConfAgents: A Conformal-Guided Multi-Agent Framework for Cost-Efficient Medical Diagnosis (arXiv, 2025) paper

    Introduces a conformal prediction-based triage mechanism that dynamically assigns cases to single agents or multi-agent teams, balancing accuracy and computational cost.

  • Advancing Healthcare Automation: Multi-Agent System for Medical Necessity Justification (BioNLP, 2024) paper

    Deploys a multi-agent system to automate the labor-intensive process of prior authorization by justifying medical necessity against clinical guidelines.

  • A Two-Stage Proactive Dialogue Generator for Efficient Clinical Information Collection Using Large Language Model (Expert Syst. Appl., 2025) paper

    Develops a diagnostic dialogue system with a two-stage recommendation structure to proactively collect critical patient information and mimic real-doctor conversational styles.

  • Mediator-Guided Multi-Agent Collaboration among Open-Source Models for Medical Decision-Making (arXiv, 2025) paper

    Utilizes a mediator agent to facilitate Socratic dialogue and reflection among open-source Vision-Language Models (VLMs), enhancing multimodal diagnostic performance.

  • DynamiCare: A Dynamic Multi-Agent Framework for Interactive and Open-Ended Medical Decision-Making (arXiv, 2025) paper

    Models clinical diagnosis as a dynamic, multi-round loop where the agent team iteratively queries a patient system (MIMIC-Patient) and adapts its strategy based on new findings.

  • MAS-PatientCare: Medical Diagnosis and Patient Management System Based on a Multi-agent Architecture (Springer CCIS, 2025) paper

    Proposes a comprehensive multi-agent architecture for remote patient monitoring that integrates diagnostic reasoning with patient management workflows.

  • Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning (arXiv, 2024) paper

    A proactive framework that enables agents to autonomously inquire about missing modalities and integrate multimodal evidence for zero-shot medical reasoning.

⏫Self-evolution

Back to Content

  • Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions (MedAgentSim) (MICCAI, 2025) paper, code

    Introduces a simulation environment where doctor and patient agents interact and self-evolve through experience replay and feedback, significantly improving diagnostic realism.

  • Integrating Dynamical Systems Learning with Foundational Models: A Meta-Evolutionary AI Framework for Clinical Trials (arXiv, 2025) paper

    Combines dynamical systems theory with LLMs to create a meta-evolutionary framework that optimizes clinical trial designs and simulates patient trajectories.

  • MedPAO: A Protocol-Driven Agent for Structuring Medical Reports (HCII, 2025) paper

    Presents an agent that strictly follows medical protocols to structure unstructured clinical reports, ensuring high compliance and data quality.

  • Agentic Surgical AI: Surgeon Style Fingerprinting and Privacy Risk Quantification via Discrete Diffusion in a Vision-Language-Action Framework (arXiv, 2025) paper

    Explores the privacy risks of agentic surgical AI by demonstrating how "surgeon style" can be identified and protected using discrete diffusion models.

  • Improving Interactive Diagnostic Ability of a Large Language Model Agent Through Clinical Experience Learning (arXiv, 2025) paper

    Enhances the initial diagnostic capabilities of LLM agents by simulating clinical experience learning, bridging the gap between passive knowledge and active inquiry.

  • Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making (arXiv, 2025) paper

    Introduces a "Catfish Agent" designed to inject structured dissent into multi-agent discussions, preventing premature consensus (groupthink) in medical diagnosis.

🤖GS

Back to Content

📊Planning

Back to Content

  • HyKGE: A Hypothesis Knowledge Graph Enhanced Framework for Accurate and Reliable Medical LLMs Responses (ACL Findings, 2024) paper, code

    Constructs a hypothesis-driven knowledge graph to verify intermediate reasoning steps, ensuring LLM responses are grounded in medical facts.

  • Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback (ICLR, 2023) paper, code

    An iterative framework where the model retrieves external knowledge and refines its answer based on automated feedback to reduce hallucinations.

  • KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA (arXiv, 2024) paper

    An agentic system capable of reviewing and refining its own retrieval and reasoning processes for high-difficulty biomedical questions.

  • EvidenceMap: Learning Evidence Analysis to Unleash the Power of Small Language Models for Biomedical Question Answering (arXiv, 2025) paper

    Maps complex evidence chains into structured representations, enabling smaller language models to perform expert-level evidence analysis.

  • Infusing Multi-Hop Medical Knowledge Into Smaller Language Models for Biomedical Question Answering (IEEE JBHI, 2025) paper

    Proposes a method to inject structured multi-hop reasoning capabilities from Knowledge Graphs into smaller models to improve efficiency.

  • Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions (EMNLP Findings, 2024) paper

    Enhances RAG by generating iterative follow-up questions to clarify ambiguities and retrieve more precise medical context.

  • MedicalGLM: A Pediatric Medical Question Answering Model with a Quality Evaluation Mechanism (BMC Med. Inform. Decis. Mak., 2025) paper

    A fine-tuned GLM for pediatrics equipped with a self-evaluation module that assesses the reliability of its own generated advice.

  • A cascaded retrieval-while-reasoning multi-document comprehension framework with incremental attention for medical question answering (Expert Syst. Appl., 2024) paper

    Introduces a cascaded framework that interleaves retrieval and reasoning steps with incremental attention to handle multi-document contexts.

  • K-COMP: Retrieval-Augmented Medical Domain Question Answering With Knowledge-Injected Compressor (arXiv, 2025) paper

    Uses a knowledge-injected compressor to condense retrieved documents, reducing noise and context length while retaining critical medical facts.

  • MEPNet: Medical Entity-balanced Prompting Network for Brain CT Report Generation (arXiv, 2025) paper

    A prompting network designed to balance the generation of medical entities in CT reports, ensuring comprehensive and accurate reporting.

  • Knowledge-Induced Medicine Prescribing Network for Medication Recommendation (Artif. Intell. Med., 2025) paper

    Integrates pharmaceutical knowledge graphs into a deep learning network to provide safe and effective medication combinations.

  • Improving Clinical Question Answering with Multi-Task Learning: A Joint Approach for Answer Extraction and Medical Categorization (arXiv, 2025) paper

    A multi-task learning framework that jointly optimizes for answer extraction and medical category classification to improve overall QA performance.

  • Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs (arXiv, 2025) paper

    Decomposes model responses into atomic facts and verifies them against retrieved evidence to enhance reliability and explainability.

🧠Memory

Back to Content

  • Bias Evaluation and Mitigation in Retrieval-Augmented Medical Question-Answering Systems (arXiv, 2025) paper

    Systematically evaluates sources of bias in medical RAG systems and proposes mitigation strategies to ensure equitable healthcare advice.

  • Rationale-Guided Retrieval Augmented Generation for Medical Question Answering (NAACL, 2025) paper

    Generates rationales first to guide the retrieval process, ensuring that retrieved documents support the logical reasoning path.

  • Infusing Multi-Hop Medical Knowledge Into Smaller Language Models for Biomedical Question Answering (IEEE JBHI, 2025) paper

    (See Planning section) Enhances memory capacity of small models by embedding multi-hop relations from medical KGs.

  • Seek Inner: LLM-Enhanced Information Mining for Medical Visual Question Answering (ACM MM, 2024) paper

    Mines implicit medical knowledge from Large Language Models to supplement visual features in Medical VQA tasks.

  • MMedAgent: Learning to Use Medical Tools with Multi-modal Agent (NeurIPS, 2024) paper, code

    A multimodal agent framework that learns to retrieve and utilize external medical tools (like calculators and search) to solve complex cases.

  • ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents (ACL Findings, 2025) paper

    Enables clinical agents to reflect on the sufficiency of their current information and autonomously decide when to use tools.

  • RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering (arXiv, 2025) paper

    Introduces a recurrence mechanism where the model's own generation is used to refine subsequent retrieval queries for better factuality.

  • Adaptive Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge (arXiv, 2025) paper

    Proposes a framework where the medical knowledge graph is adaptively updated based on new findings to keep the QA system current.

  • MedEx: Enhancing Medical Question-Answering with First-Order Logic based Reasoning and Knowledge Injection (COLING, 2025) paper

    Combines neural generation with symbolic First-Order Logic to inject strict medical constraints and knowledge into the memory of the QA system.

  • Explainable Knowledge-Based Learning for Online Medical Question Answering (PRICAI, 2024) paper

    An online learning approach that updates the model's knowledge base continuously while providing explainable reasoning paths.

  • Efficient Medical Question Answering with Knowledge-Augmented Question Generation (ClinicalNLP, 2024) paper

    Augments the training data (memory) of QA models by generating diverse synthetic medical questions grounded in knowledge bases.

  • Leveraging long context in retrieval augmented language models for medical question answering (npj Digit. Med., 2025) paper

    Investigates the trade-offs and synergies between using long-context windows and RAG for accessing vast medical knowledge.

🧰Action

Back to Content

  • KoSEL: Knowledge subgraph enhanced large language model for medical question answering (Artif. Intell. Med., 2024) paper

    Retrieves relevant subgraphs from a medical knowledge graph to provide structured context, enhancing the LLM's reasoning for medical QA.

  • Are my answers medically accurate? Exploiting medical knowledge graphs for medical question answering (Appl. Intell., 2024) paper

    Proposes a framework that cross-references LLM-generated answers with facts extracted from medical knowledge graphs to ensure accuracy.

  • Infusing Multi-Hop Medical Knowledge Into Smaller Language Models for Biomedical Question Answering (IEEE JBHI, 2025) paper

    Enables smaller language models to perform complex biomedical QA by injecting multi-hop reasoning paths derived from knowledge graphs.

  • Improving Clinical Question Answering with Multi-Task Learning: A Joint Approach for Answer Extraction and Medical Categorization (arXiv, 2025) paper

    A multi-task learning approach that simultaneously optimizes answer extraction and question categorization to improve clinical QA performance.

  • Beyond EHRs: External Clinical knowledge and cohort Features for medication recommendation (Artif. Intell. Med., 2025) paper

    (Same as "Knowledge-Induced Medicine..." in Planning) Integrates external clinical knowledge graphs with patient cohort features for precise medication recommendation.

  • MEPNet: Medical Entity-balanced Prompting Network for Brain CT Report Generation (arXiv, 2025) paper

    A network that balances the generation of various medical entities in CT reports through specialized prompting actions.

  • Improving Reliability and Explainability of Medical Question Answering through Atomic Fact Checking in Retrieval-Augmented LLMs (arXiv, 2025) paper

    Enhances RAG systems by decomposing answers into atomic facts and verifying each against retrieved evidence for better reliability.

  • K-COMP: Retrieval-Augmented Medical Domain Question Answering With Knowledge-Injected Compressor (arXiv, 2025) paper

    Employs a compressor module injected with medical knowledge to condense retrieved documents, optimizing the context for the LLM.

  • MedCoT-RAG: Causal Chain-of-Thought RAG for Medical Question Answering (arXiv, 2025) paper

    Combines retrieval augmentation with causal chain-of-thought reasoning to explain the causal relationships behind medical answers.

  • Towards Efficient Methods in Medical Question Answering using Knowledge Graph Embeddings (IEEE BigData, 2024) paper

    Utilizes knowledge graph embeddings to efficiently retrieve relevant medical concepts, improving QA speed and accuracy.

  • MediTriR: A Triple-Driven Approach to Retrieval-Augmented Generation for Medical Question Answering Tasks (IEEE Access, 2025) paper

    A RAG approach driven by knowledge triples (Subject-Predicate-Object) to ensure the retrieval of structured and precise medical information.

  • Medical Knowledge Graph QA for Drug-Drug Interaction Prediction based on Multi-hop Machine Reading Comprehension (arXiv, 2022) paper

    Predicts drug-drug interactions by treating the task as a multi-hop machine reading comprehension problem over a knowledge graph.

  • MediSearch: Advanced Medical Web Search Engine (IEEE ICHI, 2023) paper

    A specialized search engine framework that aggregates and filters medical information from the web to provide authoritative health answers.

  • Evaluating search engines and large language models for answering health questions (npj Digit. Med., 2025) paper

    A comparative study evaluating the accuracy, safety, and completeness of traditional search engines versus LLMs in answering health queries.

  • Leveraging long context in retrieval augmented language models for medical question answering (npj Digit. Med., 2025) paper

    Examines the effectiveness of using long-context LLMs to process extensive retrieved medical documents compared to standard chunking methods.

  • Using Internet search engines to obtain medical information: a comparative study (J. Med. Internet Res., 2012) paper

    A foundational study (cited for context) comparing the efficacy of general-purpose search engines in retrieving accurate medical information.

  • Large language model agents can use tools to perform clinical calculations (npj Digit. Med., 2025) paper

    Demonstrates that LLM agents equipped with external calculator tools significantly outperform base models in performing complex clinical scores (e.g., MELD).

  • MeNTi: Bridging Medical Calculator and LLM Agent with Nested Tool Calling (arXiv, 2024) paper

    Enables LLM agents to execute nested tool calls, allowing them to handle complex medical calculations that require intermediate steps.

  • MMedAgent: Learning to Use Medical Tools with Multi-modal Agent (NeurIPS, 2024) paper, code

    A framework where multimodal agents learn to autonomously select and utilize various medical tools (search, calculators) to solve clinical problems.

  • KMTLabeler: An Interactive Knowledge-Assisted Labeling Tool for Medical Text Classification (IEEE ICASSP, 2024) paper

    An interactive tool that uses medical knowledge to assist human annotators in labeling clinical text, improving efficiency and consistency.

  • ADEPT: An advanced data exploration and processing tool for clinical data insights (Database, 2025) paper

    A comprehensive software tool designed for the exploration, cleaning, and preprocessing of large-scale clinical datasets for research.

👥Cooperation

Back to Content

  • Error Detection in Medical Note through Multi Agent Debate (BioNLP, 2025) paper

    Utilizes a multi-agent debate framework where agents critically analyze medical notes to identify and reach consensus on documentation errors.

  • Multi-modal Medical Diagnosis via Large-small Model Collaboration (IEEE, 2025) paper

    Proposes a collaborative framework where large multi-modal models guide smaller, efficient models to improve diagnostic accuracy on resource-constrained devices.

  • MACD: Multi-Agent Clinical Diagnosis with Self-Learned Knowledge for LLM (arXiv, 2025) paper

    A multi-agent system where agents self-learn from historical diagnostic cases to build a shared knowledge base, enhancing collaborative decision-making.

  • MedSentry: Understanding and Mitigating Safety Risks in Medical LLM Multi-Agent Systems (arXiv, 2025) paper

    A comprehensive study and framework for identifying, categorizing, and mitigating safety risks (e.g., toxicity, bias) arising from agent interactions.

  • MedConMA: A Confidence-Driven Multi-agent Framework for Medical Q&A (Springer, 2025) paper

    Introduces a confidence-driven mechanism where agents weigh their contributions to the final answer based on their self-assessed certainty levels.

  • MDTeamGPT: A Self-Evolving LLM-based Multi-Agent Framework for Multi-Disciplinary Team Medical Consultation (arXiv, 2025) paper

    Simulates a Multi-Disciplinary Team (MDT) consultation where agents evolve their collaborative strategies over time to solve complex cancer cases.

⏫Self-evolution

Back to Content

  • Adaptive Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge (arXiv, 2025) paper

    (Previously listed as Agentic Medical Knowledge Graphs...) A framework that autonomously updates its knowledge graph using agentic search to reflect the latest medical research.

  • Large language model agents can use tools to perform clinical calculations (npj Digit. Med., 2025) paper

    Demonstrates that enabling LLM agents to autonomously identify the need for and use clinical calculators significantly reduces computational errors.

  • MACD: Multi-Agent Clinical Diagnosis with Self-Learned Knowledge for LLM (arXiv, 2025) paper

    (Also listed in Cooperation) Highlights the self-evolution aspect where the system improves its diagnostic logic through self-learned knowledge accumulation.

  • MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow (arXiv, 2025) paper

    An advanced agentic workflow that iteratively gathers multimodal evidence and refines its reasoning path to provide evidence-based diagnoses.

  • Improving Self-training with Prototypical Learning for Source-Free Domain Adaptation on Clinical Text (BioNLP, 2024) paper

    Combines self-training with prototypical learning to adapt clinical NLP models to new hospitals or domains without accessing source data.

  • ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents (ACL Findings, 2025) paper

    Enables agents to "reflect" on their outputs and tool usage history, allowing them to self-correct and optimize their tool selection strategies.

  • TAMA: A Human-AI Collaborative Thematic Analysis Framework Using Multi-Agent LLMs for Clinical Interviews (arXiv, 2025) paper

    A multi-agent framework that assists researchers in performing thematic analysis of clinical interviews, learning from human feedback to improve coding quality.

🤖VWA

Back to Content

📊Planning

Back to Content

  • VITA: 'Carefully Chosen and Weighted Less' Is Better in Medication Recommendation (AAAI, 2024) paper, code

    Proposes a medication recommendation framework that prioritizes selecting the most critical drugs over comprehensive but redundant lists, improving safety.

  • EMRs2CSP: Mining Clinical Status Pathway from Electronic Medical Records (ACL Findings, 2025) paper

    Extracts Clinical Status Pathways (CSP) from EHRs to model the temporal progression of patient states, aiding in proactive clinical planning.

  • HealthBranches: Synthesizing Clinically-Grounded Question Answering Datasets via Decision Pathways (arXiv, 2025) paper

    Generates synthetic QA datasets by simulating clinical decision pathways (branches), ensuring the data reflects realistic diagnostic logic.

  • Streamlining evidence based clinical recommendations with large language models (npj Digital Medicine, 2025) paper, code

    A comprehensive study on using LLMs to translate clinical questions directly into evidence-based recommendations, evaluating their utility in decision support.

  • CMQCIC-Bench: A Chinese Benchmark for Evaluating Large Language Models in Medical Quality Control Indicator Calculation (arXiv, 2025) paper

    Establishes a benchmark for calculating Medical Quality Control Indicators (MQCIs) from medical records, testing LLMs' ability to perform precise administrative planning.

  • Augmenting Black-box LLMs with Medical Textbooks for Biomedical Question Answering (arXiv, 2023) paper

    Enhances black-box LLMs by retrieving relevant context from trusted medical textbooks, improving the accuracy of biomedical planning and QA.

  • M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question Answering (ACL Findings, 2024) paper

    A benchmark designed to evaluate long-context clinical reading comprehension, essential for planning based on extensive patient history.

  • Listening to Patients: Detecting and Mitigating Patient Misreport in Medical Dialogue System (ACL Findings, 2025) paper

    Addresses the planning challenge where patients provide incorrect information, proposing a mechanism to detect and mitigate these misreports during dialogue.

  • Visual and Domain Knowledge for Professional-level Graph-of-Thought Medical Reasoning (ICML, 2025) paper

    Utilizes a Graph-of-Thought approach integrated with visual and domain knowledge to achieve professional-level reasoning in medical diagnostics.

  • MedPlan: A Two-Stage RAG-Based System for Personalized Medical Plan Generation (arXiv, 2025) paper

    Generates personalized treatment plans by first retrieving general guidelines and then adapting them to specific patient data in a two-stage process.

  • PIPA: A Unified Evaluation Protocol for Diagnosing Interactive Planning Agents (arXiv, 2025) paper

    A protocol for evaluating interactive agents on their ability to plan diagnostic inquiries and gather information efficiently.

  • RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering (arXiv, 2025) paper

    Introduces a recurrence mechanism where the model's own generation is used to refine subsequent retrieval queries for better factuality.

  • End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning (arXiv, 2025) paper

    Trains an end-to-end agentic system that not only diagnoses but also provides a traceable reasoning path linked to retrieved evidence.

  • Labeling-free RAG-enhanced LLM for intelligent fault diagnosis via reinforcement learning (Eng. Appl. Artif. Intell., 2025) paper

    [Methodology] Integrates RAG and RL for fault diagnosis without labeled data. (Note: Domain is primarily industrial fault diagnosis, not clinical).

  • The Helicobacter pylori AI-clinician harnesses artificial intelligence to personalise H. pylori treatment recommendations (Nat. Commun., 2025) paper

    An AI-clinician system that personalizes antibiotic treatment plans for H. pylori infection, significantly improving eradication rates.

  • Continual contrastive reinforcement learning: Towards stronger agent for environment-aware fault diagnosis of aero-engines through long-term optimization under highly imbalance scenarios (Eng. Appl. Artif. Intell., 2025) paper

    [Methodology] A reinforcement learning agent for diagnosing aero-engine faults. (Note: Domain is industrial engineering, included for completeness of input list).

  • Integration of Multi-Source Medical Data for Medical Diagnosis Question Answering (IEEE Access, 2024) paper

    Proposes a method to integrate heterogeneous medical data sources (text, structured data) to answer diagnostic questions more accurately.

  • Stage-Aware Hierarchical Attentive Relational Network for Diagnosis Prediction (IEEE JBHI, 2023) paper

    A hierarchical network that captures the stage-wise progression of diseases from EHR data for precise diagnosis prediction.

  • RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models (arXiv, 2024) paper

    Enhances the factuality of medical VLMs by retrieving and grounding responses in reliable multimodal evidence during generation.

🧠Memory

Back to Content

  • M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question Answering (ACL Findings, 2024) paper

    Establishes a benchmark specifically designed to evaluate the clinical reading comprehension and long-term knowledge recall capabilities of LLMs.

  • PIPA: A Unified Evaluation Protocol for Diagnosing Interactive Planning Agents (arXiv, 2025) paper

    (Also listed in Planning) A protocol evaluating how agents manage diagnostic history and plan information-gathering steps in interactive scenarios.

  • EMRs2CSP: Mining Clinical Status Pathway from Electronic Medical Records (ACL Findings, 2025) paper

    (Also listed in Planning) Mines Clinical Status Pathways (CSP) to represent the temporal progression and memory of patient states from EHRs.

  • Medical Graph RAG: Evidence-based Medical Large Language Model via Graph Retrieval-Augmented Generation (ACL, 2025) paper

    Enhances LLM memory by integrating a medical knowledge graph into the RAG process, ensuring generation is grounded in structured evidence.

  • MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot (arXiv, 2025) paper

    Uses knowledge graph-elicited reasoning to optimize the retrieval component, providing a more robust memory mechanism for healthcare copilots.

  • CardioTRAP: Design of a Retrieval Augmented System (RAG) for Clinical Data in Cardiology (IEEE, 2025) paper

    Designs a specialized RAG system for cardiology that effectively retrieves and utilizes patient-specific clinical data (memory) for decision support.

  • CLI-RAG: A Retrieval-Augmented Framework for Clinically Structured and Context Aware Text Generation with LLMs (arXiv, 2025) paper

    A RAG framework capable of handling structured clinical data and maintaining context awareness during long-text generation.

  • HI-DR: Exploiting Health Status-Aware Attention and an EHR Graph+ for Effective Medication Recommendation (AAAI, 2025) paper

    Utilizes a health status-aware attention mechanism and an enhanced EHR graph to capture patient history memory for precise medication recommendation.

  • Listening to Patients: Detecting and Mitigating Patient Misreport in Medical Dialogue System (ACL Findings, 2025) paper

    (Also listed in Planning) Focuses on verifying the reliability of patient-provided information (memory of symptoms) during medical dialogues.

  • End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning (arXiv, 2025) paper

    (Also listed in Planning) Trains an agentic system where the retrieval (memory) and reasoning components are optimized end-to-end for traceability.

🧰Action

Back to Content

  • CardioTRAP: Design of a Retrieval Augmented System (RAG) for Clinical Data in Cardiology (IEEE Access, 2025) paper

    Designs a specialized RAG system tailored for cardiology that retrieves and processes patient-specific clinical data to support cardiologist decision-making.

  • CLI-RAG: A Retrieval-Augmented Framework for Clinically Structured and Context Aware Text Generation with LLMs (arXiv, 2025) paper

    Introduces a RAG framework capable of handling the complex structure and context of clinical texts, enabling more accurate medical report generation.

  • HI-DR: Exploiting Health Status-Aware Attention and an EHR Graph+ for Effective Medication Recommendation (AAAI, 2025) paper

    (Also listed in Memory) Uses an action-oriented recommendation engine that leverages health status-aware attention and EHR graphs to prescribe medications.

  • Medical Graph RAG: Evidence-based Medical Large Language Model via Graph Retrieval-Augmented Generation (ACL, 2025) paper

    (Also listed in Memory) Enhances the retrieval action by utilizing a medical knowledge graph to ground LLM generations in structured, evidence-based facts.

  • MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot (arXiv, 2025) paper

    Optimizes the retrieval action through knowledge graph-elicited reasoning, improving the relevance and accuracy of information provided by healthcare copilots.

  • KPL: Training-Free Medical Knowledge Mining of Vision-Language Models (arXiv, 2025) paper

    Proposes a training-free method to actively mine and extract medical knowledge hidden within pre-trained Vision-Language Models.

  • End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning (arXiv, 2025) paper

    Trains an agentic system to perform end-to-end diagnostic actions where every reasoning step is traceable to a specific retrieved document.

  • SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering? (arXiv, 2025) paper

    Investigates the utility of integrating commercial search engine actions into the RAG pipeline to supplement internal knowledge bases for medical QA.

  • Enhancing medical information retrieval: Re-engineering the tala-med search engine for improved performance and flexibility (BMC Med. Inform. Decis. Mak., 2025) paper

    Details the re-engineering of the 'tala-med' search engine, optimizing its architecture for more flexible and high-performance medical information retrieval.

  • Designing a Distributed LLM-Based Search Engine as a Foundation for Agent Discovery (IEEE, 2025) paper

    Proposes a distributed architecture for LLM-based search that serves as a foundational layer for autonomous agents to discover and access medical knowledge.

  • How the Algorithmic Transparency of Search Engines Influences Health Anxiety: The Mediating Effects of Trust in Online Health Information Search (CHI, 2025) paper

    A user study analyzing how the transparency of search engine algorithms affects user trust and health anxiety during online health information seeking.

  • Transforming Medical Data Access: The Role and Challenges of Recent Language Models in SQL Query Automation (MIPRO, 2024) paper

    Evaluates the capability of LLMs to automate SQL query generation (Text-to-SQL), facilitating easier access to medical databases for non-technical users.

  • Improving Interactive Diagnostic Ability of a Large Language Model Agent Through Clinical Experience Learning (arXiv, 2025) paper

    (Also listed in Self-evolution) Enhances the agent's diagnostic actions by allowing it to learn from simulated clinical experiences and feedback.

  • Designing VR Simulation System for Clinical Communication Training with LLMs-Based Embodied Conversational Agents (arXiv, 2025) paper

    Integrates LLM-based embodied agents into a Virtual Reality simulation to train medical students in clinical communication actions.

👥Cooperation

Back to Content

  • Enhancing Clinical Trial Patient Matching through Knowledge Augmentation and Reasoning with Multi-Agent (arXiv, 2024) paper

    Introduces MAKA, a multi-agent framework that improves patient-trial matching by dynamically augmenting criteria with domain knowledge and performing structured reasoning.

  • TeamMedAgents: Enhancing Medical Decision-Making of LLMs Through Structured Teamwork (arXiv, 2025) paper

    Integrates the "Big Five" human teamwork components (e.g., leadership, trust) into a multi-agent system to systematically improve medical decision-making.

  • ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World (ACL Findings, 2025) paper, code

    Presents a comprehensive suite for aligning and evaluating medical agents across 24 clinical departments, featuring a realistic benchmark (ClinicalBench).

  • The Optimization Paradox in Clinical AI Multi-Agent Systems (arXiv, 2025) paper

    Reveals a paradox where systems built from individually optimized "best-of-breed" components underperform due to poor information flow, advocating for end-to-end system validation.

⏫Self-evolution

Back to Content

  • EvoAgentX: An Automated Framework for Evolving Agentic Workflows (arXiv, 2025) paper, code

    An open-source platform that automates the generation and evolutionary optimization of multi-agent workflows using algorithms like TextGrad and AFlow.

  • MetaAgent: Toward Self-Evolving Agent via Tool Meta-Learning (arXiv, 2025) paper, code

    Proposes an agent that evolves through "learning by doing," autonomously creating tools and building a knowledge base from its own experiences.

  • ZERA: Zero-init Instruction Evolving Refinement Agent (EMNLP, 2025) paper, code

    An automated prompt optimization agent that evolves structured prompts from zero initial instructions using principle-based self-correction.

  • HealthBranches: Synthesizing Clinically-Grounded Question Answering Datasets via Decision Pathways (arXiv, 2025) paper

    (Also listed in Planning) A benchmark generation framework that synthesizes QA datasets from clinical decision pathways to test complex reasoning.

  • A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence (arXiv, 2025) paper

    A comprehensive survey categorizing self-evolving agents by what (model/tool/context), when, and how they evolve, positioning them as a path to ASI.

  • Evolving Collective Cognition in Human-Agent Hybrid Societies: How Agents Form Stances and Boundaries (CogSci, 2025) paper

    Investigates the emergence of collective cognition and social boundaries in hybrid societies where humans and self-evolving agents interact.

⭐ Star History of Awesome-Agentic-Clinical-Dialogue

Back to Content

Star History Chart

🤝 Contributing

Back to Content

Your contributions are always welcome! Please contact Xiaoquan Zhi or Chuang Zhao

✍️ Citation

Back to Content

If you find this code useful for your research, please cite our paper:

@article{zhi2025reinventing,
  title={Reinventing Clinical Dialogue: Agentic Paradigms for LLM Enabled Healthcare Communication},
  author={ADM Lab},
  journal={arXiv preprint arXiv:2512.01453},
  year={2025} 
}

About

Resource collection of medical agent for clinical dialogue and health

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages