GitHub - xqz614/Awesome-Agentic-Clinical-Dialogue: Resource collection of medical agent for clinical dialogue and health

Awesome-Agentic-Clinical-Dialogue/blob/main/assets/header.svg

Welcome to Awesome-Agentic-Clinical-Dialogue. This repo includes papers about methods related to agentic clinical dialogue. We believe that the agentic paradigm is still a largely unexplored area, and we hope this repository will provide you with some valuable insights!
Read our survey paper here: Reinventing Clinical Dialogue: Agentic Paradigms for LLM‑Enabled Healthcare Communication
Courses&Tutorial (🤟Check it out!) • Papers • Datasets • Leading Group

📘 Overview

This framework facilitates a systematic analysis of the intrinsic trade-offs between creativity and reliability by categorizing methods into four archetypes: Latent Space Clinicians, Emergent Planners, Grounded Synthesizers, and Verifiable Workflow Automators. For each paradigm, we deconstruct the technical realization across the entire cognitive pipeline, encompassing strategic planning, memory management, action execution, collaboration, and evolution, to reveal how distinct architectural choices balance the tension between autonomy and safety. Furthermore, we bridge abstract design philosophies with the pragmatic implementation ecosystem. By mapping real-world applications to our taxonomy and systematically reviewing benchmarks and evaluation metrics specific to clinical agents, we provide a comprehensive reference for future development.

📁 Table of Contents

Key Categories
Start with Awesome Dataset
Tutorial and Courses
Leading Group
Awesome Methods, Model and Resource List
- LSC
- EP
- GS
  - Planning
  - Memory
  - Action
  - Cooperation
  - Self-evolution
- VWA
  - Planning
  - Memory
  - Action
  - Cooperation
  - Self-evolution
Contributing
Citation

🔑 Key Categories

🤖Latent Space Clinicians (LSC). These agents leverage the LLM's vast internal knowledge for creative synthesis and forming a coherent understanding of a clinical situation. Their philosophy is to trust the model's emergent reasoning capabilities to function like an experienced clinical assistant providing insights. For example, the zero/few-shot reasoning capabilities of Med-PaLM or MedAgents exemplify this paradigm.
🤖Emergent Planners (EP). This paradigm grants the LLM a high degree of autonomy, allowing it to dynamically devise its own multi-step plan to achieve a complex clinical goal. The agent's behavior is emergent, as it independently determines the necessary steps and goals. Frameworks like AgentMD, which uses ReAct-style prompting.
🤖Grounded Synthesizers (GS). These agents operate under the principle that LLMs should function as powerful natural language interfaces to reliable external information rather than as knowledge creators. Their primary role is to retrieve, integrate, and accurately summarize information from verifiable sources like medical databases or imaging data. Exemplars include the foundational frameworks medical retrieval and indexing techniques such as Med-RAG and MA-COIR.
🤖Verifiable Workflow Automators (VWA). In this paradigm, agent autonomy is strictly constrained within pre-defined, verifiable clinical workflows or decision trees. The LLM acts as a natural language front-end to a structured process, executing tasks rather than making open-ended decisions, which ensures maximum safety and predictability. This approach is exemplified by commercial triage bots, the structured conversational framework of systems like Google's AMIE, and principles from classic task-oriented dialogue systems sush as MeDi-TODER.

Name		Name	Last commit message	Last commit date
Latest commit History 213 Commits
assets		assets
image		image
tutorial		tutorial
README.md		README.md

Dataset Name	Time (Pub)	Downstream Task	Brief Description
MedQA	2020	Medical Examination (QA)	Large-scale multiple-choice questions collected from professional medical board exams (USMLE, Mainland China, Taiwan).
MedMCQA	2022	Medical Examination (QA)	Large-scale, multiple-choice QA dataset derived from Indian medical entrance examinations (AIIMS/NEET).
cMedQA2	2019	QA / Retrieval	Chinese medical QA dataset with queries and answers from online health counseling platforms.
CMExam	2023	Medical Examination (QA)	60K+ multiple-choice questions from the Chinese National Medical Licensing Examination with detailed annotations.
Medbullets	2024	Medical Examination (QA)	High-quality USMLE Step 2 & 3 style questions with expert-written explanations for reasoning evaluation.
HeadQA	2019	Medical Examination (QA)	Multiple-choice questions from Spanish healthcare exams (MIR, EIR, etc.) for testing complex reasoning.
MedCalc-Bench	2024	QA / Calculation	A dataset focusing on the "computational" aspect of medicine (formulas, risk scores) which LLMs typically struggle with.
RJUA-MedDQA	2024	Multimodal QA	A benchmark for document-level medical reasoning, requiring models to interpret text, tables, and images in reports.
TeleQnA	2024	Telemedicine QA	Real-world style doctor-patient QA benchmark aimed at evaluating LLMs in telemedicine scenarios.
CareQA	2025	Medical Examination	Sourced from Spanish specialized healthcare exams (MIR), offering both closed-ended and open-ended evaluation formats.
Huatuo-26M	2023	Medical Examination / QA	Massive Chinese medical QA dataset with 26 million QA pairs, used for pre-training and retrieval.
MultiMedQA	2022	QA / Evaluation	The benchmark suite used for Med-PaLM, including HealthSearchQA, LiveQA, and others for safety & accuracy.
CasiMedicos-Arg	2024	Medical Examination (QA)	Multilingual dataset (ES, EN, FR, IT) annotated with explanatory argumentative structures for clinical cases.
PubMedQA	2019	Literature-based QA	Biomedical QA task to answer "yes/no/maybe" from PubMed abstracts.
CliCR	2018	Literature-based QA	Dataset of clinical case reports designed for machine reading comprehension.
MEDIQA-2019	2019	Literature-based QA	Shared task data focusing on NLI, RQE (Recognizing Question Entailment), and QA in the medical domain.
BioASQ	2013-2023	Literature-based QA	Long-running challenge series for large-scale biomedical semantic indexing and question answering.
Medical Meadow	2023	Literature-based / Tuning	A collection of various medical tasks (Flashcards, Wikidoc) reformatted for instruction tuning.
MASH-QA	2020	Consumer Health QA	Multiple-span extraction QA dataset for consumer health questions (e.g., from WebMD).
HealthQA	2019	Consumer Health QA	Dataset focusing on reliability and helpfulness of health answers.
AfriMed-QA	2025	Domain-specific QA	Pan-African medical QA benchmark (15k Qs) covering 32 specialties and local context from 16 countries.
MedCalc-Bench	2024	Domain-specific QA	Benchmark for evaluating LLMs on medical calculations (formulas, scores) with patient notes.
MedHallu	2025	QA / Hallucination	10k Q-A pairs derived from PubMedQA annotated to detect and categorize medical hallucinations.
MedicationQA	2019	Consumer Health QA	Dataset of consumer questions about medications (drug interactions, dosage) with expert answers.
RJUA-MedDQA	2024	QA / Multimodal	Multimodal benchmark for medical document understanding (images/reports) and clinical reasoning.
MedExQA	2025	QA / Explanation	Dataset including five medical specialties and explanations for each qa pair.

Dataset Name	Time (Pub)	Downstream Task	Brief Description
MedReason	2025	Symptom Diagnosis	Large-scale medical reasoning dataset designed to enable explainable medical problem-solving.
MedDialog	2020	Symptom Diagnosis	Massive dataset (English/Chinese) of doctor-patient conversations scraped from online platforms.
DialoAMC	2023	Symptom Diagnosis	Dataset for Automated Medical Consultation focusing on symptom elicitation and diagnosis.
MedDG	2022	Symptom Diagnosis	High-quality entity-annotated medical dialogue dataset for diagnosis and treatment recommendation.
MZ (Muzhi)	2018	Symptom Diagnosis	Chinese medical dialogue dataset from the "Muzhi" platform for self-diagnosis agents.
CMDD	2019	Symptom Diagnosis	Chinese Medical Diagnostic Dialogue dataset (Pediatrics) with symptom-disease mappings.
DDXPlus	2022	Symptom Diagnosis	Large-scale automated diagnosis dataset with pathology-driven probabilistic generation (1.3M patients).
DX (DXY)	2019	Symptom Diagnosis	Diagnostic dataset from DXY.cn, containing dialogue sessions with explicit symptom transitions.
CovidDialog	2020	Symptom Diagnosis (COVID)	Dialogues specifically regarding COVID-19 consultations, scraped during the pandemic.
Ext-CovidDialog	2023	Symptom Diagnosis (COVID)	Extended version of CovidDialog with more data covering evolving variants and scenarios.
IMCS-21	2021	Symptom Diagnosis	Interactive Medical Consultation System dataset; focuses on multi-turn diagnostic dialogue.
BC5CDR	2015	Entity Recognition	BioCreative V task dataset for Chemical-Disease Relation extraction (NER/RE).
NCBI-Disease	2014	Entity Recognition	Corpus of PubMed abstracts annotated with disease mentions for NER.
PHEE	2022	Entity Extraction	Pharmacovigilance Event Extraction dataset for identifying adverse drug events from text.
MedAlign	2023	Instruction Following	Clinician-generated dataset for instruction following (summaries, questions) based on EHR data.
MedInstruct	2023	Instruction Following	Dataset of 52k medical instructions constructed from existing datasets (e.g., MedQA) for tuning.
BianqueCorpus	2023	Instruction Following	Large-scale multi-turn Chinese health conversation dataset with balanced questioning/suggestions.
MedSynth	2025	Generation / Summarization	Synthetic medical dialogue-note pairs designed to advance dialogue-to-note and note-to-dialogue tasks.
MeQSum	2019	Summarization / Instruction	Dataset for summarizing consumer health questions into canonical medical questions.

Dataset Name	Time (Pub)	Downstream Task	Brief Description
DialMed	2022	Recommendation (Drug)	Dialogue dataset designed for medication recommendation based on patient history/dialogue.
HealthCareMagic	2023	Treatment Rec / QA	Massive dataset (100k) of real patient queries and doctor responses, explicitly containing treatment recommendations.
iCliniq	2023	Treatment Rec / QA	10k highly curated doctor-patient dialogues focusing on providing medical advice and recommendations.
ReMeDi	2021	Recommendation	"Resources for Medical Dialogue"; focuses on movie/medical recommendation scenarios.
MIMIC-III	2016	Database (Source)	Large database of de-identified health-related data (EHRs) used to construct recommendation tasks.
DrugBank	-	Knowledge Base (Source)	Comprehensive database containing information on drugs and drug targets, used for grounding recommendations.
ProKnow-data	2020	Recommendation	Data used for proactive knowledge-grounded dialogue, often adapted for medical contexts.
DDInter	2024 (Upd)	Drug Safety / KB	Comprehensive Drug-Drug Interaction database; critical for agents to verify safety before recommending medication.
PromptCBLUE	2024	Rec / Classification	A unified benchmark where specific subtasks focus on recommending medical departments or classifying medical intents.
CMtMed	2024	Hybrid / Treatment Rec	Large-scale Chinese Multi-turn Medical dialogue dataset containing explicit "Medical Advice" and treatment plan slots.

Dataset Name	Time (Pub)	Downstream Task	Brief Description
EmpatheticDialogues	2019	General Empathetic	Large dataset of 25k conversations grounded in emotional situations (general domain).
CPsyCoun	2024	Mental Health Support	A high-quality, multi-turn dialogue dataset reconstructed from psychological consulting reports for realistic counseling.
PsySafe	2024	Mental Health / Safety	Focuses on the safety aspect of supportive agents, identifying risky or toxic responses in mental health dialogue.
MELD	2019	General Empathetic	Multimodal EmotionLines Dataset; textual/audio/visual emotion recognition.
PsyQA	2021	Mental Health Support	Chinese dataset of psychological health support (Q&A) with strategy annotations.
ESConv	2021	Mental Health Support	Emotional Support Conversation dataset designed to train agents in empathy and support strategies.
SoulChat-Corpus	2023	Mental Health Support	Large-scale Chinese dataset for single-turn and multi-turn empathetic psychological counseling.
MTS-Dialogue	2023	Clinical Support/Summ.	1.7k doctor-patient conversations paired with corresponding clinical note summaries.
SMILECHAT	2023	Mental Health Support	Dataset for mental health support focusing on cognitive distortion detection and reframing.

Dataset Name	Time (Pub)	Downstream Task	Brief Description
MidMed	2023	Hybrid (Diag/Rec/Chat)	Mixed-type dialogue corpus covering diagnosis, recommendation, QA, and chitchat in one session.
MedEval	2023	Evaluation Benchmark	Multi-level, multi-task benchmark spanning 35 body regions and 8 exam modalities for LLM eval.
MedTrinity-25M	2024	Multimodal / Hybrid	Massive multimodal dataset (25M images) with multigranular annotations (Image-ROI-Text).
MENTAT	2025	Mental Health / Hybrid	Clinician-annotated benchmark for complex psychiatric decision-making (diagnosis, triage, etc.).
MedAlpaca	2023	Instruction Tuning	Collection of datasets (see Medical Meadow) used to train the MedAlpaca model series.
NoteChat	2023	Generation / Hybrid	Synthetic patient-physician conversations conditioned on clinical notes (Note-to-Dialogue).

Institution	Leading Researcher/Group	Source
Google	Google Health	Homepage
NIH	Zhiyong Lu	Homepage
Open AI	Health AI Team	Homepage
Ant Group	AI for Science Team	Homepage
Alibaba	Tongyi Lab, Damo, AQ-Med Lab	Homepage, Homepage, Homepage
Shanghai AI Lab	AI for Science Team, AI4Med Team	Homepage, Homepage
Baichuan AI	AI Lab	Homepage
Meta	FAIR Team	Homepage
Tecent	Jarvislab, Xiaobin Hu	Homepage, Homepage
Huawei	NoAH	Homepage
ByteDance	Seed,AI for Science Team	Homepage
Microsoft Research	Hoifung Poon	Homepage
Harvard	Xiang Li, Faisal Mahmood Lab, Pranav Rajpurkar, Tianxi Cai	Homepage, Homepage, Homepage, Homepage
Maryland	Hanan Samet	Homepage
MIT	Paul Liang, Peter Szolovits	Homepage, Homepage
Oxford	Tingting Zhu, David A. Clifton, Alison Noble	Homepage, Homepage, Homepage
Cambridge	Vanderschaar-lab, Andreas Vlachos	Homepage, Homepage
NTU	Chunyan Miao	Homepage
Tsinghua University	Yang Liu, Hong-Yu Zhou, Weizhi Ma, Medical Informatics Lab	Homepage, Homepage, Homepage, Homepage
SJTU	Chaoyi Wu, Weidi Xie, MAGIC	Homepage, Homepage, Homepage
UNC	Tianlong Chen, Huaxiu Yao	Homepage, Homepage
Yale	Clinical NLP Lab	Homepage
UBC	Xiaoxiao Li	Homepage
UIUC	Jimeng Sun,Jiawei Han	Homepage, Homepage
ZJU	DCDmllm, Jian Wu	Homepage, Homepage
Notre Dame	SCLab	Homepage
Pennsylvania	Tianyu Han, Fenglong Ma, Lyle Ungar	Homepage, Homepage, Homepage
Emory	Carl Yang	Homepage
Stanford	SNAP, James Zou, Yejin Choi	Homepage, Homepage, Homepage
PKU	Liantao Ma, Yasha Wang	Homepage
TJU	ADM Group	Homepage
Edinburgh	Ewen M Harrison	Homepage
Virginia	Aidong Zhang, Xuan Wang	Homepage, Homepage
CUHK	Freedom AI, YuanWu, Michael R. Lyu, Benyou Wang	Homepage, Homepage, Homepage, Homepage
CityU	Xiangyu Zhao	Homepage
Houston Methodist	Wang Lab	Homepage
Mbzuai	Jianing Qiu	Homepage
DKFZ	German Cancer Research Center	Homepage
California	Yuyin Zhou	Homepage
ETH	Michael Moor	Homepage
JOHNS HOPKINS	Suchi Saria	Homepage
Cornell	Fei Wang, Claire Cardie	Homepage, Homepage
GE Healthcare	Xiao Cao	Homepage
Rutgers	Mu Zhou	Homepage
UT	Ying Ding, Wenqi Shi	Homepage, Homepage
UC Berkeley	Bin Yu	Homepage
UW	Hannaneh Hajishirzi	Homepage
LMU Munich	Volker Tresp	Homepage
SBU	Chenyu You	Homepage
FuDan	Zhongyu Wei	Homepage
Minnesota	Rui Zhang	Homepage
Monash	AIM Lab	Homepage
USYD	Med AI Lab	Homepage
Queensu	Medi Lab	Homepage
Open Source Platform	OpenMed Lab	Homepage

Folders and files

Latest commit

History

Repository files navigation

📘 Overview

📁 Table of Contents

🔑 Key Categories

✳️Start with Awesome Dataset

I. QA Dialogue

II. Task-oriented Dialogue

III. Recommendation Dialogue

IV. Supportive Dialogue

V. Hybrid Function

⛪ Leading Group

📖 Awesome Methods, Model, and Resource List

🤖LSC

📊Planning

🧠Memory

👥Cooperation

⏫Self-evolution

🤖EP

📊Planning

🧠Memory

👥Cooperation

⏫Self-evolution

🤖GS

📊Planning

🧠Memory

🧰Action

👥Cooperation

⏫Self-evolution

🤖VWA

📊Planning

🧠Memory

🧰Action

👥Cooperation

⏫Self-evolution

⭐ Star History of Awesome-Agentic-Clinical-Dialogue

🤝 Contributing

✍️ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages