Vision-Language Models Encode Clinical Guidelines for Concept-Based Reasoning in Medical Image Analysis

Abstract

Concept Bottleneck Models (CBMs) are a prominent framework for interpretable AI that map learned visual features onto a set of meaningful concepts, to be used for task-specific downstream predictions. Their sequential structure enhances transparency by connecting model predictions to the underlying concepts that support them. In medical imaging, where transparency is essential, CBMs offer an appealing foundation for explainable model design. However, their discrete concept representations overlook broader clinical context such as diagnostic guidelines and expert heuristics, reducing reliability in complex cases. We propose MedCBR, a concept-based reasoning framework that integrates clinical guidelines with vision–language and reasoning models. Labeled clinical descriptors are transformed into guideline-conformant text, and a concept-based model is trained with a multi-task objective combining multi-modal contrastive alignment, concept supervision, and diagnostic classification to jointly ground image features, concepts, and pathology. A reasoning model then converts these predictions into structured clinical narratives that explain the diagnosis, emulating expert reasoning based on established guidelines. MedCBR achieves superior diagnostic and concept-level performance, with AUROCs of 94.2% on ultrasound and 84.0% on mammography. Further experiments were also performed on non-medical datasets, with 86.1% accuracy. Our framework enhances interpretability and forms an end-to-end bridge from medical image analysis to decision-making.

Dataset setup

BUSBRA & BrEaST

Download both datasets
Move the images into the correct directory
Preprocess the data by cropping to lesion ROIs

utils/preprocess.py -d busbra -src <src_directory> -dst <target_directory>

CBIS-DDSM

Download the dataset
Preprocess the data by cropping to lesion ROIs, resizing, and grouping ROIs by patient
Move the images into the correct directory

CUB-200-2011

Download the data
Move the images into the correct directory

Model Training

Our code base supports the following models. Any model not mentioned here was trained using code provided by its creators.

Supported Models

Black-box Vision Models:

CLIP RN50
CLIP ViT-B/32
CLIP ViT-L/14
SigLIP
BiomedCLIP

Concept-based models:

Original CBM (Koh et al., 2020)
CLIP CBM

To run a model using k-fold cross-validation, we used:

scripts/run_job.sh -y <yaml_name> -g <wandb_group_name> -f <fold> -t <time>

To run a model using different random seeds, we used:

scripts/run_job.sh -y <yaml_name> -g <wandb_group_name> -s <seed> -t <time>

Unsupported Models

The following models must be cloned from their respective repositories and run according to the instructions provided by their authors:

PCBM, PCBMh (Yuksekgonul et al., ICLR 2023)
Label-free CBM (Oikarinen et al., ICLR 2023)
AdaCBM (Chowdhury et al., MICCAI 2024)

Synthetic Report Generation

To generate a synthetic report, we run the following code:

scripts/run_job.sh -y <dataset>/qwen2vl -g <wandb_group_name> -f <fold> -t <time>

This will run the Qwen2.5VL LVLM on the test set of that fold. The 3 YAMLs busbra/qwen2vl, ddsm/qwen2vl, and cub/qwen2vl specify the hyperparameters to use for this run.

The src/run_qwenvl.py file contains the logic used to run the LVLM on the data to generate reports. The model and prompt are both implemented in src/models/qwen.py, and the guidelines are recorded in src/utils/clinical_guideline.py (although some of the guidelines are non-clinical).

CLIP Concept Training

To train a multi-task concept model using CLIP, we use the following code:

scripts/run_job.sh -y <dataset>/medcbr -g <wandb_group_name> -f <fold> -t <time>

This will train a train a CLIP model with ViT-L/14 backbone on the data. The hyperparameters are found in YAMLs busbra/medcbr, ddsm/medcbr, and cub/medcbr. In particular, the hyperparameter

clip:
    ...
    use_llm_output: true
    ...

is responsible for enabling CLIP-based training on the guideline-conformant reports generated by the LVLM. The weights for the multi-task loss are also recorded as:

  clip_weight: 1.0
  det_weight: 5.0
  concept_weight: 1.0

To train a CLIP CBM baseline, use the following script:

scripts/run_job.sh -y <dataset>/clip_cbm -g <wandb_group_name> -f <fold> -t <time>

which has the following values in the YAML:

  clip_weight: 0.0
  det_weight: 1.0
  concept_weight: 1.0
  use_llm_output: false

Clinical Reasoning

Finally, to generate the reasoning, run the following:

scripts/run_job.sh -y <dataset>/medcbr_reason -g <wandb_group_name> -f <fold> -t <time>

This will instantiate a CLIP ViT-L/14 vision backbone trained using the previous step, and use its predictions as input to a Qwen3 LRM. The src/run_reasoning.py file details the process of generating the clinical narratives, and the LRM is implemented in src/models/reasoning.py.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.img		.img
config		config
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision-Language Models Encode Clinical Guidelines for Concept-Based Reasoning in Medical Image Analysis

Abstract

Dataset setup

BUSBRA & BrEaST

CBIS-DDSM

CUB-200-2011

Model Training

Supported Models

Unsupported Models

Synthetic Report Generation

CLIP Concept Training

Clinical Reasoning

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vision-Language Models Encode Clinical Guidelines for Concept-Based Reasoning in Medical Image Analysis

Abstract

Dataset setup

BUSBRA & BrEaST

CBIS-DDSM

CUB-200-2011

Model Training

Supported Models

Unsupported Models

Synthetic Report Generation

CLIP Concept Training

Clinical Reasoning

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages