Skip to content

mharmanani/MedCBR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vision-Language Models Encode Clinical Guidelines for Concept-Based Reasoning in Medical Image Analysis

arXiv Project Page Python 3.10+ PyTorch License: MIT Visitors

Abstract

Concept Bottleneck Models (CBMs) are a prominent framework for interpretable AI that map learned visual features onto a set of meaningful concepts, to be used for task-specific downstream predictions. Their sequential structure enhances transparency by connecting model predictions to the underlying concepts that support them. In medical imaging, where transparency is essential, CBMs offer an appealing foundation for explainable model design. However, their discrete concept representations overlook broader clinical context such as diagnostic guidelines and expert heuristics, reducing reliability in complex cases. We propose MedCBR, a concept-based reasoning framework that integrates clinical guidelines with vision–language and reasoning models. Labeled clinical descriptors are transformed into guideline-conformant text, and a concept-based model is trained with a multi-task objective combining multi-modal contrastive alignment, concept supervision, and diagnostic classification to jointly ground image features, concepts, and pathology. A reasoning model then converts these predictions into structured clinical narratives that explain the diagnosis, emulating expert reasoning based on established guidelines. MedCBR achieves superior diagnostic and concept-level performance, with AUROCs of 94.2% on ultrasound and 84.0% on mammography. Further experiments were also performed on non-medical datasets, with 86.1% accuracy. Our framework enhances interpretability and forms an end-to-end bridge from medical image analysis to decision-making.


Dataset setup

BUSBRA & BrEaST
  1. Download both datasets
  2. Move the images into the correct directory
  3. Preprocess the data by cropping to lesion ROIs
utils/preprocess.py -d busbra -src <src_directory> -dst <target_directory>
CBIS-DDSM
  1. Download the dataset
  2. Preprocess the data by cropping to lesion ROIs, resizing, and grouping ROIs by patient
  3. Move the images into the correct directory
CUB-200-2011
  1. Download the data
  2. Move the images into the correct directory

Model Training

Our code base supports the following models. Any model not mentioned here was trained using code provided by its creators.

Supported Models

Black-box Vision Models:

  • CLIP RN50
  • CLIP ViT-B/32
  • CLIP ViT-L/14
  • SigLIP
  • BiomedCLIP

Concept-based models:

  • Original CBM (Koh et al., 2020)
  • CLIP CBM

To run a model using k-fold cross-validation, we used:

scripts/run_job.sh -y <yaml_name> -g <wandb_group_name> -f <fold> -t <time>

To run a model using different random seeds, we used:

scripts/run_job.sh -y <yaml_name> -g <wandb_group_name> -s <seed> -t <time>

Unsupported Models

The following models must be cloned from their respective repositories and run according to the instructions provided by their authors:

  • PCBM, PCBMh (Yuksekgonul et al., ICLR 2023)
  • Label-free CBM (Oikarinen et al., ICLR 2023)
  • AdaCBM (Chowdhury et al., MICCAI 2024)

Synthetic Report Generation

To generate a synthetic report, we run the following code:

scripts/run_job.sh -y <dataset>/qwen2vl -g <wandb_group_name> -f <fold> -t <time>

This will run the Qwen2.5VL LVLM on the test set of that fold. The 3 YAMLs busbra/qwen2vl, ddsm/qwen2vl, and cub/qwen2vl specify the hyperparameters to use for this run.

The src/run_qwenvl.py file contains the logic used to run the LVLM on the data to generate reports. The model and prompt are both implemented in src/models/qwen.py, and the guidelines are recorded in src/utils/clinical_guideline.py (although some of the guidelines are non-clinical).

CLIP Concept Training

To train a multi-task concept model using CLIP, we use the following code:

scripts/run_job.sh -y <dataset>/medcbr -g <wandb_group_name> -f <fold> -t <time>

This will train a train a CLIP model with ViT-L/14 backbone on the data. The hyperparameters are found in YAMLs busbra/medcbr, ddsm/medcbr, and cub/medcbr. In particular, the hyperparameter

clip:
    ...
    use_llm_output: true
    ...

is responsible for enabling CLIP-based training on the guideline-conformant reports generated by the LVLM. The weights for the multi-task loss are also recorded as:

  clip_weight: 1.0
  det_weight: 5.0
  concept_weight: 1.0

To train a CLIP CBM baseline, use the following script:

scripts/run_job.sh -y <dataset>/clip_cbm -g <wandb_group_name> -f <fold> -t <time>

which has the following values in the YAML:

  clip_weight: 0.0
  det_weight: 1.0
  concept_weight: 1.0
  use_llm_output: false

Clinical Reasoning

Finally, to generate the reasoning, run the following:

scripts/run_job.sh -y <dataset>/medcbr_reason -g <wandb_group_name> -f <fold> -t <time>

This will instantiate a CLIP ViT-L/14 vision backbone trained using the previous step, and use its predictions as input to a Qwen3 LRM. The src/run_reasoning.py file details the process of generating the clinical narratives, and the LRM is implemented in src/models/reasoning.py.

About

[CVPR'26 Findings] Vision-Language Models Encode Clinical Guidelines for Concept-Based Medical Reasoning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors