28th INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION
Mai A. Shaaban
, Tausifa Jan Saleem
, Vijay Ram Papineni
, Mohammad Yaqub ![]()
Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE
Sheikh Shakhbout Medical City, Abu Dhabi, UAE
- A novel training-free approach for retrieving precise contexts in a medical multimodal retrieval-augmented generation framework.
- A fine-grained visual-text alignment, which captures the underlying structures between the query and the retrieved elements, thereby improving clinical relevance.
- Automated and human expert evaluations across vision language models and medical visual question answering datasets to demonstrate the strength of our proposed approach.
2025/06/25: Code is released!2025/06/17: Paper is accepted at MICCAI 2025 - The best conference for medical image computing!
-
Clone this repository
git clone https://github.com/Mai-CS/MOTOR.git cd MOTOR -
Install dependencies: (we assume GPU device / cuda available):
source install.sh
Now, you should be all set.
-
Generate grounded reports
cd models/ python caption_maira.py --dataset_name "med-diff-vqa"
-
Generate answers
cd .. source run_MOTOR.sh
@InProceedings{ShaMai_MOTOR_MICCAI2025,
author = { Shaaban, Mai A. and Saleem, Tausifa Jan and Papineni, Vijay Ram Kumar and Yaqub, Mohammad},
title = { { MOTOR: Multimodal Optimal Transport via Grounded Retrieval in Medical Visual Question Answering } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15965},
month = {September},
page = {467 -- 477}
}
