Skip to content

CAMMA-public/SurgTEMP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SurgTEMP: Temporal-Aware Surgical Video Question Answering with Text-guided Visual Memory for Laparoscopic Cholecystectomy

Shi Li, Vinkle Srivastav, Nicolas Chanel, Saurav Sharma, Nabani Banik, Lorenzo Arboit, Kun Yuan, Pietro Mascagni, Nicolas Padoy

University of Strasbourg / CNRS / INSERM, ICube UMR7357 · IHU Strasbourg

arXiv Project Page Model Weights Dataset License: CC BY-NC-SA 4.0

Abstract

Surgical procedures are inherently complex and risky, requiring extensive expertise and constant focus to navigate evolving intraoperative scenes. We propose SurgTEMP, a multimodal LLM framework for surgical video question answering, featuring:

  • A Text-guided Memory Pyramid (TEMP) constructor that builds hierarchical spatial and temporal visual memory banks guided by the text query
  • A Surgical Competency Progression (SCP) training scheme that progressively builds perception, assessment, and reasoning capabilities

We also introduce CholeVidQA-32K, a surgical video QA dataset comprising 32K open-ended QA pairs from 3,855 laparoscopic cholecystectomy segments (~128 h total), organized across 11 tasks spanning perception, assessment, and reasoning.

Code and Data

Code, dataset, and model weights will be released soon.

  • Training and inference code
  • Pre-trained model weights
  • CholeVidQA-32K dataset

Citation

@misc{li2026surgtemptemporalawaresurgicalvideo,
      title={SurgTEMP: Temporal-Aware Surgical Video Question Answering with Text-guided Visual Memory for Laparoscopic Cholecystectomy},
      author={Shi Li and Vinkle Srivastav and Nicolas Chanel and Saurav Sharma and Nabani Banik and Lorenzo Arboit and Kun Yuan and Pietro Mascagni and Nicolas Padoy},
      year={2026},
      eprint={2603.29962},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.29962},
}

Acknowledgement

This work was funded by the European Union (ERC, CompSURG, 101088553) and French state funds managed by the ANR under Grants ANR-10-IAHU-02, ANR-23-IACL-0004, ANR-10-IDEX-0002, and ANR-20-SFRI-0012, with HPC resources provided by CAMMA, IHU Strasbourg, and Unistra Mesocentre.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors