SurgTEMP: Temporal-Aware Surgical Video Question Answering with Text-guided Visual Memory for Laparoscopic Cholecystectomy

Shi Li, Vinkle Srivastav, Nicolas Chanel, Saurav Sharma, Nabani Banik, Lorenzo Arboit, Kun Yuan, Pietro Mascagni, Nicolas Padoy

University of Strasbourg / CNRS / INSERM, ICube UMR7357 · IHU Strasbourg

Abstract

Surgical procedures are inherently complex and risky, requiring extensive expertise and constant focus to navigate evolving intraoperative scenes. We propose SurgTEMP, a multimodal LLM framework for surgical video question answering, featuring:

A Text-guided Memory Pyramid (TEMP) constructor that builds hierarchical spatial and temporal visual memory banks guided by the text query
A Surgical Competency Progression (SCP) training scheme that progressively builds perception, assessment, and reasoning capabilities

We also introduce CholeVidQA-32K, a surgical video QA dataset comprising 32K open-ended QA pairs from 3,855 laparoscopic cholecystectomy segments (~128 h total), organized across 11 tasks spanning perception, assessment, and reasoning.

Code and Data

Code, dataset, and model weights will be released soon.

Training and inference code
Pre-trained model weights
CholeVidQA-32K dataset

Citation

@misc{li2026surgtemptemporalawaresurgicalvideo,
      title={SurgTEMP: Temporal-Aware Surgical Video Question Answering with Text-guided Visual Memory for Laparoscopic Cholecystectomy},
      author={Shi Li and Vinkle Srivastav and Nicolas Chanel and Saurav Sharma and Nabani Banik and Lorenzo Arboit and Kun Yuan and Pietro Mascagni and Nicolas Padoy},
      year={2026},
      eprint={2603.29962},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.29962},
}

Acknowledgement

This work was funded by the European Union (ERC, CompSURG, 101088553) and French state funds managed by the ANR under Grants ANR-10-IAHU-02, ANR-23-IACL-0004, ANR-10-IDEX-0002, and ANR-20-SFRI-0012, with HPC resources provided by CAMMA, IHU Strasbourg, and Unistra Mesocentre.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
docs		docs
.gitignore		.gitignore
.nojekyll		.nojekyll
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SurgTEMP: Temporal-Aware Surgical Video Question Answering with Text-guided Visual Memory for Laparoscopic Cholecystectomy

Abstract

Code and Data

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

SurgTEMP: Temporal-Aware Surgical Video Question Answering with Text-guided Visual Memory for Laparoscopic Cholecystectomy

Abstract

Code and Data

Citation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages