Skip to content

[IEEE RA-L] Think Step by Step: Chain-of-Gesture Prompting for Error Detection in Robotic Surgical Videos

Notifications You must be signed in to change notification settings

surgical-vision/Chain-of-Gesture

 
 

Repository files navigation

Think Step by Step: Chain-of-Gesture Prompting for Error Detection in Robotic Surgical Videos

Official implementation of [COG arxiv] [IEEE link]

Motivation

Previous methods detect errors with two separate parts: gesture recognition and error detection for each type of gesture. We propose an end-to-end Chain-of-Gesture prompting framework to capture complex visual reasoning processes with two reasoning modules: Gestural-Visual reasoning and Multi-scale Temporal Reasoning. Motivation

Architecture

High level overview of COG's architecture

Citation

If the paper and code from COG help your research, we kindly ask you to give a citation to our paper ❤️. Additionally, if you appreciate our work and find this repository useful, giving it a star ⭐️ would be a wonderful way to support our work. Thank you very much.

@ARTICLE{10750058,
  author={Shao, Zhimin and Xu, Jialang and Stoyanov, Danail and Mazomenos, Evangelos B. and Jin, Yueming},
  journal={IEEE Robotics and Automation Letters}, 
  title={Think Step by Step: Chain-of-Gesture Prompting for Error Detection in Robotic Surgical Videos}, 
  year={2024},
  volume={9},
  number={12},
  pages={11513-11520},
  keywords={Videos;Surgery;Cognition;Real-time systems;Transformers;Visualization;Kinematics;Training;Semantics;Robot kinematics;Medical robotics;Computer vision for medical robotics;surgical error detection;video-language learning;prompt engineering},
  doi={10.1109/LRA.2024.3495452}}

Table of Contents

Get Started

Installation

  1. Clone COG.
git clone --recursive https://github.com/jinlab-imvr/Chain-of-Gesture
cd Chain-of-Gesture
  1. Create the environment, here we show an example using conda.
conda env create -f environment.yml
conda activate sed
## Some packages might be deprecated or fail to install automatically. If that happens, you can manually install them with pip.
pip install git+https://github.com/openai/CLIP.git
## If you encounter issues installing CLIP, refer to the official CLIP repository (https://github.com/openai/CLIP) for troubleshooting and installation guidance.

Training

In this section, we present a short demonstration to get started with training COG.

Datasets

Download and prepare the dataset

We download the original data and extended error labels according to Kay Hutchinson et al.. All data under leave-one-user-out setting are in the LOSO folder.

Train the model

python train_COG.py -exp COG -t 4 -l 1e-4 -gpu_id cuda:0

Prediction postprocessing

Besides results at frame level, to ensure an explicit and fair comparison with the state-of-the-art work on surgical error detection, we follow its evaluation protocol to generate window-level metrics. Specifically, we apply a 2-second sliding window with a 1.2second stride to the frame-level predictive labels. Within each window, we average the predictions and binarize them using a threshold of 0.5 to generate the window-level predictive labels.

Ablation study

You can run some ablation studies in ablation_hyp.sh to find better hyperparameters and find each module like MSTR, GVR in models.py.

Results

Comparison with other methods and ablation of each module

Main Results

Visualization

visualization

About

[IEEE RA-L] Think Step by Step: Chain-of-Gesture Prompting for Error Detection in Robotic Surgical Videos

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 99.7%
  • Shell 0.3%