Skip to content

MICCAI 2025: SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting

License

Notifications You must be signed in to change notification settings

lastbasket/SurgTPGS

Repository files navigation

SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting

MICCAI 2025

Yiming Huang*, Long Bai*, Beilei Cui*, Kun Yuan,
Guankun Wang, Mobarak I. Hoque, Nicolas Padoy, Nassir Navab, Hongliang Ren

|| Paper || Project Page ||

Logo

Environment

  1. Install the CUDA toolkit on ubuntu from Download link, and then:
export PATH=/usr/local/cuda-11.7/bin:${PATH}
export LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/cuda-11.7
  1. Install the Python environment
git clone https://github.com/lastbasket/SurgTPGS
cd SurgTPGS
conda create -n SurgTPGS python=3.7 
conda activate SurgTPGS

pip install -r requirements.txt
pip install -e submodules/depth-diff-gaussian-rasterization
pip install -e submodules/simple-knn

Datasets and Pre-trained Checkpoints

  1. We have the processed version of CholeSeg and EndoVis 2018 datasets with disparity maps. Download the datasets from the Download Link, unzip to the following structure:
├── data
│   ├── cholecseg_sub
│   |   ├── video01_00080
│   |   ├── video01_00240
│   |   ├── ...
│   ├── endovis_2018
│   |   ├── seq_5_sub
│   |   ├── seq_9_sub
  1. Download the SAM checkpoint, VLM(CLIP finetuned with CAT-Seg): CholecSeg checkpoints, and EndoVis 2018. Placing the checkpoints as:
├── ckpts
│   ├── model_final_cholecseg.pth
│   ├── model_final_endovis.pth
│   ├── sam_vit_h_4b8939.pth

Training

# 1. data processing for VLM and SAM features
bash pre_data.sh
# 2. use the autoencoder for the semantic features
bash pre_VL_features.sh
# 3. train the SurgTPGS
bash train.sh

Rendering and Evaluation

# 1. render the RGB, Depth, and semantic features
bash render.sh
# 2. eval the semantic segmentation on novel view with text prompt
bash eval_fine.sh

Related Works

Welcome to follow our related works:

  • Endo-4DGX: Robust Endoscopic Gaussian Splatting with Illumination Correction
  • Endo2DTAM: Gaussian Splatting SLAM for Endoscopic Scene
  • Endo-4DGS: Monocular Endoscopic Scene Reconstruction with Gaussian Splatting

Citation

@misc{huang2025surgtpgssemantic3dsurgical,
      title={SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting}, 
      author={Yiming Huang and Long Bai and Beilei Cui and Kun Yuan and Guankun Wang and Mobarakol Islam and Nicolas Padoy and Nassir Navab and Hongliang Ren},
      year={2025},
      eprint={2506.23309},
      archivePrefix={arXiv},
      primaryClass={eess.IV},
      url={https://arxiv.org/abs/2506.23309}, 
}

About

MICCAI 2025: SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published