Skip to content

alibaba-damo-academy/OmniCT

Repository files navigation

OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis

Tianwei Lin1,2*, Zhongwei Qiu2,3,1*, Wenqiao Zhang1†, Jiang Liu1, Yihan Xie1,
Mingjian Gao1, Zhenxuan Fan1, Zhaocheng Li1, Sijing Li1,2, Zhongle Xie1,
Peng Lu1, Yueting Zhuang1, Ling Zhang2, Beng Chin Ooi1, Yingda Xia2

1Zhejiang University 2DAMO Academy, Alibaba Group 3Hupan Lab

🌟 Overview

OmniCT is a unified Slice-Volume Large Vision-Language Model (LVLM) designed for medical Computed Tomography (CT) scenarios. It supports both 2D slice-based inputs and 3D volume-based data, enabling comprehensive CT understanding and analysis within a unified framework.

OmniCT Framework

This project provides:

  • 🧠 Models: OmniCT-3B, OmniCT-7B
  • 📦 Dataset: MedEval-CT-Dataset (1.7M carefully curated samples)

🔥 News

  • [2026.03.04] We have released the full weights and projection-layer weights for OmniCT-3B and OmniCT-7B.
  • [2026.01.26] 🎉🎉🎉 OmniCT has been accepted by ICLR 2026 as Poster presentation.

TODO:

  • Release environment setup instructions
  • Release training and inference code
  • Release MedEval-CT (including MedEval-CT-Dataset)
  • Release pre-trained projection weights and full OmniCT model checkpoints

🛠️ Getting Started

This section provides instructions for environment setup, inference, and training.

Step 1: Environment Setup

First, clone the repository:

git clone https://github.com/alibaba-damo-academy/OmniCT.git
cd OmniCT

Using uv for environment installation (Recommended)

uv python install 3.11
uv venv --python 3.11
uv sync
uv pip install -e . --no-deps

Or using Conda

conda create -n omnict python=3.11 -y
conda activate omnict
pip install -r requirements.txt
pip install -e . --no-deps

💡 We strongly recommend installing Flash Attention for optimal training and inference performance.

Step 2. Model Weights Preparation

2.1 Inference: download OmniCT weights from huggingface

Model Link
OmniCT-3B Download
OmniCT-7B Download

2.2 Training from scratch: download initialization weights (ViT + LLM)

Component Link
google/siglip-so400m-patch14-384 Download
Qwen/Qwen2.5-3B-Instruct Download
Qwen/Qwen2.5-7B-Instruct Download

2.3 Training from pre-train stage (Optional)

To bypass the pre-train stage, you may directly use our released projection weights:

Model Link
OmniCT-3B-projection-weights Download
OmniCT-7B-projection-weights Download

🤖 Inference

After completing installation and weight preparation, run the following command for inference:

uv run python evaluation/infer.py \
  --model_name_or_path "/path/to/OmniCT_Weights/" \
  --vision_tower_name_or_path "google/siglip-so400m-patch14-384" \
  --training_stage "eval" \
  --bf16 "true" \
  --modality "2d" \  # Optional: 2d or 3d
  --question "Describe the CT image." \
  --image_path "/path/to/input_ct/"

Alternatively, configure parameters in evaluation/infer.sh and run:

uv run bash evaluation/infer.sh

📚 Training

Training consists of two stages:

  • PT (Pre-train Stage)
  • SFT (Supervised Fine-tuning Stage)

4.1 Data Preparation

We provide the following in MedEval-CT-Dataset:

  • Pre-train data
  • SFT data

After downloading, please configure image paths or perform preprocessing according to dataset_info.json.

Data Format Examples

2D Data:

{
    "messages": [
        {
            "content": "<|image2d_start|><|image2d|><|image2d_end|>\n<|organ_start|><|mask2d|><|organ_end|>\nCan you describe the CT image?",
            "role": "user"
        },
        {
            "content": "The CT image shows ...",
            "role": "assistant"
        }
    ],
    "image": "/path/to/image"
}

3D Data:

{
    "messages": [
        {
            "content": "<|image3d_start|><|image3d|><|image3d_end|>\n<|organ_start|><|mask3d|><|organ_end|>\nCan you describe the CT image?",
            "role": "user"
        },
        {
            "content": "The CT image shows ...",
            "role": "assistant"
        }
    ],
    "image": "/path/to/image"
}

4.2 Single-Node Training

Configuration files:

  • configs/training_config/pt.json
  • configs/training_config/sft.json

PT:

uv run bash scripts/pt_single_node.sh
# or
uv run deepspeed \
src/omnict/train.py \
--config configs/training_config/pt.json \
--deepspeed \
--deepspeed_config configs/ds_config/zero2.json

SFT:

uv run bash scripts/sft_single_node.sh
# or
uv run deepspeed \
src/omnict/train.py \
--config configs/training_config/sft.json \
--deepspeed \
--deepspeed_config configs/ds_config/zero2.json

4.3 Multi-Node Training

PT:

uv run bash scripts/pt_multi_nodes.sh
# or
uv run torchrun \
--master_addr=$MASTER_ADDR \
--master_port=$MASTER_PORT \
--nproc_per_node=$NPROC_PER_NODE \
--nnodes=$WORLD_SIZE \
--node_rank=$RANK \
src/omnict/train.py \
--config configs/training_config/pt.json \
--deepspeed \
--deepspeed_config configs/ds_config/zero2.json

SFT:

uv run bash scripts/sft_multi_nodes.sh
# or
uv run torchrun \
--master_addr=$MASTER_ADDR \
--master_port=$MASTER_PORT \
--nproc_per_node=$NPROC_PER_NODE \
--nnodes=$WORLD_SIZE \
--node_rank=$RANK \
src/omnict/train.py \
--config configs/training_config/sft.json \
--deepspeed \
--deepspeed_config configs/ds_config/zero2.json

By default, checkpoints are saved to:

./checkpoint/<project>/<run_name>/

📑 Citation

If OmniCT is helpful to your research or work, please consider citing:

@article{lin2026omnict,
  title={OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis},
  author={Lin, Tianwei and Qiu, Zhongwei and Zhang, Wenqiao and Liu, Jiang and Xie, Yihan and Gao, Mingjian and Fan, Zhenxuan and Li, Zhaocheng and Li, Sijing and Xie, Zhongle and others},
  journal={arXiv preprint arXiv:2602.16110},
  year={2026}
}

⭐ Acknowledgement

We sincerely thank all researchers and open-source contributors advancing the field of medical multimodal understanding.

If you find our work useful, please consider giving us a ⭐ Star!

About

OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors