OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis

Tianwei Lin^1,2*, Zhongwei Qiu^2,3,1*, Wenqiao Zhang^1†, Jiang Liu¹, Yihan Xie¹,
Mingjian Gao¹, Zhenxuan Fan¹, Zhaocheng Li¹, Sijing Li^1,2, Zhongle Xie¹,
Peng Lu¹, Yueting Zhuang¹, Ling Zhang², Beng Chin Ooi¹, Yingda Xia²

¹Zhejiang University ²DAMO Academy, Alibaba Group ³Hupan Lab

🌟 Overview

OmniCT is a unified Slice-Volume Large Vision-Language Model (LVLM) designed for medical Computed Tomography (CT) scenarios. It supports both 2D slice-based inputs and 3D volume-based data, enabling comprehensive CT understanding and analysis within a unified framework.

This project provides:

🧠 Models: OmniCT-3B, OmniCT-7B
📦 Dataset: MedEval-CT-Dataset (1.7M carefully curated samples)

🔥 News

[2026.03.04] We have released the full weights and projection-layer weights for OmniCT-3B and OmniCT-7B.
[2026.01.26] 🎉🎉🎉 OmniCT has been accepted by ICLR 2026 as Poster presentation.

TODO:

Release environment setup instructions
Release training and inference code
Release MedEval-CT (including MedEval-CT-Dataset)
Release pre-trained projection weights and full OmniCT model checkpoints

🛠️ Getting Started

This section provides instructions for environment setup, inference, and training.

Step 1: Environment Setup

First, clone the repository:

git clone https://github.com/alibaba-damo-academy/OmniCT.git
cd OmniCT

Using uv for environment installation (Recommended)

uv python install 3.11
uv venv --python 3.11
uv sync
uv pip install -e . --no-deps

Or using Conda

conda create -n omnict python=3.11 -y
conda activate omnict
pip install -r requirements.txt
pip install -e . --no-deps

💡 We strongly recommend installing Flash Attention for optimal training and inference performance.

Step 2. Model Weights Preparation

2.1 Inference: download OmniCT weights from huggingface

Model	Link
`OmniCT-3B`	Download
`OmniCT-7B`	Download

2.2 Training from scratch: download initialization weights (ViT + LLM)

Component	Link
`google/siglip-so400m-patch14-384`	Download
`Qwen/Qwen2.5-3B-Instruct`	Download
`Qwen/Qwen2.5-7B-Instruct`	Download

2.3 Training from pre-train stage (Optional)

To bypass the pre-train stage, you may directly use our released projection weights:

Model	Link
`OmniCT-3B-projection-weights`	Download
`OmniCT-7B-projection-weights`	Download

🤖 Inference

After completing installation and weight preparation, run the following command for inference:

uv run python evaluation/infer.py \
  --model_name_or_path "/path/to/OmniCT_Weights/" \
  --vision_tower_name_or_path "google/siglip-so400m-patch14-384" \
  --training_stage "eval" \
  --bf16 "true" \
  --modality "2d" \  # Optional: 2d or 3d
  --question "Describe the CT image." \
  --image_path "/path/to/input_ct/"

Alternatively, configure parameters in evaluation/infer.sh and run:

uv run bash evaluation/infer.sh

📚 Training

Training consists of two stages:

PT (Pre-train Stage)
SFT (Supervised Fine-tuning Stage)

4.1 Data Preparation

We provide the following in MedEval-CT-Dataset:

Pre-train data
SFT data

After downloading, please configure image paths or perform preprocessing according to dataset_info.json.

Data Format Examples

2D Data:

{
    "messages": [
        {
            "content": "<|image2d_start|><|image2d|><|image2d_end|>\n<|organ_start|><|mask2d|><|organ_end|>\nCan you describe the CT image?",
            "role": "user"
        },
        {
            "content": "The CT image shows ...",
            "role": "assistant"
        }
    ],
    "image": "/path/to/image"
}

3D Data:

{
    "messages": [
        {
            "content": "<|image3d_start|><|image3d|><|image3d_end|>\n<|organ_start|><|mask3d|><|organ_end|>\nCan you describe the CT image?",
            "role": "user"
        },
        {
            "content": "The CT image shows ...",
            "role": "assistant"
        }
    ],
    "image": "/path/to/image"
}

4.2 Single-Node Training

Configuration files:

configs/training_config/pt.json
configs/training_config/sft.json

PT:

uv run bash scripts/pt_single_node.sh
# or
uv run deepspeed \
src/omnict/train.py \
--config configs/training_config/pt.json \
--deepspeed \
--deepspeed_config configs/ds_config/zero2.json

SFT:

uv run bash scripts/sft_single_node.sh
# or
uv run deepspeed \
src/omnict/train.py \
--config configs/training_config/sft.json \
--deepspeed \
--deepspeed_config configs/ds_config/zero2.json

4.3 Multi-Node Training

PT:

uv run bash scripts/pt_multi_nodes.sh
# or
uv run torchrun \
--master_addr=$MASTER_ADDR \
--master_port=$MASTER_PORT \
--nproc_per_node=$NPROC_PER_NODE \
--nnodes=$WORLD_SIZE \
--node_rank=$RANK \
src/omnict/train.py \
--config configs/training_config/pt.json \
--deepspeed \
--deepspeed_config configs/ds_config/zero2.json

SFT:

uv run bash scripts/sft_multi_nodes.sh
# or
uv run torchrun \
--master_addr=$MASTER_ADDR \
--master_port=$MASTER_PORT \
--nproc_per_node=$NPROC_PER_NODE \
--nnodes=$WORLD_SIZE \
--node_rank=$RANK \
src/omnict/train.py \
--config configs/training_config/sft.json \
--deepspeed \
--deepspeed_config configs/ds_config/zero2.json

By default, checkpoints are saved to:

./checkpoint/<project>/<run_name>/

📑 Citation

If OmniCT is helpful to your research or work, please consider citing:

@article{lin2026omnict,
  title={OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis},
  author={Lin, Tianwei and Qiu, Zhongwei and Zhang, Wenqiao and Liu, Jiang and Xie, Yihan and Gao, Mingjian and Fan, Zhenxuan and Li, Zhaocheng and Li, Sijing and Xie, Zhongle and others},
  journal={arXiv preprint arXiv:2602.16110},
  year={2026}
}

⭐ Acknowledgement

We sincerely thank all researchers and open-source contributors advancing the field of medical multimodal understanding.

If you find our work useful, please consider giving us a ⭐ Star!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
configs		configs
datasets		datasets
evaluation		evaluation
scripts		scripts
src/omnict		src/omnict
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis

🌟 Overview

🔥 News

🛠️ Getting Started

Step 1: Environment Setup

Step 2. Model Weights Preparation

🤖 Inference

📚 Training

4.1 Data Preparation

4.2 Single-Node Training

4.3 Multi-Node Training

📑 Citation

⭐ Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis

🌟 Overview

🔥 News

🛠️ Getting Started

Step 1: Environment Setup

Step 2. Model Weights Preparation

🤖 Inference

📚 Training

4.1 Data Preparation

4.2 Single-Node Training

4.3 Multi-Node Training

📑 Citation

⭐ Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages