Tianwei Lin1,2*, Zhongwei Qiu2,3,1*, Wenqiao Zhang1†, Jiang Liu1, Yihan Xie1,
Mingjian Gao1, Zhenxuan Fan1, Zhaocheng Li1, Sijing Li1,2, Zhongle Xie1,
Peng Lu1, Yueting Zhuang1, Ling Zhang2, Beng Chin Ooi1, Yingda Xia2
1Zhejiang University 2DAMO Academy, Alibaba Group 3Hupan Lab
OmniCT is a unified Slice-Volume Large Vision-Language Model (LVLM) designed for medical Computed Tomography (CT) scenarios. It supports both 2D slice-based inputs and 3D volume-based data, enabling comprehensive CT understanding and analysis within a unified framework.
This project provides:
- 🧠 Models:
OmniCT-3B,OmniCT-7B - 📦 Dataset:
MedEval-CT-Dataset(1.7M carefully curated samples)
- [2026.03.04] We have released the full weights and projection-layer weights for OmniCT-3B and OmniCT-7B.
- [2026.01.26] 🎉🎉🎉 OmniCT has been accepted by ICLR 2026 as Poster presentation.
TODO:
- Release environment setup instructions
- Release training and inference code
- Release MedEval-CT (including
MedEval-CT-Dataset) - Release pre-trained projection weights and full OmniCT model checkpoints
This section provides instructions for environment setup, inference, and training.
First, clone the repository:
git clone https://github.com/alibaba-damo-academy/OmniCT.git
cd OmniCTUsing uv for environment installation (Recommended)
uv python install 3.11
uv venv --python 3.11
uv sync
uv pip install -e . --no-depsOr using Conda
conda create -n omnict python=3.11 -y
conda activate omnict
pip install -r requirements.txt
pip install -e . --no-deps💡 We strongly recommend installing Flash Attention for optimal training and inference performance.
2.1 Inference: download OmniCT weights from huggingface
| Model | Link |
|---|---|
OmniCT-3B |
Download |
OmniCT-7B |
Download |
2.2 Training from scratch: download initialization weights (ViT + LLM)
| Component | Link |
|---|---|
google/siglip-so400m-patch14-384 |
Download |
Qwen/Qwen2.5-3B-Instruct |
Download |
Qwen/Qwen2.5-7B-Instruct |
Download |
2.3 Training from pre-train stage (Optional)
To bypass the pre-train stage, you may directly use our released projection weights:
| Model | Link |
|---|---|
OmniCT-3B-projection-weights |
Download |
OmniCT-7B-projection-weights |
Download |
After completing installation and weight preparation, run the following command for inference:
uv run python evaluation/infer.py \
--model_name_or_path "/path/to/OmniCT_Weights/" \
--vision_tower_name_or_path "google/siglip-so400m-patch14-384" \
--training_stage "eval" \
--bf16 "true" \
--modality "2d" \ # Optional: 2d or 3d
--question "Describe the CT image." \
--image_path "/path/to/input_ct/"Alternatively, configure parameters in evaluation/infer.sh and run:
uv run bash evaluation/infer.shTraining consists of two stages:
- PT (Pre-train Stage)
- SFT (Supervised Fine-tuning Stage)
We provide the following in MedEval-CT-Dataset:
- Pre-train data
- SFT data
After downloading, please configure image paths or perform preprocessing according to dataset_info.json.
Data Format Examples
2D Data:
{
"messages": [
{
"content": "<|image2d_start|><|image2d|><|image2d_end|>\n<|organ_start|><|mask2d|><|organ_end|>\nCan you describe the CT image?",
"role": "user"
},
{
"content": "The CT image shows ...",
"role": "assistant"
}
],
"image": "/path/to/image"
}3D Data:
{
"messages": [
{
"content": "<|image3d_start|><|image3d|><|image3d_end|>\n<|organ_start|><|mask3d|><|organ_end|>\nCan you describe the CT image?",
"role": "user"
},
{
"content": "The CT image shows ...",
"role": "assistant"
}
],
"image": "/path/to/image"
}Configuration files:
configs/training_config/pt.jsonconfigs/training_config/sft.json
PT:
uv run bash scripts/pt_single_node.sh
# or
uv run deepspeed \
src/omnict/train.py \
--config configs/training_config/pt.json \
--deepspeed \
--deepspeed_config configs/ds_config/zero2.jsonSFT:
uv run bash scripts/sft_single_node.sh
# or
uv run deepspeed \
src/omnict/train.py \
--config configs/training_config/sft.json \
--deepspeed \
--deepspeed_config configs/ds_config/zero2.jsonPT:
uv run bash scripts/pt_multi_nodes.sh
# or
uv run torchrun \
--master_addr=$MASTER_ADDR \
--master_port=$MASTER_PORT \
--nproc_per_node=$NPROC_PER_NODE \
--nnodes=$WORLD_SIZE \
--node_rank=$RANK \
src/omnict/train.py \
--config configs/training_config/pt.json \
--deepspeed \
--deepspeed_config configs/ds_config/zero2.jsonSFT:
uv run bash scripts/sft_multi_nodes.sh
# or
uv run torchrun \
--master_addr=$MASTER_ADDR \
--master_port=$MASTER_PORT \
--nproc_per_node=$NPROC_PER_NODE \
--nnodes=$WORLD_SIZE \
--node_rank=$RANK \
src/omnict/train.py \
--config configs/training_config/sft.json \
--deepspeed \
--deepspeed_config configs/ds_config/zero2.jsonBy default, checkpoints are saved to:
./checkpoint/<project>/<run_name>/If OmniCT is helpful to your research or work, please consider citing:
@article{lin2026omnict,
title={OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis},
author={Lin, Tianwei and Qiu, Zhongwei and Zhang, Wenqiao and Liu, Jiang and Xie, Yihan and Gao, Mingjian and Fan, Zhenxuan and Li, Zhaocheng and Li, Sijing and Xie, Zhongle and others},
journal={arXiv preprint arXiv:2602.16110},
year={2026}
}We sincerely thank all researchers and open-source contributors advancing the field of medical multimodal understanding.
If you find our work useful, please consider giving us a ⭐ Star!
