📖 Introduction • 🧩 Methods • 🚀 How To Use • 📄 License • 📧 Contact
PRISM is a plug-in, reproducible toolbox for training and evaluating multimodal large language models (MLLMs) under continual instruction tuning (MCIT). A single entry point (run.py) orchestrates sequential task training, inference, and evaluation across multiple benchmarks and continual-learning methods.
If you use this repository, please cite:
@article{tang2026prism,
title={Prism: A Plug-in Reproducible Infrastructure for Scalable Multimodal Continual Instruction Tuning},
author={Jun-Tao Tang and Yu-Cheng Shi and Zhen-Hao Xie and Da-Wei Zhou},
year={2026},
journal={arXiv preprint arXiv:2605.26110},
}
@inproceedings{xie2026same,
title={SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning},
author={Xie, Zhen-Hao and Tang, Jun-Tao and Shi, Yu-Cheng and Ye, Han-Jia and Zhan, De-Chuan and Zhou, Da-Wei},
booktitle={ICML},
year={2026}
}Multimodal large language models (MLLMs) unify diverse vision and vision–language tasks into a shared instruction-following format. In real deployments, however, data and instructions arrive as streams: models must learn new tasks sequentially without erasing earlier capabilities. Standard fine-tuning suffers from catastrophic forgetting under this setting.
Multimodal continual instruction tuning (MCIT) addresses this by training MLLMs on a sequence of instruction-tuning stages while preserving performance on prior tasks. PRISM standardizes this workflow—benchmark definitions, method integrations, checkpoint layout, and evaluation—so that MCIT methods can be compared and extended under one infrastructure.
Each method is selected with --method <id> (folder under method/custom/<id>/).
To add a method, implement method/custom/<your_method>/integration.py and register with @CLMethodFactory.register("your_method").
Download from each repo’s Model Zoo, then set paths in config/paths/llava_paths.py or config/paths/internvl_paths.py. Select backbone via backbone in config/run_config.py (llava or internvl).
You can plug in additional backbones under config/backbone/ and backbone/, then register them in config/backbone/registry.py.
PRISM currently supports three benchmarks:
| Benchmark | --benchmark |
Tasks | Reference |
|---|---|---|---|
| CoIN | coin |
8 | Paper · Benchmark |
| UCIT | ucit |
6 | Paper · Benchmark |
| TriGap | trigap |
10 | Paper · Benchmark |
A benchmark typically has an image folder and an instruction folder. JSON files in the instruction folder reference image paths, so your on-disk layout must match those paths.
Then set the benchmark paths in config/benchmarks/<benchmark>.py (e.g. TRIGAP_IMAGE_DIR and TRIGAP_INSTRUCTION_DIR in TriGap.py).
For quick experiments, you can use smaller sub-splits: sample the instruction JSON yourself, save it with a _sub suffix (e.g. train_sub.json), and set "use_sub_dataset": true in config/run_config.py.
You can add custom benchmarks under config/benchmarks/ and register them in config/benchmarks/__init__.py.
If you are on NVIDIA RTX 5090 GPU(s) (our tested setup), a single command sets up everything from the repository root:
bash scripts/setup_env.shThis creates conda env prism (if missing), installs torch 2.8 + cu128, training/eval dependencies, flash-attn, and runs pip install -e ..
For other GPUs or CUDA versions, you may need to adjust PyTorch, flash-attn, and related libraries. See requirements/README.md for options (e.g. TORCH_REQUIREMENTS=requirements/torch-cu118.txt for older CUDA stacks, FLASH_ATTN_WHEEL, SKIP_FLASH_ATTN).
Activate and verify:
conda activate prism
python -c "import torch; import transformers; import deepspeed; print(torch.__version__, transformers.__version__)"Edit backbone paths under config/paths/ and benchmark roots under config/benchmarks/. Tune runs via config/run_config.py.
After configuration, run a quick zero-shot inference on a single task to check weights, data paths, and GPUs (zeroshot uses the base MLLM checkpoint only):
python run.py infer 0 --method zeroshotThen run continual training and evaluation:
python run.py train 0 1 2
python run.py infer 0 1 2
0,1,2are task indices (seeconfig/benchmarks/<benchmark>.py). You may train any tasks you need; stage k resumes from task k−1’s checkpoint. For inference, choose the checkpoint inconfig/run_config.py.CLI flags override config; omitted flags use config defaults.
This project is released under the MIT License.
We thank the following projects for their benchmarks and reference implementations used in PRISM:
If you have any questions, please feel free to propose new features by opening an issue or contact the authors: Jun-Tao Tang (juntao.tang@smail.nju.edu.cn), Yu-Cheng Shi (231250034@smail.nju.edu.cn), and Da-Wei Zhou (zhoudw@lamda.nju.edu.cn). Enjoy the code.
