PRISM: Multimodal Continual Instruction Tuning Toolbox

📖 Introduction • 🧩 Methods • 🚀 How To Use • 📄 License • 📧 Contact

PRISM is a plug-in, reproducible toolbox for training and evaluating multimodal large language models (MLLMs) under continual instruction tuning (MCIT). A single entry point (run.py) orchestrates sequential task training, inference, and evaluation across multiple benchmarks and continual-learning methods.

If you use this repository, please cite:

@article{tang2026prism,
  title={Prism: A Plug-in Reproducible Infrastructure for Scalable Multimodal Continual Instruction Tuning},
  author={Jun-Tao Tang and Yu-Cheng Shi and Zhen-Hao Xie and Da-Wei Zhou},
  year={2026},
  journal={arXiv preprint arXiv:2605.26110},
}

@inproceedings{xie2026same,
  title={SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning},
  author={Xie, Zhen-Hao and Tang, Jun-Tao and Shi, Yu-Cheng and Ye, Han-Jia and Zhan, De-Chuan and Zhou, Da-Wei},
  booktitle={ICML},
  year={2026}
}

📖 Introduction

Multimodal large language models (MLLMs) unify diverse vision and vision–language tasks into a shared instruction-following format. In real deployments, however, data and instructions arrive as streams: models must learn new tasks sequentially without erasing earlier capabilities. Standard fine-tuning suffers from catastrophic forgetting under this setting.

Multimodal continual instruction tuning (MCIT) addresses this by training MLLMs on a sequence of instruction-tuning stages while preserving performance on prior tasks. PRISM standardizes this workflow—benchmark definitions, method integrations, checkpoint layout, and evaluation—so that MCIT methods can be compared and extended under one infrastructure.

🧩 Methods Implemented

Each method is selected with --method <id> (folder under method/custom/<id>/).

Abbr.	`--method`	Paper
HiDe-LLaVA	`hide_llava`	HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model
Replay+LoRA	`replay_lora`	LoRA: Low-Rank Adaptation of Large Language Models
LoRA	`ft_lora`	LoRA: Low-Rank Adaptation of Large Language Models
O-LoRA	`olora`	Orthogonal Subspace Learning for Language Model Continual Learning
SMoLoRA	`smolora`	SMoLoRA: Exploring and Defying Dual Catastrophic Forgetting in Continual Visual Instruction Tuning
MoELoRA	`moelora`	CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model
CL-MoE	`clmoe`	CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering
ModalPrompt	`modal_prompt`	ModalPrompt: Towards Efficient Multimodal Continual Instruction Tuning with Dual-Modality Guided Prompt
EWC	`ewc`	Overcoming catastrophic forgetting in neural networks
DisCo	`disco`	Federated Continual Instruction Tuning
SAME	`same`	SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning
Zero-shot	`zeroshot`	Visual Instruction Tuning

To add a method, implement method/custom/<your_method>/integration.py and register with @CLMethodFactory.register("your_method").

🚀 How To Use

Pre-trained Weights

Download from each repo’s Model Zoo, then set paths in config/paths/llava_paths.py or config/paths/internvl_paths.py. Select backbone via backbone in config/run_config.py (llava or internvl).

LLaVA — llava-v1.5-7b
InternVL — InternVL-Chat-ViT-6B-Vicuna-7B

You can plug in additional backbones under config/backbone/ and backbone/, then register them in config/backbone/registry.py.

Datasets

PRISM currently supports three benchmarks:

Benchmark	`--benchmark`	Tasks	Reference
CoIN	`coin`	8	Paper · Benchmark
UCIT	`ucit`	6	Paper · Benchmark
TriGap	`trigap`	10	Paper · Benchmark

A benchmark typically has an image folder and an instruction folder. JSON files in the instruction folder reference image paths, so your on-disk layout must match those paths.

Then set the benchmark paths in config/benchmarks/<benchmark>.py (e.g. TRIGAP_IMAGE_DIR and TRIGAP_INSTRUCTION_DIR in TriGap.py).

For quick experiments, you can use smaller sub-splits: sample the instruction JSON yourself, save it with a _sub suffix (e.g. train_sub.json), and set "use_sub_dataset": true in config/run_config.py.

You can add custom benchmarks under config/benchmarks/ and register them in config/benchmarks/__init__.py.

Environment setup (one command)

If you are on NVIDIA RTX 5090 GPU(s) (our tested setup), a single command sets up everything from the repository root:

bash scripts/setup_env.sh

This creates conda env prism (if missing), installs torch 2.8 + cu128, training/eval dependencies, flash-attn, and runs pip install -e ..

For other GPUs or CUDA versions, you may need to adjust PyTorch, flash-attn, and related libraries. See requirements/README.md for options (e.g. TORCH_REQUIREMENTS=requirements/torch-cu118.txt for older CUDA stacks, FLASH_ATTN_WHEEL, SKIP_FLASH_ATTN).

Activate and verify:

conda activate prism
python -c "import torch; import transformers; import deepspeed; print(torch.__version__, transformers.__version__)"

Paths and config

Edit backbone paths under config/paths/ and benchmark roots under config/benchmarks/. Tune runs via config/run_config.py.

After configuration, run a quick zero-shot inference on a single task to check weights, data paths, and GPUs (zeroshot uses the base MLLM checkpoint only):

python run.py infer 0 --method zeroshot

Then run continual training and evaluation:

python run.py train 0 1 2
python run.py infer 0 1 2

0, 1, 2 are task indices (see config/benchmarks/<benchmark>.py). You may train any tasks you need; stage k resumes from task k−1’s checkpoint. For inference, choose the checkpoint in config/run_config.py.

CLI flags override config; omitted flags use config defaults.

📄 License

This project is released under the MIT License.

🙏 Acknowledgments

We thank the following projects for their benchmarks and reference implementations used in PRISM:

📧 Contact

If you have any questions, please feel free to propose new features by opening an issue or contact the authors: Jun-Tao Tang (juntao.tang@smail.nju.edu.cn), Yu-Cheng Shi (231250034@smail.nju.edu.cn), and Da-Wei Zhou (zhoudw@lamda.nju.edu.cn). Enjoy the code.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
PEFT		PEFT
backbone		backbone
config		config
core		core
docs		docs
method		method
requirements		requirements
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.py		run.py
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PRISM: Multimodal Continual Instruction Tuning Toolbox

📖 Introduction

🧩 Methods Implemented

🚀 How To Use

Pre-trained Weights

Datasets

Environment setup (one command)

Paths and config

📄 License

🙏 Acknowledgments

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PRISM: Multimodal Continual Instruction Tuning Toolbox

📖 Introduction

🧩 Methods Implemented

🚀 How To Use

Pre-trained Weights

Datasets

Environment setup (one command)

Paths and config

📄 License

🙏 Acknowledgments

📧 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages