MVP: Modeling Variants of Prompts for Vision-Language Models

Ao Li*, Zongfang Liu*, Xinhua Li, Jinghui Zhang, Pengwei Wang† , Hu Wang†

*Equal contribution
†Corresponding author

Our paper: arxiv

Introduction

We introduce the RobustPrompt Benchmark, a systematic benchmark to evaluate robustness to different prompt templates for VLMs. It includes a dataset with hundreds of carefully designed prompt templates, divided into six types, covering a wide variety of commonly used templates.

Beside the benchmark, we propose Modeling Variants of Prompts (MVP), a simple yet effective method that mitigates sensitivity by modeling variants of prompt structures. The innovation of MVP lies in decoupling prompts into templates and class names, and using Variational Autoencoders (VAE) to model the distribution of diverse prompt structures.

Requirements

# Install dassl
git clone https://github.com/KaiyangZhou/Dassl.pytorch.git
cd Dassl.pytorch/
conda create -y -n dassl python=3.8
conda activate dassl
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt
python setup.py develop

# Download our code
git clone https://github.com/xiaoyaoxinyi/MVP.git

How to run

# Please refer to https://github.com/KaiyangZhou/CoOp
bash script/promptclip/main.sh dataset rn50 16

How to evaluate

# Please refer to https://github.com/KaiyangZhou/CoOp
bash script/promptclip/eval.sh dataset rn50

Cite us

@misc{li2025modelingvariantspromptsvisionlanguage,
      title={Modeling Variants of Prompts for Vision-Language Models}, 
      author={Ao Li and Zongfang Liu and Xinhua Li and Jinghui Zhang and Pengwei Wang and Hu Wang},
      year={2025},
      eprint={2503.08229},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.08229}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
clip		clip
configs		configs
datasets		datasets
scripts/promptclip		scripts/promptclip
trainers		trainers
LICENSE		LICENSE
README.md		README.md
benchmark.png		benchmark.png
model.png		model.png
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MVP: Modeling Variants of Prompts for Vision-Language Models

Introduction

Requirements

How to run

How to evaluate

Cite us

About

Uh oh!

Releases

Packages

Languages

License

liaolea/MVP

Folders and files

Latest commit

History

Repository files navigation

MVP: Modeling Variants of Prompts for Vision-Language Models

Introduction

Requirements

How to run

How to evaluate

Cite us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages