This repository contains a fork of the original VLM2Vec codebase, modified for easy Pyserini integration and repackaged as a PyPI package.
current_version = "0.1.5"
All 24 Visual Document Retrieval tasks are supported. This covers ViDoRE ViDoRE v2, VisRAG, ViDoSeek, and MMLongBench.
Any VL models with qwen2-vl, gme, and lamra backbones are supported. This includes gme-Qwen2-VL-2B/7B-Instruct, VLM2Vec/VLM2Vec-V2.0, code-kunkun/LamRA-Ret and more.
Install the package directly from PyPI:
pip install vlm2vec-for-pyseriniOr, install from source:
git clone https://github.com/castorini/VLM2Vec-for-Pyserini.git
cd VLM2Vec-for-Pyserini
pip install .Assuming that you have cloned the repository and you are in the root dir:
- Download the visdoc from HuggingFace and convert the
corpus,topicsandqueriesto the format ready for Pyserini:
bash src/pyserini_integration/prepare_dataset.sh- Run encoding, indexing, and search. Then evaluation and results aggregation using the following script:
bash src/pyserini_integration/experiments.shIf you want to use the PyPI package, take a look at download_visdoc.py, save_pyserini_data.py, and quick_start_demo.py files under src/pyserini_integration/ as sample code.
For contact regarding the Pyserini integration section, please email Sahel Sharifymoghaddam.
For contact regarding the original VLM2Vec codebase, please email the authors of the original repository.
If you use this work with Pyserini, please cite Pyserini in addition to the original VLM2Vec paper:
@INPROCEEDINGS{Lin_etal_SIGIR2021_Pyserini,
author = "Jimmy Lin and Xueguang Ma and Sheng-Chieh Lin and Jheng-Hong Yang and Ronak Pradeep and Rodrigo Nogueira",
title = "{Pyserini}: A {Python} Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations",
booktitle = "Proceedings of the 44th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021)",
year = 2021,
pages = "2356--2362",
}
@article{jiang2024vlm2vec,
title={VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks},
author={Jiang, Ziyan and Meng, Rui and Yang, Xinyi and Yavuz, Semih and Zhou, Yingbo and Chen, Wenhu},
journal={arXiv preprint arXiv:2410.05160},
year={2024}
}
@article{meng2025vlm2vecv2,
title={VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents},
author={Rui Meng and Ziyan Jiang and Ye Liu and Mingyi Su and Xinyi Yang and Yuepeng Fu and Can Qin and Zeyuan Chen and Ran Xu and Caiming Xiong and Yingbo Zhou and Wenhu Chen and Semih Yavuz},
journal={arXiv preprint arXiv:2507.04590},
year={2025}
}This project is licensed under the Apache 2.0 License. See the LICENSE file for details.