Authors: Pritam Sarkar and Ali Etemad
This repository provides the official implementation of VCRBench.
Clone the repository and navigate to the VCRBench directory:
git clone https://github.com/pritamqu/VCRBench
cd VCRBench
This repository supports several LVLMs for direct evaluation on VCRBench.
Our data can be accessed via this link: [VCRBench]
Please download the videos and questions from the link and save them in your local directory.
mkdir HF_DATA # create a dir where you want to download the data
cd HF_DATA # go to that dir
git lfs install
git clone https://huggingface.co/datasets/pritamqu/VCRBench
Please make sure to update the video-folder
and question-file
in inference scripts as per your path.
See our leaderborad here.
If you want to add your model to our leaderboard, please send model responses to [email protected]
, in the same the format as provided in sample response.
You can download the open-source weights using:
git lfs install
git clone [email protected]:Qwen/Qwen2.5-VL-72B-Instruct
OR you can also evaluate models using API as done for Gemini and GPT4o.
conda create -n vcr python=3.10 -y
conda activate vcr
pip install -r requirements.txt
We provide scripts to directly evaluate several open-source (e.g., Qwen2.5-VL-Instruct, InternVL2_5, VideoLLaMA3, VideoLLaVA) and closed-source (e.g., Gemini, GPT-4o) models on VCRBench.
Evaluation scripts are located here. For example, to evaluate Qwen2.5-VL-72B-Instruct:
bash scripts/qwen_2_5_vl/inference72.sh
You can use the given evaluation scripts as a reference to evaluate on other models.
We also provide scripts to test open-source models equipped with RRD.
For example, to evaluate Qwen2.5-VL-72B-Instruct with RRD:
bash scripts/qwen_2_5_vl/rrd72.sh
If you find this work useful, please consider citing our paper:
@misc{sarkar2025vcrbench,
title={VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language Models},
author={Pritam Sarkar and Ali Etemad},
year={2025},
eprint={2505.08455},
archivePrefix={arXiv},
primaryClass={cs.CV},
}
This project incorporates datasets and model checkpoints that are subject to their respective original licenses.
Users must adhere to the terms and conditions specified by these licenses.
Assets used in this work include, but are not limited to, CrossTask.
This project does not impose any additional constraints beyond those stipulated in the original licenses. Users must ensure their usage complies with all applicable laws and regulations.
This repository is released under the MIT License. See LICENSE for details.
For any issues or questions, please open an issue or contact Pritam Sarkar at [email protected]!