📑 Paper · 🌎 Project Page · 💻 Training Code
Official PyTorch implementation of ResidualViT for Efficient Temporally Dense Video Encoding, accepted at ICCV 2025 (highlight paper).
This repository provides the testing code for NLTVG task.
This repository uses PyTorchLighting. It also uses Hydra to manage runs configurations. We have facilitated a conda environment for quick setup. Assuming conda is installed, run:
conda env create -f environment.yml
conda activate sm
pip install --no-dependencies git+https://github.com/Soldelli/residualvit
export PYTHONPATH="$PYTHONPATH:$PWD"To evaluate a model in particular dataset, follow the template below:
python -m aligner command=evaluate encoder=$MODEL data=$DATASET output_dir=$OUTPUT_DIR| Dataset | Annotations | Videos |
|---|---|---|
| Charades-STA | Download | Website |
| ActivityNet-Captions | Download | Website |
- OpenCLIP. This repository supports these available OpenCLIP models.
Available encoders:openclip_vit_b_32,openclip_vit_b_16,openclip_vit_l_14. - ResidualViT. See scripts for examples on how to use this encoder and visit the official ResidualViT codebase for the training code (here).
zs-video-eval/
├── aligner/ # Core source code
├── configs/ # Config files for encoders and datasets
├── scripts/ # Training scripts
├── environment.yml # Python dependencies
├── LICENSE.md # Project license
├── README.md # Project documentation
└── ...If you use this code or find it helpful in your research, please cite our papers:
@inproceedings{soldan2025residualvit,
title={ResidualViT for Efficient Temporally Dense Video Encoding},
author={Soldan, Mattia and Caba Heilbron, Fabian and Ghanem, Bernard and Sivic, Josef and Russell, Bryan},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year={2025}
}
@article{castro2022fitclip,
title={Fitclip: Refining large-scale pretrained image-text models for zero-shot video understanding tasks},
author={Castro, Santiago and Heilbron, Fabian Caba},
journal={arXiv preprint arXiv:2203.13371},
year={2022}
}This repository is built on top of FitCLIP, thanks to our collaborators and open-source community.
This project is licensed under the ADOBE RESEARCH LICENSE.