Skip to content

adobe-research/zs-video-eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zero Shot Natural Language Temporal Video Grounding.

📑 Paper · 🌎 Project Page · 💻 Training Code

Official PyTorch implementation of ResidualViT for Efficient Temporally Dense Video Encoding, accepted at ICCV 2025 (highlight paper).
This repository provides the testing code for NLTVG task.

🚀 Installation

This repository uses PyTorchLighting. It also uses Hydra to manage runs configurations. We have facilitated a conda environment for quick setup. Assuming conda is installed, run:

conda env create -f environment.yml
conda activate sm
pip install --no-dependencies git+https://github.com/Soldelli/residualvit
export PYTHONPATH="$PYTHONPATH:$PWD"

🎯 Zero Shot Evaluation

To evaluate a model in particular dataset, follow the template below:

python -m aligner command=evaluate encoder=$MODEL data=$DATASET output_dir=$OUTPUT_DIR

📊 Supported datasets

Dataset Annotations Videos
Charades-STA Download Website
ActivityNet-Captions Download Website

🤖 Supported encoders

  • OpenCLIP. This repository supports these available OpenCLIP models.
    Available encoders: openclip_vit_b_32, openclip_vit_b_16, openclip_vit_l_14.
  • ResidualViT. See scripts for examples on how to use this encoder and visit the official ResidualViT codebase for the training code (here).

📂 Repository Structure

zs-video-eval/
├── aligner/          # Core source code
├── configs/          # Config files for encoders and datasets
├── scripts/          # Training scripts
├── environment.yml   # Python dependencies
├── LICENSE.md        # Project license
├── README.md         # Project documentation
└── ...

💡 Citation

If you use this code or find it helpful in your research, please cite our papers:

@inproceedings{soldan2025residualvit,
  title={ResidualViT for Efficient Temporally Dense Video Encoding},
  author={Soldan, Mattia and Caba Heilbron, Fabian and Ghanem, Bernard and Sivic, Josef and Russell, Bryan},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2025}
}

@article{castro2022fitclip,
  title={Fitclip: Refining large-scale pretrained image-text models for zero-shot video understanding tasks},
  author={Castro, Santiago and Heilbron, Fabian Caba},
  journal={arXiv preprint arXiv:2203.13371},
  year={2022}
}

🙏 Acknowledgements

This repository is built on top of FitCLIP, thanks to our collaborators and open-source community.

📜 License

This project is licensed under the ADOBE RESEARCH LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published