Skip to content

Latest commit

 

History

History

retrieval

Egoinstructor retrieval module

Official Pytorch implementation for the crossview retrieval module in Egoinstructor at CVPR 2024

Retrieval-Augmented Egocentric Video Captioning
Jilan Xu, Yifei Huang, Junlin Hou, Guo Chen, Yuejie Zhang, Rui Feng, Weidi Xie
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Paper Project Page

The retrieval module is trained on pseudo paired egocentric videos (Ego4d) and exocentric videos (HowTo100M) using a EgoExoNCE loss.

Preparing Pretrain Data

Please refer to docs/data.md. If you would like to extract the video features on your own, please refer to [feature_extraction/feature_extraction.md]

Training and Evaluation

Training with slurm script

./scripts/train_slurm.sh

or run

python main_pretrain_contrastive.py --config ./configs/egohowto.yml

To evaluate the model's retrieval performance, modify the resume checkpoint path in ./configs/test.yml

resume: /path/to/the/trained/checkpoint.pt

and run

python main_pretrain_contrastive.py --config ./configs/test.yml

Pretrained Model

Model Link Size
Crossview Retrieval Module 🤗 HF link 1.83G