Official Pytorch implementation for the crossview retrieval module in Egoinstructor at CVPR 2024
Retrieval-Augmented Egocentric Video Captioning
Jilan Xu, Yifei Huang, Junlin Hou, Guo Chen, Yuejie Zhang, Rui Feng, Weidi Xie
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
The retrieval module is trained on pseudo paired egocentric videos (Ego4d) and exocentric videos (HowTo100M) using a EgoExoNCE loss.
Please refer to docs/data.md. If you would like to extract the video features on your own, please refer to [feature_extraction/feature_extraction.md]
Training with slurm script
./scripts/train_slurm.sh
or run
python main_pretrain_contrastive.py --config ./configs/egohowto.yml
To evaluate the model's retrieval performance, modify the resume checkpoint path in ./configs/test.yml
resume: /path/to/the/trained/checkpoint.pt
and run
python main_pretrain_contrastive.py --config ./configs/test.yml
Model | Link | Size |
---|---|---|
Crossview Retrieval Module | 🤗 HF link | 1.83G |