dual-encoding-system

Code for ICCE-Asia 2023 paper Dual Encoding++: Optimization of Text-Video Retrieval via Fine-tuning and Pruning.

dual-encoding-system
- Environment
- Data
- Installation
- Run
  - Preparation
  - Train
- Proposed Framework
- Results

Environment

Ubuntu: 20.04 LTS
Python: 3.8
PyTorch: 2.0
CUDA: 11.7.1
cuDNN: 8.6.0

Data

MSR-VTT official split

Train: 6,513 clips, 130,260 captions
Validation: 497 clips, 9,940 captions
Evaluation: 2,990 clips, 59,800 captions

Installation

We used Miniconda to set up our deep learning workspace. After installing Miniconda, you can create conda environment using conda_env_cuda.yaml.

conda env create --file conda_env_cuda.yaml

Please run the follwing command to download pre-trained video features (4.24 GB) and place We used the same MSR-VTT video features used in original Dual Encoding. They are the concatenation of ResNeXt-101 and ResNet-152 features. Please download msrvtt10k.tar.gz (4.24 GB) from this Google Drive URL. After downloading it, extract the directory with the following command and place it under data/. For more information, you can refer refer here.

tar -xzvf msrvtt10k.tar.gz

We also used pretrained word2vec embeddings trained on 30M Flickr images' English tags provided by this paper. Please download word2vec.tar.gz (3 GB) from this Google Drive URL (same URL as the above). After downloading it, extract the directory with the following command and place it under data/. For more information, you can refer here.

tar -xzvf word2vec.tar.gz

After all, data/ structure should be as follows.

data
├── msrvtt10k
│   ├── FeatureData
│   │   └── resnext101-resnet152
│   │       ├── feature.bin
│   │       ├── id.txt
│   │       ├── shape.txt
│   │       └── video2frames.txt
│   └── TextData
│       ├── msrvtt10ktest.caption.txt
│       ├── msrvtt10ktrain.caption.txt
│       └── msrvtt10kval.caption.txt
└── word2vec
    ├── feature.bin
    ├── id.txt
    └── shape.txt

Run

Preparation

To craete vocabulary from training dataset's captions, please run bash run.sh vocab. This creates vocab.json.
To create tags (concept features), please run bash run.sh tags. This creates tag_vocab.json and video_tag.txt.
To create word2vec embeddings, please run bash run.sh word2vec. This creates pretrained_weight.npy used for trianing.

Train

To train the model, please run bash run.sh train_hybrid_cuda. (Training automatically includes evaluation at the end.)
To evaluate the model, please run bash run.sh test_hybrid_cuda $MODEL_PATH.
For debug mode, please run bash run.sh train_hybrid_cpu_debug after creating captions for debug using bash run.sh tiny.

You can check out commands in run.sh.

Proposed Framework

Results

Larger R@Ks and sum R, and smaller mean r represent better performance.
[B] shows considerable performance improvement compared to Dual Encoding.
Final optimized models [E] and [F] have smaller sizes and better overall performance than Dual Encoding.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
docs		docs
scripts		scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
conda_env_cuda.yaml		conda_env_cuda.yaml
conda_env_local.yaml		conda_env_local.yaml
config_hybrid.yaml		config_hybrid.yaml
dataset.py		dataset.py
evaluation.py		evaluation.py
measure.py		measure.py
model.py		model.py
orig_param_num.pickle		orig_param_num.pickle
run.sh		run.sh
run_hybrid.py		run_hybrid.py
sanity_check.py		sanity_check.py
vocab.py		vocab.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

dual-encoding-system

Environment

Data

Installation

Run

Preparation

Train

Proposed Framework

Results

About

Uh oh!

Releases

Packages

Languages

dongYoun2/dual-encoding-system

Folders and files

Latest commit

History

Repository files navigation

dual-encoding-system

Environment

Data

Installation

Run

Preparation

Train

Proposed Framework

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages