Dataset Setup

VQA Setup

The image features are extracted using the bottom-up-attention strategy, with each image being represented as a dynamic number (k=[10,100]) of 2048-D features. We store the features for each image in a .npz file. You can prepare the visual features by yourself or download the extracted features from OneDrive or BaiduYun. The downloaded files contains three files: train2014.tar.gz, val2014.tar.gz, and test2015.tar.gz, corresponding to the features of the train/val/test images for VQA-v2, respectively. Run the following commands to unzip the features:

$ mkdir data/vqa/bua-r101-max100
$ tar -xzvf train2014.tar.gz -C data/vqa/bua-r101-max100/
$ tar -xzvf val2014.tar.gz -C data/vqa/bua-r101-max100/
$ tar -xzvf test2015.tar.gz -C data/vqa/bua-r101-max100/

Then download the QA files for VQA-v2. Besides, we use the VQA samples from the visual genome dataset(VG) to expand the training samples. Similar to existing strategies, we preprocessed the samples by two rules:

Select the QA pairs with the corresponding images appear in the MSCOCO train and val splits.
Select the QA pairs with the answer appear in the processed answer list (occurs more than 8 times in whole VQA-v2 answers).

For convenience, we provide our processed VG questions and annotations files, you can download them from OneDrive or BaiduYun. Place all annotation files into the folder data/vqa/annotations.

Finally, the data folder will have the following structure:

|-- data
	|-- vqa
	    |-- bua-r101-max100
	    |   |-- train2014
	    |   |   |-- COCO_train2014_...jpg.npz
	    |   |   |-- ...
	    |   |-- val2014
	    |   |   |-- COCO_val2014_...jpg.npz
	    |   |   |-- ...
	    |   |-- test2015
	    |   |   |-- COCO_test2015_...jpg.npz
	    |   |   |-- ...
		|-- annotations
	    |   |-- v2_OpenEnded_mscoco_train2014_questions.json
	    |   |-- v2_OpenEnded_mscoco_val2014_questions.json
	    |   |-- v2_OpenEnded_mscoco_test2015_questions.json
	    |   |-- v2_OpenEnded_mscoco_test-dev2015_questions.json
	    |   |-- v2_mscoco_train2014_annotations.json
	    |   |-- v2_mscoco_val2014_annotations.json
	    |   |-- VG_questions.json
	    |   |-- VG_annotations.json

VGD Setup

The image features for VGD task are also extracted using the bottom-up-attention strategy, with each image being represented as a fixed number (k=100) of 2048-D features extracted from a pretrained Faster-RCNN model. For training the Faster RCNN model, we exclude the image that overlapped with the RefCOCO/RefCOCO+/RefCOCOg to avoid contamination of the visual grounding datasets. Similar to VQA, the features for each image are stored in a .npz file. We provide the extracted features on OneDrive. After downloaded the zipped files, you can run the following commands to obtain the features in the right place.

$ cat vgd-bua-fix100.tar.gz* | tar xz
$ mv vgd-bua-fix100 data/vgd/bua-r101-fix100

The annotation files for RefCOCO, RefCOCO+, RefCOCOg can be downloaded from its original repository here. We provide the scripts as follows to preprocess them into our desired format:

$ python tools/ref_process.py
$ python tools/ref_process_plus.py
$ python tools/ref_process_g.py

Finally, the data folder will have the following structure:

|-- data
	|-- vgd
	    |-- bua-r101-fix100
	    |-- refcoco
	    |   |-- train.json
	    |   |-- val.json
	    |   |-- testA.json
	    |   |-- testB.json
	    |-- refcoco+
	    |   |-- train.json
	    |   |-- val.json
	    |   |-- testA.json
	    |   |-- testB.json
	    |-- refcocog
	    |   |-- train.json
	    |   |-- val.json
	    |   |-- test.json

Additionally, it is also needed to build as follows:

$ cd mmnas/utils
$ python3 setup.py build
$ cp build/lib.*/*.so .
$ cd ../..

ITM Setup

Following the strategy in SCAN, the image features for ITM are also extracted using the bottom-up-attention strategy, with each image being represented as a fixed number (k=36) of 2048-D features. We store the features for each image in a .npz file. We provide the extracted features on OneDrive. After downloaded the zipped files, you can run the following commands to obtain the features in the right place.

$ cat itm-bua-fix36.tar.gz* | tar xz
$ mv itm-bua-fix36 data/itm/flickr_bua-r101-fix36

The annotation files of the Flickr30K dataset can be downloaded here and here to extract the f30k_precomp folder and the dataset_flickr30k.json file, respectively.

Finally, the data folder will have the following structure:

|-- data
	|-- itm
	    |-- flickr_bua-r101-fix36
	    |-- dataset_flickr30k.json
	    |-- f30k_precomp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset Setup

VQA Setup

VGD Setup

ITM Setup

FilesExpand file tree

dataset_setup.md

Latest commit

History

dataset_setup.md

File metadata and controls

Dataset Setup

VQA Setup

VGD Setup

ITM Setup