IQVA: Immersive Question-directed Visual Attention

This is the official repository for Immersive Question-directed Visual Attention (IQVA) datasets. It provides the first visual attention dataset that takes into account the correctness of attention, and a framework to simultaneously predict both the correct and incorrect attentions. An example illustrating the correctness of attention in the Immersive Question Answering context is shown below:

Reference

If you use our code or data, please cite our paper:

@InProceedings{IQVA,
author = {Jiang, Ming and Chen, Shi and Yang, Jinhui and Zhao, Qi.},
title = {Fantastic Answers and Where to Find Them: Immersive Question-Directed Visual Attention},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2020}
}

Disclaimer

We adopt the implementation of Skip-Thought from this repository. Please refer to these links for further README information.

Requirements

Requirements for Pytorch. We use Pytorch 1.2.0 in our experiments.
Requirements for Tensorflow. We only use the tensorboard for visualization.
Python 3.6+

IQVA Dataset

All the questions, answers, YouTube video IDs, and ground-truth attention maps are available for download. For each trial (i.e., a single question), the saliency maps and raw fixation maps are stored in different folder. We provide both the maps aggregated across all participants and those for different groups if applicable (i.e., participants with correct and incorrect answers). Information about the questions and data splits are stored in question_info.json and split_info.json, respectively. In our experiments for correctness-aware attention prediction, we only use the questions with human accuracy between 20-80% , and they are highlighted with valid_correctness=1 in split_info.json. For the experiments for attention prediction regardless of correctness, we use all data.

Note that we do not provide the raw Youtube videos, but instead include their video id in question_info.json. The saliency maps and fixation maps are named based on the frame ids of the corresponding videos, thus it should be straightforward to retrieve the video inputs accordingly.

Data Pre-processing

Download our IQVA dataset, and unzip it to the root directory of this project.
Download the corresponding videos, and retrieve the video frames used in our dataset (stored as JPG images in $IMG_DIR).
Pre-process the questions to obtain a word dictionary:

python process_question --que_dir $Question_FILE

Correctness-aware Attention Prediction

For training our model for simultaneously predicting visual attentions for correct and incorrect answers:

python main_corr.py --mode train --img_dir $IMG_DIR --sal_dir ./data --que_file ./data_info/question_info.json --word2idx ./data_info/word2idx.json --checkpoint $CHECKPOINT_DIR --split_info ./data_info/split_info.json

To evaluate the performance on the test set, simply set --mode eval.

Attention Prediction Regardless of Correctness

The model is initialized with the weights previously trained for correctness-aware attention prediction, please follow the instruction above to fully train the model. After that, copy the weights to a new checkpoint directory:

cp $CHECKPOINT_SOURCE/model_best.pth $CHECKPOINT_TARGET/pretrained.pth

Then the training process can be called:

python main_agg.py --mode train --img_dir $IMG_DIR --sal_dir ./data --que_file ./data_info/question_info.json --word2idx ./data_info/word2idx.json --checkpoint $CHECKPOINT_DIR --split_info ./data_info/split_info.json

The evaluation process is the same as correctness-aware attention prediction.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
model		model
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main_agg.py		main_agg.py
main_corr.py		main_corr.py
process_question.py		process_question.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IQVA: Immersive Question-directed Visual Attention

Reference

Disclaimer

Requirements

IQVA Dataset

Data Pre-processing

Correctness-aware Attention Prediction

Attention Prediction Regardless of Correctness

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

mjiang/iqva

Folders and files

Latest commit

History

Repository files navigation

IQVA: Immersive Question-directed Visual Attention

Reference

Disclaimer

Requirements

IQVA Dataset

Data Pre-processing

Correctness-aware Attention Prediction

Attention Prediction Regardless of Correctness

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages