Please refer to the original README.md to set up the environment.
Download the foreground bbox annotations and the annotations for all the visible objects and place them at xxx
The following commands are used in our study:
We generate ResNet-152 features with Caffe. To replicate our approach, first download and save some Caffe ResNet-152 weights into the models
directory. We experiment with weights pretrained on ImageNet for EnvDrop, FAST and PREVALENT, and also weights finetuned on the Places365 dataset for Recurrent-VLN-BERT.
We use the following commands to generate the precompute image features for environment ablations:
# Install caffe-gpu
conda install caffe-gpu
CUDA_VISIBLE_DEVICES=0 python scripts/generate_img_features.py --image_feature [IMAGE_FEATURE] --mode [MODE]
IMAGE_FEATURE
:imagenet
for ImageNet ResNet-152places365
for Places365 ResNet-152
MODE
:all_visible
: mask all visible objects in the environmentforeground
: mask only foreground objectsforeground_controlled_trial
: the controlled trial for foreground maskingflip
: horizontally flipping the image at each viewpoint
Alternatively, skip the generation and run the following script to download and extract our generated tsv files into ~/Diagnose_VLN/r2r/data/img_features/
. The compressed file would be ~28.3G, and contains the ImageNet and Places365 ResNet-152 features in the above listed ablation settings.
cd Diagnose_VLN/
python data_processing/download_data.py --download_image_features --image_fearture_dataset r2r
The CLIP-ViT features are used for CLIP-ViL-VLN and VLN-HAMT.
We use the following commands to generate the precompute image features for environment ablations:
CUDA_VISIBLE_DEVICES=0 python scripts/generate_clip_img_features.py --mode [MODE]
MODE
: choose fromall_visible
/foreground
/foreground_controlled_trial
/flip
Alternatively, skip the generation and run the following script to download and extract our generated tsv files into ~/Diagnose_VLN/rxr/data/img_features/
. The compressed file would be ~2.4G, and contains the CLIP ViT-B/32 features in the above listed ablation settings.
cd Diagnose_VLN/
python data_processing/download_data.py --download_image_features --image_fearture_dataset rxr