Image features in polyencoder #2412

EricMichaelSmith · 2020-02-18T22:01:32Z

Patch description
Enable training the polyencoder on datasets that have image features (in batch.image). These image features, once encoded, can be combined with the context encoding by addition, prepending, or postpending. If image features are not included with any element of a batch, just use a tensor of zeros in its place.

Removing unused code:

The 'none_with_pos_embs' reduction type from the Transformer code
The --polyencoder-attention-keys flag from the polyencoder. This means that we don't have to pass the positional embedding around the polyencoder anymore

Also, add two teachers used in the BlendedSkillTalk paper: one used to add WoW topics to ConvAI2 data and one used to add personas to WoW data. I use these when evaluating my polyencoder models trained on ImageChat. (I can move these into a separate PR if it's annoying to include them here :) )

Testing steps
Example of a command to train the polyencoder on ImageChat and BlendedSkillTalk:

python -u examples/train_model.py -veps 0.5 -vme 8000 --activation gelu --attention-dropout 0.3 --batchsize 16 --eval-batchsize 16 --candidates batch --data-parallel True --dict-endtoken __start__ --dict-file ./data/models/pretrained_transformers/model_bi.dict --dict-lower True --dict-tokenizer bpe --dropout 0.05 --embedding-size 768 --embeddings-scale False --ffn-size 3072 --fp16 True --history-size 20 --init-model /checkpoint/marywilliamson/all_in_one_dialogue/scripts/multitask_ft_aio_20191205/001/63a_jobid=15/model --label-truncate 72 --learn-embeddings True --learn-positional-embeddings True --log_every_n_secs 20 -lr 3e-06 --lr-scheduler reduceonplateau --lr-scheduler-decay 0.3 --lr-scheduler-patience 1 --max_train_time 200000 --model transformer/image_polyencoder --multitask-weights 1,0.33 --n-heads 12 --n-layers 12 --n-positions 1024 --n-segments 2 --num-epochs 100.0 --optimizer adamax --output-scaling 0.04 --poly-attention-type basic --poly-n-codes 300 --polyencoder-type n_first --reduction-type mean --relu-dropout 0.1 --save-after-valid True --share-encoders False --text-truncate 360 --validation-metric accuracy --validation-metric-mode max --variant xlm --warmup_updates 100 -t internal:blended_skill_talk,internal:comment_battle:imageDialog --image-mode-internal uru --swap-in-ccby-images True --image-encoder-num-layers 1 --image-features-dim 2048 --image-combination-mode add --model-file /checkpoint/ems/all_in_one_dialogue/scripts/s2020_02_12__imagechat_with_polyencoder/02_multitask/000/5e1cd_jobid=1/model

Example of a command to evaluate a polyencoder model on ImageChat and BlendedSkillTask:

python examples/eval_model.py --activation gelu --attention-dropout 0.1 --batchsize 16 --candidates inline --data-parallel True --dict-endtoken __start__ --dict-file ./data/models/pretrained_transformers/model_bi.dict --dict-lower True --dict-tokenizer bpe --dropout 0.15 --embedding-size 768 --embeddings-scale False --ffn-size 3072 --fp16 True --history-size 20 --init-model zoo:pretrained_transformers/poly_model_huge_reddit/model --label-truncate 72 --learn-embeddings True --learn-positional-embeddings True --log_every_n_secs 20 -lr 1e-05 --lr-scheduler reduceonplateau --lr-scheduler-decay 0.2 --lr-scheduler-patience 1 --model transformer/image_polyencoder --multitask-weights 1,2,1,3,100 --n-heads 12 --n-layers 12 --n-positions 1024 --n-segments 2 --optimizer adamax --output-scaling 0.045 --poly-attention-type basic --poly-n-codes 300 --polyencoder-type n_first --reduction-type max --relu-dropout 0.0 --share-encoders False --text-truncate 360 --variant xlm --warmup_updates 100 -t internal:blended_skill_talk,internal:comment_battle:imageDialog --image-mode-internal uru --swap-in-ccby-images True --image-encoder-num-layers 1 --image-features-dim 2048 --image-combination-mode prepend --model-file /checkpoint/ems/all_in_one_dialogue/scripts/s2020_02_12__imagechat_with_polyencoder/02_multitask/003/9df_jobid=22/model

Hits@1,100 results: BST: 76.20%, ImageChat: 46.75%

…ParlAI into polyencoder-with-image

klshuster

i think this is good to merge for now (assuming my comments are addressed 😄), and we can revisit if we feel the model needs some changes

might be good for @stephenroller to give a quick glance, more so to weigh in on directory structure (i.e. where to put the agent files)

klshuster · 2020-03-06T21:12:00Z

parlai/agents/image_seq2seq/modules.py

@@ -96,6 +96,7 @@ def __init__(
        image_encoder_num_layers=1,
        image_features_dim=2048,
        image_combination_mode='append',
+        n_image_tokens=1,


if we don't end up seeing any gain from varying this I would say we can just keep it at 1 and reduce some unnecessary complexity later

parlai/agents/image_seq2seq/modules.py

klshuster · 2020-03-06T21:21:18Z