Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Image features in polyencoder #2412

Merged
merged 115 commits into from
Mar 11, 2020
Merged

Image features in polyencoder #2412

merged 115 commits into from
Mar 11, 2020

Conversation

EricMichaelSmith
Copy link
Contributor

@EricMichaelSmith EricMichaelSmith commented Feb 18, 2020

Patch description
Enable training the polyencoder on datasets that have image features (in batch.image). These image features, once encoded, can be combined with the context encoding by addition, prepending, or postpending. If image features are not included with any element of a batch, just use a tensor of zeros in its place.

Removing unused code:

  • The 'none_with_pos_embs' reduction type from the Transformer code
  • The --polyencoder-attention-keys flag from the polyencoder. This means that we don't have to pass the positional embedding around the polyencoder anymore

Also, add two teachers used in the BlendedSkillTalk paper: one used to add WoW topics to ConvAI2 data and one used to add personas to WoW data. I use these when evaluating my polyencoder models trained on ImageChat. (I can move these into a separate PR if it's annoying to include them here :) )

Testing steps
Example of a command to train the polyencoder on ImageChat and BlendedSkillTalk:

python -u examples/train_model.py -veps 0.5 -vme 8000 --activation gelu --attention-dropout 0.3 --batchsize 16 --eval-batchsize 16 --candidates batch --data-parallel True --dict-endtoken __start__ --dict-file ./data/models/pretrained_transformers/model_bi.dict --dict-lower True --dict-tokenizer bpe --dropout 0.05 --embedding-size 768 --embeddings-scale False --ffn-size 3072 --fp16 True --history-size 20 --init-model /checkpoint/marywilliamson/all_in_one_dialogue/scripts/multitask_ft_aio_20191205/001/63a_jobid=15/model --label-truncate 72 --learn-embeddings True --learn-positional-embeddings True --log_every_n_secs 20 -lr 3e-06 --lr-scheduler reduceonplateau --lr-scheduler-decay 0.3 --lr-scheduler-patience 1 --max_train_time 200000 --model transformer/image_polyencoder --multitask-weights 1,0.33 --n-heads 12 --n-layers 12 --n-positions 1024 --n-segments 2 --num-epochs 100.0 --optimizer adamax --output-scaling 0.04 --poly-attention-type basic --poly-n-codes 300 --polyencoder-type n_first --reduction-type mean --relu-dropout 0.1 --save-after-valid True --share-encoders False --text-truncate 360 --validation-metric accuracy --validation-metric-mode max --variant xlm --warmup_updates 100 -t internal:blended_skill_talk,internal:comment_battle:imageDialog --image-mode-internal uru --swap-in-ccby-images True --image-encoder-num-layers 1 --image-features-dim 2048 --image-combination-mode add --model-file /checkpoint/ems/all_in_one_dialogue/scripts/s2020_02_12__imagechat_with_polyencoder/02_multitask/000/5e1cd_jobid=1/model

Example of a command to evaluate a polyencoder model on ImageChat and BlendedSkillTask:

python examples/eval_model.py --activation gelu --attention-dropout 0.1 --batchsize 16 --candidates inline --data-parallel True --dict-endtoken __start__ --dict-file ./data/models/pretrained_transformers/model_bi.dict --dict-lower True --dict-tokenizer bpe --dropout 0.15 --embedding-size 768 --embeddings-scale False --ffn-size 3072 --fp16 True --history-size 20 --init-model zoo:pretrained_transformers/poly_model_huge_reddit/model --label-truncate 72 --learn-embeddings True --learn-positional-embeddings True --log_every_n_secs 20 -lr 1e-05 --lr-scheduler reduceonplateau --lr-scheduler-decay 0.2 --lr-scheduler-patience 1 --model transformer/image_polyencoder --multitask-weights 1,2,1,3,100 --n-heads 12 --n-layers 12 --n-positions 1024 --n-segments 2 --optimizer adamax --output-scaling 0.045 --poly-attention-type basic --poly-n-codes 300 --polyencoder-type n_first --reduction-type max --relu-dropout 0.0 --share-encoders False --text-truncate 360 --variant xlm --warmup_updates 100 -t internal:blended_skill_talk,internal:comment_battle:imageDialog --image-mode-internal uru --swap-in-ccby-images True --image-encoder-num-layers 1 --image-features-dim 2048 --image-combination-mode prepend --model-file /checkpoint/ems/all_in_one_dialogue/scripts/s2020_02_12__imagechat_with_polyencoder/02_multitask/003/9df_jobid=22/model

Hits@1,100 results: BST: 76.20%, ImageChat: 46.75%

Copy link
Contributor

@klshuster klshuster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is good to merge for now (assuming my comments are addressed 😄), and we can revisit if we feel the model needs some changes

might be good for @stephenroller to give a quick glance, more so to weigh in on directory structure (i.e. where to put the agent files)

@@ -96,6 +96,7 @@ def __init__(
image_encoder_num_layers=1,
image_features_dim=2048,
image_combination_mode='append',
n_image_tokens=1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't end up seeing any gain from varying this I would say we can just keep it at 1 and reduce some unnecessary complexity later

parlai/agents/image_seq2seq/modules.py Outdated Show resolved Hide resolved
@@ -0,0 +1,184 @@
#!/usr/bin/env python3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm i think you're onto something, especially since they both use the ContextWithImageEncoder.

Not sure what the right move is here - perhaps combine in one folder multimodal/image_seq2seq and multimodal/image_polyencoder, though this is a bit opaque.

I would be ok with leaving as is unless @stephenroller feels strongly

parlai/agents/transformer/image_polyencoder.py Outdated Show resolved Hide resolved
'--image-combination-mode',
type=str,
default='prepend',
choices=['add', 'append', 'prepend'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once we have a solid grasp of which method is best, let's add a recommended='<method>' flag here (could just mark as TODO for now)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TODO for this

"""
return ImagePolyencoderModule(self.opt, self.dict, self.NULL_IDX)

def batchify_image_features(self, batch: Batch) -> Batch:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you maybe elaborate a bit about how this differs from the batchify_image_features in ImageSeq2seq? if it's not too much different, it might be good to include in the TorchImageAgent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've added a note about this. Yes, I agree that it'd be good to test out whether ImageSeq2seq really requires a separate batchify method now. I've created an issue for this: #2461


return batch

def _get_batch_size(self, batch) -> int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i have a fix for this in #2371 but fine for now

"""
Override to account for weights used for image features.
"""
for tensor in ['dummy_image_enc', 'ones_mask']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe it is

self._process_image_features() will likely be useful for this.
"""
raise NotImplementedError(
'Subclasses must implement method for batching images!'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there not any shared functionality that could go here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps - that's what I've created #2461 to solve. I can try to look into unifying the batchify_image_features() methods now if it's really important, but given that there may be image_seq2seq edge cases that we have to account for, my preference would be to make this a separate PR

@testing_utils.retry(ntries=3)
@testing_utils.skipUnlessTorch
@testing_utils.skipUnlessGPU
def test_image_task(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@stephenroller
Copy link
Contributor

There's a large number of surprising mypy errors. If you really think they are noise, then GA, but some of them look legit (e.g. one type is tensor instead of Tensor). They are enough that they could easily be catching a bug.

@EricMichaelSmith
Copy link
Contributor Author

(Also I wouldn't mind seeing this thing have some tests!)

Added 3 tests in test_transformers.py analogous to those in test_image_seq2seq.py

Copy link
Contributor

@stephenroller stephenroller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good. We can nit forever, no reason to hold this up.



TOKEN_IMAGE = '__image__'
TOKEN_NO_IMAGE = '__no_image__'


class ImageSeq2seqAgent(TransformerGeneratorAgent):
class ImageSeq2seqAgent(TransformerGeneratorAgent, TorchImageAgent):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, the ordering of the mixin determines the order of supers().

self.assertGreater(
valid['accuracy'],
0.5,
f'ImagePolyencoderAgent val-set accuracy on a trivally simple task was {valid["accuracy"].value():0.2f}.',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do assert valid['accuracy'] > 0.5 now, and it will print very cleanly with like, local variables and such.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this is new since we switched to pytest)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - changed

@EricMichaelSmith
Copy link
Contributor Author

There's a large number of surprising mypy errors. If you really think they are noise, then GA, but some of them look legit (e.g. one type is tensor instead of Tensor). They are enough that they could easily be catching a bug.

Yes - went through these one by one and fixed all of the ones that I don't think are noise

@EricMichaelSmith EricMichaelSmith merged commit af10ca5 into master Mar 11, 2020
@EricMichaelSmith EricMichaelSmith deleted the polyencoder-with-image branch March 11, 2020 19:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants