-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Conversation
…ParlAI into polyencoder-with-image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this is good to merge for now (assuming my comments are addressed 😄), and we can revisit if we feel the model needs some changes
might be good for @stephenroller to give a quick glance, more so to weigh in on directory structure (i.e. where to put the agent files)
@@ -96,6 +96,7 @@ def __init__( | |||
image_encoder_num_layers=1, | |||
image_features_dim=2048, | |||
image_combination_mode='append', | |||
n_image_tokens=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we don't end up seeing any gain from varying this I would say we can just keep it at 1 and reduce some unnecessary complexity later
@@ -0,0 +1,184 @@ | |||
#!/usr/bin/env python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm i think you're onto something, especially since they both use the ContextWithImageEncoder
.
Not sure what the right move is here - perhaps combine in one folder multimodal/image_seq2seq
and multimodal/image_polyencoder
, though this is a bit opaque.
I would be ok with leaving as is unless @stephenroller feels strongly
'--image-combination-mode', | ||
type=str, | ||
default='prepend', | ||
choices=['add', 'append', 'prepend'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once we have a solid grasp of which method is best, let's add a recommended='<method>'
flag here (could just mark as TODO
for now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added TODO for this
""" | ||
return ImagePolyencoderModule(self.opt, self.dict, self.NULL_IDX) | ||
|
||
def batchify_image_features(self, batch: Batch) -> Batch: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you maybe elaborate a bit about how this differs from the batchify_image_features
in ImageSeq2seq
? if it's not too much different, it might be good to include in the TorchImageAgent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've added a note about this. Yes, I agree that it'd be good to test out whether ImageSeq2seq
really requires a separate batchify method now. I've created an issue for this: #2461
|
||
return batch | ||
|
||
def _get_batch_size(self, batch) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i have a fix for this in #2371 but fine for now
""" | ||
Override to account for weights used for image features. | ||
""" | ||
for tensor in ['dummy_image_enc', 'ones_mask']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe it is
self._process_image_features() will likely be useful for this. | ||
""" | ||
raise NotImplementedError( | ||
'Subclasses must implement method for batching images!' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there not any shared functionality that could go here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps - that's what I've created #2461 to solve. I can try to look into unifying the batchify_image_features()
methods now if it's really important, but given that there may be image_seq2seq edge cases that we have to account for, my preference would be to make this a separate PR
@testing_utils.retry(ntries=3) | ||
@testing_utils.skipUnlessTorch | ||
@testing_utils.skipUnlessGPU | ||
def test_image_task(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
There's a large number of surprising mypy errors. If you really think they are noise, then GA, but some of them look legit (e.g. one type is |
Added 3 tests in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems good. We can nit forever, no reason to hold this up.
|
||
|
||
TOKEN_IMAGE = '__image__' | ||
TOKEN_NO_IMAGE = '__no_image__' | ||
|
||
|
||
class ImageSeq2seqAgent(TransformerGeneratorAgent): | ||
class ImageSeq2seqAgent(TransformerGeneratorAgent, TorchImageAgent): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi, the ordering of the mixin determines the order of supers().
tests/test_transformers.py
Outdated
self.assertGreater( | ||
valid['accuracy'], | ||
0.5, | ||
f'ImagePolyencoderAgent val-set accuracy on a trivally simple task was {valid["accuracy"].value():0.2f}.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do assert valid['accuracy'] > 0.5
now, and it will print very cleanly with like, local variables and such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this is new since we switched to pytest)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - changed
Yes - went through these one by one and fixed all of the ones that I don't think are noise |
Patch description
Enable training the polyencoder on datasets that have image features (in
batch.image
). These image features, once encoded, can be combined with the context encoding by addition, prepending, or postpending. If image features are not included with any element of a batch, just use a tensor of zeros in its place.Removing unused code:
'none_with_pos_embs'
reduction type from the Transformer code--polyencoder-attention-keys
flag from the polyencoder. This means that we don't have to pass the positional embedding around the polyencoder anymoreAlso, add two teachers used in the BlendedSkillTalk paper: one used to add WoW topics to ConvAI2 data and one used to add personas to WoW data. I use these when evaluating my polyencoder models trained on ImageChat. (I can move these into a separate PR if it's annoying to include them here :) )
Testing steps
Example of a command to train the polyencoder on ImageChat and BlendedSkillTalk:
Example of a command to evaluate a polyencoder model on ImageChat and BlendedSkillTask:
Hits@1,100 results: BST: 76.20%, ImageChat: 46.75%