Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Image features in polyencoder #2412

Merged
merged 115 commits into from
Mar 11, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
115 commits
Select commit Hold shift + click to select a range
86b0ea3
Start to pass through image features
EricMichaelSmith Feb 12, 2020
109733c
Work on image features in polyencoder
EricMichaelSmith Feb 12, 2020
e4929bf
Work on revamping
EricMichaelSmith Feb 12, 2020
744f4ca
Simplify image encoding
EricMichaelSmith Feb 12, 2020
5584ac9
Tweax
EricMichaelSmith Feb 12, 2020
bf36218
Prepend mode
EricMichaelSmith Feb 12, 2020
fb9f38e
Add mode
EricMichaelSmith Feb 12, 2020
744a33a
Flag fix
Feb 12, 2020
85cc617
Minor fixes
Feb 12, 2020
0043ea4
Setup for merged encoder
EricMichaelSmith Feb 14, 2020
f3430d1
Merge branch 'polyencoder-with-image' of github.com:facebookresearch/…
EricMichaelSmith Feb 14, 2020
3990ebc
Getting new encoder to match old
EricMichaelSmith Feb 14, 2020
ba9188d
Remove unneeded things
EricMichaelSmith Feb 14, 2020
09a8b4c
Revert a bit
EricMichaelSmith Feb 14, 2020
92b73c9
Multiple trivial reduction types
EricMichaelSmith Feb 14, 2020
1dd717d
Fix pos
EricMichaelSmith Feb 14, 2020
e88c411
Add postpend mode
EricMichaelSmith Feb 14, 2020
bf32454
Change param name back
EricMichaelSmith Feb 14, 2020
c49b7cb
Rip out old code
EricMichaelSmith Feb 14, 2020
8627c72
New cat()
EricMichaelSmith Feb 14, 2020
29d390a
Smart add
EricMichaelSmith Feb 14, 2020
e7f4108
Fix forward()
EricMichaelSmith Feb 14, 2020
1896dd3
Comment
EricMichaelSmith Feb 14, 2020
54af92f
Minor
EricMichaelSmith Feb 14, 2020
5990018
Minor
EricMichaelSmith Feb 14, 2020
8d4fdfc
Finally, merge in new ContextWithImage module
EricMichaelSmith Feb 14, 2020
172ec95
Import bugs
Feb 14, 2020
431104d
Fixes
Feb 14, 2020
225f90d
Breakpoint
Feb 14, 2020
3ae3b08
Minor, but maybe will help test to pass
Feb 14, 2020
e7f1283
Remove breakpoint
Feb 14, 2020
fd5fd8d
Dealing with new params
Feb 15, 2020
dad3b36
DataParallel fix
Feb 15, 2020
1cc2a01
Register untrained params as buffers
Feb 15, 2020
b17cc0f
Add persona+WoW teachers
Feb 17, 2020
a00f01e
Ugh handle no image
Feb 17, 2020
730ad07
Fill in Nones
Feb 17, 2020
2a01122
Cleanup and enforce always having image features
Feb 18, 2020
929b20a
Let's be tough on tensors
Feb 18, 2020
b82678a
Always use dummy features
Feb 18, 2020
00f42d2
Typo
Feb 18, 2020
2d3bf27
Delete additional teachers for now
EricMichaelSmith Feb 19, 2020
6c146d6
postpend -> append
EricMichaelSmith Feb 19, 2020
f0c1e1f
Private methods
EricMichaelSmith Feb 19, 2020
f5d489c
Simplify
EricMichaelSmith Feb 19, 2020
431793f
Dump ImagePolyencoderAgent
EricMichaelSmith Feb 19, 2020
b954cdd
Split functionality
EricMichaelSmith Feb 19, 2020
a7748b4
Remove duplicated methods
EricMichaelSmith Feb 19, 2020
d0bd597
Split remaining Agent code
EricMichaelSmith Feb 19, 2020
bec4d63
Move Agent
EricMichaelSmith Feb 19, 2020
e6a5e11
Dump new model
EricMichaelSmith Feb 19, 2020
54f6377
Refer to new model
EricMichaelSmith Feb 19, 2020
0373ad5
Don't duplicate code
EricMichaelSmith Feb 19, 2020
b2cbefe
Start to separate model
EricMichaelSmith Feb 19, 2020
683ac81
Start to split .encode()
EricMichaelSmith Feb 19, 2020
d1b0bbc
More DRY
EricMichaelSmith Feb 19, 2020
355fcf6
End of splitting
EricMichaelSmith Feb 19, 2020
33fc1fb
Remove flag
EricMichaelSmith Feb 19, 2020
51f96a1
Tweak defaults
EricMichaelSmith Feb 19, 2020
bbd6460
Yes - added
EricMichaelSmith Feb 19, 2020
ca03db3
warn_once
EricMichaelSmith Feb 19, 2020
ddc6f02
Remove unneeded giant function
EricMichaelSmith Feb 19, 2020
1b266a7
Cosmetic
EricMichaelSmith Feb 19, 2020
4d2f125
Avoid a lot of repeated code
EricMichaelSmith Feb 19, 2020
5eb08fe
Revert old code
EricMichaelSmith Feb 19, 2020
e3b1f7a
Name change
EricMichaelSmith Feb 21, 2020
b2b8f72
Update docstring
EricMichaelSmith Feb 21, 2020
56608ab
Split into separate module
EricMichaelSmith Feb 21, 2020
9c35126
Fix kwarg issue
EricMichaelSmith Feb 21, 2020
8c8c799
Change a variable named "dict"
Feb 21, 2020
6ee97b3
Reduction type fix
Feb 21, 2020
3f9272f
No, have to use kwargs
Feb 21, 2020
b85b06f
Fix getattr() bug
Feb 23, 2020
8cf287b
Same trick for the context encoder
Feb 23, 2020
9e00af8
Use right encoder types
Feb 23, 2020
b98194c
Remove unneeded method
Feb 23, 2020
2d19a3a
Autoformat
Feb 24, 2020
5a95902
Multi-layer image encoder fix
Feb 24, 2020
139c6cf
TorchImageAgent
EricMichaelSmith Feb 24, 2020
32a1b26
Merge branch 'master' into polyencoder-with-image
Feb 24, 2020
bdf02d1
Fix adding args
Feb 24, 2020
de1b6b1
CPU fix
Feb 25, 2020
f9acfc9
Allow multiple image tokens
Feb 28, 2020
aed9983
Formatting
Feb 28, 2020
7cc5822
Name consistency
Feb 29, 2020
f329f56
Partial fix for token-dim issue
Feb 29, 2020
e65c426
Rest of fp16 function
Mar 1, 2020
b0931e3
Dump previous test
EricMichaelSmith Mar 3, 2020
8442ce4
Revamp basic task
EricMichaelSmith Mar 3, 2020
fa301e9
Set up image polyencoder test
Mar 4, 2020
da37eea
Change output message
Mar 4, 2020
2b1897a
Minor test tweaks
Mar 4, 2020
76ac292
Bad mode
Mar 4, 2020
4028578
Token fix
Mar 6, 2020
1ad5826
Fix
Mar 6, 2020
a0a70e5
Deal with no image features
Mar 7, 2020
61a9be6
Merge branch 'master' into polyencoder-with-image
EricMichaelSmith Mar 10, 2020
003f302
Segments fix
Mar 10, 2020
75c2b96
Fix test
Mar 10, 2020
81ceabf
autoformat
Mar 10, 2020
2213d42
Add commit string
EricMichaelSmith Mar 10, 2020
f5f58d8
Silently delete param
EricMichaelSmith Mar 10, 2020
adc95a1
Minor
EricMichaelSmith Mar 10, 2020
aa0bf20
mypy
EricMichaelSmith Mar 10, 2020
b08b979
Overhaul task
Mar 11, 2020
c1f549b
More tasks
Mar 11, 2020
eb54800
fp16 in new function
Mar 11, 2020
ff7dd95
Minor
Mar 11, 2020
0f6f4fc
Cosmetic
Mar 11, 2020
9737599
More description
Mar 11, 2020
9e5049b
[long] Assert
Mar 11, 2020
ed1d59c
mypy tweaks
Mar 11, 2020
7b4fd65
Make tests faster
Mar 11, 2020
6d0b994
Increase time image test has
Mar 11, 2020
b053eb2
Make test easier
Mar 11, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 25 additions & 15 deletions parlai/agents/image_seq2seq/modules.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ def __init__(
image_encoder_num_layers=1,
image_features_dim=2048,
image_combination_mode='append',
n_image_tokens=1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you use in practice? This looks like it could cause that multiple-of-8 issue I was mentioning

Copy link
Contributor Author

@EricMichaelSmith EricMichaelSmith Feb 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, my time steps definitely aren't multiples of 8. Given what you said, I'm sure that's causing a major slowdown - I first have to address some issues with the training itself not going as expected (due to image features not being given weight when ImageChat personalities are also present), but then I'll take a look at this :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't end up seeing any gain from varying this I would say we can just keep it at 1 and reduce some unnecessary complexity later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - so, preliminary sweeps show that it might be giving us a few-point boost, presumably due to the image features just taking up more space in the final encoding. I'd prefer to keep it in unless you all really hate it :)

):
"""
Override TransformerEncoder __init__.
Expand All @@ -107,6 +108,7 @@ def __init__(
self.n_img_layers = image_encoder_num_layers
self.img_dim = image_features_dim
self.image_combination_mode = image_combination_mode
self.n_image_tokens = n_image_tokens
reduction_type = None # Must pass back unreduced encoding and mask
EricMichaelSmith marked this conversation as resolved.
Show resolved Hide resolved
super().__init__(
n_heads=n_heads,
EricMichaelSmith marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -128,17 +130,22 @@ def __init__(
n_segments=n_segments,
output_scaling=output_scaling,
)
self.full_embedding_size = self.embedding_size * self.n_image_tokens
# Images will be embedded to this size, and then the embedding will be folded
# into however many tokens are needed
self._build_image_encoder()
self.register_buffer('dummy_image_enc', torch.zeros((self.embedding_size,)))
self.register_buffer('ones_mask', torch.ones(1).bool())
self.register_buffer(
'dummy_image_enc', torch.zeros((self.full_embedding_size,))
)
self.register_buffer('ones_mask', torch.ones(self.n_image_tokens).bool())

def _build_image_encoder(self):
image_layers = [nn.Linear(self.img_dim, self.embedding_size)]
image_layers = [nn.Linear(self.img_dim, self.full_embedding_size)]
for _ in range(self.n_img_layers - 1):
image_layers += [
nn.ReLU(),
nn.Dropout(p=self.dropout_frac),
nn.Linear(self.embedding_size, self.embedding_size),
nn.Linear(self.full_embedding_size, self.full_embedding_size),
]
self.image_encoder = nn.Sequential(*image_layers)

Expand All @@ -158,9 +165,9 @@ def encode_images(
:return:
a (image_encoded, image_mask) tuple, where:

- image_enc is a torch.Tensor of dim N x self.img_dim,
representing the encoded batch of images
- image_mask is a torch.Tensor of dim N x 1
- image_encoded is a torch.Tensor of dim N x self.n_image_tokens x
self.embedding_size, representing the encoded batch of images
- image_mask is a torch.Tensor of dim N x self.n_image_tokens
"""
image_masks = image_encoded = None
valid_inds = [
Expand All @@ -170,24 +177,27 @@ def encode_images(
]

if valid_inds:
image_masks = []
image_encoded = []
image_mask_list = []
image_encoded_list = []

valid_imgs = torch.stack([images[i] for i in valid_inds])
valid_img_enc = self.image_encoder(valid_imgs)

img_num = 0
for i in range(len(images)):
if i in valid_inds:
image_masks.append(self.ones_mask)
image_encoded.append(valid_img_enc[img_num, :])
image_mask_list.append(self.ones_mask)
image_encoded_list.append(valid_img_enc[img_num, :])
img_num += 1
else:
image_masks.append(~self.ones_mask)
image_encoded.append(self.dummy_image_enc)
image_mask_list.append(~self.ones_mask)
image_encoded_list.append(self.dummy_image_enc)

image_masks = torch.stack(image_masks)
image_encoded = torch.stack(image_encoded).unsqueeze(1)
image_masks = torch.stack(image_mask_list)
image_encoded = torch.stack(image_encoded_list).reshape(
(len(images), self.n_image_tokens, self.embedding_size)
)
assert image_masks.shape == image_encoded.shape[:2]

return image_encoded, image_masks

Expand Down
10 changes: 10 additions & 0 deletions parlai/agents/transformer/image_polyencoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,15 @@ def add_cmdline_args(cls, argparser):
choices=['add', 'append', 'prepend'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once we have a solid grasp of which method is best, let's add a recommended='<method>' flag here (could just mark as TODO for now)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TODO for this

help='How to combine image embedding (if used) with context embedding',
)
agent.add_argument(
'--n-image-tokens',
type=int,
default=1,
help=(
'Number of tokens that the image encoding will consist of (when adding '
'or prepending)'
),
)
agent.set_defaults(reduction_type=None)
# This agent doesn't support any encoder output reductions
return agent
Expand Down Expand Up @@ -162,6 +171,7 @@ def get_encoder(self, opt, dict_, null_idx, reduction_type, for_context: bool):
image_encoder_num_layers=opt['image_encoder_num_layers'],
image_features_dim=opt['image_features_dim'],
image_combination_mode=opt['image_combination_mode'],
n_image_tokens=opt['n_image_tokens'],
)
else:
# The candidate encoder is the same as for PolyEncoderModule
Expand Down