Open
Description
Hi,
Thank you for this tremendously useful codebase! I am playing around with extending the TextTokenizer vocabulary and found out that the size of the text embeddings i.e. min_dalle.encoder.embed_tokens.weight.shape[0]
is smaller than the size of the vocabulary i.e. len(tokenizer.subword_from_tokens)
. Here's the code I am using to get those numbers.
from min_dalle import MinDalle
model = MinDalle(
models_root='./pretrained',
dtype=torch.float32,
device='cuda',
is_mega=False,
is_reusable=True
)
print(model.encoder.embed_tokens.weight.shape, len(model.tokenizer.token_from_subword))
The output is as follow:
torch.Size([50264, 1024]) 50265
In case of DALL-E Mega, the embeddings are larger than the vocabulary size:
torch.Size([50272, 2048]) 50265
Practically, these discrepancies can be worked with by bounding the text tokens, so I am not too concerned about it. But just wanted to make it known that there's a potential issue. Thanks!
Metadata
Metadata
Assignees
Labels
No labels