Skip to content

encoder.embed_tokens.weight is smaller than the size of the TextTokenizer vocabulary #93

Open
@adymaharana

Description

@adymaharana

Hi,

Thank you for this tremendously useful codebase! I am playing around with extending the TextTokenizer vocabulary and found out that the size of the text embeddings i.e. min_dalle.encoder.embed_tokens.weight.shape[0] is smaller than the size of the vocabulary i.e. len(tokenizer.subword_from_tokens). Here's the code I am using to get those numbers.

    from min_dalle import MinDalle
    model = MinDalle(
        models_root='./pretrained',
        dtype=torch.float32,
        device='cuda',
        is_mega=False,
        is_reusable=True
    )
    print(model.encoder.embed_tokens.weight.shape, len(model.tokenizer.token_from_subword))

The output is as follow:

torch.Size([50264, 1024]) 50265

In case of DALL-E Mega, the embeddings are larger than the vocabulary size:

torch.Size([50272, 2048]) 50265

Practically, these discrepancies can be worked with by bounding the text tokens, so I am not too concerned about it. But just wanted to make it known that there's a potential issue. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions