Skip to content

difficulty & problems in reproducing TiTok-VQ-BL128 #114

@dzj441

Description

@dzj441

@yucornetto @TACJu
hi, nice work. I think that TiTok-VQ-BL-128 is trained by TA-TiTok group as it doesn't need MASKGIT as pixel decoder. However, i find it hard to reproduce the rFID of 1.49 . My current result is 5 using the modified config from TA-TiTok-bl-128 , excluding text related settings. Can you please kindly shed some light on how to reproduce this version?

The current main differences between the official setting and our setting are:

  1. TA-TiTok uses DataComp with aspect_ratio_filtering , while i use ImageNet-1k without the filtering
  2. TA-TiTok uses a global batchsize of 1024 (4 * 8 * 32) ,while i use 512 (64*8)\

Do you think the gap is mainly because of the data scale ,batchsize or other issues?

Image

Lastly, in your paper, TiTok-BB-64 VQ can get an rFID of 2.43. I think this version may use the same setting as TiTok-BL-128-VQ . Can you share some information on this config?

Thanks for your patience and looking forward to your reply!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions