difficulty & problems  in reproducing TiTok-VQ-BL128

@yucornetto  @TACJu  \
hi, nice work. I think that TiTok-VQ-BL-128 is trained by TA-TiTok group as it doesn't need MASKGIT as pixel decoder.  However, i find it hard to reproduce the rFID of 1.49 . My  current result is 5 using the modified config from TA-TiTok-bl-128 , excluding text related settings. Can you please kindly shed some light on how to reproduce this version? 

The current main differences between the official setting and our setting are:
1. TA-TiTok uses DataComp with aspect_ratio_filtering , while i use ImageNet-1k without the filtering
2. TA-TiTok uses a global batchsize of  1024 (4 \* 8 \* 32) ,while i use 512 (64*8)\

Do you think the gap is mainly because of the data scale ,batchsize or other issues?

<img width="732" height="213" alt="Image" src="https://github.com/user-attachments/assets/1db67b4f-5e20-479b-b7e6-ac9fa9f2a9c5" />

Lastly, in your paper, TiTok-BB-64 VQ can get an rFID of  2.43. I think this version may use the same setting as TiTok-BL-128-VQ . Can you share some information on this config?

Thanks for your patience and looking forward to your reply!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

difficulty & problems in reproducing TiTok-VQ-BL128 #114

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

difficulty & problems in reproducing TiTok-VQ-BL128 #114

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions