@yucornetto @TACJu
hi, nice work. I think that TiTok-VQ-BL-128 is trained by TA-TiTok group as it doesn't need MASKGIT as pixel decoder. However, i find it hard to reproduce the rFID of 1.49 . My current result is 5 using the modified config from TA-TiTok-bl-128 , excluding text related settings. Can you please kindly shed some light on how to reproduce this version?
The current main differences between the official setting and our setting are:
- TA-TiTok uses DataComp with aspect_ratio_filtering , while i use ImageNet-1k without the filtering
- TA-TiTok uses a global batchsize of 1024 (4 * 8 * 32) ,while i use 512 (64*8)\
Do you think the gap is mainly because of the data scale ,batchsize or other issues?
Lastly, in your paper, TiTok-BB-64 VQ can get an rFID of 2.43. I think this version may use the same setting as TiTok-BL-128-VQ . Can you share some information on this config?
Thanks for your patience and looking forward to your reply!
@yucornetto @TACJu
hi, nice work. I think that TiTok-VQ-BL-128 is trained by TA-TiTok group as it doesn't need MASKGIT as pixel decoder. However, i find it hard to reproduce the rFID of 1.49 . My current result is 5 using the modified config from TA-TiTok-bl-128 , excluding text related settings. Can you please kindly shed some light on how to reproduce this version?
The current main differences between the official setting and our setting are:
Do you think the gap is mainly because of the data scale ,batchsize or other issues?
Lastly, in your paper, TiTok-BB-64 VQ can get an rFID of 2.43. I think this version may use the same setting as TiTok-BL-128-VQ . Can you share some information on this config?
Thanks for your patience and looking forward to your reply!