Hello, thanks for you great work!
I directly used the provided f16d32 VAVAE checkpoint (https://huggingface.co/hustvl/vavae-imagenet256-f16d32-dinov2/blob/main/vavae-imagenet256-f16d32-dinov2.pt) and train a LightningDiT-XL model using the provided config file (https://github.com/hustvl/LightningDiT/blob/main/configs/lightningdit_xl_vavae_f16d32.yaml) with 80K training steps on 8 H100 GPUs.
However, the FID-50K is only 20.65 without CFG and 42.57 with CFG scale 10.0.
I expect the FID results to be 8.22 (in Table 1) or 5.14 (in Table 3) or at least in the same level.
Is there anything I was wrong?
Thanks in advance!