Skip to content

FLOPS count vs FID #13

@arijit-hub

Description

@arijit-hub

Hi,

Congratulations on the amazing work.

I had a small curiousity question: For the velocity decoder, in AdaLN modulation, instead of a single token, now you have K tokens (with K=256 for 256x256 image with patch size=2). As such, the adaLN_modulation linear layer, which was previously just computing the scale and shift of 1 token, now needs to compute the scale and shift of K tokens. I assume this grows the flops by K times. So for a 256x256 image, this would grow by 256 times for the layers which are considered for the velocity decoder. So I was wondering if you have some scores that show flops vs FID w.r.t. SiT as a baseline.

Thanks a lot for the cool work, and am really eager to know your thoughts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions