FLOPS count vs FID

Hi,

Congratulations on the amazing work.

I had a small curiousity question: For the velocity decoder, in AdaLN modulation, instead of a single token, now you have K tokens (with K=256 for 256x256 image with patch size=2). As such, the `adaLN_modulation` linear layer, which was previously just computing the scale and shift of 1 token, now needs to compute the scale and shift of K tokens. I assume this grows the flops by K times. So for a 256x256 image, this would grow by 256 times for the layers which are considered for the velocity decoder. So I was wondering if you have some scores that show flops vs FID w.r.t. SiT as a baseline.

Thanks a lot for the cool work, and am really eager to know your thoughts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FLOPS count vs FID #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FLOPS count vs FID #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions