missing   self.blocks  adaLN-Zero

     332 + # Zero-out adaLN modulation layers in DDT blocks (adaLN-Zero):
      333 + for block in self.blocks:
      334 +     nn.init.constant_(block.adaLN_modulation[-1].weight, 0)
      335 +     nn.init.constant_(block.adaLN_modulation[-1].bias, 0)


I noticed that unlike the standard DiT architecture, which uses zero-initialization in every transformer block (e.g., via the adaLN-Zero mechanism), your implementation only applies zero-initialization to the final projection head. Could you share the reasoning behind this design choice?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

missing self.blocks adaLN-Zero #14

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

missing self.blocks adaLN-Zero #14

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions