Implementing Mosaic Diffusion into Patch-Diffusion

[Patch Diffusion](https://github.com/Zhendong-Wang/Patch-Diffusion/tree/main) can x2 training speed even on 256x256 ImageNet. If this works out between Mosaic Diffusion and Patch-Diffusion, that is potentially x10 cumulative boost. The issue is both have different training script so I'm thinking of copy+paste the features of Mosaic into Patch-Diffusion. Right now, I only ask where to find relevant code for
1. xFormer+FlashAttention - I'll be trying to swap FlashAttention-1 for FlashAttention-2
2. Precomputing latent
3. Low Precision LayerNorm and GroupNorm
4. FSDP
5. Scheduled EMA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing Mosaic Diffusion into Patch-Diffusion #121

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implementing Mosaic Diffusion into Patch-Diffusion #121

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions