Open
Description
Patch Diffusion can x2 training speed even on 256x256 ImageNet. If this works out between Mosaic Diffusion and Patch-Diffusion, that is potentially x10 cumulative boost. The issue is both have different training script so I'm thinking of copy+paste the features of Mosaic into Patch-Diffusion. Right now, I only ask where to find relevant code for
- xFormer+FlashAttention - I'll be trying to swap FlashAttention-1 for FlashAttention-2
- Precomputing latent
- Low Precision LayerNorm and GroupNorm
- FSDP
- Scheduled EMA
Metadata
Metadata
Assignees
Labels
No labels