You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/images/diffusion/README.md
+27-14Lines changed: 27 additions & 14 deletions
Original file line number
Diff line number
Diff line change
@@ -1,21 +1,27 @@
1
-
# ColoDiffusion
2
-
*[ColoDiffusion](https://github.com/hpcaitech/ColoDiffusion) is a Faster Train implementation of the model [stable-diffusion](https://github.com/CompVis/stable-diffusion) from [Stability AI](https://stability.ai/)*
1
+
# Stable Diffusion with Colossal-AI
2
+
*[Colosssal-AI](https://github.com/hpcaitech/ColossalAI) provides a faster and lower cost solution for pretraining and
3
+
fine-tuning for AIGC (AI-Generated Content) applications such as the model [stable-diffusion](https://github.com/CompVis/stable-diffusion) from [Stability AI](https://stability.ai/).*
3
4
4
-
We take advantage of Colosssal-AI to exploit multiple optimization strategies
5
+
We take advantage of [Colosssal-AI](https://github.com/hpcaitech/ColossalAI) to exploit multiple optimization strategies
5
6
, e.g. data parallelism, tensor parallelism, mixed precision & ZeRO, to scale the training to multiple GPUs.
6
7
7
-
8
-

9
-
10
-
[Stable Diffusion](#stable-diffusion-v1) is a latent text-to-image diffusion
8
+
## Stable Diffusion
9
+
[Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion) is a latent text-to-image diffusion
11
10
model.
12
11
Thanks to a generous compute donation from [Stability AI](https://stability.ai/) and support from [LAION](https://laion.ai/), we were able to train a Latent Diffusion Model on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database.
13
12
Similar to Google's [Imagen](https://arxiv.org/abs/2205.11487),
14
13
this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts.
15
-
With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.
16
-
See [this section](#stable-diffusion-v1) below and the [model card](https://huggingface.co/CompVis/stable-diffusion).
[Stable Diffusion with Colossal-AI](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion) provides **6.5x faster training and pretraining cost saving, the hardware cost of fine-tuning can be almost 7X cheaper** (from RTX3090/4090 24GB to RTX3050/2070 8GB).
[Stable Diffusion](https://github.com/CompVis/stable-diffusion) and [Hugging Face](https://huggingface.co/CompVis/stable-diffusion).
78
85
Thanks for open-sourcing!
79
86
80
87
- The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories).
81
88
82
-
-the implementation of [flash attention](https://github.com/HazyResearch/flash-attention) is from [HazyResearch](https://github.com/HazyResearch)
89
+
-The implementation of [flash attention](https://github.com/HazyResearch/flash-attention) is from [HazyResearch](https://github.com/HazyResearch).
83
90
84
91
## BibTeX
85
92
86
93
```
94
+
@article{bian2021colossal,
95
+
title={Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training},
96
+
author={Bian, Zhengda and Liu, Hongxin and Wang, Boxiang and Huang, Haichen and Li, Yongbin and Wang, Chuanrui and Cui, Fan and You, Yang},
97
+
journal={arXiv preprint arXiv:2110.14883},
98
+
year={2021}
99
+
}
87
100
@misc{rombach2021highresolution,
88
101
title={High-Resolution Image Synthesis with Latent Diffusion Models},
89
102
author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
0 commit comments