Skip to content

Commit 4ac7d3e

Browse files
authored
[doc] polish diffusion README (#1840)
1 parent 9d3124a commit 4ac7d3e

File tree

3 files changed

+27
-158
lines changed

3 files changed

+27
-158
lines changed
-3.82 MB
Binary file not shown.

examples/images/diffusion/README.md

Lines changed: 27 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,27 @@
1-
# ColoDiffusion
2-
*[ColoDiffusion](https://github.com/hpcaitech/ColoDiffusion) is a Faster Train implementation of the model [stable-diffusion](https://github.com/CompVis/stable-diffusion) from [Stability AI](https://stability.ai/)*
1+
# Stable Diffusion with Colossal-AI
2+
*[Colosssal-AI](https://github.com/hpcaitech/ColossalAI) provides a faster and lower cost solution for pretraining and
3+
fine-tuning for AIGC (AI-Generated Content) applications such as the model [stable-diffusion](https://github.com/CompVis/stable-diffusion) from [Stability AI](https://stability.ai/).*
34

4-
We take advantage of Colosssal-AI to exploit multiple optimization strategies
5+
We take advantage of [Colosssal-AI](https://github.com/hpcaitech/ColossalAI) to exploit multiple optimization strategies
56
, e.g. data parallelism, tensor parallelism, mixed precision & ZeRO, to scale the training to multiple GPUs.
67

7-
8-
![](./Merged-0001.png)
9-
10-
[Stable Diffusion](#stable-diffusion-v1) is a latent text-to-image diffusion
8+
## Stable Diffusion
9+
[Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion) is a latent text-to-image diffusion
1110
model.
1211
Thanks to a generous compute donation from [Stability AI](https://stability.ai/) and support from [LAION](https://laion.ai/), we were able to train a Latent Diffusion Model on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database.
1312
Similar to Google's [Imagen](https://arxiv.org/abs/2205.11487),
1413
this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts.
15-
With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.
16-
See [this section](#stable-diffusion-v1) below and the [model card](https://huggingface.co/CompVis/stable-diffusion).
1714

18-
15+
<p id="diffusion_train" align="center">
16+
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/diffusion_train.png" width=800/>
17+
</p>
18+
19+
[Stable Diffusion with Colossal-AI](https://github.com/hpcaitech/ColossalAI/tree/main/examples/images/diffusion) provides **6.5x faster training and pretraining cost saving, the hardware cost of fine-tuning can be almost 7X cheaper** (from RTX3090/4090 24GB to RTX3050/2070 8GB).
20+
21+
<p id="diffusion_demo" align="center">
22+
<img src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/colossalai/img/diffusion_demo.png" width=800/>
23+
</p>
24+
1925
## Requirements
2026
A suitable [conda](https://conda.io/) environment named `ldm` can be created
2127
and activated with:
@@ -33,15 +39,15 @@ pip install transformers==4.19.2 diffusers invisible-watermark
3339
pip install -e .
3440
```
3541

36-
### Install ColossalAI
42+
### Install Colossal-AI
3743

3844
```
3945
git clone https://github.com/hpcaitech/ColossalAI.git
4046
git checkout v0.1.10
4147
pip install .
4248
```
4349

44-
### Install colossalai lightning
50+
### Install Colossal-AI [Lightning](https://github.com/Lightning-AI/lightning)
4551
```
4652
git clone -b colossalai https://github.com/Fazziekey/lightning.git
4753
pip install .
@@ -74,16 +80,23 @@ you can change the trainging config in the yaml file
7480
## Comments
7581

7682
- Our codebase for the diffusion models builds heavily on [OpenAI's ADM codebase](https://github.com/openai/guided-diffusion)
77-
and [https://github.com/lucidrains/denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch).
83+
, [https://github.com/lucidrains/denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch),
84+
[Stable Diffusion](https://github.com/CompVis/stable-diffusion) and [Hugging Face](https://huggingface.co/CompVis/stable-diffusion).
7885
Thanks for open-sourcing!
7986

8087
- The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories).
8188

82-
- the implementation of [flash attention](https://github.com/HazyResearch/flash-attention) is from [HazyResearch](https://github.com/HazyResearch)
89+
- The implementation of [flash attention](https://github.com/HazyResearch/flash-attention) is from [HazyResearch](https://github.com/HazyResearch).
8390

8491
## BibTeX
8592

8693
```
94+
@article{bian2021colossal,
95+
title={Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training},
96+
author={Bian, Zhengda and Liu, Hongxin and Wang, Boxiang and Huang, Haichen and Li, Yongbin and Wang, Chuanrui and Cui, Fan and You, Yang},
97+
journal={arXiv preprint arXiv:2110.14883},
98+
year={2021}
99+
}
87100
@misc{rombach2021highresolution,
88101
title={High-Resolution Image Synthesis with Latent Diffusion Models},
89102
author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},

examples/images/diffusion/Stable_Diffusion_v1_Model_Card.md

Lines changed: 0 additions & 144 deletions
This file was deleted.

0 commit comments

Comments
 (0)