You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: examples/advanced_diffusion_training/README.md
+27-2
Original file line number
Diff line number
Diff line change
@@ -80,8 +80,7 @@ To do so, just specify `--train_text_encoder_ti` while launching training (for r
80
80
Please keep the following points in mind:
81
81
82
82
* SDXL has two text encoders. So, we fine-tune both using LoRA.
83
-
* When not fine-tuning the text encoders, we ALWAYS precompute the text embeddings to save memoםהקרry.
84
-
83
+
* When not fine-tuning the text encoders, we ALWAYS precompute the text embeddings to save memory.
85
84
86
85
### 3D icon example
87
86
@@ -234,6 +233,32 @@ In ComfyUI we will load a LoRA and a textual embedding at the same time.
234
233
235
234
SDXL's VAE is known to suffer from numerical instability issues. This is why we also expose a CLI argument namely `--pretrained_vae_model_name_or_path` that lets you specify the location of a better VAE (such as [this one](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix)).
236
235
236
+
### DoRA training
237
+
The advanced script now supports DoRA training too!
238
+
> Proposed in [DoRA: Weight-Decomposed Low-Rank Adaptation](https://arxiv.org/abs/2402.09353),
239
+
**DoRA** is very similar to LoRA, except it decomposes the pre-trained weight into two components, **magnitude** and **direction** and employs LoRA for _directional_ updates to efficiently minimize the number of trainable parameters.
240
+
The authors found that by using DoRA, both the learning capacity and training stability of LoRA are enhanced without any additional overhead during inference.
241
+
242
+
> [!NOTE]
243
+
> 💡DoRA training is still _experimental_
244
+
> and is likely to require different hyperparameter values to perform best compared to a LoRA.
245
+
> Specifically, we've noticed 2 differences to take into account your training:
246
+
> 1.**LoRA seem to converge faster than DoRA** (so a set of parameters that may lead to overfitting when training a LoRA may be working well for a DoRA)
247
+
> 2.**DoRA quality superior to LoRA especially in lower ranks** the difference in quality of DoRA of rank 8 and LoRA of rank 8 appears to be more significant than when training ranks of 32 or 64 for example.
248
+
> This is also aligned with some of the quantitative analysis shown in the paper.
249
+
250
+
**Usage**
251
+
1. To use DoRA you need to install `peft` from main:
0 commit comments