How To Do Stable Diffusion LORA Training By Using Web UI On Different Models - Tested SD 1.5, SD 2.1 #304
FurkanGozukara
announced in
Tutorials
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
How To Do Stable Diffusion LORA Training By Using Web UI On Different Models - Tested SD 1.5, SD 2.1
Full tutorial: https://www.youtube.com/watch?v=mfaqqL5yOO4
Our Discord : https://discord.gg/HbqgGaZVmr. Ultimate guide to the LoRA training. If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰 https://www.patreon.com/SECourses
Playlist of Stable Diffusion Tutorials, #Automatic1111 and Google Colab Guides, DreamBooth, Textual Inversion / Embedding, #LoRA, AI Upscaling, Pix2Pix, Img2Img:
https://www.youtube.com/playlist?list=PL_pbwdIyffsmclLl0O144nQRnezKlNdx3
Welcome to the ultimate beginner's guide to training with #StableDiffusion models using Automatic1111 Web UI. In this video, we will walk you through the entire process of setting up and training a Stable Diffusion model, from installing the LoRA extension to preparing your training set and tuning your training parameters. We'll also cover advanced training options and show you how to generate new images using your trained model. By the end of this video, you'll have a solid understanding of how to use Stable Diffusion to train your own custom models and generate high-quality images.
You should watch these two videos prior to this one if you don't have sufficient knowledge about Stable Diffusion or Automatic1111 Web UI:
1 - Easiest Way to Install & Run Stable Diffusion Web UI on PC by Using Open Source Automatic Installer - https://youtu.be/AZg6vzWHOTA
2 - How to Use SD 2.1 & Custom Models on Google Colab for Training with Dreambooth & Image Generation - https://youtu.be/AZg6vzWHOTA
00:00:00 Introduction speech
00:01:07 How to install the LoRA extension to the Stable Diffusion Web UI
00:02:36 Preparation of training set images by properly sized cropping
00:02:54 How to crop images using Paint .NET, an open-source image editing software
00:05:02 What is Low-Rank Adaptation (LoRA)
00:05:35 Starting preparation for training using the DreamBooth tab - LoRA
00:06:50 Explanation of all training parameters, settings, and options
00:08:27 How many training steps equal one epoch
00:09:09 Save checkpoints frequency
00:09:48 Save a preview of training images after certain steps or epochs
00:10:04 What is batch size in training settings
00:11:56 Where to set LoRA training in SD Web UI
00:13:45 Explanation of Concepts tab in training section of SD Web UI
00:14:00 How to set the path for training images
00:14:28 Classification Dataset Directory
00:15:22 Training prompt - how to set what to teach the model
00:15:55 What is Class and Sample Image Prompt in SD training
00:17:57 What is Image Generation settings and why we need classification image generation in SD training
00:19:40 Starting the training process
00:21:03 How and why to tune your Class Prompt (generating generic training images)
00:22:39 Why we generate regularization generic images by class prompt
00:23:27 Recap of the setting up process for training parameters, options, and settings
00:29:23 How much GPU, CPU, and RAM the class regularization image generation uses
00:29:57 Training process starts after class image generation completed
00:30:04 Displaying the generated class regularization images folder for SD 2.1
00:30:31 The speed of the training process - how many seconds per iteration on an RTX 3060 GPU
00:31:19 Where LoRA training checkpoints (weights) are saved
00:32:36 Where training preview images are saved and our first training preview image
00:33:10 When we will decide to stop training
00:34:09 How to resume training after training has crashed or you close it down
00:36:49 Lifetime vs. session training steps
00:37:54 After 30 epochs, resembling images start to appear in the preview folder
00:38:19 The command line printed messages are incorrect in some cases
00:39:05 Training step speed, a certain number of seconds per iteration (IT)
00:39:44 How I'm picking a checkpoint to generate a full model .ckpt file
00:40:23 How to generate a full model .ckpt file from a LoRA checkpoint .pt file
00:41:17 Generated/saved file name is incorrect, but it is generated from the correct selected .pt file
00:42:01 Doing inference (generating new images) using the text2img tab with our newly trained and generated model
00:42:47 The results of SD 2.1 Version 768 pixel model after training with the LoRA method and teaching a human face
00:44:38 Setting up the training parameters/options for SD version 1.5 this time
00:48:35 Re-generating class regularization images since SD 1.5 uses 512 pixel resolution
00:49:11 Displaying the generated class regularization images folder for SD 1.5
00:50:16 Training of Stable Diffusion 1.5 using the LoRA methodology and teaching a face has been completed and the results are displayed
00:51:09 The inference (text2img) results with SD 1.5 training
00:51:19 You have to do more inference with LoRA since it has less precision than DreamBooth
00:51:39 How to give more attention/emphasis to certain keywords in the SD Web UI
00:52:51 How to generate more than 100 images
00:54:46 How to check PNG info to see used prompts and settings
00:55:24 How to upscale using AI models
00:56:12 Fixing face image quality, especially eyes, with GFPGAN visibility
00:56:32 How to batch post-process
00:57:00 Where batch-generated images are saved
Video Transcription
00:00:00 Greetings everyone. Welcome to the most beginner friendly guide for how to do training on Stable
00:00:06 Diffusion models by using Automatic1111 web UI. In this tutorial I will train portrait images of
00:00:12 my brother by using Low-Rank Adaptation, as known as LoRA training method on the Stable
00:00:18 Diffusion 2.1 768 pixels model. If you do not have prior knowledge, please watch these two
00:00:25 videos on our channel. On our channel, go to the playlist section and in here you see we
00:00:32 have Stable Diffusion DreamBooth playlist And in here first watch easiest way to install and run
00:00:41 Stable Diffusion web UI on PC. So this will teach you how to install web UI on PC and how to run it.
00:00:50 And then watch how to use Stable Diffusion version 2.1 and different models in the web UI. This will
00:00:57 teach you how to download and install different models and use them with the web UI. After that,
00:01:03 you are ready to watch this tutorial and follow me. To be able to train with LoRA,
00:01:10 you need to go to the extensions tab here and install DreamBooth extension,
00:01:14 check for updates. And if you don't know how to install from available, first go to available tab,
00:01:22 load from and in here search DreamBooth. Since I am currently hiding extension with installed
00:01:30 it is not showing. But when I disable it, you see DreamBooth is already installed but it has
00:01:36 updates. So I am going to update it. OK, so for updating, we just click apply and restart
00:01:42 UI and it updates. Now, when we check for updates, you see we have the latest version. And, as I
00:01:48 said, for installation, go to available. And when I check this installed, it shows the DreamBooth is
00:01:54 here and click install. OK, that's all it. After that, and after you restart your application,
00:02:00 you may need to do a full restart for DreamBooth tab to appear. You will get this tab. OK, and once
00:02:08 you are in here and from the models, you have the version 2.1. You are ready to follow me. You see,
00:02:18 current Stable Diffusion checkpoint is 2.1. OK, first of all, before starting our training,
00:02:24 we need to prepare our images. Since I am going to use 768 pixels version, I need to set my images
00:02:33 as 768 pixels. So my images are inside in this folder. I didn't still set their resolution. So
00:02:44 first I will show you how you can crop them with an open source, a free software: Paint .NET. Let
00:02:51 me show you Paint .NET. OK, this is the Paint .NET and you can download paint dot net from its
00:02:58 official website in here. It's an open source .NET based software. Alternatively, you can use this
00:03:04 website, which is free to resize and crop your images. But I prefer Paint .NET. I will show how
00:03:10 to crop one of the images. So I am going to drag and drop this image into here. Click open. And
00:03:16 in here you see there is rectangle select, and in here I click fixed ratio. I set it one hundred and
00:03:23 one hundred, like this. Then I am selecting the image like this as I want. I click ctrl-C to copy,
00:03:30 I click ctrl-R for resize. I am typing something smaller and clicking enter. Then I am clicking
00:03:38 ctrl-V expanding. Then I am clicking ctrl-R again to resize. And I am exactly resizing as 768
00:03:48 pixels. Then I save it with one hundred percent quality. You can also use PNG or JPG images.
00:03:58 Alternatively, you can use Birme dot net as well. However, you may not trust this website. It is up
00:04:06 to you. So I will select two images from here, upload them. And in here I am going to select
00:04:12 768 pixels like this: 768 pixels. OK, and you can just set where you want cut to be like this. Then
00:04:23 you need to click save as zip. It will download a zip like this. You can click it and you can then
00:04:31 extract them into your folder and overwrite existing files And they will be exactly as. Let me
00:04:39 show you 768 pixels. I will open with paint, not net, And you see they are 768 pixels. So this is
00:04:48 the way you need to prepare your images. OK, all images are cropped by 768 pixels and 768 pixels.
00:04:56 Now we are ready to do training. So go to our Stable Diffusion web UI. So you may wonder what
00:05:05 is LoRA? LoRA is a low-rank adaptation for faster text to image diffusion fine tuning. It uses both
00:05:12 UNET and CLIP. It is faster than DreamBooth. Also, its checkpoints are much smaller than the full
00:05:18 checkpoint of DreamBooth. When you do a checkpoint with DreamBooth, it generates full .ckpt file.
00:05:27 However, LoRA generates much smaller files. And when you are done,
00:05:32 you can generate the full checkpoint file. To do training we go to the DreamBooth tab here
00:05:38 and we first need to generate our model. I am going to use my brother. I need to select the
00:05:45 source checkpoint. I have selected version 2.1 like this: I am using the EMA version
00:05:51 and I am not going to click this. This is not necessary. OK, and just click Create button.
00:06:01 After you click Create button, you see it is downloading the necessary files from the internet
00:06:08 like this. So you need to wait this download. If you are not seeing anything on the web UI,
00:06:14 always check the running command line window to see what is happening like this:
00:06:22 OK, the model has been generated. You see checkpoint successfully extracted to Models
00:06:27 DreamBooth my brother working. We can also check it from our installed folder. Let's go to C drive
00:06:33 and I have installed in StableDiffusion web UI, in Models and in here in StableDiffusion. And now
00:06:42 no, not in StableDiffusion in DreamBooth folder, and in here you see there is working directory and
00:06:47 there is my brother directory, as you can see. OK, let's return back to our interface. In here
00:06:54 you see there is LoRA Weight. So this defines what percentage of LoRA Weight should be applied to the
00:07:01 UNET when training or creating a checkpoint, and it is same for Text Weight. Setting this as 1 may
00:07:09 cause overtraining, over tuning. However, since we are going to do generate our portrait images,
00:07:19 our own portrait images, and we are just teaching one face, this is fine for now. You can pick this
00:07:26 half model. It will enable FP16 Precision, which results in a smaller checkpoint with minimal loss
00:07:33 in quality. But we don't need this for LoRA, since the checkpoints are already low size.
00:07:39 And when you click this, checkpoints will be saved to a subdirectory in the selected checkpoints
00:07:44 folder. So I am going to click training wizard person. OK, and let's set up our parameters.
00:07:52 This is really important. So how many training steps we want to do for each image? How many
00:07:59 images do I have? I have images. Let me show once again: total: 16. OK, since I am going to compare
00:08:08 checkpoints quality, I am going to set this very high because I will early terminate the training
00:08:16 or I will decide whether I have trained enough or not. OK, I am setting max training steps as zero,
00:08:23 pause after an epochs zero, amount of time passed between epochs. By the way, one epoch equal to 16
00:08:31 steps because I have 16 images. OK, and I am not going to set any with pause between epochs. So
00:08:40 use lifetime steps, epochs when saving. Let's say you have stopped or paused your training
00:08:47 and then later at a time you continue it. So use lifetime means that it will consider your previous
00:08:55 training epochs steps as well. However, if you unclick this, it will use only this session of
00:09:02 training steps and epochs when saving. So I am just unclicking it. This is really important.
00:09:10 The save checkpoint frequency by n steps. OK, since I didn't click this, it will check by n
00:09:18 steps. Since I have 16 images. If I set this 16, it will save checkpoint after each epoch. OK,
00:09:30 if it is confusing for you, you can just click this and you can set this 10. So it will save
00:09:35 checkpoints every 10 epochs. In this case, since I have 16 images, it will be after 160 training
00:09:47 steps. OK, this is fine. I will also save a preview of the image after each checkpoint
00:09:53 so that I can decide whether that checkpoint is good or not. I will explain this in the video,
00:10:01 so don't worry about that, You will understand it. Batch size: OK, how many images to process
00:10:07 process at once per training step? We are going to process one image per training step and we will
00:10:16 do same for classifier regularization images to generate at once. If you have more than one GPU,
00:10:24 you can increase batch size to process them in parallel I suppose. Learning rate and
00:10:31 other rates. I am not going to touch them, but you can try to obtain better learning rates or
00:10:38 encoder rates. You can also scale the learning rate, but I am just leaving them as default. OK,
00:10:47 image processing: This is important. Since I am using 768 pixel version, I am setting it
00:10:55 as 768. However, if you use another version, like version 1.5, in that case you need to
00:11:04 use 512 pixels. So this resolution depends on your Stable Diffusion model, version and type.
00:11:13 I am not going to do any cropping since I have cropped. Apply horizontal flip. It means that
00:11:21 the images will be flipped as well, so it will add more variation to your images. You can set
00:11:28 this. Do we have pretrained VAE Name or Path? No, we don't have. We will use the base model vae.
00:11:35 When you watch my previous videos, you will learn what is vae and how to set them. OK, concept list:
00:11:42 I am not going to use any concept list as well, since I am just going to train for teaching one
00:11:50 portrait image. And advanced tab: OK, this is important. This is where we set
00:11:55 our training methodology. We are going to use LoRA methodology. So use 8bit Adam. This is,
00:12:03 enable this to save VRAM. If your graphic card VRAM is not much or let's say you
00:12:10 have encountered not enough VRAM problem while training, you can set this. I am going to set
00:12:16 this for now because I'm not sure how much VRAM it is going to take. And you can also set FP 16.
00:12:24 So, mixed precision. You probably want this to be FP16. If using Xformers. You definitely want this
00:12:32 to be FP 16. And if you have watched my previous videos, you know that we are already using
00:12:38 Xformers to speed up our inference and training. So I am going to set this. Memory Attention:
00:12:45 I am going to set this Xformers. My graphic card is RTX 3060 and it supports that. OK, Don't Cache
00:12:55 Latents: When I hover my mouse over that, you see, there appears a tooltip and explains to me
00:13:03 what does that checkbox is doing. When this box is checked, latents will not be cached. When latents
00:13:10 are not caged, you will save a bit of VRAM but train slightly slower. So for a lower VRAM usage,
00:13:17 I am checking this. Train Text Encode. Enabling this will provide better results and editability
00:13:24 but cost more VRAM. Yes, we are setting this. I am not changing any of default parameters
00:13:33 And I am not changing these parameters as well. So we are done with parameters tab. By the way,
00:13:41 I will make this arbitrarily high number because I will stop it myself. The training myself.
00:13:47 OK, concepts. This is really important. This is the part where we are setting what to teach. OK,
00:13:56 maximum training steps is minus one. It will never end. OK, data set directory. Path to
00:14:04 the directory with input images. So to get the director of my input images. I am right clicking
00:14:11 one of the image, right clicking properties, left clicking. And in here you see it shows location.
00:14:16 I am copying this like this. Alternatively, you can also click the search bar here and
00:14:24 select entire path and copy. And I am pasting it here. Classification data set directory. OK,
00:14:32 this is a path to directly with classification regularization images. So let's also set a path
00:14:39 for this to understand what is it. So, however, you shouldn't set them inside here because in that
00:14:47 case it is using, I think, all of the images in all of the folders. So let's say, brother
00:14:57 classification folder. OK, let's enter here. Copy the path like this. File words: OK,
00:15:08 we are not going to use any file words in this training because we are not training hypernet, or
00:15:16 let me show you what was it. It is embedding. Therefore, you can just leave this empty. But
00:15:24 prompts: this is where we need to enter a unique prompt to teach the face of or the other thing
00:15:33 that we want to teach to the model. So I will give it a unique name as my brother. OK, like this,
00:15:43 you can give any unique name. It should be unique enough that it won't be in the original training
00:15:51 data set. So to be sure, you can just also expand it like this. Class Prompt. Now this is important.
00:15:57 What am I teaching to the model? I am teaching face of a man. So I will say face of a man.
00:16:06 OK, like this. Classification image, negative prompt. OK, you may give a negative prompt to
00:16:16 generate better quality classification images. These images will be generated to improve your
00:16:23 training success. They will be automatically generated. So let's enter a good negative
00:16:28 prompt here. OK, I have a decent negative prompt I have prepared previously. I have used a chatGPT to
00:16:36 expand some of the famous negative prompts. I will put this into the comment section of the video.
00:16:44 So, don't worry, you will be able to copy and paste it. So I'm copying and pasted in here. So
00:16:51 I will explain to you what is this as well. So this class prompt and classification images what
00:16:58 are they? Sample Image Prompt. This is important Why? Because during the training we want to see
00:17:05 how the training is going on. And for that we will generate sample images. So the sample images
00:17:12 will be like this: First we will give the instance prompt so that we will be able to see the face of
00:17:22 the person we are teaching. Then you can append here some of the good keywords to obtain better
00:17:30 results. But if you just want to see how much the model has learned, you can leave this only
00:17:37 with the instance prompt. So you will get a better idea of how much the face has been learned by the
00:17:47 model. Sample prompt template. We don't need a prompt template right now. Sample image:
00:17:53 negative prompt. You can just copy and paste it in here as well. Okay, Image Generation. So
00:18:00 when doing training we will generate, let's say, generic images, generic face of a man,
00:18:08 images to make our training more generalized. Okay, to improve its success rate. So I will
00:18:20 generate 10 times of the my input like this. And these are for the other things that you need to
00:18:30 set for generating template images, we may say, or generic images, we may say. Number of samples
00:18:36 to generate one. Okay, it looks good. Okay, you can just enter 10 times of it. It's up to you.
00:18:43 And we are ready to do training. But I will also show you one other thing in settings. In settings
00:18:53 when you go to training section here, you can reduce VRAM usage by clicking this checkbox. Okay,
00:19:03 I think it will probably reduce your training speed. And you can also turn on pin memory for
00:19:10 data loader. Make training slightly faster but increase memory usage. So you may play with these
00:19:15 settings to obtain the best possible training speed. I have 12 GB VRAM. So I may open this,
00:19:28 but I won't open that. Also. Yeah, the others are just fine. Okay, and let's click training button,
00:19:39 since we are ready. Okay, when we click training button, first it will generate the generic
00:19:49 face of a man images. Why face of a man? Because we have entered our
00:19:58 class prompt as face of a man and where they will be saved. I think they will
00:20:02 save it in here brother classification. So it is starting to generate our generic face images to
00:20:13 add more variety to our data set. So you see, it is generating face of a man with the given
00:20:20 prompt I have. You can also improve this prompt with adding other, let's say, styling prompts,
00:20:29 more quality prompts, anything you want. If you watch my previous videos on the channel,
00:20:37 on the playlist, you will understand what I mean. This is actually same as doing inference
00:20:43 text2img from here to generate images. It is exactly doing that to improve variety of our
00:20:57 classification training. You see they are really bad quality right now,
00:21:04 So maybe we should tune our class prompt.
00:21:12 To do that I will just cancel the training with clicking cancel button. You see training
00:21:20 cancelled. And you see that these are the images it has generated. I will
00:21:27 modify. I have deleted all of them. I will modify the class prompt by adding some keywords. Okay, to
00:21:39 decide what to enter, I have moved to text2image tab and I have typed: portrait photo of a man,
00:21:46 HDR, 8K and sharp. The keywords that you will find from real photos of people. And I have entered my
00:21:56 negative prompt as well and this is the image that has been generated. It is pretty good. So I am
00:22:02 returning back to my DreamBooth tab and in here I am changing to my the class prompt like this
00:22:10 and now I am clicking train again. Now it will generate 160 class images for training. Basically
00:22:23 the generic images to improve my training quality. Let's see what kind of results we are going to
00:22:30 get. We should get results same as we got in text2image tab actually. Yeah, it's a decent
00:22:37 face photo of a male. Why we are doing this? As I said, to increase the variance variation. When
00:22:51 you have different styles, different variations of photos. It will prevent over training and it will
00:23:01 force model to learn face of the person that you want to teach. OK, so you see, now we are getting
00:23:09 really decent quality face images of male persons and it will help our model to learn better.
00:23:21 OK, meanwhile, the training is going on. I mean, the image generation is going on. Let's
00:23:28 quickly recap. First, we generated our model with a unique name like this, and we have selected the
00:23:35 source checkpoint. You can source checkpoint any model on the Internet that you want to teach. OK,
00:23:42 it will work. Exactly same as version 2.1. The only thing that may differ if your model
00:23:51 is based on stable diffusion 1.5 or 1.4, they use 512 pixel size images. Therefore, the only
00:24:02 thing that you need to change is the image size, which is where was it? Let me show you.
00:24:14 Here image processing resolution. So, if you use a checkpoint model based on 1.5, 1.4 or 512 pixel
00:24:24 based 2.x version, then you need to change this to 512 pixel. But if you are using 2.1 version based
00:24:35 model which has native 768 pixel resolution, then you need to change that. Other than that,
00:24:46 we are going to parameters: training steps per image. I have set this very big number because
00:24:51 I will stop the training myself at a certain point. I will show you when I will stop it and
00:24:58 how I will decide to stop training, And this is important. I will save and generate images every
00:25:07 10 epoch. Every 10 epoch means that one epoch will happen when it process all of the images
00:25:14 in my training folder. I have 16 images in my training folder, which is here. It will also put,
00:25:23 I think, flipped images there, so it will be 32 images. We will see that when training starts,
00:25:29 currently still generating the generic images that I have requested, like this. OK, so I will be able
00:25:39 to decide whether model has learned enough so that I can stop and start using the model or not. OK,
00:25:48 so these save previews and save checkpoints is really important to see the progress of training.
00:25:56 The batch size is, I think, related to how many GPUs you have, or if you have a very strong GPU
00:26:04 that can process in parallel two images at the same time. If it has enough VRAM memory, you can
00:26:12 also increase this. But if your graphic card can only process one image at a time, then you should
00:26:19 leave both of these as one. I didn't change any of the learning rates or other things. I did leave
00:26:28 them default. I have also applied horizontal flip randomly. Decide to flip images horizontally so
00:26:36 that it will add more variation to the learning data set. I don't have any VAE or concept list.
00:26:46 I am using LoRA because with this way it will use lesser VRAM than DreamBooth. And the save files
00:26:57 will be 1000 times lesser than the DreamBooth, because DreamBooth generates full size model
00:27:04 files for checkpoints. However, this will generate minimal files. Then from those files, after we
00:27:12 are satisfied with the training process, we will generate full model. I am using 8bit Adam to save
00:27:20 VRAM. I am using mixed precision memory attention and I didn't change any other parameters. And in
00:27:27 the concepts I did set my data set directory and the classification data set directory. I have
00:27:33 already shown you them. We are not using any file words because we are not doing a general concept
00:27:40 training. It is not the context of this video. I may make another video to train hypernet or
00:27:50 textual embeddings. The instance prompt. This is really important because this keyword is being
00:28:00 taught to our model. So when I do inference with the new model, the tuned model, it will
00:28:09 know that this keyword is the face of my brother pictures. Therefore, this is really important.
00:28:18 This is the generic class prompt. I already have explained that. And these are the arbitrary
00:28:26 numbers. Actually, this is the arbitrary number I have entered. I didn't change the other things.
00:28:33 These are only affecting the images generated in here. None other than that. So this part is
00:28:43 only important for these images. So it will generate 160 images in this folder. You see,
00:28:50 it is also generating same named text files and it is saving the description of the input. You
00:29:00 could also modify these descriptions, but I think it is not very important for LoRA
00:29:04 training. It is important for hypernetwork and especially for text embeddings. Now I will pause
00:29:12 video until the image generation has been done and the training has started. OK, meanwhile,
00:29:19 the class image generation: So far, we are almost at 50 percent. It says there is still 20 minutes
00:29:27 remaining. Approximately. It is using 95 percent of my GPU. It is using almost 9 gigabytes of my
00:29:37 GPU and it is using about 20 percent of CPU. So these are the values that it is using for just
00:29:45 class image generation. And let's see how much it will use for training. And the class image
00:29:53 generation speed is 14.58 seconds / IT. OK, the training process has started. After generating all
00:30:04 of the images. Let me show you them. Once you have generated these generic images, you don't have to
00:30:11 generate them once again. You can stop and restart training and use these base generic images.
00:30:19 However, an error occurred. So my web interface is not getting updated anymore, unfortunately.
00:30:29 But the training is going on, as you can see. Currently it is two iterations,
00:30:34 actually two seconds per iteration as a speed. It has done 145 iterations so far.
00:30:47 And let's see how much VRAM. Oh, you see, my entire VRAM is almost full. It says
00:30:56 that there is allocated and reserved, but I am seeing the full VRAM usage in my graphic card.
00:31:04 And after 10 epochs we are supposed to get our first training output to see. OK, it says that
00:31:15 you see, LoRA weights successfully saved to C Stable Diffusion web UI, which is my folder.
00:31:20 Inside models LoRA. So let's go there and check it out. In Stable Diffusion web UI in models in LoRA.
00:31:33 OK, so you see, this is the checkpoint file it has generated and it is only three megabytes. So
00:31:40 I can generate checkpoint file for even every epoch. However, if we were using DreamBooth
00:31:48 instead of the LoRA, this would be minimal, like 5 gigabytes or 6 gigabytes or 4 gigabytes based on
00:31:56 the model that you are using as the first initial checkpoint. So for version 2.1, it would be equal
00:32:07 to minimum five gigabytes because the base model is five gigabytes. If we were using DreamBooth
00:32:13 instead of LoRA, our every checkpoint would be five gigabytes. But now we are only getting
00:32:20 three megabytes checkpoint. It is even smaller than 1/1000. OK, so we should also got the first
00:32:34 image output of our training. So where it is saved, If you are wondering, I think inside
00:32:42 DreamBooth, in my brother's in here samples. Yes, So this is the first sample image it has
00:32:50 generated after ten epochs. In this folder, as the time passes we are going to see images that
00:32:59 will be similar to my brother's sample images. Let me show you our training data set images.
00:33:09 Let me show you once again. So once we get images as close as possible to our training data set,
00:33:17 then we will, we will, we will generate a checkpoint model file, full model file, from that
00:33:27 file. Which file from the file? Let me open once again the C folder
00:33:37 to explain better. In here in models in LoRA. So once we got a good image we are
00:33:46 going to you see the file name is 160. We are going to get the same file in here and we will
00:33:54 generate a full model checkpoint from that and then we will be able to generate the images of
00:34:01 the person we train it for. OK, so now I will pause video until we got some good results.
00:34:10 Oh, by the way, it seems like. Yes, yes, the process has stopped. So therefore I have to
00:34:17 continue. Probably an error has occurred. Yeah, an error occurred. So what we need to do is:
00:34:26 OK, let me show you to continue from there. So I am refreshing the web interface. This
00:34:34 error may occur time to time And in here I go back to the DreamBooth. And in here you see,
00:34:41 let's refresh. We have my brother as LoRA model. And OK, so let's click load params and
00:34:53 see if it will load. I hope it loads. OK, it says loading. I maybe need to restart the application.
00:35:04 Yeah, probably I need to restart the application. You see, when you play with the web UI while
00:35:10 training, these kind of errors may occur. So I will now restart the application. OK,
00:35:16 after I close the command line, you see connection error occurred. So let's go back to our stable
00:35:24 diffusion web UI. Click web UI webui-user.bat. OK, I have restarted the web UI. Now refresh
00:35:33 and go to the DreamBooth and pick the LoRA model. And let's click load params. Please specify model
00:35:41 to load. So you see, my brother model is now here. After restart, and after I click load params,
00:35:49 load loaded config and I click train, it should continue from where it is left. OK, you see,
00:35:57 it is getting. Concept requires 160 images. It has loaded the same images, so it is not
00:36:03 regenerating the classification images, It is loading the weights where it is left, I think.
00:36:12 So the number of examples: 16. Number of batches per epoch: 16.. Correct. Number of epochs: one
00:36:19 million. As we have set, the total optimization step is 16 million. OK, everything looks correct.
00:36:28 And, yes, it is now continuing where it is left. This is great. So if an error occurs,
00:36:37 this is how you are going to continue your training. So you can further optimize your
00:36:44 model from any checkpoint. And in here, if you see, let me zoom in.
00:36:50 You see, OK, I did zoom too much. Training step is 23. This is the current session. And, you see,
00:36:59 this is the lifetime session. So this is different that you are setting in here. Use lifetime steps,
00:37:06 epochs when saving. We didn't check that. So we are only taking into account the current
00:37:13 session steps for saving and previewing the checkpoints. But this is the lifetime. OK,
00:37:20 now I will pause the video again. OK, so you see, after the second save and now it continue to do
00:37:28 training. So sometimes errors may happen, even though they shouldn't. So if an error happens,
00:37:36 just restart application, just as I have shown, and continue to training. So the samples are
00:37:44 getting produced. I hope it doesn't take too much time to teach my brother face into the model.
00:37:55 OK, it has been only 30 epochs so far and we already got somewhat
00:38:02 similar picture in the third one. You see, this is the thirty epoch after 480 steps
00:38:09 in total. And this is my brother. You see, there is a similarity, as you can see. OK,
00:38:18 I have noticed another mistake. You see the command line interface is displaying the LoRA
00:38:25 weights has been saved to my brother, underline 160 dot PT. However, it is correctly saving in the
00:38:33 folder. So this printed message is incorrect, but the same with file names are correct,
00:38:40 as you can see. So this is the thirty epoch has been done actually for. Actually for the epochs,
00:38:47 Yes, since we have 16 images, when you divide this 16, it is 40 epochs. And these are the
00:38:54 so far generated images. You see it starts to resemble more and more as the training continue.
00:39:00 OK, it has been over 1423 steps so far. So the training step speed is, for 1.60 seconds for per
00:39:14 iteration. So far it is going on. We are getting closer to our target image, as you can see here.
00:39:24 OK, It has been over 5600 steps so far, which makes 350 epochs. And for
00:39:37 this tutorial I am now going to cancel the training and I will generate a checkpoint
00:39:45 based on the best that comes to me as best epoch number, which is sample 2400. You can
00:39:57 continue to do training until you are satisfied with the results. But these results are just
00:40:03 a preview of what it has learned. With a good prompt you can obtain much better photos. And
00:40:11 also it also depends on your data set quality. If you prepare better images than in this example,
00:40:18 you can still obtain better results. I think this is a decent one. And let's generate our
00:40:26 model checkpoint. So how we are going to generate our model checkpoint to use later.
00:40:32 You see there is a LoRA model and now we are going to generate checkpoint from our 2400.
00:40:42 I am entering the model name here that I want to give. Let's say, my
00:40:48 brother test one, and I am clicking generate ckpt file. OK, it is generating. You can see that it is
00:40:58 loading LoRA from the selected checkpoint from here and applying weight. As you can see here:
00:41:07 LoRA weight: What percentage of LoRA weight should be applied to the UNET when training
00:41:12 or creating checkpoint. Applying the text weight as well. And then it is saving. However,
00:41:19 the saved file name is not correct. You see, it has appended the latest training number, However
00:41:28 it has loaded 2400. So I think there is a simple mistake in the web UI. So where it is saved,
00:41:39 It is saved inside Stable Diffusion installation folder, then models, then Stable Diffusion
00:41:46 folder. OK, I am going to rename this into the name that I want. Let's say LoRA.
00:41:54 And you see it also has YAML file and it has to be the same name. OK, I did rename with F2.
00:42:02 OK, now I can do text2img and generate new images based on our training. How we are going to do
00:42:10 that: First click refresh here and then my new model has appeared here. It is now loading. You
00:42:19 can see the loading from this command window here: Loading config and loading other parameters. OK,
00:42:27 the model has been loaded. Now what is the prompt that we are going to do? The prompt
00:42:35 we are going to do is the prompt we have given in here, which is my brother face. OK, this is our
00:42:42 unique keyword. And then we will append the other keywords that we want. Even though our model has
00:42:51 learned the face very good As soon as we add new keywords to improve and obtain different styles of
00:43:01 the learned face, it totally produces different images. Unfortunately, no matter how many times
00:43:09 I have tried, my all attempts have failed. It always produces different faces, not the face
00:43:17 it has learned. If I only give my prompt instance, yes, it produces the face of my brother. But then
00:43:26 what is the purpose of training? Because I am not able to modify it. Change the style, produce
00:43:33 different styles. Therefore, now I will do another training with SD version 1.5. And let's see the
00:43:41 difference between SD version 2.1 and 1.5 when we are doing face training. Since SD version 1.5
00:43:51 requires 512 pixel resolution, I am recutting the images, as you can see. I am cropping them again
00:44:03 And I have removed some of the very old images. Actually, I only removed two of them. Okay,
00:44:10 it is like this that I am setting up the images for SD version 1.5 training. Okay, yes,
00:44:21 like this. Save as zip and open the downloaded file and extract them into the folder.
00:44:33 Okay, here, like this, I will overwrite and the training set is now ready. Okay, for 1.5,
00:44:40 I am first changing model into 1.5. Then I am going to the reboot. And in here we are
00:44:49 going to generate a new model. Let's say, okay, brother, SD 15, like this. And the
00:44:58 source checkpoint will be 1.5 because we are starting a new model. Let's generate the model
00:45:06 like this: Okay, it is preparing the model file for training.
00:45:12 Actually, everything is same. Then I am clicking training wizard person. It will
00:45:17 set the parameters. Oh, I think I click it. Yeah, I didn't wait process to finish.
00:45:26 Okay, now I will set again. Okay, now it is set and the model is also set. All right, you see,
00:45:33 the model has arrived here for training. Okay, I am just doing the same things. By the way, we now
00:45:41 need to recompose new class images for improving the test accuracy. Also, this apply: horizontal
00:45:53 flip means that on the runtime it will sometimes provide the horizontally flipped images in the
00:46:01 training. Okay, it won't generate new images on the folder, It will do that on the runtime. Okay,
00:46:09 I am selecting LoRA. I am using 8bit adam with FP16, xFormers. Don't Cache Latents.
00:46:25 Okay, I am not changing other things because it is already learning good,
00:46:29 but we weren't able to generate good images. I think it was due to
00:46:34 version, SD version 2.1. Okay, the path for our VAE here. And classification. We now need to make
00:46:48 new classification, so I will just make another folder too. Okay, like this: Let's enter here.
00:46:58 All right, we are leaving this empty because we are only teaching one phase. I will give the same
00:47:05 name as the model name to instance prompt. Now class prompt. This is important to decide class
00:47:14 prompt. Now I will do a few tests here. Okay, with a simple prompt, such as face photo of a man:
00:47:21 8K HDR, smooth, sharp focus and cinematography, we got decent faces. So this will be our class
00:47:32 prompt. Okay, let's go back to our DreamBooth training, and the class prompt will be like this:
00:47:39 Classification image, negative prompt. So let's also copy and paste it. By the way, don't worry,
00:47:45 I will provide these as comments. Okay, so the sample image prompt will be same as before. Okay,
00:47:55 and should I provide negative image prompt for sample? Yes, let's also provide it. Okay,
00:48:03 how many we want? This time? We have how many images in the folder? Let's check it out
00:48:12 Once again. Okay, we have 14, so I will generate just 140. Okay, I'm not touching
00:48:23 this. Parameters are set. Everything looking good. Okay, let's start another training.
00:48:35 So you see, since we don't have any concept images now, it is going to generate first our class
00:48:43 images, as before. But this time we are using 512 pixel resolution. This is really important
00:48:50 because our base model is now version 1.5 and it is using 512 pixel as native resolution.
00:49:00 Generating class images are now much faster, you see, because it is now lower size in dimension.
00:49:12 Okay so, the classification training set has been completed and now the training has started and so
00:49:20 far we are at the 50 training step. You see, it is much faster now than before because we are simply
00:49:29 working with 0.44, which means 44% of image size than before. Because before we were working on
00:49:41 768 pixels. Now we are working with 512 pixels. Therefore, it is more than two times faster.
00:49:50 New model training checkpoints are also getting saved under the models
00:49:57 LoRA folder. As you can see here with the name that I have given.
00:50:01 Also, the new folder under DreamBooth has been generated for the new training in here, brother
00:50:09 SD15. And in here we can see the samples it is generating: so far, nothing resembling at all.
00:50:17 Okay so the training has been completed. I let it run during the night while I was sleeping, so
00:50:23 generated so many checkpoints. And now I am going to use this particular checkpoint to generate our
00:50:31 .ckpt file from here. As usual, as previously same. I have selected the model entered,
00:50:41 selected the checkpoint, given a name, and then click it generate ckpt file. Then now
00:50:51 I am going to load newly generated ckpt file. So to do that, just click refresh.
00:50:58 And it should come. And yes, it is. It has arrived. With now checkpointing that model.
00:51:05 It is done. And now we can do our tests. OK, I have generated over 600 images and some of them
00:51:15 are really good and really resembling the face we teach it. So the key thing is that you need to
00:51:23 generate more images with LoRA because I think it is not as precise as DreamBooth. The prompt I have
00:51:31 used is portrait photo of brother SD 15, which is my prompt instance, with weight 1.2. 1.2 weight
00:51:40 means that it will give more importance to this keyword. On the official page of Automatic1111
00:51:48 Stable Diffusion web UI wiki features, You can see attention emphasis and it is explaining that
00:51:55 how you can give more attention to each word. You can also use parentheses like this, or you
00:52:02 can directly set importance like this. So it is totally up to you to use the either way. So I have
00:52:10 given more importance to the prompt instance and I have also written photo of brother SD 15. And
00:52:17 then I have used a generic keywords to generate images as close as to our prompt instance. You see
00:52:26 8K HDR, smooth, sharp focus cinematic. I am going to share all these keywords in the comments of the
00:52:34 video, and I have also entered a lengthy negative prompt. I have used Eular a as a sampling method
00:52:44 with 25 steps and the native resolution for SD 1.5 512 pixels. So how can you generate more than 100
00:52:55 images? Set the batch count to 100, then go to bottom. Here. You will see the script section.
00:53:03 By default it is set to none, but you can go to prompts from file or text box and you can just
00:53:10 copy and paste your prompt. So it will read each line and will continue generating images
00:53:18 until all of the lines are executed. With this way, you can generate much more images. Also,
00:53:27 there are other options that you can do here. For example, you can do X and Y plots. So you can give
00:53:35 X values and Y values and, if you wonder what they are, separate values for X axis, using commas.
00:53:42 You can play with these to, for example, generate different style images with having, let's say,
00:53:52 artist names or style names in your X values and the Y values would be like your regular
00:54:00 prompt. Okay, so I have selected few of the images and now I will show you how to upscale them. The
00:54:09 resemblance rate is not as good as DreamBooth, unfortunately. So you can also do DreamBooth
00:54:16 training. The only thing different in DreamBooth training than the LoRA is in the advanced setup.
00:54:22 You just don't pick this LoRA and it will do DreamBooth training. And also be careful that
00:54:27 when you are DreamBooth training it will generate 5GB files, 4GB files, at the each save checkpoint.
00:54:35 So you may want to reduce this, increase the save checkpoint frequency, not just 10, but maybe
00:54:43 50. It totally depends on your hardware hard drive. Okay, so what am I going to do is first
00:54:50 let's check out the PNG info. Tap in here in pictures and in brother selected. Let's pick
00:54:57 one of them. So you see Web UI embeds the meta information of the parameters. So if you can
00:55:05 get the original image that is generated by the Web UI, you can just use PNG info to extract the
00:55:13 parameters from that image. And the one another thing I am going to show you is extras. In extras,
00:55:20 let's first try a single image. You can upscale it. Okay, the best upscaling algorithm I have
00:55:28 found is R-ESRGAN 4x+. I pretty much like this. And let's upscale to 3X dimension. The
00:55:39 first time you do. It may download something in here. Since I have done it previously,
00:55:44 it didn't download the necessary models. Okay, the upscaling is done. As you can see,
00:55:51 now this is upscaled version. You can also apply GFPGAN visibility. The GFPGAN will
00:56:00 improve the face of a human. It is another model. Let's do that and see the difference.
00:56:10 Okay, it is getting done. And yes, so now you see it is more like a real human. This is fixing
00:56:21 eyes much better, making the eyes much better if they are not oriented, if they are not symmetric.
00:56:29 So you may want to apply this as well, if you want.
00:56:34 And also you can do batch processing. For batch processing, just open the folder with Ctrl-A,
00:56:41 select all, open. It will be loaded like this. Then it will apply all of the parameters you
00:56:48 set here and click generate. And all of the images will be generated as a batch. So the
00:56:58 results of the batch generation will be saved in a folder. If you want to open that folder,
00:57:03 just click this folder image here. You see open images output directory and the batch processing
00:57:08 results will appear here. You can just directly copy them, paste them and do whatever you want.
00:57:17 Okay, this is all for today. Please ask any questions that you might have.
00:57:23 To improve the performance so I suggest you to use DreamBooth instead of the LoRA training. Also,
00:57:31 you can improve your data set. The data set, our data set, was not that very good. You see
00:57:40 almost same time captured, same pose images. So if you add more variety to your data set,
00:57:48 you will obtain better results, more likely. And also, please like, share and subscribe our channel
00:57:56 if you have enjoyed. And if you support us on Patreon we would appreciate very much. Currently,
00:58:02 so far, we have one Patreon, as you can see. Thank you very much to our beloved Patreon, by the way,
00:58:08 And I am hoping that you will support us as well. Hopefully, more videos, more advanced videos will
00:58:15 come for Stable Diffusion. If you want to also learn something about Stable Diffusion, let me
00:58:22 know by comments. And hopefully I will make videos about them. Hopefully, see you in another video.
Beta Was this translation helpful? Give feedback.
All reactions