Zero To Hero Stable Diffusion DreamBooth Tutorial By Using Automatic1111 Web UI - Ultra Detailed #302
FurkanGozukara
announced in
Tutorials
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Zero To Hero Stable Diffusion DreamBooth Tutorial By Using Automatic1111 Web UI - Ultra Detailed
Full tutorial: https://www.youtube.com/watch?v=Bdl-jWR3Ukc
Our Discord : https://discord.gg/HbqgGaZVmr. The most advanced tutorial of Stable Diffusion Dreambooth Training. If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰 https://www.patreon.com/SECourses
Playlist of Stable Diffusion Tutorials, Automatic1111 and Google Colab Guides, DreamBooth, Textual Inversion / Embedding, LoRA, AI Upscaling, Pix2Pix, Img2Img:
https://www.youtube.com/playlist?list=PL_pbwdIyffsmclLl0O144nQRnezKlNdx3
I am explaining from scratch to very advanced level how to use #Automatic1111 Web UI and D8ahazard #DreamBooth extension to teach new subjects, e.g. your face into a model. Moreover, I am showing how to inject your taught face into a completely new model e.g. Protogen x3.4 to produce awesome quality images without wasting too much time on finding correct prompts.
Automatic1111
https://github.com/AUTOMATIC1111/stable-diffusion-webui
How to install Web UI: https://youtu.be/AZg6vzWHOTA
How to use #StableDiffusion different models on Web UI:
https://youtu.be/aAyvsX-EpG4
Official SD v1-5-pruned : https://bit.ly/sd15ckpt
How To Do LoRA Training: https://youtu.be/mfaqqL5yOO4
Wiki Ram memory: http://bit.ly/3IqFUeW
Rare tokens: https://bit.ly/SDRareTokens
Rare tokens list: https://bit.ly/SDRareTokensList
Basics wiki: http://bit.ly/3Yy78pn
DreamBooth paper
https://arxiv.org/pdf/2208.12242.pdf
Best caption: https://bit.ly/bestcaption2
00:00:00 Introduction to Grand Master yet most beginner friendly Stable Diffusion Dreambooth tutorial by using Automatic1111 Web UI
00:03:11 How to install DreamBooth extension to the Web UI
00:04:09 How to update installed extensions on the Web UI
00:04:35 Introduction to DreamBooth extension tab
00:04:45 Training model generation for DreamBooth
00:05:34 How to download official SD model files
00:06:21 Training model selection and settings tab of the DreamBooth extension
00:07:36 What is training steps per image epochs
00:08:24 Checkpoint saving frequency
00:09:15 What is training batch size in DreamBooth training and how to set them properly
00:10:47 Set gradients to none when zeroing
00:11:24 Gradient checkpoint
00:12:04 Image processing and resolution
00:12:39 Horizontal flip and Center crop
00:12:50 What is Sanity sample prompt and how to utilize it to understand overtraining
00:13:30 Best options to set in Advanced tab of DreamBooth extension
00:14:22 Step Ratio of Text Encoder Training
00:14:49 Concepts tab of the DreamBooth extension
00:15:27 How to crop images from any position with Paint .NET or use Birme .NET
00:17:22 Setting training dataset directory
00:17:44 What are classification images
00:18:46 What is Instance prompt
00:19:05 How to and why to pick your instance prompt as a very rare word (very crucial)
00:21:52 Class of the subject
00:22:15 Everything about class prompt
00:22:55 Sample prompt
00:23:30 Clas images per instance
00:25:00 Number of samples to generate
00:26:27 Teach multiple concepts in 1 run
00:28:24 Saving tab
00:29:10 How to generate checkpoints during training
00:30:52 Generating class images before start training
00:33:28 What is batch size in txt2img tab
00:36:09 Start training
00:38:25 First samples/previews of training
00:39:13 Sanity prompt sample
00:39:54 How to understand overtraining with sanity samples
00:40:34 How to properly prepare your training dataset images
00:43:15 Checkpoint saving during training
00:44:30 What is Lr displayed in cmd during training
00:45:38 How to continue / resume training if an error occurs or you cancel it
00:46:41 We started to overtraining and how we understood it
00:48:24 How to start generating our subject (face) images from best trained checkpoint
00:50:09 What is prompt strength / attention / emphasis and how to increase it
00:51:17 How to increase image quality with negative prompts
00:51:50 How to get your taught subject with which correct prompting
00:52:31 What is CFG and why should we increase it
00:52:54 How to try multiple CFG scale values by using X/Y prompting
00:54:54 Analyzing CFG effect
00:56:03 How to test different artist styles with different CFG scales by using X/Y plot
01:00:47 How to use prompt matrix
01:02:54 Prompts from file or text box to test many different prompts
01:03:57 Generate thousands of images while sleeping
01:04:22 PNG info to learn used prompts, CFG, seed and others
01:07:00 Extras tab to upscale images by using AI models with awesome quality
01:09:54 How improve eyes and face quality by using GFPGAN
01:11:35 How to continue training from any saved ckpt checkpoint
01:12:06 How to upload your trained model to Google Colab to use
01:14:19 How to teach a new subject to your already trained model
01:15:55 How to use filewords for training
01:21:52 What is fine tuning and how it is done
01:23:10 Hybrid training
01:24:39 How to understand out of memory error
01:25:39 Lowest GPU VRAM settings
01:27:35 How to batch preprocess images
01:31:47 How to generate very correct descriptions by using GIT large model
01:33:19 How to inject your trained subject into any custom / new model
01:37:36 Where is model hash written and how to compare
Video Transcription
00:00:02 Greetings everyone.
00:00:03 Welcome to the most beginner-friendly and yet the most advanced and up-to-date Stable
00:00:07 Diffusion DreamBooth model training tutorial.
00:00:09 In this guide video, I am going to use the latest Automatic1111 web UI and the DreamBooth
00:00:14 extension.
00:00:16 The interface and the features of the DreamBooth plugin have been significantly changed, so
00:00:20 all other tutorials are now obsolete.
00:00:23 I have been experimenting for over 7 days to find the best settings and the training
00:00:27 parameters.
00:00:28 Moreover, I tried to learn what each option does and I have explained everything in this
00:00:33 video.
00:00:34 Before starting, let me provide some quick info.
00:00:37 Stable Diffusion is a text-to-image generative public AI model, and the Automatic1111 web
00:00:42 UI is a tool developed by the open source community to use Stable Diffusion easily.
00:00:47 DreamBooth is an AI algorithm that allows you to teach new subjects or even styles to
00:00:52 existing Stable Diffusion models very successfully, such as teaching the face of a person.
00:00:57 In this tutorial, I am going to use freshly installed Automatic1111 web UI to teach my
00:01:02 face by using Stable Diffusion 1.5 official version.
00:01:05 I will also show how you can do the same training on Stable Diffusion version 2.1 as well.
00:01:11 Moreover, I will show you how you can inject your trained subject, in this case my face,
00:01:16 into any custom model and obtain amazing results.
00:01:19 I will demonstrate an example by using the very popular and very high-quality custom
00:01:24 model Protogen x3.4.
00:01:26 With this injection methodology, you can use any namely released custom model and obtain
00:01:32 even better results.
00:01:33 You won't even need to retrain your subject for this to work.
00:01:37 This method provides such high-quality images that you cannot even obtain them on paid services
00:01:42 like Lensa or Midjourney.
00:01:44 The Automatic1111 web UI is getting constantly updated, so let me show you the version I
00:01:49 am using from official repository.
00:01:53 This is the official repository of the Stable Diffusion web UI.
00:01:57 It has been recently taken down, but it is now back again.
00:02:00 So if you can't find this URL, just check out the video and I will update the description
00:02:05 of the video and the comment of the video so you will find the latest link of the Automatic1111.
00:02:11 So the commit we are using is published 9 hours ago, January 7, 2023.
00:02:20 If you don't know how to install Automatic1111 web UI, I have a great tutorial for that.
00:02:25 So this is the homepage of our YouTube channel.
00:02:27 Go to playlist and in here you will see Stable Diffusion DreamBooth playlist and in this
00:02:34 playlist, easiest ways to install and run Stable Diffusion web UI on PC.
00:02:38 I will put the link of this video to the description and also you can watch how to use Stable Diffusion
00:02:43 version 2.1 and different models in the web UI.
00:02:46 This is also very important.
00:02:47 I will also put the link of this video to the description as well.
00:02:51 One more thing.
00:02:52 This is commonly asked.
00:02:54 If you encounter any problem, go to about page of our channel and in here you will see
00:03:00 our Discord channel link.
00:03:01 As you can see, I am currently hovering that.
00:03:03 You can join our Discord channel and ask me any questions that you encounter.
00:03:08 So this is our beginning screen of the Stable Diffusion.
00:03:11 And first let's start with installing our extension, DreamBooth.
00:03:14 To do that, go to extensions tab, click available load from and in here you will see DreamBooth
00:03:21 extension.
00:03:23 When you type DreamBooth, it is listed in here.
00:03:26 I am just clicking install and it is getting installed.
00:03:29 You should see a message here: OK, it has been installed.
00:03:35 We have one error, but it is not a problem.
00:03:38 It still works.
00:03:39 So you see, we have a message on CMD window and also installed into the C web UI tutorial
00:03:45 extensions as the DreamBooth extension.
00:03:47 Now we have to restart CMD window because we are the first time installing and it is
00:03:53 a necessity.
00:03:54 Otherwise it won't work.
00:03:56 Let's close.
00:03:58 Let's restart.
00:03:59 OK, restart has been completed.
00:04:02 Let's just refresh and then go back to extensions and check for updates every time you start.
00:04:09 OK, it has been just got updated.
00:04:12 So I'm just clicking apply and restart UI.
00:04:14 OK, it is done.
00:04:16 After the first time installation.
00:04:19 You don't need to restart CMD window once again.
00:04:23 So you see, this is how frequently this stuff are getting updated.
00:04:26 Literally, it has been updated just now, as you can see.
00:04:30 So you should always check the latest version.
00:04:33 Now we can start our tutorial.
00:04:35 We are now.
00:04:36 We see the DreamBooth tab in the interface.
00:04:38 We click that.
00:04:39 This is the interface where you are going to generate our model and train our face or
00:04:44 new subject.
00:04:46 First of all, we need to generate our model.
00:04:51 You can simply enter any name here.
00:04:53 It doesn't matter.
00:04:54 So I will enter as web UI and the identifier prompt of my model, which will be ohwx.
00:05:02 I will explain why it will be ohwx.
00:05:05 Then we need to check source point.
00:05:08 You can also import from Hugging Face, but I don't suggest that it is not necessary.
00:05:12 I am checking version 1.5 Pruned ckpt.
00:05:17 So version 1.5 pruned ckpt available in the official repository of StableDiffusion 1.5.
00:05:24 You can just download it from here.
00:05:26 Why we are using Pruned ckpt, not the pruned-emaonly ckpt, because this is better for training
00:05:33 new subjects.
00:05:34 When you click here, you can just download it with clicking here.
00:05:39 And after you put that into your model folder, it will be also available here, as you can
00:05:44 see.
00:05:46 OK.
00:05:48 Then just click the create model button.
00:05:51 OK, you see.
00:05:54 We have a message checkpoint successfully extracted to this folder.
00:05:59 Where it is.
00:06:00 Let me show you.
00:06:01 It is inside Web UI Tutorial.
00:06:04 And let's go to our models and inside DreamBooth, inside Web UI ohwx and in here working.
00:06:11 And these are actually weights of the model that we have just composed.
00:06:17 Lets continue.
00:06:19 Now this model is selected here.
00:06:21 This is where the selection we make.
00:06:24 After we make this selection, we will train the selected model.
00:06:28 Yes.
00:06:29 OK.
00:06:30 Now let's go to the settings tab in here.
00:06:33 First click performance wizard.
00:06:35 It will set the parameters according to the VRAM of your GPU.
00:06:39 If you have less than 12GB of GPU, it is really hard to use DreamBooth.
00:06:44 Unfortunately.
00:06:45 You can use LoRA, but it is a topic of another video.
00:06:48 Actually, it is almost same as this video, but there is.
00:06:52 There are some just few tricks, and I already have a video for LoRA.
00:06:56 So after watching this video, if you watch that video, the LoRA video.
00:07:00 You can easily apply LoRA to your training.
00:07:05 It is in here.
00:07:06 You see how to do Stable Diffusion LoRA training by using.
00:07:08 I will also put the link of this video to the description as well.
00:07:13 So training steps per image epochs.
00:07:16 First of all, let me explain what is epoch.
00:07:19 We will have a training data set, the pictures of the subject that we are going to teach.
00:07:26 In this case, I am going to teach myself.
00:07:29 I will use 12 images of myself.
00:07:33 Therefore, one epoch means that 12 steps.
00:07:38 So each step is a training step and each epoch is training all of the training images one
00:07:44 time.
00:07:45 So one epoch means 12 steps in my case, because I have to have training images.
00:07:50 And how many epochs we want?
00:07:52 For teaching faces it is usually suggested 150.
00:07:57 So when you go to the concepts, just click training with a person.
00:08:00 It will set the most appropriate values for person.
00:08:05 So you see, now it is set to 150.
00:08:07 However, you can set this as much as you want and you can use a certain checkpoints.
00:08:13 I will explain that.
00:08:14 So I'm just going to make it 300.
00:08:17 And how much time you want to wait between each epoch: zero, This is also zero.
00:08:22 OK, this is important.
00:08:24 How frequently we want to save our training.
00:08:29 You know, if your computer crashes, if you cancel your training, if whatever happens,
00:08:34 you will be able to continue from your latest saved model.
00:08:40 Therefore, this is important.
00:08:42 Also, if you do over training and you want to use previous training checkpoint, you also
00:08:48 need to have a save.
00:08:49 So I'm going to set this as 10.
00:08:50 Be careful that when you are doing the DreamBooth training, it is usually taking about 4 to
00:08:55 5 gigabytes for per saving.
00:08:58 So if you don't have much hard drive space, you need to set this a higher number perhaps.
00:09:04 This is saving preview images each epoch, for example, or for whatever the number of
00:09:08 epochs you want.
00:09:09 This doesn't take space, but this will slow you down.
00:09:13 So I'm just going to leave this as five.
00:09:17 Batch size: Now, this is very important.
00:09:19 If you increase batch size, it will speed up your training significantly.
00:09:23 However, this will also increase your GPU memory usage significantly as well.
00:09:29 If you increase these numbers, you need to increase both of them equally to obtain the
00:09:35 best results.
00:09:36 So now, for example, it will be almost four times faster.
00:09:40 Also, make sure that your training images count is divisible to this number.
00:09:45 So two multiplied by two makes four and you must have training number of images divisible
00:09:54 to four.
00:09:55 So it can be four images, eight images, 12 images, 16 images, 20 images, but it shouldn't
00:10:02 be 17 images.
00:10:03 OK, this is the formula.
00:10:06 Let's say you have 16 gigabytes of GPU RAM, then you can make this three by three.
00:10:12 And then you should have nine or 18 or 27 or 36 images.
00:10:17 That is the formula.
00:10:18 I'm just going to leave this one by one for now.
00:10:22 Also, another thing is, if you make this two and two, like this, it will be four times.
00:10:28 Then you need to also increase learning rate by four times, like this and this.
00:10:35 Otherwise it will be very slow.
00:10:36 It is also requiring to speed up the learning rate as well.
00:10:40 As much as you increase them.
00:10:42 Since I will use one by one, I am just going to leave the default learning rate.
00:10:46 OK, set gradients to none when zeroing.
00:10:50 If you select this, it will increase the GPU RAM usage.
00:10:55 How can you know that?
00:10:57 The DreamBooth has a wiki pages and in here they have RAM usage settings.
00:11:03 Let me show you.
00:11:06 OK in here: settings known to use more VRAM.
00:11:10 High batch size, as I just explained.
00:11:12 Setting gradients to none when zeroing, which is these settings in here.
00:11:18 So when you check this, it will use more VRAM and then use EMA.
00:11:23 OK.
00:11:24 Now let's continue.
00:11:25 And I will explain.
00:11:26 Gradient checkpoint: This is technique to reduce memory usage by clearing activations.
00:11:31 So it is good to check it out.
00:11:35 And then we are not just passing here.
00:11:37 These are just kind of more advanced things to play with it.
00:11:42 After you get used to how to use the DreamBooth, you can just change them, but in the learning
00:11:47 stage just leave them as they are.
00:11:49 If you set these too high, it will get too fast trained.
00:11:53 However, it will also over train easily.
00:11:56 If you get them too low, then you may never get it trained.
00:12:01 So this is kind of experimental thing that you need to do a lot of experimentation.
00:12:05 Image processing and resolution.
00:12:07 This is important.
00:12:08 When you use a model version based on the version 1.X, then they are 512 pixels.
00:12:18 If you use version 2.1 then there is also 768 pixels version.
00:12:24 So you need to set this according to the version of your base model.
00:12:28 OK, the base model, the source checkpoint.
00:12:30 We checked here.
00:12:32 Since we are using version 1.5, official version.
00:12:35 It is 512 pixels.
00:12:39 Don't apply horizontal flip.
00:12:40 This is not good for faces.
00:12:42 Center crop.
00:12:44 If your images are not cropped, you should check this out.
00:12:46 I will explain how to set your images.
00:12:49 Since my images are center cropped, I am not checking this.
00:12:53 Sanity sample prompt.
00:12:54 OK, this is important.
00:12:55 We are going to use this prompt to see the overall training of the model.
00:13:02 But how, in in terms of overtraining or not.
00:13:07 During the training training, I will explain.
00:13:11 So I am going to enter here photo of ohwx man by Tomer Hanuka.
00:13:16 I will explain why did I enter this prompt.
00:13:20 And by Tomer Hanuka.
00:13:22 You will understand it.
00:13:23 Miscellaneous, pre trained VAE or path.
00:13:26 These are advanced things and you don't need currently.
00:13:28 OK.
00:13:29 OK, advanced stuff.
00:13:31 This is important.
00:13:33 If you check box the use EMA, then it will improve your training quality.
00:13:38 However, it also increases the RAM usage significantly.
00:13:41 Use eight bit Adam: This will reduce the RAM usage.
00:13:45 BF16: This is also.
00:13:47 This will also reduce RAM usage.
00:13:50 xFormers: This will significantly increase your training speed.
00:13:54 Cache Latent: This will also reduce the VRAM usage.
00:13:59 All of these are actually written in this page.
00:14:02 The out of memory top of the wiki.
00:14:05 I will put this into the description.
00:14:08 So you see these are all decreasing the RAM usage.
00:14:11 Actually, it says that cache Latent increases, but as far as I know this is not increasing.
00:14:19 But you can test that.
00:14:22 So the Step Ratio of Text Encoder Training.
00:14:25 This will improve your training quality.
00:14:27 However, it will also increase the RAM usage of the graphic card.
00:14:31 So if you encounter out of memory error, you should set this zero.
00:14:37 But the optimal value for faces is 0.7, for style 0.2.
00:14:44 And the other things you don't need to play with them.
00:14:46 They are more advanced stuff.
00:14:49 OK, now the concepts.
00:14:51 This is the very important part.
00:14:54 You can set a [filewords], prompts and directories.
00:14:59 So first of all we have to set our training data set.
00:15:03 Training data set directory.
00:15:04 Where are my training data set?
00:15:07 It is inside my pictures folder and it is in here Best DB.
00:15:12 So all of these images are now 512 by 512 pixels.
00:15:19 Let me show their original version.
00:15:21 So their original version is here.
00:15:26 How did I set them like this?
00:15:27 I have used a Paint .NET to crop them as I want.
00:15:31 For example.
00:15:32 Let me show you: paint dot net is a free tool, by the way.
00:15:36 You can install it from the Google.
00:15:40 Just click, like this, and then I am just cropping them with a square.
00:15:46 So I click rectangle, select, then click here, then in here, fixed ratio like this, Then
00:15:52 you can pick the any part of the image you want.
00:15:55 Just for example here.
00:15:57 Then you can control-C control-N and it will paste into a new place.
00:16:01 You can save it.
00:16:03 Or in here.
00:16:04 You can just resize these to very low resolution like this, with control-R. It will open resize
00:16:10 type like this, then control-V and expand.
00:16:13 You see, now it is cropped.
00:16:15 Alternatively, you can use Birme .NET.
00:16:17 Birme dot net is a famous site to crop images.
00:16:22 It is commonly used in the community.
00:16:25 You can just, for example, upload any image there and crop them.
00:16:30 For example, let's upload this image.
00:16:33 These are currently squared, but if they are not square, it will also automatically let
00:16:37 you square them.
00:16:38 Let me show: OK, you see, both of these images are not cropped.
00:16:42 So you are able to crop them with your mouse like this: set the position, then set the
00:16:48 resolution from here: 512, 512.
00:16:51 If you use SD version 2.1, then they will be seven 768 pixels.
00:16:55 OK, you can also use auto detect image focal point.
00:16:58 Do not resize.
00:16:59 And you can click here.
00:17:01 If you check, do not resize, It won't.
00:17:03 They won't be resized to this resolution.
00:17:06 Then save a zip and all of them will be saved as zip.
00:17:08 Then you can extract them with the software you have.
00:17:12 If you don't have any software like Winrar Windows still able to extract them.
00:17:17 All right.
00:17:18 If you can't make them, just join Discord and I will help you, hopefully.
00:17:22 So data set directory.
00:17:23 When you ready your images, then we will enter the path of it.
00:17:27 So this is my.
00:17:28 Let me enter the folder directory.
00:17:32 I click here and you see I am able to select the path.
00:17:34 I do control-C to copy it, paste it here (ctrl-v).
00:17:38 So this is the directory where my training images are located.
00:17:43 Classification directory: Now, what is classification?
00:17:46 Classification are generic images that we will use to not over train our model and also
00:17:54 keep the inner sanity of the model.
00:17:58 So that the entire model does not become looking like us.
00:18:02 OK.
00:18:03 So for this I will just generate a new folder.
00:18:06 Yes, I have copy pasted the path.
00:18:10 I will set it as web UI tutorial.
00:18:13 You can also enter an existing another directory.
00:18:16 It is fine.
00:18:18 Instance token: Now [filewords] are used to set the different description for each training
00:18:25 images.
00:18:26 This is very, very advanced and hard to do.
00:18:29 So I will explain this in the later parts of the tutorial video.
00:18:34 For now I will just skip them.
00:18:35 You can also skip to that part in the video, because I will put the sections of the video
00:18:42 into the description.
00:18:43 Now prompts: This is very important.
00:18:46 The instance prompt is used to define the keyword that will activate our new subject
00:18:53 that we taught to the model.
00:18:56 So in here you have to pick a unique word, but it has to be very specific and rare.
00:19:05 Whatever you enter to the model.
00:19:07 They will get turned into tokens.
00:19:10 They will split into tokens.
00:19:11 So there is a reddit thread that explains the rare tokens.
00:19:15 I will put link of this page to the description and in here the rarity of the tokens are listed.
00:19:24 So, for example, you have entered, let's say, mill.
00:19:29 It is a single token, but mill probably exist in the real life a lot.
00:19:34 Therefore, you have to go to the bottom and try to find rare tokens that you can't make
00:19:40 sense of.
00:19:42 For example, they.
00:19:43 Also, these tokens should be used in other languages as well.
00:19:49 For example, from here: ohwx is a very famous token because this is a token that almost
00:19:57 does not exist in anywhere.
00:19:59 When I type ohwx to the Google, you see all unrelated things.
00:20:06 They look like spam.
00:20:07 So this is a good token and, for example, you can also try other tokens here that looks
00:20:14 like to you weird.
00:20:15 Maybe this one?
00:20:16 Yes, this, OK.
00:20:17 I'm not sure if this is a real name or not, so you can verify it, but ohwx works very
00:20:27 well and the token you pick is extremely important.
00:20:31 Because your training will begin from that token and you can inject a new token that
00:20:37 does not exist in the database, so everything you enter will become a token that it knows
00:20:44 they will get splint into.
00:20:46 Even if you generate a new keyword, such as SECourses, the model will not see this as
00:20:54 an SECourses.
00:20:55 How will it see it?
00:20:57 First it will look to S key, SE key.
00:21:01 So the SE key does exist, OK.
00:21:03 Then it will look sec.
00:21:05 So, yes, sec also exists.
00:21:10 And then it will look seco.
00:21:12 OK, there is no seco, so it will get split into sec.
00:21:16 And then it will be like it will check the other characters, the remaining characters,
00:21:23 so they will all get split into yes our SECourses will probably become sec our ses or something
00:21:33 like that.
00:21:34 You see, you are understanding.
00:21:35 I am hoping that.
00:21:37 So the keyword you enter will get split into tokens, no matter what you enter.
00:21:44 Therefore, we are picking a single token that is very rare from this list and I have done
00:21:51 many tests.
00:21:52 So ohwx is working very well and then we need to enter the class of the subject we are going
00:21:58 to teach.
00:21:59 What am I going to teach?
00:22:00 I am going to teach the face of me.
00:22:03 So it's the face of man.
00:22:04 Therefore, I am just entering man.
00:22:07 So this is really important.
00:22:09 It will use the underlying knowledge of man in the model to learn my face.
00:22:15 Class prompt: now, as I said, this will be used to keep sanity of our model and prevent
00:22:21 overtraining.
00:22:22 When you also hover it, it says: read me for more info.
00:22:27 I wonder if they added into the wiki yet.
00:22:31 In the basics perhaps?
00:22:33 OK, in the wiki, in the basics they have a small explanation.
00:22:38 A class specific prior preservation loss is also introduced to prevent overfitting and
00:22:44 encourage the generation of diverse instances of the same class.
00:22:49 They have made an example like this.
00:22:51 So in class prompt I am going to enter photo of man.
00:22:55 OK, you see, these two are same and the sample prompt.
00:23:00 This will be used to generate preview images during the training so we will be able to
00:23:04 see how the training is going on and if it is becoming too overtrained or not.
00:23:11 So in here I am going to enter photo of ohwx man.
00:23:15 OK, I am not entering any negative prompts and I'm not using any sample prompt template.
00:23:22 So these are more, let's say, advanced things that you can also play with them after you
00:23:28 learned the basics.
00:23:30 And in here, class images per instance.
00:23:32 In the community it is usually said that have minimal 300 images total.
00:23:39 In the official paper of the DreamBooth.
00:23:42 Which is here.
00:23:45 I will also put the link of this paper to the description.
00:23:48 They have used 200 classification images.
00:23:53 I have made some tests but I can't say for sure how much minimum is necessary.
00:23:59 So I am just going to follow the community and to reach the 300 images I need to enter,
00:24:06 let's easily calculate 300 divided by the number of training images.
00:24:12 I have 12, so 25..
00:24:14 You can also calculate like this.
00:24:16 So classification, CVG scale.
00:24:18 This is same as text2images CFG scale how many, how much CFG scale you want to use for
00:24:25 generating classification images?
00:24:27 By the way, you can also use text2image tab to generate your classification images.
00:24:32 Put them into the folder that we set here.
00:24:35 Then the extension will not generate any new images.
00:24:40 It is up to you.
00:24:41 You can use the both ways, but if you use this way, it will also generate a text description
00:24:47 file same as the image name, and it will put the description you have typed here inside.
00:24:54 That I will show in a moment.
00:24:56 Classification steps.
00:24:58 So this is the number of steps equal to the in here.
00:25:01 Sampling steps: OK, and number of samples to generate.
00:25:05 So this is the number of samples that we want to be generated during the training to see
00:25:10 how the training is going on.
00:25:12 You can set this to 1, 2, 3, 4, whatever you want.
00:25:15 Sample seed -1.
00:25:17 It means that the every image generated for samples will be different random with a random
00:25:23 seed and the samples CFG scale 7.5.
00:25:27 You don't need to change this.
00:25:30 These are just same as the text2image.
00:25:32 You will make sense of it after you get used to text to image.
00:25:36 OK, and now let's return back here: How many images we want to generate for classification
00:25:45 at the same time in parallel.
00:25:47 So I have 12 GB VRAM memory.
00:25:50 Therefore, I am able to generate 10 images as a batches, so it will take lesser time
00:25:56 to generate classification images.
00:25:58 By the way, you only need to generate classification images one time for each class prompt.
00:26:06 So if you don't change photo of man, if you don't change your subject class, then you
00:26:11 you don't need to generate them once again.
00:26:14 So for showing you, I will just set this as five and you will understand.
00:26:20 It will generate images, five and five as batches.
00:26:24 OK.
00:26:25 And one more thing: you can teach up to three concept at a time to the model.
00:26:34 So the first concept is is: let's say it's me, and in here I can also teach my wife picture
00:26:42 for example.
00:26:43 It can be like wife DB.
00:26:45 So another folder and its classification data set can be exactly same as the other one,
00:26:52 or no, it wouldn't be, because it would be related to women.
00:26:55 Since it will be a woman, not man.
00:26:59 Therefore, let's say woman images, and in here you need to use another keyword for that.
00:27:06 So it is important to find a rare keyword from this list.
00:27:13 I don't know which ones are very rare, but a ske is commonly used at another prompt.
00:27:21 So it can be like a ske woman.
00:27:26 And in here it will be a photo of women and sample will be a photo of a ske woman.
00:27:38 OK, and the rest is same.
00:27:40 And you can also add another concept here.
00:27:43 But the only thing that matters is the class of the another subject, If it.
00:27:50 If it is a cat or a dog or a tree, whatever you are teaching the class and instance prompt
00:27:59 so that you can differently call them and you can use both of them in a single picture.
00:28:05 For example, you can generate pictures of your wife and yourself in the same picture,
00:28:09 or your dog and yourself in same picture.
00:28:12 But for this tutorial I am not going to teach multiple concepts, so it is up to you to teach
00:28:18 or not.
00:28:19 I will just teach a single concept.
00:28:23 All right.
00:28:24 Now we are moving to saving tab.
00:28:27 In here you can enter a custom model name for saving checkpoints and LoRA models.
00:28:33 You can check out the half-model.
00:28:34 They say that it doesn't decrease the quality, but the checkpoints are smaller.
00:28:41 I didn't test it so I can't say if it is 100 percent correct or not.
00:28:46 So to keep the quality in max, I won't check it.
00:28:51 Save checkpoints to sub directory.
00:28:53 You should make this.
00:28:55 You should check this checkbox so that the savings will be under Web UI ohwx.
00:29:01 They won't get in the same directory.
00:29:04 Now this is important to set.
00:29:06 Generate a ckpt file when saving during training.
00:29:10 If you don't check this, then let's say you won't be able to test, load back and test
00:29:17 the model at the 20 epoch or 40 epoch or 60 epoch.
00:29:22 So you should check this out.
00:29:24 You can also continue from that point using that as a base mode.
00:29:27 And you can also load that model and you can do test inference on that.
00:29:35 So this is important, but this will increase your hard drive usage.
00:29:40 Be careful with that.
00:29:41 Generate a ckpt file when training completes.
00:29:43 Yes.
00:29:44 Generate a ckpt file when training is canceled.
00:29:46 I'm not checking this because when I cancel I don't want it to generate a ckpt.
00:29:53 After canceling you can just load the model and click ckpt and it will generate a ckpt
00:29:58 file from the last saved weights.
00:30:00 Now weights.
00:30:02 You see there is also option to save separate diffuser snapshots when saving during training.
00:30:08 This option will generate weight files, like you see here.
00:30:14 So for demonstration purposes I will also select this from later point you can just
00:30:21 make them as a new model folder and then you can continue your training from there.
00:30:27 Alternatively, I believe you can generate a new model from your saved ckpt file as a
00:30:35 new source checkpoint and you can continue from that saved checkpoint ckpt file.
00:30:41 I think both should be same.
00:30:44 OK.
00:30:45 After you did settings, just click save settings.
00:30:48 When you click train, I think it is automatically also saving.
00:30:51 Now I will generate the class images before starting training.
00:30:56 This will use the settings that I did set in these options.
00:31:02 And let's see what kind of class images we are going to get.
00:31:05 OK, so you see, it is generating 300 class images for training.
00:31:10 Why?
00:31:11 Because currently I have no images in here, but, as you can see, it is not working right
00:31:18 now.
00:31:19 So there is a mistake.
00:31:20 Obviously, To solve this mistake, I will just restart the application.
00:31:25 OK, restart is completed.
00:31:28 Let's refresh.
00:31:29 Go back to our extensions tab.
00:31:32 Check for updates.
00:31:33 If there is an update.
00:31:34 Yes, there is a new update during the video.
00:31:38 The updates are coming, So let's just refresh.
00:31:41 OK, refreshed.
00:31:43 Let's go back to extensions.
00:31:44 Check for updates.
00:31:45 OK, we are at the last.
00:31:46 Then let's go to DreamBooth, select our model load settings.
00:31:52 Go to the generate.
00:31:54 Before generating, I will delete these incorrect images first.
00:31:59 Let me do that.
00:32:00 Go to the pictures and in here, go to the web UI tutorial.
00:32:07 Ctrl-a shift-delete.
00:32:08 Yes, all deleted.
00:32:11 And just click generate class images.
00:32:13 OK, let's see if any error again.
00:32:16 OK, OK, I think error continues.
00:32:20 So instead of these methods, I will use txt2image tab to generate images.
00:32:27 The only difference between these and using text to image is: let me show you.
00:32:33 Meanwhile, just let's restart the application.
00:32:37 When you use, generate images like this will also generate a text file, same name as the
00:32:44 image name, and inside it it will write photo of man as a description.
00:32:51 So this is useful when you do [filewords] training or when you do LoRA training, But
00:33:00 for now it is not necessary for us.
00:33:04 I just reported this bug also to the developer, so I believe it will get fixed really quickly.
00:33:11 OK, so we are going to generate our class images from here.
00:33:19 Classification images: photo of man.
00:33:21 I'm just typing that setting the sampling steps counts 40, setting CFG: 7.5.
00:33:26 So this batch size means that processing multiple images at the same epoch.
00:33:35 It will use more GPU RAM, but it will make it faster.
00:33:39 And how many I need?
00:33:40 I need 300.
00:33:41 Therefore, I am going to set this as 38, like this, and then just click generate.
00:33:50 So now it will generate images.
00:33:52 But make sure that the selected model here you see, is same as the model that you used
00:34:00 to generate your model.
00:34:02 So in here, when you select your model for training, it shows the base model source checkpoint.
00:34:08 You see Stable Diffusion 1.5 pruned, and currently I am generating same images from this model.
00:34:14 So the generated images will be saved in text to image folder.
00:34:20 Let's open it by clicking here.
00:34:22 OK, when I have clicked open folder in here.
00:34:27 It didn't open because it says in the CMD window: text to image images does not exist.
00:34:34 After you create an image.
00:34:35 It will be generated because, as I said, this is a fresh installation to demonstrate you.
00:34:40 Therefore, all of my settings here are also default.
00:34:42 I didn't change any of them.
00:34:46 And there is one another thing that I want to mention.
00:34:48 In the DreamBooth model selection You will see in the SD 1.x versions they has they have
00:34:56 EMA or not.
00:34:57 So if they have EMA, it will increase your further training, fine tuning the model.
00:35:05 So you should pick EMA version having models.
00:35:09 It only exists in the 1.x versions.
00:35:11 I think in the SD 2.0.
00:35:14 In the 2.1 there is no model released with has EMA features.
00:35:20 OK, the first batch has been completed.
00:35:22 Let's open the folder.
00:35:23 Now the folder is opened.
00:35:26 So these are photo of man.
00:35:28 You see, there will be very weird images, bad quality images, but they don't matter
00:35:33 much.
00:35:34 They are not very important as long as they are generated by our checkpoint model.
00:35:40 OK, after all of the images have been generated, just select them all with control-C, then
00:35:48 go back to your folder where you want to get them saved web UI tutorial.
00:35:54 I am just going to copy paste them in the folder.
00:35:58 OK, let's return back to our DreamBooth and load settings.
00:36:04 So now we have the sufficient amount of classification images.
00:36:09 Now we are ready to click start training.
00:36:12 OK, when we start training, it will first start by caching them out.
00:36:19 We will see that.
00:36:23 So you see, it says that it has found 300 regularization images.
00:36:28 Therefore, it is not going to generate any more images.
00:36:32 Currently it is caching them.
00:36:35 OK, after the caching has been completed, you will see the training has been started.
00:36:42 It is progressing step by step.
00:36:45 You see 13, 14.
00:36:48 If you get out of memory error, then you need to try further decreasing memory usage.
00:36:55 All of the low memory settings and high memory settings are stated in the wiki.
00:37:01 I will put this into the description.
00:37:03 Also, you are seeing right now.
00:37:05 High batch size, set gradients.
00:37:08 These will increase your memory usage and these will decrease your memory usage.
00:37:13 There is not much else things that you can do, and one another thing is that the developers
00:37:18 are constantly trying to optimize and improve the extension to reduce memory usage.
00:37:26 So therefore, when you watch this video, or maybe one month later, you, your card, could
00:37:33 perhaps do use DreamBooth training.
00:37:37 So that's another possibility.
00:37:41 And after how many steps we are going to see our first sample images?
00:37:45 We can calculate it easily.
00:37:47 In the settings tab.
00:37:49 We did set as 10 epoch and how many training images we have.
00:37:53 We have 12, you see in here.
00:37:56 Therefore, after 120 steps we are going to see our first sample training sample images.
00:38:05 Actually, on, after 120 steps, it will save the checkpoint.
00:38:12 After 60 steps, because we did set 5 epochs.
00:38:15 We are going to see the first sample image and 60 steps has been completed.
00:38:20 So it is generating preview images at the step 60.
00:38:24 Ok, the first samples have been generated.
00:38:28 Let's open the samples folder.
00:38:29 So where they were saved, they were saved under our model.
00:38:34 Let me show: Ok, I have so many same tabs.
00:38:41 Ok, inside our installation folder, go to the models and in here go to the DreamBooth
00:38:47 and in here you see the same name as our training model name.
00:38:50 Enter there.
00:38:52 In here you will see samples.
00:38:53 When you click here you will see the samples.
00:38:56 So the first sample is generated with this sample prompt with ohwx man.
00:39:05 So this is our class and this is the unique instance prompt we have set.
00:39:09 Ok, so there is another image.
00:39:11 You see.
00:39:12 This is generated with photo of ohwx man by Tomer Hanuka.
00:39:19 Why did I set this and where did I set this?
00:39:22 I did set this in here.
00:39:24 If you remember, The second prompt you see in here with name it as one, is the sanity
00:39:35 sample prompt.
00:39:38 The number here is the step count that it has been generated, and this is the other
00:39:43 thing is the prompt used to generate it.
00:39:47 After we progress in the training, you will understand why we are using this.
00:39:54 As much as this image looks like us, with a different style, it means that our model
00:40:00 is learning good, and when it becomes exactly like us, not styled like this, that would
00:40:07 mean that our model is overtrained and now we can't apply styles.
00:40:12 Our aim is learning our teaching our shape, but not overtraining it, not distributing,
00:40:22 disturbing the underlying context, the knowledge of it, not overriding it completely.
00:40:29 So after we progress in the training, we will understand better.
00:40:32 Okay, now let me explain to you to how to prepare your training dataset images.
00:40:40 What is important with the selection of the images?
00:40:45 What we want to teach is the subject that we want to teach.
00:40:50 The most important part.
00:40:52 I want to teach my face.
00:40:54 Therefore, other than my face, everything must be different, or, let's say, should be
00:40:59 different in each of the images.
00:41:01 So, other than face, what can be different?
00:41:03 My clothes and the background can be different.
00:41:07 So if you are teaching your face other than your face, all of the backgrounds and the
00:41:14 clothes should be different as much as possible.
00:41:17 As you can see in my pictures, I have made sure that all of the backgrounds and the clothes
00:41:23 are different or the clothes are not visible.
00:41:27 So if you make your clothes different and your backgrounds are different, then the model
00:41:34 will learn your face, not your clothes or not the backgrounds.
00:41:37 That is what we want.
00:41:38 We want to teach our face, not the other things in the pictures.
00:41:42 If you use same clothes, then the model will not say that this is the face and this is
00:41:49 the clothes and the model will learn both of them at the same time and it will reduce
00:41:53 your stylizing your face.
00:41:57 Therefore, the key point of preparing training images is having different things other than
00:42:04 the subject.
00:42:05 So if the subject is face, the other things must be different.
00:42:09 Also, you should have different angles of photos and different distances of photos.
00:42:18 It will make the model learn different angles and different distances to generate different
00:42:25 kinds of different styles, more variety of images.
00:42:30 So if you make your images, I can't say my data set is the best available data set.
00:42:37 You can expand your data set with more variety of images, more variety of poses, more variety
00:42:42 of angles, more variety of lightning.
00:42:46 Lightning also matters.
00:42:47 It would be better.
00:42:49 However, this is a small data set and I think it is working pretty decently.
00:42:55 But if you expand this data set, your training data set, with more variety, then it is better.
00:43:00 It will learn your face or subject in a more generalized matter and with that way we will
00:43:08 be able to produce different kind of different artistic images more easily.
00:43:14 Okay, so you see, currently it is compiling a checkpoint ckpt file and you can just load
00:43:21 the ckpt file directly and do inference on that checkpoint.
00:43:27 It is compiling checkpoint at the step 360, which is epoch 30, and so where are these
00:43:34 checkpoint files are located?
00:43:37 They are located on models inside inside our folder, and you see the ckpt file and the
00:43:45 yaml file is here.
00:43:48 If you don't know what are yaml files, just watch my how to use Stable Diffusion 2.1 and
00:43:55 different models in the web ui tutorial video.
00:43:59 I will put the link as usual, and let's check out our so far samples.
00:44:07 So in this image this is like me, but no other sample prompts are like us.
00:44:13 We just need to do more training.
00:44:15 And also in this screen you will see 5.5 or 3.7.
00:44:22 So this means that this is how many iterations per iteration is done in each second.
00:44:30 However, these values are not very correctly displayed, so there is also loss and this
00:44:37 lr is important.
00:44:38 This shows your learning rate.
00:44:40 So 2e-6, what does that mean?
00:44:43 That means that it is a number.
00:44:47 When you type it to the google 2e-6 and go to the first result, for example, it will
00:44:54 show you it is equal to this number.
00:44:57 Okay, so this is the number.
00:44:58 Actually we did set in our settings, in our learning rate, you see.
00:45:04 So this is equivalent of the scientific e-notation number.
00:45:10 If you set changing numbers from here you see there are changing numbers like polynomial,
00:45:17 constant or other things, learning rates then you will see different numbers in here and
00:45:25 it also shows the gpu usage.
00:45:27 However, this is also not very accurate.
00:45:30 It says that 9.5 gigabytes currently is being used.
00:45:34 Okay, okay, it has been 72, 82 epochs.
00:45:41 Now i will show you how you can continue training, if an error occurs.
00:45:46 So to illustrate that, i will just crash the application with closing here.
00:45:51 When you close from here, it won't save any checkpoint or anything.
00:45:55 Use the error connection error.
00:45:58 Then just restart the application and after the restart is done, just refresh your interface.
00:46:08 Go to the DreamBooth tab, select the model, click load settings it actually it will be
00:46:14 automatically loaded.
00:46:15 And then just click train.
00:46:17 It will continue from the last checkpoint, which is 80 epochs.
00:46:23 Let's wait.
00:46:25 Okay, you see it has.
00:46:28 It is continuing from wherever it is left, as you can see here.
00:46:34 Also, in the cmd window it shows first resume epoch, and first resume step, step, as you,
00:46:42 as you can see here.
00:46:43 Okay, we are over 168 epochs and we are already doing a lot of over training.
00:46:52 How do i know?
00:46:54 As i said you in the beginning, i have entered a sanity, sanity prompt.
00:47:03 So the samples numbered with, dash one are the sanity prompts.
00:47:09 And let's look at the sanity prompts changes.
00:47:12 So the sanity prompts started like this.
00:47:15 Then in here you see, the sanity prompt is resembling me and also here resembling me,
00:47:23 okay, resembling me somehow.
00:47:26 And after certain point, actually after 1368 steps, the sanity prompts become just like
00:47:37 me.
00:47:38 You see, it is not anymore styled okay, like this, like this and this is almost as like
00:47:45 me, and you see, they are not anymore styled like here.
00:47:50 Styling is completely gone and in here.
00:47:53 Therefore, now we are sure that we are doing over training.
00:47:59 So i am just going to stop training with cancel and i am going to use different checkpoints,
00:48:08 test them out to see how they are performing.
00:48:11 Now the hard part is coming: the prompting, the proper, the correct prompting to obtain
00:48:17 the good results.
00:48:19 So the training has been cancelled.
00:48:22 Let's look for the closest one.
00:48:25 I am refreshing here and in here, yes, this one looks like the closest one: 1308.
00:48:32 Then go to the text2image tab.
00:48:35 So how are we going to generate our own image?
00:48:39 We are going to use photo of.
00:48:41 These two keywords are also associated with us right now, but not as strong as our prompt
00:48:47 instance.
00:48:49 Ohwx and man.
00:48:50 Also man is very much associated, okay, so when we type like this and hit the generate
00:48:57 button, it will generate our own image.
00:49:00 Okay, the image is ready.
00:49:02 You see, it is like us and now we need to style it.
00:49:06 So let's add in this name style and let's see what kind of result we are going to get.
00:49:12 Okay, as you can see, we didn't get much of styling, so therefore, i am going to show
00:49:21 you an extension which is named as web ui prompt generator.
00:49:26 You can install it from available tab.
00:49:29 Just click load and in here just search for prompt and you will see prompt generator and
00:49:34 just click install and then just apply and restart the ui.
00:49:38 After that you will see prompt generator tab here.
00:49:41 So let's get some extra additional keywords from prompt generator and let's click generate.
00:49:47 Okay, there are a lot of results here, but, this came to me, could work like, so i copied
00:49:55 it and pasted it in here and let's see the result we are going to get.
00:50:00 Okay, we got somewhat decent results, but it is still not very much like us.
00:50:06 Therefore, we need to increase the prompt strength.
00:50:10 So what is prompt strength?
00:50:12 prompt attention.
00:50:13 This is from the official wiki of the Automatic1111.
00:50:17 So if you want to increase attention to a word by factor of 1.1, you can take the word
00:50:24 inside one parentheses.
00:50:26 If you want to increase the attention even more by factor of 1.2, 21, so you can just
00:50:34 put like this: alternatively, you can use an easier way, which will be: let me show
00:50:40 me, let me also zoom in, just type like this: okay, so this will increase the attention.
00:50:47 This will force model to generate image that is more like us and it will going to ignore
00:50:56 the rest.
00:50:57 Also, in this prompt there are so many things that would be unrelated to disney style.
00:51:05 So what would be related to disney style, for example, CGI, and let's also add some
00:51:13 other keywords.
00:51:15 Okay, here are results.
00:51:17 Not very much like us and not very good quality.
00:51:21 We need to improve the prompt with adding some negative prompts as well.
00:51:28 Okay, here i have added some negative prompts and now you see we have a much better artwork,
00:51:35 but still not very much resembling to me.
00:51:39 So i am going to try another prompt with also increasing, the, the emphasis of our unique
00:51:47 keyword, which is ohwx and the man.
00:51:51 In every prompt you must have ohwx man with some increased strength, probably to get your
00:51:58 own face, and also adding photo of.
00:52:01 Why?
00:52:02 Because during the training we have used class prompt as photo of man.
00:52:09 Therefore, now these three keywords are also associated with us, but the most association
00:52:15 is coming from ohwx, okay.
00:52:18 Okay, so i am going to try with emphasis of 1.5 and a new prompt like this.
00:52:27 Let's see the results.
00:52:28 Okay, we got an image that is not very stylized.
00:52:32 Therefore, we need to increase CFG.
00:52:35 So what is CFG?
00:52:36 CFG is classifier free guidance scale how strongly the image should conform the prompt.
00:52:43 Lower values produce more creative results.
00:52:45 We want the model to obey our prompt because we are providing a very detailed prompt.
00:52:54 Therefore, we need to increase scale and try it.
00:52:59 So i will show you how you can try multiple scale values.
00:53:04 Go to the bottom on here and go to the x/y plot.
00:53:09 So in the x/y plot there are x and y values.
00:53:14 Currently we only need x value.
00:53:15 In the x value i am going to select CFG scale and in here i am just typing seven, eight,
00:53:22 nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, and i wanted to use same
00:53:29 seed for all of the input so that i can see the changes, and i will generate four images
00:53:37 in each iteration, in each step.
00:53:42 My graphic card is able to process four images.
00:53:45 If you don't have much vram, you can't do that.
00:53:48 Then you should increase this.
00:53:50 Okay, if you check this, keep minus one for seeds.
00:53:54 Then each image in the each generation would be different.
00:53:58 However, i want to see the difference of CFG effect in a legend.
00:54:04 Therefore, i'm keeping it like this and then just click generate.
00:54:09 So currently, in the CMD window, actually it is generating four images at each epoch.
00:54:19 So you see, in the 20 steps, actually it is processing 80 steps.
00:54:23 So four of them is being parallelly processed, since i did set batch size to four.
00:54:32 Okay, CFG images.
00:54:33 Different CFG images have been generated.
00:54:35 I have modified the input because the previous input was not very good.
00:54:40 Actually, it turns out that.
00:54:42 But it is not important, because when you are working with Stable Diffusion, you have
00:54:47 to make, you have to generate a lot of images to find out the good ones that you, you would
00:54:55 like to obtain.
00:54:57 So let's look at, look at the effect of the CFG.
00:55:01 So this is our seed value.
00:55:04 If you use this seed value, you will always generate similar images in each generation,
00:55:11 as long as you keep the same value, same model.
00:55:16 So this is the CFG scale seven.
00:55:18 At the CFG scale seven, there is not much resemblance.
00:55:23 At the CFG scale eight, a little bit resemblance.
00:55:28 Look at how the images are changing.
00:55:31 This is CFG scale nine.
00:55:33 There is some resemblance in these two.
00:55:36 Okay, and if in the CFG scale 10.
00:55:39 Now, this is also some resembles and in here, okay, you see, resembles is increasing and
00:55:47 in the CFG scale 14 actually, there is really good resemblance in this image and in this
00:55:53 image actually, and so it goes, and after certain CFG scale it becomes, i think the
00:56:00 quality starts to be decreasing.
00:56:03 So therefore the CFG scale makes difference.
00:56:06 Now let's say you want to test out different artists, styles with different CFG scales.
00:56:14 How can you do that?
00:56:17 I am putting here a special keyword that i am going to use by replace kw.
00:56:23 Okay, then the rest is anything you want, and in the bottom so this time i am going
00:56:30 to select prompt sr.
00:56:33 Okay, so the prompt sr works as separate a list of words with commas, and the first word
00:56:39 will be used as a keyword.
00:56:40 Script will search for this word in the prompt and replace it with others.
00:56:45 So these keywords will be replaced whatever i type here.
00:56:49 So let's say wlob and then artgerm and then whatever other artists that you want to test.
00:56:57 Okay, i have added two more artists, so we have four artists.
00:57:02 Let's also test 4 CFG values 10, 11, 12 and 13.
00:57:10 Perhaps let's start from 11, okay, and let's keep seeds for minus one, but that time we
00:57:23 we couldn't test the CFG or the style.
00:57:27 Therefore, let's keep the same seed, okay, and you see, there are restore faces, tiling
00:57:34 and high res fix, so you could also pick them to improve your output, but that would take
00:57:40 extra time and you can do them in the extras tab which i will show.
00:57:45 And the batch count is one and batch size is four.
00:57:48 Let's see what kind of results we are going to get.
00:57:50 By the way, these other keywords will also heavily affect the artist style.
00:57:56 Therefore, if you want to only check out the artist style, then you should reduce number
00:58:03 of extra keywords here and let's see what we are going to get.
00:58:08 Okay, i did get runtime error.
00:58:11 Why?
00:58:12 Because i have forgotten to put this keyword in here.
00:58:16 The first keyword has to be that.
00:58:18 Now i need to run again.
00:58:20 Okay, now the generation started.
00:58:23 You should always, check out the CMD window and what is happening here.
00:58:28 If you get an error, then you should fix it, obviously.
00:58:32 Okay, this is the kind of tile that we are going to get.
00:58:35 Actually, it is pretty useful.
00:58:37 So, you see, in the top CFG scale and in the left we got the art style, by the way: It
00:58:45 also produces results with: replacekw and not much like representing the style or me.
00:58:56 Therefore, perhaps we can remove many of the keywords that would take away the style like:
00:59:07 let me do.
00:59:10 Okay this time we have more kind of styling, as you can see here: this is the default,
00:59:17 this is wlobe, this is artgerm.
00:59:21 This is Robert S Duncanson and this is Karol Bak.
00:59:25 Especially Karol Bak style is pretty different and significant, as you can see.
00:59:31 So the key point here is is, with Stable Diffusion, that you have to generate a lot of images,
00:59:38 and some of them will be very, very good and maybe majority of them will not be good and
00:59:44 useful.
00:59:45 This is the nature of the AI based art generation, especially if you are trying to generate art
00:59:55 based on your subject, a new subject, and also when we were doing training in here,
01:00:04 you can use more classification images.
01:00:08 That can help.
01:00:09 I said that the community is using 300 total, but that is not a hard limit.
01:00:18 You can just use 200 images for per training image and that may help you to improve your
01:00:25 style.
01:00:26 Actually, it is also the number used in the official paper, as i said.
01:00:30 So it is up to you.
01:00:31 You have to do experimenting.
01:00:33 The numbers and the quality you get also totally will depend on your training data set.
01:00:39 If you get a much variety having a training data set, as i have explained, then your model
01:00:47 can learn much better.
01:00:49 I will show you one another thing here.
01:00:50 There is a prompt matrix that will generate combination of the images.
01:00:57 Okay, so when you type your query like this and select it from matrix, this query will
01:01:05 become face photo of ohwx man:1.3, like this.
01:01:10 And then they will get combined by like this.
01:01:15 So this will be generate all of the combinations of the written text separated with the let
01:01:24 me tell you once again, vertical pipe character.
01:01:28 It will generate all of these keywords combination like this.
01:01:32 Okay, i will show one another thing.
01:01:34 Let's say you are going to sleep and you want your computer to generate many different style
01:01:40 of images for you during your sleep.
01:01:44 For that i will show you an easy way to do it.
01:01:49 So our first our first prompt is face photo of ohwx and let's say 1.4.
01:01:58 Then let's add some certain keywords to get some certain kind of prompt.
01:02:06 Okay, i have typed like this and generated 20 input like this, then it has generated
01:02:14 me a lot of results.
01:02:15 I am going to copy all of this into a notepad file, paste it so you see they are actually
01:02:23 copied as one line each one.
01:02:26 Then i will generate several more.
01:02:28 Okay, i keep copy, pasting the newly generated input to there.
01:02:35 Okay, now i have 60 lines of inputs like this.
01:02:40 I am going to save it as.
01:02:45 Let's go to the pictures and nightly prompts.
01:02:49 Okay, then go back to text2img tab and in here select prompts from file or text box.
01:03:00 You can paste all of them here or you can upload them from here.
01:03:05 So i will upload them, from the text box, from the text file, and they are all uploaded.
01:03:13 I am going to say, use random seed for all lines, because i want to get as many as possibly
01:03:19 different results.
01:03:22 And then i want to generate how many images you want to generate for each one.
01:03:29 I want to generate, let's say, eight images in parallel, and currently they will use the
01:03:36 CFG value I am going to set here 14.
01:03:39 So with 60 and 8 images batch size, we are going to get 480 images.
01:03:48 Let's say you want to generate 4000 or whatever you want.
01:03:53 So if i set this 20, we are going to get exactly 20 times multiplied by 8 and multiplied by
01:04:02 the number of lines we have.
01:04:05 9600 images during the night with a lot of different inputs, variation, and among them
01:04:13 you can pick whatever you want and use it as you want.
01:04:17 This is one of the options that you can.
01:04:20 Okay, after i click it, it started generating images.
01:04:25 For example, generated this one, and if you wonder what is this image, you go to the png
01:04:31 info and then you can just go to get the image, drag and drop it in here and it will show
01:04:39 you all of the parameters it has.
01:04:42 So this is the prompt input and this is the negative prompt input it has and the number
01:04:47 of steps used.
01:04:48 The sampler used, the CFG scale used, the seed, so with this seed you can repeat this
01:04:55 image generated.
01:04:57 You can use this seed and change the CFG value and generate other variations of this and
01:05:02 the size and the model hash.
01:05:04 The model hash, of course, will change since, we are using our custom trained model.
01:05:10 The batch size and the batch position, so this is also important.
01:05:15 To exactly get this, you need to generate a against batch size as 8, and the sixth position
01:05:23 will be this one.
01:05:24 If you use this seed and this CFG value and this sampler.
01:05:28 We are getting some decent photos, and i will leave it to run during my sleep and tomorrow
01:05:34 i will show you, of course, in a moment for you.
01:05:38 We are going to see what kind of good images we got.
01:05:43 Okay, here you see, some of the images i have generated during my sleep.
01:05:49 They are pretty good quality, but they are very similar.
01:05:52 Why?
01:05:53 Because it appears that the inputs i have used to generate them were not much different.
01:05:59 However, some of them are really high quality.
01:06:02 For example, this image: you see, it has almost perfect eyes, perfect shape.
01:06:07 It's a really good quality image.
01:06:10 So your training data set and the keywords, the prompts you use, will hundred percent
01:06:16 affect the outcome that you are going to get, and you really need to stylize your prompt
01:06:23 according to what you want to get.
01:06:25 Now let me show you a few of the prompts used for generating these images.
01:06:30 To doing that, i am going png info, okay, and then i will drag and drop.
01:06:36 For example, let's first see a 3d like image.
01:06:41 Okay, and you see this used blender, zbrush, autodesk maya, unreal engine, colored, because
01:06:51 if you want to generate a 3d like image then you need to use these kind of keywords.
01:06:57 Then you can send these to.
01:06:59 For example, let's go to the extras tab.
01:07:02 In extras tab i can upscale this image to get it a bigger size.
01:07:08 After my testing i have found that R-ESRGAN 4x+ works best.
01:07:14 There is also anime version.
01:07:18 Also, LDSR is working very good, but this requires a lot of gpu memory.
01:07:25 So when i click generate, when the first time you generate it, it is going to download the
01:07:31 model that is necessary for R-ESRGAN 4x+.
01:07:35 You can see here and now we will see the upscaled image.
01:07:41 So this is the upscaled image.
01:07:43 The upscale and the original will not be exactly same, but let's compare them okay.
01:07:48 Let's make them, not zoomed in, okay.
01:07:54 So you see, both of these are, really similar.
01:07:59 A little bit loss of quality.
01:08:02 Let's also try with the anime version.
01:08:10 Okay, now we got anime version.
01:08:14 So let's say, you want to make your images like anime, then you can use that.
01:08:19 This is extremely useful.
01:08:23 You can also upscale entire folder.
01:08:26 For example, i will just ctrl a select all, then i will drag and drop them here.
01:08:32 All of them is now here.
01:08:33 Now i can upscale all of them at once.
01:08:37 Let me show.
01:08:38 During the operation you will see they are getting tiled like this to generate bigger
01:08:45 size images.
01:08:47 The results of upscaling the extras tab actually will be inside another folder.
01:08:52 When i click it, you will see they are getting here and all of these images are now upscaled.
01:08:58 For example, let's open this: this is a pixar style image, actually.
01:09:04 Okay, this is another pixar style image, so, for example, this is also another pixar style
01:09:13 image, as you can see.
01:09:17 I have trained these on the Google Colab and now i will show you how you can upload your
01:09:23 model to the Google Colab and generate images there with faster than probably your gpu,
01:09:31 because the Google Colab gpu is really strong, able to process a lot of images at once in
01:09:40 a parallel way.
01:09:41 Okay, you see, all of these are getting upscaled, okay, um, let's see some of them like this,
01:09:52 as you can see, okay, okay.
01:10:02 Now i will show one another cool thing.
01:10:06 Usually you may not get very good looking eyes or some errors in the face, and there
01:10:13 is a very good way to improve the eyes or the overall structure of the face.
01:10:20 It uses another AI model and let's try this image improving.
01:10:26 Usually, the my images were really good eyes.
01:10:31 Okay, to test it.
01:10:33 I am just going to not upscale, but i am going to use GFPGAN.
01:10:38 So this GFPGAN is a model to improve the eyes.
01:10:42 Let's test it.
01:10:43 When the first time you use it, it will download the necessary model.
01:10:48 Okay, now let's compare the result.
01:10:50 This is the original image and this is the fixed image.
01:10:53 Now let's also apply an upscale, okay.
01:10:59 Okay, after applying upscale and applying a GFPGAN, you see it is now looking much better
01:11:07 in terms of quality correctness.
01:11:09 This will seriously improve the eyes.
01:11:13 Let's open them like this.
01:11:14 Okay, let's zoom in.
01:11:16 So you see the difference is huge: much better quality, styling.
01:11:22 You can apply this to your generated images as a batch as well.
01:11:26 Just go to batch process and select the options from here and it will do everything.
01:11:31 You can also try these other options.
01:11:33 I didn't find them very useful actually, and there is also not a description to them.
01:11:39 Okay, now i will show you how you can continue training from any checkpoint that you did
01:11:44 set.
01:11:45 Just go to the search checkpoint and you will see your saved checkpoints here, by the way,
01:11:49 to get them saved in the saving, you need to check this generator ckpt file when saving
01:11:57 during checkpoint and then, if you generate a new model from that checkpoint, you will
01:12:03 basically continue training from that certain checkpoint.
01:12:08 Now i will show you how you can use these ckpt files directly in a Google Colab.
01:12:16 If you have watched my previous video about transform yourself into a stunning ai avatar,
01:12:22 this tutorial is how to do training on a Google Colab and, everything is explained there to
01:12:30 use your ckpt file in a Google Colab.
01:12:34 It is so, so easy.
01:12:35 First we are going to generate a new model from the our wanted checkpoint.
01:12:41 That let's say i want to use step 1380 as a checkpoint.
01:12:47 Then i am giving it a name as a Colab image.
01:12:52 Okay, and nothing else.
01:12:54 Just click create model.
01:12:56 Okay, it has generated a, generated a new model for the Colab image and inside working
01:13:02 directory you just need to upload this into google google drive and then just give its
01:13:10 path.
01:13:11 So for i will say that, my image.
01:13:15 Okay.
01:13:17 Let's say, let's also add our keyword to that and let's move them inside here and then go
01:13:26 to your drive folder like this, where you are running your DreamBooth or the Stable
01:13:34 Diffusion, then drag and drop this directory here.
01:13:41 It will upload all of the files, as you can see in here.
01:13:45 Once the upload is completed, all we need to do is changing model path in the inference
01:13:52 tab of the Google Colab notebook.
01:13:55 This is linked in the description of the tutorial.
01:13:59 So you need to change it like this: content: drive my drive, and in here my drive image
01:14:06 ohwx, which is the folder name that i have given and i am uploading to the main folder
01:14:13 of my Google Drive.
01:14:15 Then, in the Google Colab, you will be able to use your trained ckpt file right away.
01:14:21 So what if?
01:14:23 If you want to teach another face?
01:14:27 Just generate a new model like this and this time, in the concepts folder, set the directory
01:14:34 and the classification directory for your new subject.
01:14:38 However, be careful with something.
01:14:41 Currently, my model is trained with ohwx man as an instance prompt and photo of man as
01:14:48 class prompt.
01:14:50 So if i am going to teach another, a person, a male, then i have to pick another keyword,
01:14:56 for example ske or another rare keyword, and um, it will teach this man into the model
01:15:06 as well.
01:15:07 So we will be able to use both of them.
01:15:09 However, probably you will get mixed results because man keyword were already taught for
01:15:17 my own image and when i introduce another man image they will get mixed.
01:15:24 So it could be a problem, but you can try it.
01:15:27 Test it and if you generate sufficient of images, then i think you will.
01:15:31 You can obtain still good results.
01:15:34 However, if you inject some another class, like a woman, then it shouldn't be much problem
01:15:42 and you should be able to teach multiple different subjects easily.
01:15:47 Now i will explain more advanced stuff.
01:15:50 For example, the directories, data set directory.
01:15:54 Okay to be able to use [filewords], you need to have a training data set named like this:
01:16:02 okay.
01:16:03 So for each image, you are also going to have a text file with the same name.
01:16:08 The extension will be txt, like this, and, you need to write the description of that
01:16:13 file properly.
01:16:16 There is a new AI model for captioning images.
01:16:20 This is not implemented to Automatic1111 yet, but i will it will be.
01:16:25 I will put the link of this into the description.
01:16:28 You can also locally run this.
01:16:31 And if you don't know how to locally run run this, then you need to watch our this video
01:16:38 on our channel.
01:16:39 In this video, i am explaining how to locally run HuggingFace files.
01:16:45 Okay, and i will just use the online demo right now because it is not very much used.
01:16:51 So, first image, i will just drag and drop here.
01:16:56 Sorry about that.
01:16:58 Okay, like this: and click submit.
01:17:01 It will generate the description for this image.
01:17:03 You see, you should use the caption generated by GIT large.
01:17:08 This is the best one.
01:17:09 A man with dark hair and glasses is smiling.
01:17:13 Okay, so let's just change this text text.
01:17:18 Text description, like this.
01:17:20 However, there is one key issue: you have to have your class for this image inside this
01:17:27 description.
01:17:28 So my class is man and therefore it is there.
01:17:31 Okay, let's go.
01:17:32 Then.
01:17:33 This is another image that we want to caption, so let's submit it.
01:17:40 Okay, and then another image description is here.
01:17:44 Let's open the description: a cat with long whiskers looking at the camera.
01:17:50 And this is the class of cat, and it is inside here as well.
01:17:54 Yes, correct, and the rest will be for dog as well.
01:17:58 Now for classification images.
01:18:01 You need to do the same.
01:18:03 When you generate classification, you also need to have classification image and its
01:18:08 description.
01:18:10 Let's say: this is my classification image and it is it is generated with photo of man.
01:18:16 Therefore, i need to generate a same file description like this and inside here i need
01:18:23 to type photo of man.
01:18:26 When this tab get fixed, let me show you maybe it is already fixed, i am not sure.
01:18:33 In here, you see, we have generate class images and when you use that feature, it will be
01:18:40 able to: let's try it, actually okay.
01:18:44 And let's yeah, it doesn't matter, okay.
01:18:48 And when we type class prompt here photo of man, i think it will generate with it.
01:18:56 Let's try it, okay, it is not working.
01:19:01 It says maybe say okay, it's still not working.
01:19:06 When this become working, then you can easily generate it.
01:19:10 Or you need to generate the description like this: photo of man, and it will generate images
01:19:17 like that, or photo of cat or photo of dog.
01:19:20 So this will be your classification directory with description like this and this will be
01:19:27 your classification directory with naming like this.
01:19:29 With this way, you can teach multiple subjects in the one run and you can also possibly improve
01:19:37 your training quality if you provide a better description with defining more things.
01:19:45 By the way, when defining, you should specify your subject in the description what you want
01:19:53 to teach.
01:19:54 If you want to teach face, then you should describe the face in mostly.
01:19:58 Okay, and one another thing: okay, once you prepared your folders.
01:20:05 Now here the way to do it.
01:20:08 First of all, we are defining the data set directory as usual.
01:20:14 Okay, let's set it.
01:20:16 And let's also set the classification directory like this.
01:20:21 And in [filewords], we need to use defining prompt instance.
01:20:29 Okay, this will be used to define it.
01:20:35 It has to be a single word.
01:20:37 Therefore, i am entering ohwx and the class token.
01:20:41 This will be also a single word.
01:20:45 By the way, it won't be very precise actually if you use this way, class token.
01:20:55 But yeah, looks like if you teach multiple different classes, then you may not get very
01:21:02 good performance, for example, teaching a cat, a face, a cat, a dog and a man, because
01:21:08 they are conflicting with the current setup.
01:21:12 So using three concept is better, but let me also explain it to you.
01:21:17 So this will be man.
01:21:18 And in prompts you are just going to type [filewords] and class prompt.
01:21:23 You are just going to type [filewords] and leave blank to use instance prompt optionally.
01:21:30 Use [filewords] to base sample captions on instance images.
01:21:33 You can just also use [filewords] to see what is what it is generating.
01:21:40 This is called mixed where in the basics of the wiki of DreamBooth extension.
01:21:47 So you see there is DreamBooth regular training that i have shown in this tutorial.
01:21:53 Then there is fine tuning.
01:21:55 Fine tuning is the standard approach for big data sets.
01:21:58 Only the captions of the images are used.
01:22:00 [filewords] class images are not used.
01:22:02 These results in a model that doesn't need instance token and reacts to any prompt.
01:22:07 So in this case you are overall training.
01:22:10 What does that mean?
01:22:11 That means that, let's say, in your [filewords] you have cars, you have cats, you have dogs,
01:22:18 you have men.
01:22:19 You are training all of these words.
01:22:22 And this is how the custom models you see are usually trained.
01:22:28 Let me show an example.
01:22:29 So, for example, protogen x3.4 is a custom model and it is working pretty good.
01:22:37 How did they train it?
01:22:38 They probably trained it with fine tuning.
01:22:41 So in fine tuning they have, precisely prepared the descriptions of each training image.
01:22:48 They didn't use any classification images and they have overall changed the underlying
01:22:53 context, data, the knowledge of the model.
01:22:56 So when you use now man, it produces quality of man images depending on their new fine-tuned
01:23:04 data set or car or castle or whatever that you are improving your model on.
01:23:10 And there is hybrid.
01:23:12 Okay, actually i said mix it, but it will be hybrid.
01:23:15 Hybrid, for lack or of better term, is achieved using instance token in combination to [filewords]
01:23:20 as instance prompt.
01:23:21 Trained Dataset will be linked to that instance token.
01:23:24 This minimize the bleed but requires token in every prompt, as you can see here.
01:23:29 So you have to use or ohwx french bulldog or ohwx, whatever you have teached.
01:23:37 Also you see the class token is person.
01:23:39 So with hybrid model with [filewords] if you, if you don't do fine tuning but only teach
01:23:45 any subject, the subject should be, i think, same class.
01:23:49 They can't be from different classes.
01:23:51 So you can teach multiple person in a single run, maybe 10 person, with just providing
01:23:59 correct [filewords] and their descriptions.
01:24:03 So for this person you need to add, let's say a man personA.
01:24:09 Okay, this will define personA.
01:24:11 For person b, you need to add personB and for person c, you need that personC.
01:24:16 But you are not going to add into this description: you are not going to add this instance token.
01:24:24 Okay, you don't need to type instance token into the [filewords], into the description
01:24:30 of the training images or the into the description of the classification images.
01:24:37 Okay, this is important.
01:24:39 Okay, now i will show how you can understand out of memory error.
01:24:46 So it is easy.
01:24:47 I'm just going to load settings for our existing data set.
01:24:50 You see, i have an error.
01:24:52 So it looks like i had error in cmd.
01:24:54 I just need to restart.
01:24:56 Okay, i did restart and in the settings, if i set use EMA.
01:25:03 So actually this improves our result quality but it costs more ram.
01:25:08 And then i just click train and let's see how we are going to get out of memory error.
01:25:14 Okay, we got our error.
01:25:18 Let me show you how to understand out of memory error.
01:25:22 You will see runtime CUDA out of memory.
01:25:24 If you are seeing this error, all other messages are not important.
01:25:28 This means that with the current settings that you are trying to training, your graphic
01:25:34 card is not enough and you need to reduce the ram usage.
01:25:38 Now let me show you all of the settings to how to reduce the ram usage.
01:25:43 Okay, so for minimal ram usage you need to pick LoRA with the LoRA.
01:25:48 There is just a little bit difference.
01:25:52 It is only different when you try to do inference and generate new images from generated LoRA
01:26:00 file.
01:26:01 And when you watch this video you will learn that, okay, LoRA will significantly reduce
01:26:06 ram usage.
01:26:08 Other than that, always make sure that your batch size and gradient accumulation steps
01:26:12 are one and other than that, in the advanced tab you need to pick use 8 bit adam and select
01:26:20 bf16 and select xformers.
01:26:23 So for xformers to be able to, you need to set your starting arguments to xformers and
01:26:31 minus minus no half.
01:26:33 These will allow you to use that.
01:26:35 Cache latents.
01:26:36 Actually, this is the.
01:26:37 This is still not clear.
01:26:39 You should try both this checked and unchecked.
01:26:42 Because some says that this increases, some says that this decreases . So also, Step Ratio
01:26:49 of Text Encoder Training.
01:26:50 This should be zero because this increases quality but also reduces, also increases the
01:26:54 vram usage.
01:26:56 And other than these, there is not much else that you can do.
01:27:02 These are the lowest possible.
01:27:04 Also, you need to uncheck this checkbox and you need to check this checkbox.
01:27:12 So when you check this checkbox it will increase your vram usage, but when you check this checkbox
01:27:18 it will reduce your vram usage.
01:27:21 Actually, actually, the settings are written in the troubleshooting part of the DreamBooth
01:27:26 wiki extension, in the OOM tab, and there is also overtraining and other things.
01:27:32 Actually, overtraining is still in working process and i have already shown you how to
01:27:37 understand overtraining.
01:27:39 And one another cool thing that i am going to show you is preprocessing your images.
01:27:45 So with preprocessing images you can easily generate descriptions for your both training
01:27:52 images and your classification images.
01:27:55 Of course they won't be very accurate, so let me show you.
01:28:00 I am picking my best db 512 as source directory and the description directory will be same.
01:28:09 So in here you can even define their target resolution, change them, but i prefer manually
01:28:16 changing them and captioning.
01:28:19 So for captioning, i am just going to select ignore, so it will generate new captions and
01:28:26 i am going to use deepbooru for captioning.
01:28:29 You can also generate flipped copies oversized images, splitted, autofocal point crop.
01:28:35 So let's say you have tens of thousands of images, then these options will be extremely
01:28:41 useful for you.
01:28:42 However, if you are only going to train your face, then you should manually prepare your
01:28:47 training data set to be best, and then i am going to generate captions for them.
01:28:53 I am just going to click preprocess.
01:28:54 It shouldn't change the width and height because they are already 512 pixels and it is downloading
01:29:03 the deepbooru for captioning.
01:29:04 This is another model, just as i have shown you in here.
01:29:09 The deepbooru is not as good as caption generated by git large, but it is still useful and in
01:29:15 a moment we are going to see.
01:29:17 Okay, it has thrown an error.
01:29:19 Says that same director specified as source and destination directory.
01:29:22 Obviously, this is not allowed.
01:29:25 Actually, it's a good thing that they don't allow.
01:29:28 So i'm just going to change it as processed, so that you don't override your original images
01:29:36 and just lets click preprocess.
01:29:39 Okay, the models are only downloaded one time, and all images are preprocessed.
01:29:45 So let's check out the preprocessed images.
01:29:48 Okay, you see same images with descriptions.
01:29:52 Let's look at the description.
01:29:53 So the description is one: boy, black hair, facial hair, gray pants, jacket, long sleeves,
01:29:58 male focus pants, realistic solo sub stable track jacket and track it track pants.
01:30:05 So it's a pretty good description.
01:30:07 You can also manually modify them.
01:30:10 Let's also modify our classification images so that, it will generate all of the description
01:30:17 of classification images.
01:30:18 By the way, this is useful, as i said, when you use [filewords].
01:30:22 If you are not using [filewords], then these won't get used.
01:30:26 This is also useful, very useful, if you use a hyper network or embeddings, and i will
01:30:32 also hopefully make a video about embeddings.
01:30:35 Hyper networks are not very good, but embeddings are really really good.
01:30:39 Okay, let's preprocess our classification folder.
01:30:45 So the preprocess is in train tab.
01:30:47 This is a feature of Automatic1111.
01:30:50 Okay, and preprocess it.
01:30:53 It is also pretty fast.
01:30:57 So this will be extremely useful to caption.
01:31:00 And also, if your images are not properly cropped and you have tons of thousands of
01:31:07 images, as i said, that will take huge time.
01:31:10 You can just use this.
01:31:12 As a beginner you can also use this to make your job easier and see the results, how it
01:31:17 is performing.
01:31:18 Let's say you picked your hundreds of images of yourself and you don't want to spend time.
01:31:23 Then you can preprocess images like this and try, try, train, try the training on them
01:31:31 and see the results.
01:31:32 If you can get good results, then why not spend much time, more time on them?
01:31:37 But if you want to get perfect results, then you need to manually crop your images and
01:31:43 set your set your description.
01:31:47 So let's see the preprocess now.
01:31:49 Every image has description.
01:31:51 Let's look at them.
01:31:52 Okay, it, for example, it defined this man as a girl, which is a very incorrect and also
01:31:59 3d asian black shirt.
01:32:01 Okay, this is a completely incorrect description, as you can see.
01:32:06 It's completely failed.
01:32:07 And now let's compare this with the large git which i have shown.
01:32:12 Okay, i wonder what kind of result we are going to get with large git, so i'm just going
01:32:18 to drag and drop.
01:32:21 By the way, as i said, i have suggested adding this model to the Automatic1111 to get better
01:32:27 results, and the large git generated a portrait of man with beard.
01:32:32 Yes, absolutely fantastically correct when compared to this trashy description, as you
01:32:41 can see.
01:32:42 Okay, as a final thing, i suggest you to look at the ELI5 training.
01:32:48 So this is getting updated by the experienced persons and, for example, in [filewords],
01:32:55 they say that they are giving an example of instance, token alexa is bad because underlying
01:33:01 data for alexa is great and it would be hard to override it.
01:33:06 This is also bad because this is getting split into like this: ohwx, great.
01:33:12 Class token is also important.
01:33:15 I already experienced them, but you can also check these pages.
01:33:19 I will put the links of these pages into the description.
01:33:24 Now i will show you one another very cool thing.
01:33:26 You see, this Protogen x3.4 is a custom model that has been generated by using multiple
01:33:33 models, a lot of training, and you see, if you train your face or subject into this model,
01:33:40 it won't produce good results.
01:33:43 Because the underlying data have been significantly changed.
01:33:48 So how can we inject our face into this model?
01:33:53 There is a way to do that and now i am going to show you.
01:33:56 We go to the checkpoint merger and in the primary model we are selecting our target
01:34:05 model, which is Protogen x 3.4.
01:34:09 Secondary model will be the model that we train it we are using, which will be this
01:34:16 one: ohwx 1308.
01:34:19 And there is tertiary model.
01:34:21 So the tertiary model will be version 1.5.
01:34:24 This is the model, this is the base model of our model, and what we are going to do
01:34:29 is we are going to extract our image from base model and we will apply our image into
01:34:37 the our new target model.
01:34:39 Let's give it a name: ohwx, protogen okay, 3.4, and set the weight 0.75.
01:34:49 This is 75%.
01:34:50 You may ask: how did you come up with this value?
01:34:53 I asked the community and, according to the experience of the community, 75% is a good
01:35:00 point.
01:35:01 You can, of course, try multiple different points.
01:35:03 You can try your different checkpoints to see how you perform.
01:35:08 Also, click the add difference.
01:35:10 So this will extract our face information from our base model and it will inject our
01:35:16 face information into our new target model without breaking the underlying context, the
01:35:25 information.
01:35:26 We are going to generate ckpt add difference and just click run.
01:35:31 In the cmd window you will see the messages like this: and checkpoint saved, then refresh
01:35:36 here and just go to our new model, which is ohwx protogen.
01:35:42 Now we can produce images by using the protogen model and our face, same as usual.
01:35:51 Okay, everyone, i have done a few tests and the results are just amazing.
01:35:58 So you see, these are some of the images that i have selected from the results.
01:36:03 And let me show you something.
01:36:05 So you see, this is generated by protogen and this is my original, real image.
01:36:12 And this is the generated image.
01:36:14 You see the quality.
01:36:15 It is just amazing.
01:36:17 And what kind of test i did.
01:36:20 For testing, i have used the x/y plot, i have entered different x values as CFG and i have
01:36:29 entered prompt sr as the weights.
01:36:32 So how did i make?
01:36:33 So you you see the ohwx man and then we are entering a weight here, right to give an importance
01:36:41 to it.
01:36:42 So i have entered a keyword here change weight and i have used it as a change weight here
01:36:49 in the prompt sr.
01:36:50 So the you the Automatic1111 ui into application changed the weight for me and tested different
01:36:59 weights.
01:37:00 Now i can see the properties of this generated part particular image to see what were the
01:37:09 used values.
01:37:10 Then, based on that, i can generate anything i want.
01:37:14 So the weight used was 1.4 and the cfg scale was 8.
01:37:20 So by using 1.4 and cfg scale 8 i can generate much more quality images.
01:37:28 So these two parameters will work with my merged model.
01:37:35 By the way, i also have used something else.
01:37:39 You see, there is a model hash and that hash, the hash written here, also displayed here.
01:37:46 This 95 means that i have generated another checkpoint, but this time i have used 95%
01:37:53 weight.
01:37:54 This worked better for me.
01:37:57 So in the beginning you can start with 75 and if you are not getting good images then
01:38:03 you can increase it and make different model merges and then do test on them.
01:38:11 So this is the way how to test and find out the good working parameters for your model
01:38:17 and then use those parameters to generate more stylized images as you want.
01:38:22 But the results are just simply amazing.
01:38:25 You can't just get these results so easily on the default Stable Diffusion model.
01:38:30 So you can inject your trained model, trained face, into any custom model out there and
01:38:37 generate the beautiful images as you want.
01:38:42 So let's also upscale this image.
01:38:45 To do that, i am just going to send it to extras and i will upscale it with R-ESRGAN
01:38:51 4x+.
01:38:53 And here the result: it is just beautiful.
01:38:56 Let's also apply GFPGAN to get better face quality.
01:39:02 Okay, now, amazing, as you can see, amazing quality, amazing image.
01:39:07 There is only only an artefact here, as you can see, . So if i would generate such images,
01:39:13 i could also get rid of this artefact.
01:39:15 I think i have covered pretty much everything.
01:39:19 As i said in the beginning, just join our discord channel.
01:39:22 From our about page and also in the in here you will see the link.
01:39:29 Just click the official discord . Please also share, like, subscribe, and if you support
01:39:35 us on our patreon, i would be greatly appreciated.
01:39:39 Currently we have three patrons.
01:39:42 I think i thank a lot to them for becoming patron of our, supporting our job.
01:39:48 You can also join our channel and support us from here, as you can see.
01:39:53 I would appreciate every bit of your support.
01:39:56 Hopefully see you in another video.
01:39:59 Please leave comments and ask the questions.
01:40:02 Ask the topics that you want to see as a new um tutorial.
01:40:07 Thank you very much.
01:40:09 Hopefully see you later.
Beta Was this translation helpful? Give feedback.
All reactions