How to Inject Your Trained Subject e.g. Your Face Into Any Custom Stable Diffusion Model By Web UI #301
FurkanGozukara
announced in
Tutorials
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
How to Inject Your Trained Subject e.g. Your Face Into Any Custom Stable Diffusion Model By Web UI
Full tutorial: https://www.youtube.com/watch?v=s25hcW4zq4M
Our Discord : https://discord.gg/HbqgGaZVmr. In this video, I am explaining how to extract the taught subject information from your trained model and inject into any new custom Stable Diffusion model. If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰 https://www.patreon.com/SECourses
Playlist of Stable Diffusion Tutorials, Automatic1111 and Google Colab Guides, DreamBooth, Textual Inversion / Embedding, LoRA, AI Upscaling, Pix2Pix, Img2Img:
https://www.youtube.com/playlist?list=PL_pbwdIyffsmclLl0O144nQRnezKlNdx3
Easiest Way to Install & Run Stable Diffusion Automatic1111 Web UI on PC by Using Open Source Automatic Installer
https://www.youtube.com/watch?v=AZg6vzWHOTA
How to use Stable Diffusion V2.1 and Different Models in the Automatic1111 Web UI - SD 1.5 vs 2.1 vs Anything V3
https://www.youtube.com/watch?v=aAyvsX-EpG4
Zero To Hero Stable Diffusion DreamBooth Tutorial By Using Automatic1111 Web UI - Ultra Detailed
https://www.youtube.com/watch?v=Bdl-jWR3Ukc
How To Do Stable Diffusion LORA Training By Using Automatic1111 Web UI On Different Models - Tested SD 1.5, SD 2.1
https://www.youtube.com/watch?v=mfaqqL5yOO4
How to Do DreamBooth training on a Google Colab without needing any GPU and download trained ckpt model file
https://www.youtube.com/watch?v=mnCY8uM7E50
How to Use SD 2.1 & Custom Models on Google Colab for Training with Dreambooth & Image Generation
https://www.youtube.com/watch?v=2yGGorOxtbA
00:00:00 Introduction to how to inject / merge / combine your models by using checkpoint merger
00:01:48 Start of the tutorial
00:01:57 My face trained model used training dataset
00:02:12 The image quality of the default trained model (SD 1.5 official version)
00:02:44 How to inject your trained info from your trained model into a new custom model
00:03:04 What are primary model, secondary model and tertiary model
00:03:32 What is the strategy for extracting your trained subject from trained model and inject into a new custom model
00:04:31 What is Checkpoint Merger multiplier
00:05:01 Add Difference selection
00:05:25 How to use newly merged model
00:05:54 How to select proper prompt strength and CFG value for the new subject injected model
00:09:22 How to join our discord channel to ask anything and get support for free
Stable Diffusion is a powerful deep learning model that has the ability to generate highly detailed images from text descriptions. The model was developed by the CompVis group at LMU Munich, and was released in 2022 by a collaboration of several organizations, including Stability AI, CompVis LMU, and Runway, with support from EleutherAI and LAION.
One of the key features of Stable Diffusion is that it is a latent diffusion model, a type of deep generative neural network. This allows the model to generate a wide range of images from a given text prompt, and to make use of a large amount of data in order to improve its performance. The model is also highly efficient, and can run on most consumer hardware that is equipped with a moderate GPU with at least 8 GB VRAM, unlike many previous proprietary text-to-image models such as DALL-E and Midjourney which could only be accessed through cloud services.
In addition to its impressive capabilities, Stable Diffusion is also highly accessible to users. The model's code and model weights have been made publicly available, meaning that anyone can use the model and experiment with its capabilities. This opens up a wide range of possibilities for researchers and developers, who can use the model to explore new applications and improve its performance.
Another important feature of #StableDiffusion is the #DreamBooth approach which is a new way for personalizing text-to-image models. This method allows users to fine-tune a pre-trained text-to-image model like Imagen with just a few images of a specific subject. By doing this, the model learns to associate a unique identifier with that subject, allowing the generation of fully new and photorealistic images of the subject in a variety of contexts, poses and lighting conditions, while preserving the subject's unique features. This opens up a wide range of possibilities for the field, including text-guided view synthesis, appearance modification, and artistic rendering.
In October 2022, Stability AI raised US$101 million in a round led by Lightspeed Venture Partners and Coatue Management, which is a testament of how promising the technology is for investors. The Stable Diffusion is a powerful new tool that is poised to revolutionize the field of deep learning and text-to-image generation.
#Midjourney is a text-to-image generative model developed by OpenAI, which is known for its impressive capabilities in generating detailed images from text prompts. It is a variant of the DALL-E model which is a deep learning model that was trained on a massive dataset of images and captions, allowing it to learn the relationships text image data.
Video Transcription
00:00:01 Greetings everyone.
00:00:02 In this video I am going to show you how you can inject your subject, in this case my face,
00:00:08 into any custom model by using Automatic1111 Web UI.
00:00:13 So you see, the center image is my original image and the other images are generated by
00:00:17 using this injected model, which is Protegen x3.4.
00:00:22 It's an awesome model to generate hyperrealistic pictures.
00:00:26 This is the official site.
00:00:28 So if you don't know how to use Automatic 1111, I have great tutorials.
00:00:33 First of all, I have shown how to install Automatic 1111 and run it in this video.
00:00:40 And I have shown how to use different models with Automatic1111 in this video.
00:00:45 And recently my published video shows how you can train your subject, e.g. your face
00:00:51 or anything, by using Automatic 1111 and DreamBooth extension.
00:00:56 And if your graphic card is not very good but can support LoRA training, then I have
00:01:01 shown how to do LoRA training in this video.
00:01:04 Even though if you are still not able to train your subject by using LoRA, if you still don't
00:01:11 have a decent graphic card, then don't worry.
00:01:14 I have shown how to use Google Colab to teach your subject and then you can generate ckpt
00:01:20 file on Google Colab, download it and can use it on Automatic1111 web UI.
00:01:26 And in this video I have shown how to use custom models on Google Colab for training.
00:01:31 Now we can start.
00:01:33 By the way, I will put all of these videos links into the description and they are all
00:01:36 available in our channel.
00:01:38 Go to our channel main page and in here you will see Stable Diffusion DreamBooth playlist
00:01:42 and everything is available here.
00:01:45 I will also put the link of this to the description.
00:01:48 OK, now we can start.
00:01:49 So currently the selected model is my trained model in the my last video.
00:01:57 I have trained my face by using 12 images.
00:02:00 So this is the training data set.
00:02:03 If you wonder what kind of data set was was used, it's a pretty simple and small data
00:02:09 set: only 12 images.
00:02:12 OK, this is the image that the training model can generate.
00:02:16 By the way, the training model is based on SD 1.5 official version.
00:02:21 You see, it is a pretty bad quality actually.
00:02:25 So to get very good, very high quality images on the default SD 1.5 version data set, you
00:02:31 have to generate insane amount of images and you have to do a lot of prompt trying.
00:02:38 So you see, this is a pretty simple prompt actually, and this is also a pretty simple
00:02:43 negative prompt.
00:02:45 So how are we going to inject our trained model into a custom model.
00:02:52 First of all, as a primary model, we are going to select the target model, which is Protegen
00:02:57 x3.4 in this case.
00:03:00 A photo realism model, a very high quality model.
00:03:03 I have selected it from this drop down box: Protegen x3.4, official.
00:03:10 Then the secondary model will be the model that we will extract our trained subject,
00:03:18 which is web ui ohwx, my face trained model.
00:03:23 OK, let me select it.
00:03:27 And the tertiary model.
00:03:28 So what is tertiary model?
00:03:30 The strategy we are going to use is like this: So the base model, which is the tertiary model,
00:03:37 C model.
00:03:38 In the base model, we are going to select the base model that we used to train our subject,
00:03:43 which is version 1.5 Pruned ckpt.
00:03:48 And the trained model is model B, which is Web UI owhx, and the target model is model
00:03:53 A. So what we are going to do is we are going to subtract model C from model B. And what
00:04:02 will be left?
00:04:04 Only our trained subject will be left.
00:04:06 OK, because we have trained our subject, in my case my face, and we have now the trained
00:04:14 model have my face and the rest is supposed to be same with the base model.
00:04:19 And then we are going to inject our face into our target model, which is Protegen.
00:04:25 So how are we going to do that?
00:04:27 First of all, we need to define a custom name.
00:04:29 I will say my face, Protegen, like this: So multiplier.
00:04:35 What kind of model we want to get?
00:04:38 This is defining the strength of our injection.
00:04:43 For faces the community suggests 75 percent, which is equal to zero point seventy five.
00:04:50 I have, I have made a lot of tests and I seen that for my particular trained model, 95 percent
00:04:56 is working better.
00:04:58 Then you are selecting add difference here, not the weighted sum.
00:05:01 We are going to use add selected add difference, because this is the strategy that we are going
00:05:08 to use and generate ckpt.
00:05:11 You don't need to save as float16 Just click run.
00:05:15 OK.
00:05:17 In the CMD window you will see that it is loading the models then generating the target
00:05:22 model, like this: My face protogen ckpt and the checkpoint saved.
00:05:26 Now let's go back to text image tab and click refresh.
00:05:32 And now we can see my face protogen ckpt.
00:05:36 OK.
00:05:38 Now let's generate an image to see what kind of quality we are going to get.
00:05:42 OK, this is the quality we got.
00:05:45 With just a try you can generate hundreds of images and generate much higher quality
00:05:50 images.
00:05:51 The eye color is not matching, therefore I should add also brown eyes here.
00:05:56 But there are several key issues.
00:05:59 The first key issue is that how did I come up with this prompt strength 1.4 and how did
00:06:06 I picked The CFG value?
00:06:09 To do that.
00:06:10 It is very easy.
00:06:11 First of all, let's define a weight here, weight_val, like this as you can see: OK,
00:06:18 this will be keyword that we are going to use and for CFG value, we will set it here.
00:06:23 So I am going to X/Y plot.
00:06:27 So in the first value I am going to select CFG.
00:06:31 To find the optimal CFG and weight values for your newly generated compound or injected
00:06:40 model.
00:06:41 You should make a X/Y plot and see the which one is producing the best outcome for you.
00:06:49 And for the Y values we are going to use prompt SR, like this.
00:06:54 And in here we are going to put this as a keyword.
00:06:58 And then we are going to set the prompt strength, like 1.0, 1.1, 1.2, like this, and then 1.3.
00:07:10 Okay, 1.4.
00:07:15 All right, then do not check this box, because we are going to compare the effect of prompt
00:07:23 strength and CFG values.
00:07:24 Therefore, do not check keep -1 for seeds.
00:07:28 For example, let's pick the batch size as 4 it will speed up if your graphic card VRAM
00:07:34 is sufficient.
00:07:35 I think this should be sufficient enough to see the compare results and see which one
00:07:41 of the weight and the CFG value performs best on our newly generated model.
00:07:49 After the results generated, just compare them and see which one of the CFG value and
00:07:54 the weight produced the best outcome And then you can use that weight value in your prompts
00:08:02 and the CFG value in your prompts.
00:08:04 So this methodology would work on any custom model that is using the same base model version
00:08:13 as you are using.
00:08:14 So the if the models are trained on version 1.5, and if your custom model the subject
00:08:22 having model trained on version 1.5, then it should work.
00:08:26 You can try different weight values in here to see how it is performing, And you should
00:08:31 try different weight values here and the CFG values here, as I have just shown with CFG
00:08:37 scheme.
00:08:38 So you see it is taking two hours to generate why it will generate combination of these
00:08:44 and batch size 4.
00:08:46 If we calculate we got 12 multiplied by 11 grid, it will produce 12 multiplied by 11
00:08:53 grid and then it in each generation it will generate 4 images.
00:08:59 So in total it will be a multiplication by, like this, 528 images and among them you will
00:09:07 be able to understand that.
00:09:08 So this is all for this video.
00:09:11 If you want to support us, you can become our patron on Patreon.
00:09:15 Please subscribe, like, share and write in the comments that which what you want to see
00:09:22 next.
00:09:23 And you can also join our official discord channel from about page of our channel and
00:09:28 just click the official discord channel.
00:09:31 So far, we have 4 patrons.
00:09:33 I thank them very much.
00:09:35 This is keeping us to produce more high quality videos.
00:09:39 And, as I said, just go to the playlist and check out the playlist of our stable diffusion.
00:09:46 Everything is in here.
00:09:47 I will also put this video and the next videos in here as well.
00:09:51 Thank you very much.
00:09:52 Hopefully see you later.
Beta Was this translation helpful? Give feedback.
All reactions