DreamBooth Got Buffed - 22 January Update - Much Better Success Train Stable Diffusion Models Web UI #298
FurkanGozukara
announced in
Tutorials
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
DreamBooth Got Buffed - 22 January Update - Much Better Success Train Stable Diffusion Models Web UI
Full tutorial: https://www.youtube.com/watch?v=KwxNcGhHuLY
Our Discord : https://discord.gg/HbqgGaZVmr. Newest update of DreamBooth extension of Automatic1111 brought huge quality and success improvement. If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰 https://www.patreon.com/SECourses
Playlist of Stable Diffusion Tutorials, #Automatic1111 and Google Colab Guides, DreamBooth, Textual Inversion / Embedding, #LoRA, AI Upscaling, Pix2Pix, Img2Img:
https://www.youtube.com/playlist?list=PL_pbwdIyffsmclLl0O144nQRnezKlNdx3
In this video, I have explained how to use the newest DreamBooth update of Automatic1111 Web UI extension. With new update, now it is much more successful to teach your subjects into any Stable Diffusion model.
The update has just been released today : 22 January 2023
Zero To Hero Stable Diffusion DreamBooth Tutorial By Using Automatic1111 Web UI - Ultra Detailed
https://youtu.be/Bdl-jWR3Ukc
#Dreambooth revision: fd51c0b2ed20566c60affa853a32ebce1b0a1139
SD-WebUI revision: d8f8bcb821fa62e943eb95ee05b8a949317326fe
AUTOMATIC1111: https://github.com/AUTOMATIC1111/stable-diffusion-webui
Git Bash download link : https://git-scm.com/downloads
How To Do Stable Diffusion Textual Inversion (TI) / Text Embeddings By Automatic1111 Web UI Tutorial
https://youtu.be/dNOpWt-epdQ
Our discord channel : https://discord.com/invite/HbqgGaZVmr
00:00:00 Introduction to the new buffed DreamBooth extension
00:00:30 How to checkout the SD and DreamBooth version used in this video by commit hash IDs
00:01:40 How to compose DreamBooth training model
00:02:13 Best configuration of settings tab of DreamBooth extension training
00:03:37 Lowest VRAM settings to use DreamBooth extension and do DreamBooth training
00:03:59 Why not use --no-half on SD 1.5 and use on SD 2.1
00:04:46 New setting AdamW Weight Decay
00:05:10 New setting Scale Prior Loss
00:06:14 How exactly filewords work in Stable Diffusion DreamBooth training
00:08:53 Sample images generated during training
00:09:30 Prompting difference of new DreamBooth extension than previous versions
00:10:25 How to test different checkpoints saved during training by X/Y plot script
From official paper : https://arxiv.org/pdf/2208.12242.pdf
Our new approach, DreamBooth, addresses the limitation of current text-to-image models by allowing for "personalization" of these models to better fit the specific needs of users. By providing just a few images of a subject as input, DreamBooth fine-tunes a pre-trained text-to-image model (such as Imagen) to learn to associate a unique identifier with that subject. This allows for the generation of novel, photorealistic images of the subject in various scenes, poses, views, and lighting conditions, even those not present in the reference images.
Our technique utilizes a new autogenous class-specific prior preservation loss which enables the preservation of the subject's key features while still allowing for diverse synthesis of the subject. This opens up possibilities for a wide range of previously unassailable tasks such as subject recontextualization, text-guided view synthesis, appearance modification, and artistic rendering.
Imagine your own dog traveling the world, your favorite bag on display in the most exclusive showrooms, or your parrot as the main character of an illustrated storybook. These are just a few examples of the type of creative and unique content that can be generated using DreamBooth. Our approach allows for the natural and seamless integration of specific subjects into new and diverse contexts, making the impossible possible.
Our goal is to use just a few casually captured images of a specific subject, without any textual description, to generate new images of the subject with high detail fidelity and variations guided by text prompts. The input images can be captured in varying settings and contexts and the output variations can include changes in the subject's location, properties such as color, shape, and species, as well as modifications to the subject's pose, expression, material, and other semantic changes. Our approach utilizes the powerful prior of text-to-image models to enable a wide range of modifications.
To accomplish this, we first implant the subject instance into the output domain of the model and assign it a unique identifier. We present a new method for fine-tuning the model to use its prior for the specific subject instance while also addressing issues of overfitting and language drift. Our approach includes an autogenous class-specific prior preservation loss which encourages the model to generate diverse instances of the same class as the subject.
Our goal is to add a new key-value pair to the text-to-image model's "dictionary" that will allow us to generate fully-novel images of a specific subject with meaningful semantic modifications guided by a text prompt. We achieve this by fine-tuning the model with a small number of images of the subject. The question then becomes how to guide this process.
Video Transcription
00:00:01 Greetings everyone. In this video I will show you the newest update of DreamBooth
00:00:05 extension of Automatic1111 web UI. The new update brought huge improvement to teach our
00:00:11 subjects into Stable Diffusion models. It is like 10x better than before. This will
00:00:16 be a short video of best settings. Therefore, if you are interested in a detailed tutorial,
00:00:21 please first watch my very thorough DreamBooth tutorial. Let me show you it. This is the video
00:00:27 that you need to watch. So let me show you which commit, which version of DreamBooth and
00:00:31 SD web UI I am using. This is the DreamBooth revision and this is the SD web UI revision.
00:00:37 To use these specific versions, just open the Git Bash. You need to install it from Google and then
00:00:44 move to your folder installation folder, like this CD and drag and drop. Then use git checkout
00:00:52 command like this and paste the commit ID. And now it will checkout to that version.
00:01:00 With the newest update, you are able to obtain very highly stylized and very good results with
00:01:06 a very simple command like this. You see, I just added Tomer Hanuka to my token, ohwx and,
00:01:15 as usually, I am using the negative prompt and it is able to stylize it very well. This is the data
00:01:21 set I have used. It is not great, but it is not also very bad. All of the backgrounds and
00:01:26 the clothes are different, different angles. Okay, now let me show you the best settings.
00:01:33 This is not necessary, but make sure that you have selected the version 1.5, pruned ckpt here,
00:01:39 or whatever the model that you want to train on. Go to the DreamBooth tab in the create menu. Give
00:01:45 a name like tutorial. Now we are checking whether that 512 pixel model or not. Source checkpoint
00:01:55 this is the important one: version 1.5 pruned and create model. You should get a message like this:
00:02:02 checkpoint successfully extracted to our working folders. Go to the settings tab and in here click
00:02:08 performance wizard first. There is also new LoRA version. However, I couldn't get good results
00:02:14 with LoRA yet. Hopefully I will make another new video for LoRA, so I am not picking it. Do not
00:02:20 check this checkbox. It is usually not working. By the way, with the new model, with the new update,
00:02:28 it is working much faster than before. In terms of number of epochs required to train your subject.
00:02:35 For example, I was able to teach my face with just 30 epochs, unlike previously I was using
00:02:42 like 150 epochs. So still, you can set this as 200, set this as 0, set this as 5 because it is
00:02:50 getting overtrained very quickly. With the newest update I think they have fixed a lot of things and
00:02:57 now this is not automatically checking. Checked for me. I have RTX 3060. When you check this it
00:03:04 will actually reduce the memory usage, but it will slow down your process and this is also for saving
00:03:11 VRAM. So this is constant with warmup. It is fine. I didn't change the learning rate. It works
00:03:16 fine. Do not check this box and do not check this checkbox as well. Here, this is very important to
00:03:23 understand whether you started overtraining or not. So photo of ohwx man by Tomer Hanuka. The
00:03:33 ohwx is our token. It is a very rare token and man is class token. Then in the advanced tab, if you
00:03:40 have more than 12GB VRAM, you can check this "use EMA". It will improve your success. Use 8bit Adam.
00:03:46 We have 16bf and xformers. By the way, let me also show you the command line arguments I used. The
00:03:52 only command line argument I used is --xformers, and I am also using disable safe unpicked. But
00:03:58 this is not necessary. I didn't add --no-half, because this is necessary for SD version 2.1.
00:04:09 This will slow down your image generation and also training, but this is all only necessary
00:04:15 for SD 2.1. So if you are not working with SD 2.1 version, do not add this. It will slow down your
00:04:21 process. I am checking cache latents. This will speed up the training but it will increase the
00:04:27 VRAM usage. Since I have 12GB, it works fine. I am also training UNET. If you don't have enough VRAM
00:04:34 then you should uncheck this. Okay, It gave an error when I unchecked it. I think they will fix
00:04:42 it. So I am leaving these default. There is a new settings, as you can see, the Weight Decay AdamW
00:04:49 optimizer. It says that this will more generalize your images as you make this number bigger. If you
00:04:56 want your subject to be as close much as to you. By default it is 0.01. I did left it default and
00:05:03 it worked very well. And the pad tokens: these are for when you are using the [filewords] and
00:05:08 shuffle tags, image captions related. There's also one new, another option: prior loss. I asked this
00:05:15 to the developer of the extension developer. The answer of the developer is this: As you can see,
00:05:21 Scale prior loss loss, decrease the prior loss weight as training progress. When you enable it,
00:05:26 you get a "minimum prior loss" setting and "prior loss target". The target is: at what epoch
00:05:30 should the prior loss weight reach the minimum. He also commented as not sure if it matters or helps,
00:05:38 but it will stand to the reason that as we train our model, we want the weight of the class images
00:05:43 to be lower than that the instance images, as the model should already better know the subject. I
00:05:49 made test with enabling this and disabling it. I think when this is not enabled, it worked better
00:05:54 for me, but it is up to you to test it. Then we go to the concept. Okay, In the directories,
00:06:00 as usual, we are setting our first training set directory. This is my training set directory.
00:06:06 So I copied it, I pasted it, then set it set a new. If you have prior classification images,
00:06:12 like I have, you can give it its directory or you can give a new directory like this.
00:06:18 Okay, Now [filewords]. People are getting very confused with this. Let I will explain
00:06:23 to you what is this actually in this video. Let's say you have typed as ohwx here and
00:06:30 you have type of [filewords] here. So this will become like this when it is processing.
00:06:36 For example, let's say I am using image captions, like in this example, and the caption of the
00:06:42 image is like this. Okay, Let me show you. So when I use the configuration like this,
00:06:52 it will append the instance token I have written in the [filewords] to the beginning. Then it will
00:06:57 read the captions here. So the final prompt will be like this. If I also add a word here,
00:07:04 let's say example word, then this example word will be appended here. So this is also equal to
00:07:12 using like this. When you use it like this, it will become actually exact exactly like this. So
00:07:19 this is how [filewords] and instance token works. I am not using any [filewords] and image caption.
00:07:26 So I am setting as the instance token as ohwx man. So the ohwx is our token and the man is our class.
00:07:37 In this video I have very clearly and very detailedly and very technically explained
00:07:43 how the Stable Diffusion works, how it is composed by vectors and different tokens.
00:07:49 For class prompt, we are using photo of man and for sample image prompt: we are using
00:07:57 photo of ohwx man as you can see, then, class images per instance, I have used 48 images and in
00:08:10 the saving tab, make sure that you are generating a ckpt file and saving during training because,
00:08:16 when you see it is over training, you will see use the certain checkpoints and then
00:08:22 click save settings. Okay, I think everything is pretty much ready and.
00:08:30 Just hit train tab and it will start generating the classification images and after that it
00:08:37 will start training the model. You see, the classification images are very weird and. If
00:08:43 you handpick the good classification images then it may improve your success. It is up to you to
00:08:49 test. But I didn't touch the classification images actually. I just used whatever it generated.
00:08:55 Okay, here the training samples generated during my training. So, as you can see,
00:09:01 even in the 10th epoch this is the I have saved preview images and checkpoints every 10 epoch.
00:09:08 It is already learned a very good my subject, my face. And after 30 epochs we are losing the
00:09:15 styling with by using the sanity. As you can see, in the sanity we used by Tomer Hanuka style. So I
00:09:23 decided to do my tests on the just the 30 epochs and it worked very well, as I have shown you.
00:09:31 Also, when we analyze that you see, now we don't even need to increase the prompt
00:09:38 emphasis of our token, just using, like this, 1.1 emphasis, and it is working, working very well.
00:09:45 And with 30 step and with 720 steps training, it is able to fully stylize my face and able
00:09:56 to generate very beautiful images. This is different than the previous trainings
00:10:01 of the DreamBooth extension of Automatic1111. So therefore, now, the things are really different,
00:10:07 really improved I think what they did is now they are properly able to keep up the prior loss and
00:10:14 previously they were not able to keep that. So now, with a very few number of epochs, the
00:10:22 model is able to learn our subject very well. You still can do a test of different epochs, so to do
00:10:31 that, you just need to use the X/Y plot and in here you can give the checkpoint names. You see
00:10:40 it's showing all of the names like this and just delete the ones that you don't want to test. And,
00:10:45 the Y plot. I suggest you to test CFG value. You can test 8, 7 9, 10, 11, 12 and whatever
00:10:52 you want. And when you do test you can see which epoch is working best for you. And I am really
00:11:02 happy with the newest update because it's certainly improved the training quality. So check
00:11:09 out the newest DreamBooth extension. Thank you very much for watching. Please like, subscribe and
00:11:15 leave a comment. You can join our discord channel and discuss everything and ask any questions. Go
00:11:21 to the about tab of our YouTube channel and in the bottom you will see official discord
00:11:25 channel link. And if you support us on Patreon, I would appreciate very much. This is keeping
00:11:30 me to do more research and produce better quality videos. Hopefully see you later in another video.
Beta Was this translation helpful? Give feedback.
All reactions