Ultimate RunPod Tutorial For Stable Diffusion - Automatic1111 - Data Transfers, Extensions, CivitAI #287
FurkanGozukara
announced in
Tutorials
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Ultimate RunPod Tutorial For Stable Diffusion - Automatic1111 - Data Transfers, Extensions, CivitAI
Full tutorial: https://www.youtube.com/watch?v=QN1vdGhjcRc
Sign up RunPod: https://bit.ly/RunPodIO. This is the Grand Master tutorial for running Stable Diffusion via Web UI on RunPod cloud services. If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰 https://www.patreon.com/SECourses
SECourses Discord To Get Full Support⤵️
https://discord.com/servers/software-engineering-courses-secourses-772774097734074388
#RunPod discord: https://discord.gg/pJ3P2DbUUq
Colab Tutorial 1: https://youtu.be/mnCY8uM7E50
Colab Tutorial 2: https://youtu.be/kIyqAdd_i10
Automatic1111 Command Line: https://bit.ly/StartArguments
Best DreamBooth Tutorial: https://youtu.be/Bdl-jWR3Ukc
DreamBooth second tutorial: https://youtu.be/KwxNcGhHuLY
RunPodCTL GitHub: https://github.com/runpod/runpodctl
Pre-trained models repo link : https://huggingface.co/lllyasviel/ControlNet
Web UI install tutorial on PC: https://youtu.be/AZg6vzWHOTA
How To Use Different Models Automatic1111: https://youtu.be/aAyvsX-EpG4
Textual Inversion Training Tutorial: https://youtu.be/dNOpWt-epdQ
ControlNet Tutorial Video: https://youtu.be/vhqqmkTBMlU
ControlNet extension: http://bit.ly/3IxBYc6
ControlNet Model Files: https://bit.ly/CTRLNETModels
ControlNet Native Script: https://youtu.be/YJebdQ30UZQ
Upgrade xformers Commands: https://bit.ly/UPxformers
Kohya GUI: http://bit.ly/3ICvsB7
Cloud sync: http://bit.ly/40Zf44C
00:00:00 Intro
00:01:32 How to register RunPod.io and charge your credits
00:02:34 How to deploy a pod - start a server for Stable Diffusion 1.5 Automatic1111 Web UI
00:03:30 How to select deployment template for Stable Diffusion Web UI in RunPod
00:04:00 Explanation of temporary disk and persistent volume
00:04:44 Explanation of credit spending per minute for storage usage in RunPod
00:08:10 My Pods section
00:08:30 Connect to the started Pod
00:08:41 Start SD 2.1 Version Web UI Pod
00:09:25 Why pick a lesser used Pod
00:10:53 Bidding system of RunPod.io
00:13:11 Where and how to see scheduled maintenance
00:13:31 Stop Pod vs Terminate (delete) Pod
00:14:24 Where to see logs to debug and understand errors
00:15:08 Connect your Pod via a Jupyter Lab interface
00:15:16 How to change Automatic1111 Web UI command line arguments and restart it
00:17:54 First prompt in RunPod Automatic1111 Web UI
00:18:45 Where to see logs, find error logs, debug them
00:19:35 How to install DreamBooth extension of Automatic1111 Web UI
00:20:58 Where the generated images are saved
00:21:10 How to download generated images
00:21:38 How to update installed extensions
00:21:55 How to notice port error and fix it
00:23:04 How to install runpodctl latest version to transfer files very quickly between Pods and PC
00:23:55 How to download a ckpt file very fast from Hugging Face repo
00:25:10 Start DreamBooth training with best model and settings
00:30:41 How to upload your training dataset images
00:34:15 How to upload thousands of images (big data) from your computer to RunPod via runpodctl
00:34:28 How to install RunPodCTL on your Windows computer
00:35:06 How to send files from your PC to RunPod via runpodctl
00:39:38 Where to find generated checkpoints and sample images during DreamBooth training
00:41:30 How to delete non-empty folder
00:41:51 Even though xformers not selected during training, still breaks training and how to fix it
00:42:29 How to download a folder from RunPod to your PC via runpodctl very quickly
00:43:09 How to add runpodctl to environment path to use from every folder
00:47:25 How to continue/resume DreamBooth training
00:48:20 Test all training checkpoints with x/y plot to find best one
00:52:09 How to set correct command line arguments for SD 2.1
00:52:55 Where to see currently spent credits per hour
00:54:05 How to do DreamBooth training on SD 2.1 - 768 pixel version with best possible settings
00:57:42 How to generate classification images manually very fast
01:00:26 Why SD 1.5 is superior to 2.1
01:04:34 How to download custom models very fast from CivitAI
01:08:45 How to do Textual Inversion training with some optimal settings
01:13:00 Where Textual Inversion training samples and checkpoints are saved
01:14:07 How to use Textual Inversion check points
01:15:55 Move generated SD 2.1 classification images into correct folder
01:19:26 How to install and run ControlNet extension on RunPod IO
01:21:11 How to download your trained model files (ckpt) into your PC very fast via runpodctl
01:25:00 How to upgrade xformers to 0.0.17 for DreamBooth SD 2.1 training
01:26:04 How to expand runtime disk space
01:27:21 Best settings for SD 2.1 with xformers
01:31:30 What is Stable Diffusion fine tuning and how to do fine tuning with DreamBooth
01:39:20 Best settings quick recap for SD 2.1 for 24 GB VRAM
01:40:34 How to install and run Kohya GUI on RunPod
01:44:16 How to enable public Gradio link for Kohya GUI
01:44:52 How to start RunPods without GPU
01:46:53 Cloud snyching your Pod data / content
thumbnail freepik macrovector
Video Transcription
00:00:00 Greetings everyone.
00:00:01 In this video, I am going to show how to use Automatic1111 Web UI for Stable Diffusion
00:00:07 tasks on RunPod.io like you are using it on your computer.
00:00:11 I will cover many topics such as how to upload and download files quickly, how to delete
00:00:17 directories, how to install and run extensions, how to quickly download and use custom models,
00:00:23 how to do DreamBooth training on Stable Diffusion 1.5 or 2.1 versions, how to do fine tuning
00:00:30 via DreamBooth extension, how to do Textual Inversion training.
00:00:34 I will also explain how their pricing system works, how you can use bidding, how you can
00:00:39 transfer files from Pod to Pod or from Computer to Pod and vice versa, how you can install
00:00:45 custom other scripts such as famous Kohya graphical user interface.
00:00:50 I will also demonstrate how you can use new famous ControlNet on RunPod.io.
00:00:56 So why RunPod.io?
00:00:58 Because their system charges you based on per minute and they have great Discord support.
00:01:03 They are also easier to use with the tools they have.
00:01:07 But still, if you are interested in free cloud services for Stable Diffusion, I have two
00:01:12 great tutorials for Google Colab.
00:01:14 The first one is this one and the second one is this one.
00:01:18 And if you don't know how to use Automatic1111 Web UI, if you don't know what is Stable Diffusion,
00:01:23 what is Automatic1111 Web UI, I have great tutorial series for them.
00:01:27 For example, you can begin with watching video and you can check out the other videos in
00:01:31 this playlist.
00:01:32 So let's begin the Grandmaster RunPod.io tutorial by signing up a new account.
00:01:38 Click the sign up button.
00:01:39 For sign up I will use my Google account.
00:01:42 You can also enter your email and password if you wish.
00:01:45 Choose your account to sign up.
00:01:47 Click I have read and agreed RunPod Terms and Services.
00:01:51 Click Continue.
00:01:52 And yes, we are ready to start.
00:01:54 First of all, you need to charge some credits to start using the pods.
00:01:59 Click your balance from here as you can see in the right top menu, then it will show your
00:02:04 available balance.
00:02:06 From here you can pay with a card.
00:02:08 You can change the amount that you want to charge.
00:02:10 To have automatic payments you can add a card.
00:02:13 They also allow you to pay with a crypto.
00:02:16 Just click this icon.
00:02:17 They also show recent transactions, recent charges, and everything is very transparent.
00:02:23 OK, now I have logged in my account where I have my credits.
00:02:28 Now we can start using our Pods.
00:02:30 To do that, go to the browse servers tab in here and in here you will see the available
00:02:37 servers.
00:02:38 If you are going to do training, then I suggest you to get minimum 24 gigabytes VRAM having
00:02:45 server.
00:02:46 Because currently the latest officially released xformers is not working very well for training.
00:02:53 They have a nightly version that works well, but for training we won't use xformers.
00:03:00 And if you are not going to use xformers, then you should get minimum 24 gigabytes VRAM
00:03:06 having server.
00:03:07 I find that RTX A5000 is very decent GPU with a lower price.
00:03:14 As you can see, it is only 0.32 dollars per hour.
00:03:19 So I am going to deploy RTX A5000 GPU.
00:03:23 When you click the deploy icon, this interface will appear to you.
00:03:28 So in this interface, you should select your template.
00:03:31 There are many templates.
00:03:33 When you type Stable Diffusion, you see there are two very popular templates for Stable
00:03:39 Diffusion.
00:03:40 RunPod Stable Diffusion 1.5 and RunPod Stable Diffusion 2.1.
00:03:43 I will start both of them and I will start doing training both of them simultaneously.
00:03:49 So let's begin with RunPod Stable Diffusion 1.5 as a template.
00:03:54 So it will also download the official 1.5 version when it starts.
00:03:59 In here it shows us the other features.
00:04:01 They are very decent.
00:04:03 The temporary disk is the disk where the operating system will run.
00:04:08 You don't need to increase this.
00:04:10 And the persistent volume.
00:04:11 Now this is really important.
00:04:13 The persistent volume will stay remain as long as you don't delete your Pod.
00:04:20 So when you close your Pod, it will remain as it is.
00:04:23 It is like your hard drive.
00:04:24 It is persistent.
00:04:26 Everything you have generated, you have downloaded will remain there.
00:04:30 So this should be a sufficient amount of disk space based on your needs.
00:04:36 I am going to set it as 100 and when you set it, it will increase your minute credit spending.
00:04:44 So when you hover your mouse over this icon, it shows that 0.10 per gigabyte per month
00:04:52 for total disk on running Pods, 0.20 per gigabyte per month for volumes on exited Pods.
00:05:00 I know that this may be sounding confusing in the beginning, so I have prepared an example
00:05:06 for you which I will explain step by step.
00:05:09 So we have 105 gigabytes while running.
00:05:13 Why?
00:05:14 Persistent volume is 100 gigabytes and temporary disk is 5 gigabytes.
00:05:17 So while running, we are going to spend like this.
00:05:21 Let's say our Pod did run 75 minutes.
00:05:25 So 105 multiplied with 0.1 which is the per gigabyte price for per month.
00:05:33 In per month, how many days there are?
00:05:36 30 days.
00:05:37 So we are dividing it with 30 days.
00:05:40 In a day, how many hours there are?
00:05:42 24 hours.
00:05:43 So we are dividing it with 24 hours.
00:05:46 In an hour, how many minutes there are?
00:05:48 There are 60 minutes.
00:05:49 So this is the price of per minute running and since we are running 75 minutes, it is
00:05:56 going to take total 0.018 dollar from our credit.
00:06:02 You can also copy this.
00:06:04 Open your calculator with typing calculator in your search bar, paste it and hit enter
00:06:09 and you will get the result like this as you can see.
00:06:12 So in the below, I am giving example of Pod when it is not running.
00:06:18 When the Pod is not running, we are going to use 100 gigabytes persistent volume and
00:06:25 let's say our Pod did remain not running for two days.
00:06:30 So when Pod is not running, the price is for per gigabyte per month 0.20 dollars.
00:06:37 So since we have 100 gigabytes, 100 multiplied with 0.2, then let's delete this to not have
00:06:45 more confusion than in a month.
00:06:48 We have 30 days and we are going to use two days.
00:06:52 So this will be our spending.
00:06:54 So you can also open the calculator and copy-paste it, hit enter and you will get the price.
00:07:01 So this part is the price of one day offline for your 100 gigabytes having Pod.
00:07:09 And since it will be offline for two days, this is the credit that we are going to use.
00:07:13 The very important thing is that these credits will be deducted from your account per minute.
00:07:20 So if you keep using RunPod.io service for 10 minutes, you will be charged for 10 minutes.
00:07:26 So if it remains offline for 10 minutes, then you will be charged for 10 minutes.
00:07:31 It is not like taking your credits for per day, for per hour, or for per month.
00:07:37 It is using your credits for every minute.
00:07:40 When you hover your mouse over encrypt volume, you will see the message.
00:07:44 Encrypted volumes provide better data security, but will incur a performance penalty and cannot
00:07:49 be resized later.
00:07:50 So unless you need this, don't check this box.
00:07:54 Start Jupyter Notebook.
00:07:55 This will make your life much easier.
00:07:58 And this is the price per hour for our GPU.
00:08:02 So this price will be added these volume prices as well.
00:08:07 After you clicked deploy button, you will see an interface like this.
00:08:11 You can go to the My Pods section and you will see on demand community cloud is being
00:08:17 prepared.
00:08:18 When I click in here, you see it is showing me the messages of the Pod that is being prepared,
00:08:27 what is happening on the Pod.
00:08:28 And once it becomes ready, we will see connect button in here.
00:08:33 So it is initializing the Pod with the necessary installation and the Pod is now ready and
00:08:39 it is running.
00:08:40 Now I will start SD 2.1 version Pod simultaneously.
00:08:45 To do that I am clicking browse servers and when you open browse servers tab, you will
00:08:49 see in the right tab how much credits you are spending right now.
00:08:54 Because currently my other Pod is running, as you can see in My Pods tab.
00:09:00 So let's return back the browse servers and in here there are several options.
00:09:04 So you see there are one GPU Pods, two GPU Pods, large Pods, four GPU or x large Pods,
00:09:12 eight GPUs.
00:09:13 So if you need multiple GPUs, then you can filter them with this.
00:09:16 Also in each Pod, you will see their location, their available upload and download speeds,
00:09:22 their available disks and other things.
00:09:25 Choosing a less used Pod is better because if your previous Pod is fully used, then you
00:09:34 won't be able to get a GPU on that Pod.
00:09:37 So what happens then, then to use your existing files, you need to compose a new Pod and transfer
00:09:45 your files.
00:09:46 So availability is really important when choosing your Pod.
00:09:50 If you choose highly preferred Pod, then you will have lesser time to get it and it will
00:09:57 make things harder for you.
00:09:59 So based on this fact, you should choose your Pod.
00:10:03 So for the SD 2.1 version, I am going to pick another RTX A5000.
00:10:09 When you click more RTX A5000, it displays you other locations as well.
00:10:16 You see the upload and download speeds changes and the available space changes.
00:10:22 More available space probably means that it is being used lesser.
00:10:27 So for Canada server, it looks like it is not very much preferred this particular server.
00:10:33 So there is also Norway server.
00:10:35 You see it has great upload and download speeds.
00:10:38 It has decent hard drive space as well.
00:10:40 So it is probably also not very much used.
00:10:43 However, this is expensive than others.
00:10:46 So I think I will go with this Canada server.
00:10:50 Its speeds are also decent.
00:10:52 Click deploy.
00:10:54 There is one more thing as well that I need to explain.
00:10:57 Community cloud.
00:10:58 So what does community cloud mean that?
00:11:01 In the community cloud section, you will be able to bid for shared servers.
00:11:06 All of the servers are shared, but this is kind of that you bid and if someone overbids
00:11:11 you, they get your GPU.
00:11:14 So in here you see the prices will be lower.
00:11:17 When I click RTX A5000 select and then I click continue.
00:11:22 So you see currently this is selected.
00:11:24 RunPod Stable Diffusion 1.5.
00:11:27 I can also change it from this template.
00:11:29 Don't forget to change template.
00:11:32 When I click continue, you see now I am getting pricing summary and advanced.
00:11:36 When I click advanced, it will allow me to bid for a spot.
00:11:41 So you see the current bid is 0.198.
00:11:44 When I bid this, I will overbid the other person who has bidded lesser than this.
00:11:51 So I am going to get his GPU if there are no available other GPUs.
00:11:56 So let's say we did bid like this and we started our RunPod.
00:12:00 So someone else comes and bids 0.2 and they will get our GPU.
00:12:06 Then our pod will not have any GPU to do inference or training and our training will be also
00:12:13 halted.
00:12:14 So be careful with this.
00:12:15 If you are not going to do training, if you are only going to do image generation, then
00:12:20 you can go with this option and spend lesser.
00:12:23 The running disk cost and exited disk cost also slightly changes.
00:12:28 You can recalculate the cost.
00:12:30 So this is how you do bidding and this is how you use community cloud servers.
00:12:36 Since I am going to do training, I am going to use on demand server and I am going to
00:12:41 pick on demand server from here.
00:12:44 This Canada server.
00:12:46 Let's check again.
00:12:47 Yes, I am going to use this Canada server because it has the most available disk space.
00:12:52 Therefore, I am assuming that it is being used lesser than others.
00:12:57 Click deploy and we have selected RunPod Stable Diffusion 2.1 version.
00:13:01 Let's set our persistent volume as 100 GB and let's also deploy it so it will get deployed.
00:13:08 When I click My Pods, I will see them in here.
00:13:11 OK, when you go on My Pods, it is going to show you if there will be a maintenance or
00:13:18 not.
00:13:19 So you should be careful with this maintenance.
00:13:21 It says that it will start at this local time.
00:13:24 Therefore, I think I will delete this Pod.
00:13:27 So I will just click stop Pod and then I will delete it.
00:13:31 So when you stop your Pod, it will remain as it is.
00:13:34 However, if you click this terminate, then the Pod will be permanently deleted and you
00:13:39 won't be able to recover or access any of your data.
00:13:43 So now it is gone.
00:13:45 Let's go back to the browse servers tab and let's pick another server from here.
00:13:51 Maybe that is why it was being lesser used.
00:13:54 So I will pick this one.
00:13:56 OK, 2.1 version 100 GB.
00:13:59 Let's deploy.
00:14:00 Let's go to the My Pods and it is being deployed.
00:14:03 The first one we started is running.
00:14:06 The other one is being initialized and this is my per hour using credits right now.
00:14:12 OK, let's connect our first Pod.
00:14:14 To connect our first Pod.
00:14:16 I am clicking My Pods.
00:14:18 Let's refresh so you will see the interface as it is.
00:14:20 OK, I am clicking here.
00:14:22 It will open the interface.
00:14:24 When you click logs, it will show you the logs screen.
00:14:27 This is really important to debug the errors that you might encounter.
00:14:31 So it started with xformers with Workspace 1.5. emaonly CKPT file.
00:14:36 Actually, this is not the best CKPT file for training, so I will download the best one
00:14:43 and it is running on xformers 0.0.16.
00:14:47 This xformers is not compatible with DreamBooth training or Textual Inversion training, unfortunately,
00:14:54 so we won't use xformers during training and the other things are also displayed here.
00:14:59 When you click system logs, it will also show you the system logs.
00:15:02 When you click this refresh icon, it will refresh and when you click this X, it will
00:15:07 close it.
00:15:08 So let's click connect and I will connect it via Jupyter Lab, which will make our life
00:15:14 much easier.
00:15:15 OK, so our Jupyter has started like this.
00:15:19 The first thing that I am going to show you is how to change starting command line arguments.
00:15:25 To change them, I am zooming it for you to see easier.
00:15:28 That is webui-user.sh.
00:15:33 So this is the file where the command line arguments are provided.
00:15:38 You see it is starting with default port 3000.
00:15:41 It is starting with xformers.
00:15:43 The default CKPT is provided like this and there is a listen and enable insecure access.
00:15:50 So if you wonder what are these arguments are doing, there is a wiki page of Automatic1111
00:15:55 web UI and you can search for the commands by copying and pasting them and it will show
00:16:02 you launch gradio with 0000 as server name allowing to respond network requests.
00:16:07 Actually, I am going to also add share to be able to use it from my browser like this
00:16:14 and enable insecure extension access means that we will be able to install extensions.
00:16:19 Make sure that these commands are already enabled.
00:16:23 Otherwise, you won't be able to install extensions and I think we are ready.
00:16:28 I will also change the port to not get conflicted with any of the initial starting.
00:16:34 Just save.
00:16:35 When you save, you will see in the bottom saving completed.
00:16:37 Then go to the running terminals and kernels.
00:16:40 Shut down all of the running terminals and then go back to the file browser.
00:16:46 Make sure that you are inside Stable Diffusion web UI folder.
00:16:50 Then start the terminal.
00:16:51 When you start the terminal, it will start with the folder that you are currently in.
00:16:56 You see it is the same as the folder that we are in and in here we will use relauncher.py.
00:17:03 To do that just type python and I will copy paste the name relauncher.py hit enter and
00:17:10 it will restart our web UI with the newest set command line arguments.
00:17:15 We should be able to see them in here.
00:17:17 Yes, we are seeing dash dash port three thousand ten xformers.
00:17:21 So with this way you can also start multiple instances of web UI.
00:17:27 If you are a professional, then you can do that.
00:17:29 But if you are not, I don't suggest you to do that.
00:17:32 Now we can access it from this public URL.
00:17:35 This public URL is currently not secured by a password.
00:17:39 You can also add a password in here I think.
00:17:42 Let me show you.
00:17:43 Yes, you can also set out username and password.
00:17:46 However, if you are not giving this URL to anyone, then it should be safe.
00:17:51 As you can see, our interface is started.
00:17:54 Let's start with typing a simple prompt and see what happens.
00:17:58 OK, I have prepared my prompt.
00:18:01 I hit generate and in My Pods now you will see the GPU memory used is being increased.
00:18:07 GPU utilization will also increase as it generates the images and image is already generated.
00:18:14 Let's set the batch size as eight and batch count as one hundred.
00:18:18 And let's see how it is using our GPU.
00:18:21 So let's hit the refresh.
00:18:23 So it is showing like ten seconds ago.
00:18:25 OK, now you see the GPU utilization is one hundred percent.
00:18:29 GPU memory used is still significantly low because it is using also xformers, even though
00:18:37 we are generating images as batches with eight as batch size.
00:18:42 So in each time it will generate eight images.
00:18:45 So where are these files are being saved?
00:18:49 And how can I see if any error occurs?
00:18:52 You see in the My Pods, just click the logs and you will see all of the logs here.
00:18:58 This is really important to debug the logs.
00:19:00 And in here in the terminal window, you will see what is happening.
00:19:04 So how can you open the terminal.
00:19:05 To open the terminal, go to the running terminals and kernels.
00:19:08 And let's say I have closed the terminal.
00:19:11 I double click the terminal and it will show me the terminal as here.
00:19:16 As you can see in here.
00:19:17 This is equal to the terminal that we have on our computer when we are running it locally
00:19:23 on our computer.
00:19:24 This is the it per second.
00:19:26 However, since we are generating eight images at a time, it is actually over twenty four
00:19:31 it per second.
00:19:33 You need to multiply this with eight.
00:19:35 OK, let's hit the interrupt.
00:19:37 Now I will install the DreamBooth extension.
00:19:39 To do that go to the extension tab.
00:19:41 Go to the available hit load from.
00:19:44 Search DreamBooth, hit install.
00:19:47 Meanwhile, my two point one version terminal is also spending my time, my credit.
00:19:53 So I will just stop it.
00:19:55 So when you click stop Pod you are going to get this message, you should read it and understand
00:20:01 it.
00:20:02 OK, stopped Pod.
00:20:03 Basically what does it says that all of the things that is not saved on your workspace
00:20:09 will be lost.
00:20:11 So whatever you have in your workspace will be saved.
00:20:15 OK, let's see the status of the installation.
00:20:18 OK, it says that installed into workspace, Stable Diffusion, web ui extensions, SD DreamBooth
00:20:24 extension.
00:20:25 Now I will restart my terminal because when you first time install DreamBooth, you really
00:20:30 need to restart terminal so that it can install the necessary dependencies.
00:20:35 So I am going to do terminal stop, shut down all terminals.
00:20:39 Then I am going to Stable Diffusion Web UI folder and in here I will open a new terminal.
00:20:46 Same as before, I will type Python and relauncher.py and hit enter.
00:20:51 So the Web UI has been restarted and now we got a new link.
00:20:56 Let's copy and paste it.
00:20:58 Meanwhile, it is being loaded let's check out the generated images.
00:21:01 So they are saved in the outputs folder in the in the text to image images folder.
00:21:07 And yes, they are in here.
00:21:09 So how to download them?
00:21:10 You can download them one by one, right click and download.
00:21:13 Then it will download like this.
00:21:16 You can alternatively right click and download current folder as an archive.
00:21:20 It will first make archive and it will download all of the images like this.
00:21:25 It is a decent speed and it has downloaded all of these images.
00:21:31 121 files so far.
00:21:32 OK, the interface has been reloaded and now we are seeing the DreamBooth extension.
00:21:38 When we go to the extension tab, check for updates.
00:21:41 We should see the latest version in here.
00:21:44 Actually, it says that it is behind.
00:21:46 So let's click apply and restart UI.
00:21:49 And once we do that, we get an error.
00:21:53 It is relaunching in two seconds.
00:21:55 OK, when relaunching, we are getting port error because the previous one was crashed.
00:22:01 So what I'm going to do is: I will shut down all of the terminals.
00:22:05 Go back to the file browser.
00:22:07 In the first installation, you may encounter such errors.
00:22:10 Go to the webui user.sh file and change the port here and then go to the terminal tab.
00:22:18 Open a new terminal like this and type Python relauncher.py.
00:22:22 it will restart and when restarting now it is showing us the DreamBooth revision and
00:22:28 the SD Web UI revision like this.
00:22:30 I will just start training.
00:22:31 OK, it has been restarted.
00:22:33 Let's open the new URL.
00:22:35 OK, currently it is selected as 1.5 pruned emaonly CKPT and in the DreamBooth tab.
00:22:42 When we are going to generate a new training model, this is only available model.
00:22:48 However, 1.5 pruned CKPT is better than emaonly for training.
00:22:53 Therefore, I am going to download this CKPT file.
00:22:56 So how am I going to download it?
00:22:58 You see there is a download button in here.
00:23:01 I am right clicking and copying link address.
00:23:04 But before doing that, let's start a new terminal.
00:23:06 To do that, I am going to right click new plus icon here.
00:23:10 It will open a new launcher.
00:23:11 Hit terminal.
00:23:13 For fast download I am going to use RunPod CTL.
00:23:17 The RunPod CTL allows us to quickly download or upload files through our Pods to Pods or
00:23:24 from Windows to Pods and vice versa.
00:23:27 There are different versions.
00:23:29 I am going to install the Linux one on my RunPod.
00:23:32 So I am selecting it like this and copying it.
00:23:35 Then in my terminal I am pasting it with control V and I am hitting enter.
00:23:41 It will install the latest RunPod CTL.
00:23:44 After this command type RunPod CTL hit enter and you should get a message like this.
00:23:50 That means that it has been successfully installed or it was already installed.
00:23:55 Now how are we going to download this pruned CKPT file.
00:23:57 To download it first enter where you want to download, which is inside models inside
00:24:05 Stable Diffusion.
00:24:06 And in here where we want to download our model file, then I am going to click this
00:24:12 plus new launcher, launch a new terminal, and in this new terminal, this is the folder
00:24:18 where we are right now.
00:24:19 Now for downloading type wget and copy this URL paste it, hit enter and it will get downloaded
00:24:29 inside this folder.
00:24:30 By the way RunPod CTL is not necessary to download this file, but we will use it to
00:24:37 send data and get data from RunPod to our computer or from computer to RunPod or from
00:24:44 RunPod to RunPod.
00:24:45 This wget is a unix command and also alternative of it is available on windows as well.
00:24:52 So with this wget command, you can quickly download files into your RunPod folders like
00:24:59 this.
00:25:00 So you see currently it is downloading with 90 megabytes per second which is pretty decent
00:25:05 speed.
00:25:06 Okay the download has been completed and now the file is located in here.
00:25:10 Then what are we going to do is hit refresh button here and now I can see the 1.5 pruned
00:25:18 CKPT as well.
00:25:19 This is the way to download models from Hugging Face or wherever they are hosted.
00:25:24 If you can get direct link of it I will show examples.
00:25:28 Don't worry.
00:25:29 So now I will start DreamBooth training with the best possible settings.
00:25:33 First let's switch to 1.5 pruned CKPT.
00:25:36 This is not necessary but I'm not being sure that it is working as expected.
00:25:40 So I am making sure I have selected the target model in here as well.
00:25:45 So it has been loaded.
00:25:47 If it doesn't get loaded.
00:25:48 You should check the terminal window.
00:25:50 It is running on here.
00:25:52 It will show what is happening and you can also check the logs window in here.
00:25:57 It will show what is happening.
00:25:59 Okay now let's give a name to our training.
00:26:01 Let's say test SD 15 and check the source point.
00:26:05 So you see it is not seeing my latest checkpoint.
00:26:07 I am clicking refresh and I am checking the latest checkpoint.
00:26:11 This is very good to teach faces.
00:26:14 1.5 pruned CKPT the 512x model is selected and hit create model.
00:26:20 I am not changing other parameters because optimal parameters are currently selected.
00:26:26 These are more like experimental things or things that for more professional people.
00:26:31 And in the terminal you see it is downloading the necessary files right now.
00:26:36 That is why it is waiting.
00:26:37 Okay it says that checkpoint successfully extracted.
00:26:41 So the model has been generated.
00:26:42 However as you can see, the interface is frozen.
00:26:46 Unfortunately this is a problem of Gradio.
00:26:49 So what are we going to do is we will refresh reload this page and now it says no interface
00:26:54 is running.
00:26:55 It looks like the interface has been terminated unexpectedly.
00:27:01 And what do we see in the terminal in here in the system logs.
00:27:05 Okay it doesn't show anything and it doesn't show anything in here either.
00:27:09 So let's check out our terminals.
00:27:11 Terminal one which is our main terminal and yes it is not showing.
00:27:17 So what can we do.
00:27:19 We need to restart.
00:27:20 To restart I will shut down all terminals and I will follow the same procedure.
00:27:24 Open terminal.
00:27:26 However currently we are inside model Stable Diffusion so it won't work.
00:27:29 We need to move to the parent folder.
00:27:32 To moving parent folder.
00:27:33 I am closing this terminal going to the folders tab.
00:27:37 I am navigating like this opening a new terminal.
00:27:40 Python relauncher.py and in my pod current GPU memory usage is only 11 percent.
00:27:46 So it is good, which means that no other terminal or instance of Web UI is running.
00:27:52 Also there are some warning messages here.
00:27:55 I think we could ignore them.
00:27:57 Okay it has started.
00:27:58 I am opening this URL.
00:28:00 I am going DreamBooth tab and now I will select my model because I already created it and
00:28:06 it is selected.
00:28:07 Let's set up the settings.
00:28:09 Okay I won't pick this checkbox because it is usually causing me problems.
00:28:14 How many steps per image.
00:28:15 I am going to use 12 images and I am going to train up to 200 epochs.
00:28:20 I will save model for every 10 epoch.
00:28:24 Be careful with this because each save will take about five gigabyte space and with every
00:28:32 10 epoch, it is going to make 20 saves.
00:28:35 So it is going to take all of my hard drive.
00:28:38 So I think I will make this up to 180 or 160.
00:28:42 This should be sufficient.
00:28:44 If you don't know what are these parameters, how am I setting them.
00:28:48 I have an excellent DreamBooth tutorial on my YouTube channel.
00:28:53 You should watch this definitely to learn more about DreamBooth training.
00:28:57 Okay the batch size is one.
00:28:59 Gradient accumulation steps are one class batch size which will determine how many images
00:29:04 at a time that I want to be generated for classification images not related to training.
00:29:11 I will set this as 16 because this graphic card has huge VRAM, but if we get error, I
00:29:17 will reduce it.
00:29:18 Set gradients to none when zeroing.
00:29:20 Okay correct.
00:29:21 I am going to use half learning rate.
00:29:24 I am going to use sanity prompt as photo of ohwx man by Tomer Hanuka.
00:29:30 I will explain what are these for.
00:29:33 Actually I am explaining what are these for in this tutorial with details.
00:29:37 This is for checking the over trained or not.
00:29:41 And in here I am going to use EMA.
00:29:43 This will improve my training success rate and I have 24 gigabyte VRAM.
00:29:48 I will use 8 bit adam.
00:29:50 I am going to use mixed precision and I am going to use fp16 because this bf16 is not
00:29:57 supported by all graphic cards.
00:29:58 It is supported by RTX 2000 series or 3000 series.
00:30:03 I am not sure about this card as well.
00:30:05 So fp16 is our most safe option for every cards.
00:30:09 I am not going to use xformers.
00:30:11 This is important because the current xformers is not supporting the DreamBooth training
00:30:17 or Textual Inversion training.
00:30:18 It is you see, xformers 0016.
00:30:21 I think it will become compatible with xformers 0017 when it is officially released.
00:30:28 Currently nightly version is supporting as well as far as I know.
00:30:32 Cache latents.
00:30:33 Yes it will improve speed.
00:30:35 Train UNET.
00:30:36 Okay these are the optimal settings actually, so no need to change them.
00:30:40 And in here concepts.
00:30:41 Okay first we need to upload our training data set.
00:30:44 To do that go to the Stable Diffusion web ui folder or workspace.
00:30:48 Doesn't matter I will upload them to workspace.
00:30:51 In here create new folder training data set.
00:30:55 I have named the folder like this.
00:30:57 Enter inside folder and click upload files.
00:31:00 Select the files from your computer.
00:31:03 Since I don't have many files currently I am going to use this method and you see I
00:31:08 have only nine images which are pretty close shots.
00:31:12 No same background.
00:31:13 No same clothes as you can see.
00:31:16 I am all explaining what is a good training data set in this video and they are getting
00:31:21 uploaded.
00:31:22 We could also use runpodctl.
00:31:24 However, since there isn't many files, I am using this methodology for this task and our
00:31:31 training data set is ready.
00:31:33 Okay now we need to give the path of it.
00:31:35 To give the path of it.
00:31:36 Go back to the workspace like this: right click, copy path, paste it like this and put
00:31:42 a backslash to the beginning of it and where we want regularization images to be generated.
00:31:48 I am copy pasting like this and I will type classification images.
00:31:53 Okay filewords.
00:31:55 For training faces I am not using filewords.
00:31:57 It is more likely needed to fine-tune your model with lots of tokens and lots of good
00:32:05 images.
00:32:06 If you wonder how filewords are working in this short video.
00:32:09 I am explaining how file words are actually working.
00:32:13 So I'm just skipping file words and I am going to prompts.
00:32:17 So our instance prompt will be ohwx man.
00:32:20 Ohwx is our rare token and man is our class.
00:32:24 Class prompt will be photo of man since I am teaching a face of a man.
00:32:29 Sample prompt will be simply photo of ohwx man.
00:32:33 I am not going to set negative prompt or other things.
00:32:36 How many classification regularization images we want for per training image.
00:32:42 I have nine training images and I want 50 for per image.
00:32:46 This is actually a debated topic, how many is good is not precise.
00:32:52 In the official DreamBooth paper, the authors have used 200 so you can also try with 100
00:32:58 like this as well.
00:32:59 Okay then go to the saving tab, generate a ckpt file when saving during training.
00:33:04 So we will be able to generate checkpoints for every 10 epochs and then we will be able
00:33:11 to compare them to see which one of the checkpoint is performing best, which one of the checkpoint
00:33:19 has learned our subject best and with this way you can avoid over training.
00:33:26 And once you are ready, click save settings and hit train.
00:33:30 First it will start with generating class images.
00:33:32 In my pod I will see GPU utilization and memory usage.
00:33:35 Okay it says that exception training model no executable batch size found reached zero.
00:33:42 Why we got this error because we did set the classification images batch size pretty big.
00:33:50 If you make it like let's say six and try again.
00:33:54 And now I am seeing that it is generating six images at a time.
00:33:59 The it is pretty low actually only 12 because we need to multiply this with six and we are
00:34:07 seeing the images are being generated.
00:34:09 They will be saved in workspace, classification images directory like this: if you have previously
00:34:18 generated images on your computer, then you can alternatively upload them.
00:34:22 For uploading them I will install runpodctl on my windows.
00:34:28 To do that I am going to run this command on my windows powershell.
00:34:33 Type powershell, right click and hit enter.
00:34:37 Okay the installation has been completed, the runpodctl is now available on my command
00:34:42 prompt: let's see, runpodctl and now I am seeing it.
00:34:48 So I have previously generated 2400 images on my hard drive.
00:34:54 I am going to share this with runpodctl to download them in RunPod.
00:34:59 Alternatively, you can use upload methodology as well.
00:35:03 It also works, but for bigger files, runpodctl is better.
00:35:08 So for sharing the folder type runpodctl send and the folder path like this.
00:35:16 Getting the folder path easier, copy the folder path from here, paste it into the notepad
00:35:22 like this.
00:35:23 Put quotation mark to the beginning and end and type in your cmd.
00:35:29 I will show from beginning once again.
00:35:32 Open cmd type runpodctl send and paste the path like this and it will prepare like it.
00:35:39 It says that photo of man zip already exists because in another cmd window we used that.
00:35:46 So I need to delete this file.
00:35:49 Okay, this zip file is generated inside local disk c users and my username directory.
00:35:56 I am just going to delete it and I will run the command once again.
00:36:00 It will quickly prepare all of the files and now share link is generated I am copying this,
00:36:07 selecting it ctrl c or select it right click from here and copy, then go back to your Jupyter
00:36:14 Lab where your RunPod is running and in here I will make a new folder like this: ready
00:36:21 class.
00:36:22 I will enter inside ready class folder, then I will open a new terminal like this and I
00:36:28 will paste the command.
00:36:31 You see runpodctl receive the URL it has generated, hit enter.
00:36:36 It will connect to my computer and it will start downloading all of the files very quickly.
00:36:41 So this is how you can upload files from your computer to the remote RunPod.
00:36:48 The same thing applies to the RunPod to RunPod, so this is all vice versa.
00:36:53 RunPod to computer, RunPod to RunPod computer to RunPod.
00:36:57 You can send and receive files like this.
00:37:00 This is of course totally depends on my upload speed.
00:37:03 So when I open my task manager I see that it is using all of my available upload speed
00:37:09 like this.
00:37:10 This is pretty useful and convenient.
00:37:13 Instead of generating new classification images each time which uses your GPU time and consumes
00:37:20 your credits, you can prepare them on your computer and then quickly upload them to your
00:37:25 RunPod.
00:37:27 You can also upload them to any hosting, website, or other places that has better upload speed
00:37:33 and download them with the wget command as I have shown to download ckpt file.
00:37:41 RunPodCTL is extremely useful to upload and download files as you can see.
00:37:48 Okay 2400 photo of man.
00:37:51 The classification regularization images upload have been completed.
00:37:55 Now I see that it is uploaded as a zip here.
00:37:59 I need to extract them like oh.
00:38:01 It has automatically extracted as you can see after refresh.
00:38:05 Now they are here.
00:38:07 So what am I going to do is I will cancel training and I will give this folder.
00:38:13 So I will just skip image generation.
00:38:16 So it has been cancelled.
00:38:17 Let's give the new folder.
00:38:20 In concepts type here new folder name and click save settings and okay looks like the
00:38:27 train button is not appeared.
00:38:29 So what we need to do is we need to refresh reload.
00:38:34 Okay reloaded.
00:38:35 Go to the DreamBooth select the model, hit load settings, verify the settings are properly
00:38:41 loaded.
00:38:42 Okay, this is not being saved so you should uncheck it.
00:38:45 Okay, all settings are looking good and click train.
00:38:49 Now it won't generate any new classification regularization images because we already provided.
00:38:54 We can see that in the terminal window in here.
00:38:58 So you see it is processing the uploaded photo of man images.
00:39:02 Then it is going to cache the classification images with caching latents.
00:39:07 Okay, the training has started.
00:39:09 It has a pretty good speed as you can see.
00:39:13 It is supposed to do 180 epochs in less than 15 minutes.
00:39:17 However, this will take a little bit more time because it will generate ckpt during
00:39:22 the training.
00:39:23 We can also watch the training here.
00:39:25 However, you may get disconnected from gradio interface.
00:39:30 You can just watch the command line interface from here and know the status of the training
00:39:35 if that happens.
00:39:37 Okay, 10 epochs have been completed so it started generating the initial images as you
00:39:42 can see.
00:39:43 It also generated a checkpoint.
00:39:45 Where can we see the checkpoint?
00:39:47 Go to the workspace, go to the Stable Diffusion Web UI, go to the models folder, go to the
00:39:52 Stable Diffusion folder, and in here you will see our training name, go inside that folder
00:39:58 and now we can see the checkpoints being generated.
00:40:01 Then we will test each one of them with x/y plot and see how they are performing.
00:40:07 So if you want to see the generated samples during training, go to the models folder,
00:40:12 go to the DreamBooth folder, go to your training named folder, and in here you will see samples.
00:40:18 So these are the samples being generated during training and when you click the txt file,
00:40:24 you will see which prompt was used to generate this image.
00:40:27 When you double click the image, it will open image like this.
00:40:30 So far it is not like me at the moment.
00:40:34 When you go to the My Pods, you can see the GPU utilization and GPU memory being used.
00:40:39 The GPU memory is almost full because we are using EMA and we are not using xformers.
00:40:45 Because in the settings tab, we checked to use EMA and in the memory attention we didn't
00:40:49 use xformers.
00:40:50 And do, and these two are heavily increasing the memory usage.
00:40:55 Also, we didn't check the gradient checkpointing.
00:40:58 This also reduces the VRAM usage.
00:41:01 However, if you have sufficient amount of VRAM you shouldn't check this as well.
00:41:05 Okay, even after 130 epochs, it is still not learning even though it shows a good loss
00:41:11 rate.
00:41:12 That means that there is a bug currently with DreamBooth extension.
00:41:16 Therefore, I have cancelled the training.
00:41:18 Now I will delete the folder to open a space.
00:41:22 Right click folder.
00:41:23 Delete it.
00:41:24 It says that it is not empty so you can't delete it.
00:41:27 However, we can.
00:41:28 Now I will show you how to do it.
00:41:30 Click new, open a new terminal type rm minus r and the directory name test sd15.
00:41:39 It will iteratively delete all of the files and the folder.
00:41:42 After we refresh it is gone.
00:41:44 Now I will figure out the problem and show you the working settings and setup.
00:41:50 So I have figured out the problem and the problem was exactly as I have guessed it.
00:41:55 It was using xformers even though we didn't select use xformers.
00:42:02 In the settings, we had used memory attention default.
00:42:06 However, it was still using xformers.
00:42:09 Wo what did I do to fix this problem?
00:42:12 It is simple.
00:42:13 I have opened the Web UI user dot sh file and I have removed the dash dash xformers
00:42:21 from command line arguments.
00:42:22 I have restarted my Web UI.
00:42:25 Then I have composed a new training with the exactly same parameters and it did work very
00:42:31 well.
00:42:32 The training has been completed, so let's download the samples and check them out on
00:42:37 our computer.
00:42:38 To download the folder of samples, I will use runpodctl command.
00:42:42 So what I need to do is I will enter the samples folders.
00:42:47 So to do that, go to the models folder, go to the DreamBooth, go to the training folder
00:42:52 name so the samples are located here.
00:42:55 Open a new command terminal, write runpodctl send samples which is the folder name and
00:43:03 it will zip the samples folder and generate a receive command.
00:43:07 Copy it with ctrl c.
00:43:09 First, I need to add the path of runpodctl into my environment.
00:43:15 So the currently runpodctl exe is located inside my user folder.
00:43:21 Go to the users and your username and I will copy the runpot, yaml and runpodctl exe file.
00:43:27 Copy them.
00:43:28 Then I will make a new folder in my c drive as runpot exe.
00:43:33 Paste them here.
00:43:34 Then in the search bar search for environment, it will open, edit environment variables like
00:43:41 here and in here.
00:43:42 I am going to add a path variable for system variables, so go to the path, click edit and
00:43:49 in here click browse, select the folder where you have copy pasted which is inside c drive.
00:43:56 Runpod exe click ok, now the runpod exe is registered in my path.
00:44:02 Click ok, click ok click ok and now runpod exe should be available to call from everywhere.
00:44:09 Where I want to download.
00:44:10 I want to download the files inside my pictures, inside test samples.
00:44:16 I type cmd here.
00:44:18 So currently this is where I am.
00:44:20 Now I will copy and paste this command into my cmd window.
00:44:25 And yes, it is running as expected and the files are being copied into my folder.
00:44:33 And then they are automatically extracted with the folder name.
00:44:36 So in here we are able to see the generated sample images.
00:44:40 I can say that after 800 steps it started to resemble me and we have totally trained
00:44:48 it for 160 epochs, 3200 steps, we can see the examples here.
00:44:55 Okay, this is pretty much like me, so with good prompting I think we can get good results.
00:45:03 So let's try all of the checkpoints to see which one is working best.
00:45:07 How are we going to do that?
00:45:09 We are going to do that with text to image tab and in here we are going to use x/y/z
00:45:14 plot.
00:45:15 Okay, it didn't appear.
00:45:16 Let's refresh.
00:45:17 Oh, looks like our instance is closed so I will restart.
00:45:22 So before restarting make sure that you have closed all of the running terminals and I
00:45:27 will also close all of the open tabs.
00:45:29 Okay, all of the tabs and terminals are closed.
00:45:33 Okay, Web UI is restarted.
00:45:35 Let's open it.
00:45:36 Okay, now we can also see the checkpoints in here so you can test particularly one of
00:45:43 them.
00:45:44 But I am going to do xyz plot test.
00:45:47 But before that, let's decide our testing prompt.
00:45:51 So I am going to make my tests on 2200 step checkpoint.
00:45:58 I am going to select it from here.
00:46:00 First, let's see the raw prompt.
00:46:02 Ohwx man.
00:46:03 Okay, this is the raw prompt and it looks pretty decent.
00:46:08 This is the training data set you see.
00:46:11 It looks pretty decent, but it looks like have some memorization.
00:46:15 Actually, not exactly memorization.
00:46:18 The clothe is similar but not exactly same.
00:46:20 Okay, while doing testing, my Web UI has been killed.
00:46:24 So I have checked the terminal to see the message.
00:46:28 So you should be careful if some error happens.
00:46:31 Make sure to check the terminal to see what is happening in the behind the scenes and
00:46:36 now it is not able to restart.
00:46:39 Therefore, I will close all of the terminals and start with a different port.
00:46:46 To do that, you need to go to the terminals tab, shut down all and change the webui user.sh
00:46:53 file: change the port from here.
00:46:56 Save and restart.
00:46:57 Okay, I got a simple prompt like this: photo of ohwx man 1.2 emphasis: you can learn emphasis
00:47:05 from wiki page of Automatic1111.
00:47:09 Just pause the video and read here if you don't know.
00:47:12 And digital painting, artstation, masterpiece.
00:47:14 I don't have any negative prompts.
00:47:17 The picture is not exactly like me.
00:47:19 So now we are ready to do test and see if model is trained enough.
00:47:23 If it is not trained enough, then go to the DreamBooth tab, select the model load settings
00:47:29 and continue training.
00:47:31 It will continue training for the number of steps that you have defined in here.
00:47:36 Okay, I started continue training and it will start from this model revision which means
00:47:42 it will start from 3200 steps and it will continue to do training for number of epochs
00:47:50 that we have defined here.
00:47:52 However, my Gradio is crashed once again and I am able to see the continuing training from
00:47:59 here.
00:48:00 Now let's test the current checkpoints and see whether they are trained enough or not
00:48:05 and decide upon that to continue training or not.
00:48:08 However, since my Gradio is crashed, I have to restart the terminal because there is no
00:48:14 way to cancel the training right now.
00:48:16 Let's also have yes no way.
00:48:18 Okay, I did a restart.
00:48:20 So how are we going to test different checkpoints?
00:48:24 Prompt emphasis, and CFG values.
00:48:26 Go to the bottom, pick x/y/z plot and in here you see there are different type of parameters.
00:48:33 So first parameter will be checkpoint name.
00:48:36 When you click this icon it will paste the available checkpoints.
00:48:40 I am going to start picking from 1600 steps which means 80 epochs for me.
00:48:46 It depends on the your training dataset size and I will test the remaining as well like
00:48:52 this.
00:48:53 It is also displaying the calculated hash value.
00:48:56 Okay, as a second thing, I am going to test prompt strength.
00:49:00 To do that, I am going to use prompt s/r.
00:49:02 So I am going to give this a any keyword like prsr.
00:49:07 So the first value here will be prsr.
00:49:10 Then I will type the prompt strengths like 1.1, 1.2, 1.3 let's also try 1.0, 1.4 1.5,
00:49:20 1.6 and 1.7 okay, as a third comparison thing, I am going to test CFG value.
00:49:28 So for CFG values, I am going to test seven, seven point five, eight, eight point five,
00:49:34 nine, nine point five and ten.
00:49:37 If you keep minus one for seeds then you won't be able to compare them very well.
00:49:42 So do not check this checkbox so it will use same seed for all of the comparisons and then
00:49:48 when you click generate it will process all of them.
00:49:52 You can see the process in the command line interface.
00:49:56 Now meanwhile this is running I will start my 2.1 version RunPod as well.
00:50:02 Okay, it says that there is no available GPU for this RunPod right now so I can start it
00:50:09 without a GPU and transfer my files with runpodctl.
00:50:14 However, I do not have any files on it so I will just delete it because I didn't even
00:50:20 start it yet and I will start a new one.
00:50:22 Okay, I am going to use this one.
00:50:26 Select the template from here.
00:50:27 I will pick Stable Diffusion 2.1 version: I will start with 100 gigabytes, deploy my
00:50:34 pods, it is being initialized and my other Pod is currently working with this kind of
00:50:40 i/t.
00:50:41 By the way, xformers is still not enabled right now, so if you enable it, this will
00:50:46 become even faster.
00:50:47 But for training, make sure that you have disabled it.
00:50:50 And images are being generated in here.
00:50:52 We will download all of them and check all of them later.
00:50:56 Okay, 2.1 version is being generated and getting ready.
00:51:01 Okay, 2.1 is now ready.
00:51:03 Just click connect.
00:51:05 Connect to the Jupyter.
00:51:06 Okay, it says that it cannot connect yet so it is probably still not ready.
00:51:10 Let's wait.
00:51:11 Try again.
00:51:13 Okay, let's refresh the page.
00:51:14 Maybe the URL is incorrect.
00:51:17 Yes, after the refresh I think it is fixed or it is just started.
00:51:21 So just be patient a little bit.
00:51:23 It is getting loaded and yes, 2.1 version is started.
00:51:27 It is exactly same as the previous one.
00:51:31 We are editing the command line arguments here.
00:51:33 I will add dash dash share so I can use it as I want.
00:51:37 And I will also remove xformers because it is preventing training.
00:51:42 I will set the port as 3001.
00:51:45 Save it.
00:51:46 Then there is no open terminals.
00:51:49 Let's open a new launcher terminal python relauncher.py Our comparison on SD 1.5 trained
00:51:57 models are continuing.
00:51:58 Okay, 2.1 RunPod is ready.
00:52:02 Let's start it.
00:52:03 Okay, currently selected model is 2.1 version.
00:52:06 Let's test it.
00:52:07 Okay, I have written my prompt the the output resolution is 768 and 768.
00:52:14 Looks like we got a problem.
00:52:17 It says that a tensor with all NaNs was produced in Unet.
00:52:21 So we need to add no half command to the command line because with this graphic card, otherwise
00:52:28 it won't work.
00:52:29 So let's go back to the RunPod.
00:52:31 Open the webui dash user dash sh.
00:52:34 So for SD 2.1 version, make sure that you are using these command line arguments.
00:52:41 These may be necessary for some of the custom models as well.
00:52:44 So check the messages that you see in here.
00:52:47 This message should be available also in the terminal window.
00:52:51 Yes, you can also see the error in here as well.
00:52:55 So I will close the terminal and restart it.
00:52:58 Currently I am spending 0.669 dollars per hour.
00:53:03 My mode of the RunPods are running right now.
00:53:06 Okay, it looks like I have mistyped the dash dash precision.
00:53:12 So it says that argument precision expected one argument.
00:53:15 I will just fix it quickly.
00:53:17 To fix it, I am opening the file and I am setting dash dash precision as full, saving
00:53:24 it and restarting.
00:53:26 Make sure that you only have one active running terminal, otherwise other terminals will also
00:53:32 consume your VRAM memory.
00:53:35 You can also see the VRAM memory usage in your My Pods tab and you can see the logs
00:53:41 from here.
00:53:42 This is really important to debug the errors.
00:53:44 Okay, it is started with these command line arguments exactly like this.
00:53:49 Let's open the Gradio window.
00:53:51 Okay, let's hit generate with our written prompt and it is getting generated.
00:53:57 And we got our tank image.
00:54:00 Now I will install the extension same exactly as I have did.
00:54:04 Okay, 2.1 version is ready with DreamBooth now.
00:54:07 Go to the DreamBooth tab, make a new model I will name as test select the source checkpoint.
00:54:13 Uncheck 512 model.
00:54:14 Hit create.
00:54:15 When the first time you click hit create it is downloading the necessary files same as
00:54:19 before because this is a new RunPod so they are not connected.
00:54:23 This is a fresh installation and checkpoint successfully extracted so it is ready.
00:54:29 Okay, we didn't get any error so we can continue.
00:54:32 So for 2.1 version usually you need more epochs so I will set this as 300.
00:54:38 However, now it will also use more space.
00:54:42 Due to more epochs so I need to reduce save model frequency.
00:54:47 I think I will save it for every 20 epochs.
00:54:50 Batch size one, gradient accumulation one, class batch size will be four.
00:54:55 I am not going to set gradient checkpointing.
00:54:58 You can also leave it as default learning rate.
00:55:01 This would make it learn faster, however, it may also not learn very well or it may
00:55:07 get over trained quickly.
00:55:09 So I will make this as one.
00:55:10 But you can also leave it as default.
00:55:13 So the other things are same.
00:55:14 Now with 2.1 version, I don't know if 24 gigabytes will be enough without xformers when we use
00:55:20 EMA so I will test it.
00:55:23 Okay, it says let's make it like this.
00:55:26 Actually, we should click performance wizard so it will set the optimal ones for us.
00:55:32 Okay, okay, I am leaving the settings like this.
00:55:35 Let's also set the memory attention as default and let's see if it will work.
00:55:39 By the way, we also need to re-upload our training images and these training images
00:55:44 have to be 768 pixels because this model is 768 pixels model.
00:55:52 So to upload them I am following just the same things.
00:55:56 Here my 768 pixel images.
00:55:59 I'm just going to use drag and drop but you can use runpodctl as well as just I have displayed.
00:56:06 Okay they are ready.
00:56:07 So I am right clicking copy path, paste it, adding a backslash to the beginning.
00:56:12 Copy this and let's say class 786.
00:56:17 All other settings are same.
00:56:19 Ohwx man, photo of man, photo of ohwx man and I will use only 12 images because I want
00:56:28 training to start quickly but you should use bigger number.
00:56:32 I am checking generate ckpt during the checkpoints, click save settings, and hit train.
00:56:39 So it will start with generating class images.
00:56:42 So for each image we are generating 12.
00:56:44 Okay, we got an error.
00:56:46 Therefore, we need to decrease the class batch size.
00:56:49 Let's hit train again.
00:56:50 Okay, looks like our Gradio is killed, therefore it has to be restarted.
00:56:55 You may get these errors.
00:56:56 Okay, during restart, it is throwing error because port is still being in used.
00:57:01 So I am going to close the terminal, change the port and restart myself manually.
00:57:07 Okay, restart has been completed.
00:57:09 Let's go to the DreamBooth, select model load settings, just quickly verify settings.
00:57:14 I am unchecking this because it is usually problematic.
00:57:17 Class batch size is two and let's hit train.
00:57:21 You can also generate classification images from text to image directly yourself.
00:57:25 Cut the generated images and put them into a new folder.
00:57:28 Okay, we got error once again.
00:57:30 This is a memory error actually.
00:57:32 When we check the command line interface, we can see the memory error.
00:57:37 So looks like our only option is class batch size one.
00:57:40 Let's click train.
00:57:41 Okay, it is working.
00:57:42 However, this will be very slow.
00:57:44 So what am I going to do is?
00:57:45 I will enable xformers, manually generate from text to image and use them as classification
00:57:51 images which will save our time significantly.
00:57:55 So follow me how am I doing.
00:57:57 First, I will just terminate the terminal from here.
00:58:00 I will add dash dash xformers, change the port and restart python relauncher.py I would
00:58:08 also clear text to images tab so you can directly use it so I will just rename it.
00:58:14 It will generate a new folder for me and new app is started with xformers.
00:58:19 Let's open it!
00:58:20 So our class prompt is photo of man.
00:58:23 I am typing photo of man.
00:58:25 I am going to set the sampling steps as 30 which is a decent enough and I am leaving
00:58:30 all other options are same and I will use batch size as eight and how many images total
00:58:37 do you need?
00:58:38 Let's say for per training image 50 images since I have nine images, I am going to generate
00:58:44 480 images.
00:58:46 Therefore I need to set this minimum 57 and then hit generate and let's see if we will
00:58:53 get out of memory error.
00:58:55 And you see from text to image tab we are not getting out of memory error even when
00:59:01 the batch size is eight.
00:59:03 So it will very quickly generate all of these images for us much faster than using the classification
00:59:11 images that is being generated in the DreamBooth.
00:59:15 If you wonder why it is generating images like this or why we are using these kind of
00:59:19 images, in this video I am explaining all of them so we are keeping the underlying contextual
00:59:25 data of the model.
00:59:27 You could also use more beautiful images in your classification training data set.
00:59:32 However, it would break your model conceptual meaning so your model would become more biased
00:59:40 to the images that you have used.
00:59:42 Also, your face would be biased to the images that you use.
00:59:46 With this methodology, we are using the underlying contextual knowledge of the model and we are
00:59:53 trying to keep it as much as possible.
00:59:55 However, this is up to you.
00:59:58 So if you use all handsome images, all full colored, professional real images, then your
01:00:06 model would become more biased to them.
01:00:08 This is how custom models are usually made.
01:00:12 They are being cooked to those kind of images.
01:00:15 So whatever you type, you are getting beautiful images because all of the other underlying
01:00:20 conceptual data of the model is lost during the training.
01:00:25 Actually, according to the ControlNet developer, SD 2.1 version is inferior to the SD 1.5 due
01:00:35 to the used CLIP.
01:00:36 You can read this with pausing the video right now.
01:00:39 Okay, looks like our 1.5 version experiment has ended.
01:00:44 Let's go to the outputs and in here there are text to image grids and you see there
01:00:51 is a grid file.
01:00:53 35 megabytes.
01:00:54 Let's open it.
01:00:55 Actually I will download this and there is also 228 megabytes.
01:01:00 So for downloading let's use the runpodctl.
01:01:04 I am going to open a new command line in here.
01:01:07 Runpodctl, send text to image grids.
01:01:11 Hit enter and it will generate download link for us.
01:01:15 Go to the folder where you want to download.
01:01:17 I will download inside in here, type cmd, copy paste the link like this.
01:01:22 So it is going to download 265 megabyte grid output.
01:01:27 This is much faster than downloading from the Jupyter notebook.
01:01:31 Okay, the grid images are downloaded.
01:01:34 And in here this is the newest grid image that is generated.
01:01:39 It is over 200 megabytes, it is over 35 000 pixels and now we are able to compare different
01:01:47 checkpoints with different prompt emphasis and with different CFG scale.
01:01:53 So this is for CFG scale 7.
01:01:55 These are the checkpoints and these are the prompt emphasis.
01:02:00 Let's find the best one that we like and that is similar to us.
01:02:05 You see these faces are not like me but in here I am seeing faces like me.
01:02:12 So with prompt strength 1.4 in these checkpoints I am starting to get similar face like to
01:02:19 me.
01:02:20 I think this one is very similar to me.
01:02:22 So with prompt strength 1.4 for CFG scale 7 and for checkpoint 3000 steps.
01:02:30 Yeah I like it.
01:02:31 So you should also compare for yourself.
01:02:34 And after prompt strength 1.4 the image becomes very very bad.
01:02:39 So let's also look at the other CFG scales and checkpoints.
01:02:44 Okay now I will show you slowly what is happening from CFG scale 10 to 7 and this is the prompt
01:02:51 strength 1.4.
01:02:53 This is how the images are changing.
01:02:55 This would of course depend on your training data set, how it is trained and I can see
01:03:01 that they are not very good at all because we also didn't use any negative prompts.
01:03:08 Our aim here is finding the sweet spot of prompt strength and the checkpoint and the
01:03:16 CFG possibly.
01:03:18 Okay I think this model is still not trained enough.
01:03:22 Because with only 1.4 strength and in the 3200 steps, it is providing the best.
01:03:31 So therefore I will train this model even further with more steps and then do another
01:03:38 experiment.
01:03:39 However, currently we could use 1.4 strength with checkpoint 3200.
01:03:45 I suggest you to test no half and precision full training for SD 1.5 version as well without
01:03:54 xformers and compare whether it is learning better or not.
01:04:00 Because of the used graphic card this could be making a difference and you can test use
01:04:06 8bit adam or not.
01:04:08 You can test mixed precision no versus fp16 and bf16 so these all things could improve
01:04:16 your training success rate.
01:04:18 You should experiment with them and currently I do not have time to test all of them.
01:04:24 I am showing the some of the settings that are widely used but you should also experiment
01:04:30 with them.
01:04:31 Like options like this or like this or like this or like this.
01:04:36 Now I will show you how to download custom models from CivitAI .com and use them in your
01:04:43 RunPod io.
01:04:44 So I am going to show example of Protogen x3.4.
01:04:49 Right click download latest copy link, go to your RunPod io interface, Jupyter interface
01:04:57 and in here go to the folder where the model files are downloaded.
01:05:02 So in this folder which is model Stable Diffusion where you are supposed to put your model files,
01:05:09 open a new launcher, open launcher, type wget, paste the link and hit enter and it will start
01:05:16 downloading the model file.
01:05:18 So you see 5.6 gigabytes and you see there are no more space left on my hard drive.
01:05:26 What I need to do is I will delete the some of the models.
01:05:31 So I am going to delete some of the training checkpoints.
01:05:34 They are located inside models, inside Stable Diffusion, inside my training folder and in
01:05:41 here I am going to remove delete some of them.
01:05:43 You can also do a directory delete right, click and delete.
01:05:47 You can also select them and hit delete button on your keyboard.
01:05:51 Okay, I think we got now sufficient space so I will just rerun the prompt.
01:05:56 So to open back the latest executed command I just hit up arrow and hit enter and now
01:06:02 it will start downloading.
01:06:03 Currently it will be downloaded in this folder where we had opened this terminal.
01:06:10 Let's go back to there.
01:06:11 Models Stable Diffusion and now this file is being downloaded with the name of 4048.
01:06:19 Then I will rename it.
01:06:21 Meanwhile, 2.1 version classification regularization images are still being generated.
01:06:26 We can see the process in the terminal of it.
01:06:30 You see it has generated over 160 images so far.
01:06:34 Okay, it is downloading the custom model file with 50 megabytes per second.
01:06:39 You can also upload those files from your computer or you can download from Hugging
01:06:45 Face as I have shown you already.
01:06:48 So this is how you can download files fast on your Pod.
01:06:52 Okay, the file has been downloaded and saved as 4048.
01:06:57 I will rename right click, rename and let's say protogen x34 it is renamed.
01:07:05 Then let's go back to our Stable Diffusion interface.
01:07:08 Click, refresh folder.
01:07:10 It is not appearing because the model file extension is not correct.
01:07:14 Right Click.
01:07:15 And when renaming, add dot ckpt to end of it like this and then refresh again.
01:07:23 Okay, now we see the model here.
01:07:25 Let's test it.
01:07:26 Okay, it didn't load even though I have selected.
01:07:29 Let's look at the command line interface.
01:07:31 Okay, it says that we should add disable safe unpickle because we have downloaded it like
01:07:38 that.
01:07:39 So I will add this to the command line arguments and restart like this.
01:07:44 Let's also change the port.
01:07:46 Just close all of the terminals.
01:07:47 Okay, restart has been completed with disable safe unpickle.
01:07:51 Let's open the interface.
01:07:53 Okay, let's try with protogen.
01:07:55 Okay, we got error once again because when we download it, it is downloading safetensors
01:08:02 not ckpt.
01:08:03 Therefore, we have to rename it once again into safe tensors .safetensors like this and
01:08:11 try again.
01:08:12 Let's hit refresh.
01:08:13 Now there is safetensors.
01:08:15 Okay, it is loaded.
01:08:16 Let's test it and protogen is working as expected.
01:08:20 You see of awesome, intricate, fantastic, castle, in a forest and this is what I got.
01:08:25 Let's run again.
01:08:26 And yes, this is definitely protogen.
01:08:28 Let me run it on 1.5 version official as well.
01:08:32 Okay, 1.5 version is loaded and this is the result on 1.5 version official.
01:08:38 So this is how you can use custom models on RunPod io.
01:08:43 2.1 image generation is still going on.
01:08:45 Now I will show you how to do Textual Inversion training.
01:08:49 To do that, let's go to the train tab.
01:08:52 By the way before doing that, let's go to the settings and in here in training, move
01:08:56 VAE and CLIP to RAM when training if possible.
01:09:00 You can pick this option to reduce VRAM usage.
01:09:03 You can also turn on pin memory for data loader.
01:09:06 Makes training slightly faster, but it can increase memory usage.
01:09:09 You can also pick this depending on your machine's RAM memory.
01:09:13 However, since we have 24 gigabytes, I am not going to pick them.
01:09:17 So let's give a name as test initialization text is none.
01:09:21 Number of vectors is two.
01:09:24 You can watch my excellent how to do Stable Diffusion Textual Inversion video.
01:09:30 I am explaining in great details in this video and you can learn many of the things related
01:09:37 to the Textual Inversion from this video.
01:09:40 Hit create embedding and it is already created.
01:09:43 Let's go to the train tab, pick the embedding.
01:09:46 We also need to set dataset directory.
01:09:49 So our data set directory is like this.
01:09:52 We don't need classification images for Textual Inversion training.
01:09:56 You can reduce the learning rate or leave it as default.
01:10:00 You can test it.
01:10:01 Okay, we need a style file word for Textual Inversion.
01:10:06 When you watch this video, you will understand it better.
01:10:10 So this text file is located inside Stable Diffusion, inside Textual Inversion templates.
01:10:16 In here, i'm going to edit the none as as [name].
01:10:20 You need this otherwise it won't work.
01:10:22 This is the name of the Textual Inversion.
01:10:24 This is basically going to use the unique tokens that it generates so i'm going to pick
01:10:30 none from here.
01:10:32 My width and height are 512 pixels.
01:10:35 Max number of steps.
01:10:37 You can leave it as this because it will generate pretty small files, but since we are already
01:10:42 using a lot of space, I will delete my older checkpoints from DreamBooth, Stable Diffusion
01:10:48 Web UI inside models inside Stable Diffusion and inside test2 folder.
01:10:54 Okay for selecting hit left shift key, select first, then go to the very bottom while pressing
01:11:00 shift key hit here it will select all of them.
01:11:04 Then while hitting control button left control, unpick the ones that you don't want to delete
01:11:10 right, click and hit delete.
01:11:12 It will delete all these files and open a space for me.
01:11:16 Okay, now we are ready.
01:11:17 I want to check checkpoints for every 10 epochs.
01:11:23 How many training images I have.
01:11:25 I have nine training images.
01:11:26 Therefore, one epochs means nine steps.
01:11:30 Five epochs means 45 steps.
01:11:33 So for every five epoch I am going to make save.
01:11:36 I don't need this and I will pick deterministic.
01:11:40 This is the best option and we are ready.
01:11:43 Just click hit train embedding.
01:11:45 Okay, it has started training.
01:11:48 By the way currently xformers is enabled.
01:11:51 Therefore, I will disable it and restart again because there is a bug as I have just shown
01:11:58 and it is preventing good training.
01:12:01 Also in settings this is unchecked.
01:12:04 Use cross attention optimizations but still it could be using it due to a bug.
01:12:10 So best thing is just disabling the xformers and restarting the training.
01:12:16 However, looks like learning right now I think.
01:12:19 So probably there is no bug for this one unlike the DreamBooth.
01:12:24 The loss rate is also pretty low and it is pretty fast.
01:12:28 Okay, it already started learning my face.
01:12:32 Not very good but there is a resemblance as you can see and it is really really fast the
01:12:38 number of steps it is taking really really fast.
01:12:41 This is how fast it is you see.
01:12:44 Training Textual Inversion epochs, training speed, the i/t per second and it is learning.
01:12:52 However, which one will be best is needs to be checked from text to image tab from x/y
01:13:00 plot and as you can see it is learning.
01:13:03 So all these samples are being saved inside.
01:13:07 Let's go to the Stable Diffusion Web UI folder inside here, textual inversion, inside here
01:13:13 you will see the training date and inside here the name of the Textual Inversion training
01:13:17 inside here images and these are the images named with the epoch number.
01:13:24 You can check them like this, or you can download them and check all of them.
01:13:29 Okay, 2700 steps looks a little bit decent.
01:13:34 It is actually equal to 300 epochs.
01:13:38 Maybe it may get better over time or we may need to use more vector count, but since I
01:13:44 am just trying to explain, I will use this and show you how you can use this checkpoint
01:13:51 in your queries in your text to image tab.
01:13:54 First I will cancel the training.
01:13:55 This one also looks like a decent one.
01:13:59 Hit interrupt: yeah.
01:14:01 3240 also looking decent so it may get even better over time as we do more training, but
01:14:08 I don't have too much time.
01:14:10 Okay, so to be able to use these embeddings first, we need to copy the generated pt file
01:14:16 which is the checkpoint.
01:14:18 To do that, go to the Textual Inversion inside your main folder, go to the date that you
01:14:23 did training, go to the training name, go to the embeddings, and in here you will see
01:14:28 the dot pt files.
01:14:30 Pick the checkpoints that you want to test right, click, copy, then go back to the main
01:14:36 installation folder and in here you will see embeddings folder and paste them there like
01:14:42 this so it is pasted now here.
01:14:44 So to activate this Textual Inversion, we are going to type it like this.
01:14:50 By the way, there is one very important thing when you do training, it will train based
01:14:56 on the model selected here.
01:14:57 Therefore this will be most compatible with this selected model and just hit generate
01:15:04 and you see our face is generated trained subject.
01:15:07 Now we can try stylizing.
01:15:09 Okay, I did a simple test awesome, intricate, 3d artstation, cinematic lightning and generated
01:15:16 batch size as eight and these are the generated images.
01:15:20 So with better prompting it should be possible to get better results.
01:15:25 You can do same training on protogen or any other custom model as well, just check it
01:15:31 from here, make a new embedding and do training.
01:15:35 The Textual Inversion training works pretty decent on custom models as well.
01:15:40 However, custom models are not working very well with DreamBooth training.
01:15:44 Okay, so our image generation for classification data set for SD 2.1 is completed.
01:15:52 Now we will put them into the correct folder so all of the images are now generated inside
01:15:59 this folder.
01:16:00 How am I gonna do that?
01:16:01 I will right click cut, then I will go to the workspace, right, click paste and then
01:16:07 I will rename as class 768 version 2 like this.
01:16:14 Then I will go to the DreamBooth tab, I will open my test, load settings, go to the settings
01:16:21 and in here I will set the concept the classification data set directory as class 768 version 2
01:16:30 and now I have 50 images for per instance.
01:16:34 Okay, everything else is same.
01:16:36 Just save settings and hit train and let's see if we will get out of memory error or
01:16:41 not.
01:16:42 So it is preprocessing class images.
01:16:44 We can see the command line interface okay, uh, so it looks like the Gradio is killed
01:16:51 or our web app.
01:16:52 Therefore, we need to restart it.
01:16:55 By the way, we also need to disable xformers, otherwise it won't work for training.
01:16:59 So I am disabling xformers, saving, closing all of the terminals and starting a new instance
01:17:06 of the web ui.
01:17:08 Okay, restart is done.
01:17:09 You see these are the command line arguments that I have used to start 2.1 version Web
01:17:16 UI let's open it.
01:17:18 Go to the DreamBooth select model, click load settings.
01:17:22 Just verify settings quickly if they are correct or not.
01:17:25 Okay, all looking good and let's click train to see how it works.
01:17:30 Okay, preprocessing class.
01:17:32 Let's also see the cmd window from here.
01:17:35 Okay, you see it says nothing to generate because we already have sufficient number
01:17:40 of classification images in our folder 456 and we need 450 images.
01:17:47 So it is caching right now.
01:17:49 Okay, after caching it is killed once again and trying to relaunch.
01:17:55 Okay, we got out of memory error so we need to enable some more of the memory optimization
01:18:03 and I already unchecked the EMA.
01:18:07 Therefore, looks like we need some more optimization.
01:18:10 So I will pick fp16, but we are not using mixed precision so it is probably being ignored.
01:18:17 What else we can do for more optimization?
01:18:21 Gradient checkpointing yes, we can do this and let's save settings, load settings, and
01:18:28 hit train once again.
01:18:30 Okay, looks like I had to refresh load settings.
01:18:34 Hit train okay, yeah, it says that change in precision detected.
01:18:39 Please restart Web UI entirely to use new precision.
01:18:43 All right, so we will restart it.
01:18:46 Okay.
01:18:47 Restart is done.
01:18:48 Let's go to DreamBooth select model load settings and now gradient checkpointing enabled.
01:18:54 Use 8-bit adam fp16, memory attention default, cache latents and let's see if we will get
01:19:02 any error or not.
01:19:03 Okay, training started this time.
01:19:05 I hope we don't get any error during preview generation because it also uses GPU and we
01:19:11 can see our GPU is being used 95 percent already.
01:19:16 You can also see other utilization parameters here volume, container, and this is my other
01:19:21 running pod and this is how much I have spent and how much I am spending.
01:19:26 So now I will show you how to install ControlNet on SD 1.5 version.
01:19:33 If you don't know what is control net and how to install and use it.
01:19:37 I already have a great tutorial on my channel.
01:19:40 So this is the extension that we are going to install.
01:19:42 Copy the extension URL.
01:19:45 You can also find this in the description.
01:19:47 Go to the extension tabs, go to the install from URL, copy paste it, and click install.
01:19:53 Then once it is installed, go to the installed tab, apply and restart UI.
01:19:58 After we clicked it and unfortunately the Gradio is died again.
01:20:01 So I will relaunch it and since I am not going to do any training, I am enabling xformers
01:20:07 once again because it will speed up my image generation.
01:20:11 Okay, after restart, go to the text to image tab and in the bottom you should see ControlNet
01:20:16 like this.
01:20:17 Now we need to download ControlNet model which is hosted on Hugging Face in here.
01:20:24 Go to the files and versions and just download which model that you want to use.
01:20:29 Because each model files are like five gigabytes.
01:20:32 I'm going to show scribble as an example.
01:20:35 All others are same, exactly same and when you watch this video you will learn more about
01:20:40 them.
01:20:41 Okay right.
01:20:43 Click the download button, copy link path, go to your RunPod.
01:20:46 So these files will be put inside another folder.
01:20:50 Go to the extensions, go to the sd Web UI control net, go to the models.
01:20:55 We are going to put them inside here.
01:20:58 So in here I will open new launcher, open terminal wget and copy paste the link and
01:21:05 you see it has started downloading file from Hugging Face with an incredible speed.
01:21:10 Meanwhile I will show something else how you can download your trained models into your
01:21:15 computer.
01:21:17 So to download your trained DreamBooth model, go to the models, go to the Stable Diffusion,
01:21:23 go to the training and let's say you want to download this ckpt.
01:21:27 You can right click and download.
01:21:30 Or you can use runpodctl as we already shown multiple times.
01:21:34 But let's just show once again, runpodctl send the checkpoint file full name, not the
01:21:41 directory and it generated the download command like this: go to the download folder where
01:21:47 you want to download.
01:21:48 So let's say I want to download here.
01:21:50 Open cmd, right!
01:21:51 click, paste and hit enter and that model file will be downloaded into your computer
01:21:57 with a great speed like this as you can see.
01:22:00 It is downloading with 70 megabits per second and my maximum internet is 100 megabits per
01:22:06 second.
01:22:07 So this will of course totally depend on how other users are currently using the Pot network.
01:22:14 Okay, meanwhile ControlNet file is downloaded and saved in the folder.
01:22:19 Let's verify it.
01:22:20 Go to the extensions sd web ui control net inside models.
01:22:24 I see the pth file.
01:22:26 Let's go back to the ControlNet and in here.
01:22:29 When we refresh models we should see it.
01:22:33 Yes it is here and there is also pre-processor, then upload your file into this canvas that
01:22:40 you want to use.
01:22:41 I will do a scribble.
01:22:43 I am going to use this file.
01:22:46 Let's set the canvas with and height like this, also set your target resolution.
01:22:50 I will use the native resolution of the provided image which is 866 and 684.
01:22:58 Then type your prompt here and you can use the any model from here.
01:23:03 Let's use Protogen model so my prompt is dragon, awesome, intricate, cinematic, artstation.
01:23:08 Let's type some negative low,bad, worse.
01:23:11 Hit generate.
01:23:12 Okay, we didn't get the output because we didn't enable the ControlNet.
01:23:16 Don't forget that.
01:23:18 And don't forget the check scribble mode, invert colors and now it is the map it it
01:23:25 generated and this is the output we got.
01:23:28 So you can play with different prompts and different models and generate different images.
01:23:35 It works pretty fast and pretty correct.
01:23:38 Just watch this video to learn more.
01:23:39 Actually, I have another control net video as well which is based on the native released
01:23:45 scripts from the official author.
01:23:47 You can also watch this video to learn even more about ControlNet.
01:23:51 Our SD 2.1 version training is going on.
01:23:55 However, it looks like there are some problems because generated image is not correct.
01:24:00 Okay, I have done a lot of research and looks like there is no way to do SD 2.1 version
01:24:07 768 pixels training with DreamBooth without using xformers.
01:24:15 I wanted to avoid xformers during training because it reduces the quality of the training.
01:24:21 However, 24 gigabytes VRAM is just not enough.
01:24:24 So we need to downgrade the xformers version to 0.0.14 I already have an excellent tutorial
01:24:33 video for that for windows installation, so now I will show it on unix on RunPod.
01:24:40 Alternatively, you can go to the browse servers and in here you can deploy a RunPod with 48
01:24:49 gigabytes VRAM or 40 gigabytes VRAM.
01:24:52 It is up to you, but they cost more.
01:24:55 Therefore, we will just downgrade the xformers version.
01:25:00 Now, follow me very carefully to learn how to downgrade xformers on RunPod io.
01:25:07 First close all of the running kernels and terminals.
01:25:11 Then inside python 3.10 folder, start a new terminal.
01:25:16 First, we are going to run this command.
01:25:19 Pip Uninstall torch torchvision.
01:25:22 Paste it and hit yes and hit yes.
01:25:25 Okay, it is uninstalled.
01:25:26 Then we are going to run pip Uninstall torch audio.
01:25:31 Paste it.
01:25:32 Okay, it is done.
01:25:33 Then we are going to use pip Uninstall xformers.
01:25:36 Hit yes and it is done.
01:25:39 You know?
01:25:40 Currently I am inside workspace venv lib python 3.10.
01:25:44 The folder where you are currently located makes huge difference.
01:25:49 Make sure that you are inside the same folder.
01:25:51 You can also apply this to SD 1.5 version as well.
01:25:56 It is just same thing.
01:25:57 Then we are going to install torch vision.
01:25:59 Just copy this and paste it and hit enter.
01:26:02 Okay, I got error.
01:26:04 It says that there is no space left on the device because currently we started with five
01:26:11 gigabyte space for runtime.
01:26:14 Therefore, I will stop the pod like this.
01:26:17 I will edit the disk space from.
01:26:21 Click here.
01:26:22 More actions.
01:26:23 Click edit pod and in here in increase the container disk size.
01:26:27 Save it, run it, start it, and reconnect to Jupyter lab.
01:26:32 Enter inside the same folder venv lib, python 3.10 open terminal and make sure that you
01:26:41 run all of the commands once again to be sure.
01:26:44 Pip uninstall hit yes! if they are installed once again and pip uninstall torch audio,
01:26:52 then pip uninstall xformers.
01:26:54 Okay, it is done, then we will install this one.
01:26:58 As you can see, I have changed it because this is the one that is working.
01:27:02 Copy paste and hit.
01:27:03 Enter and it is going to install.
01:27:06 So once the full version of 0.0.17 is released, it will work with DreamBooth.
01:27:11 Currently this is a development version as you can see and it is installed.
01:27:16 Now we are ready to run our web UI as usual and it should support DreamBooth training
01:27:22 with xformers.
01:27:23 Before starting, I am going to edit xformers command line arguments minus minus xformers
01:27:29 and I am going to add back the full precision minus minus no half and minus minus precision
01:27:37 full and minus minus no half vae.
01:27:41 Save it, run on a different port, shut down all of the terminals start a new terminal,
01:27:47 relaunch the Web UI like this.
01:27:49 Okay, so our application is now starting with 0.0.17.dev 448 version for xformers and these
01:27:59 are the torch, torch vision, diffusers, and other versions.
01:28:03 Okay, it is started now.
01:28:04 Time to test whether it is working correctly or not for SD 2.1 DreamBooth training: okay,
01:28:12 I am loading my model, load settings and in here.
01:28:16 Let me show you quickly the latest settings.
01:28:19 So let's make the amount of time to pause between epochs zero.
01:28:23 I will save for every 20 epochs.
01:28:26 I am unchecking gradient checkpointing.
01:28:28 I will make learning rate as default.
01:28:30 Actually, let's try it.
01:28:32 Okay photo of ohwx man by tomer hanuka for sanity prompt and in advanced tab: now this
01:28:38 is important.
01:28:39 I will use EMA and in the mixed position, I am going to use fp16.
01:28:44 Some cards also supports bf16, but to be sure use fp16.
01:28:49 And when you hover your mouse it also says you that required when using xformers and
01:28:55 in here I am going to use xformers.
01:28:56 This is important.
01:28:57 Cache latents: okay, then go to the concepts tab.
01:29:02 They are set.
01:29:03 Everything is looking good and in saving, generate a ckpt file when saving during training
01:29:09 and hit train.
01:29:10 By the way, we should have clicked save settings before, but I think it is automatically saved.
01:29:16 If it doesn't work right away, just click save settings then hit train.
01:29:20 Okay, let's watch the terminal.
01:29:22 I hope that we won't get any more.
01:29:24 Uh, out of memory error.
01:29:26 Okay, it is killed so I will test one more time.
01:29:30 Refresh the Gradio, DreamBooth select model load settings.
01:29:35 Now this time I will set gradient checkpointing because it looks like necessary.
01:29:40 Fp16 use EMA and yes, everything is same and let's try again with save settings.
01:29:47 Train: okay, we got another error so this time I won't use EMA.
01:29:51 Refresh the interface DreamBooth, model load settings uncheck gradient checkpointing and
01:29:58 uncheck use EMA.
01:29:59 This is significantly increasing the VRAM usage.
01:30:02 Save settings hit train okay.
01:30:04 Finally, the training has started and now time to wait and see how well it is learning
01:30:10 and training.
01:30:12 The Gradio is still responsive.
01:30:13 That is very good and it is using this much of GPU memory so you see how much GPU memory
01:30:20 usage the EMA is increasing when we check the EMA option.
01:30:27 Meanwhile, SD 2.1 version training continues.
01:30:29 I will explain what is fine tuning with DreamBooth.
01:30:34 Okay, before I show how to do fine tuning.
01:30:37 We got an error during the SD 2.1 version training at the 400 steps which means when
01:30:44 it is generating a ckpt from the 20th epoch checkpoint.
01:30:49 Therefore, I will restart the training with one change one parameter, change load settings,
01:30:56 and go to the settings tab and enable gradient checkpointing.
01:31:01 The rest is same like this so it should just work fine this time I think.
01:31:07 Save settings hit train okay, this time we didn't get any error.
01:31:11 During SD 2.1 version training, we got sample yes, somewhat similar.
01:31:17 This is the first one at the 20th epoch and we got our sanity prompt as well.
01:31:21 This is the loss rate which is very erratic as you can see and this is the VRAM usage
01:31:27 like this.
01:31:28 Now I can start showing you fine tuning.
01:31:30 I have opened my 1.5 version RunPod so what is the difference of fine tuning.
01:31:38 In the fine tuning we are not going to use classification images and we are going to
01:31:44 use file words.
01:31:45 Fine tuning is basically using a lot of good images with proper captions and not using
01:31:53 any classification images.
01:31:54 The rest is same.
01:31:56 So every one of the keywords, every one of the tokens in the captions of the images will
01:32:02 be trained and they will become like the images that you use for fine tuning.
01:32:09 First of all, we need to process image files and add captions to them.
01:32:15 So go to the training tab, go to the preprocess images, set the source directory.
01:32:20 I don't have a data set for fine tuning a good data set you need a lot of images you
01:32:24 need so I will use my own pictures that I used for training and set a destination like
01:32:32 training captioned and in here use BLIP for caption and if those images are not 512 and
01:32:41 512 pixels.
01:32:42 If you are going to fine tune SD 2.1 version with 768 pixels then you need to change these
01:32:49 resolution as well.
01:32:50 You can also crop them with autofocal point crop but manually cropping them and preparing
01:32:57 them is better and then click preprocess.
01:33:00 When the first time you run it.
01:33:01 It will download the BLIP model from internet.
01:33:04 Okay, preprocessing has been completed now.
01:33:07 Training captioned folder is generated.
01:33:10 Now you see there are txt files named same as the image file.
01:33:16 When you open them, you will see this captioning.
01:33:19 So what does this mean.
01:33:21 In the fine tuning all of these words, these tokens will be improved by the image they
01:33:30 are same named.
01:33:32 So all of these words will be improved towards this image.
01:33:37 This is what is fine tuning.
01:33:39 Let's say you want to improve castle images, then you should have good castle images and
01:33:44 inside their description, you should have castle word.
01:33:48 And if you want to associate those pictures with other words such as beautiful, intricate,
01:33:53 high quality, then you should also put them.
01:33:56 So put here whatever the words that you want to improve in your model with related to the
01:34:03 picture they are associated with, and then once you prepared good captions and images
01:34:09 inside your folder, copy the path of the new folder, go back to your DreamBooth tab and
01:34:15 make this setup like this in concepts, workspace, training captioned data directory.
01:34:21 Now this is important.
01:34:23 In the prompt just type [filewords] and nothing else.
01:34:28 This means that whenever it is training that particular image, it will load whatever is
01:34:36 written inside here and replace instance prompt with it.
01:34:40 That's it.
01:34:41 So this will be equal to this prompt for this particular image that is going to train.
01:34:49 In class prompt we are not using any classifications or class prompt.
01:34:53 In the sample prompt, you can use the [filewords] to see what kind of images it is generating
01:34:59 and make sure that class images per instance is zero.
01:35:03 Because we don't want to try to keep the previous context of the model, we want its underlying
01:35:10 context, latent space to be improved.
01:35:14 And that's it everything else is same.
01:35:16 So for fine tuning you need a lot of good images, good quality images with good captions.
01:35:23 Those captions will be improved.
01:35:25 It will also improve the Unet of the model so it will become overall better and overall
01:35:30 cooked we can say.
01:35:33 Because if you show less number of images than it was trained on, it will lose a lot
01:35:38 of the contextual knowledge it has.
01:35:42 Therefore, these cooked custom models are not good to train your faces on them because
01:35:47 they don't have as much as information as these 1.5 pruned ckpt have.
01:35:52 For example, this model was trained on 5 billion images as far as I know of.
01:35:59 However, those custom models may be trained on 1000 images, one maybe 10 000 images.
01:36:06 So their Unet has become like those 10 000 images instead of being trained on 5 billion
01:36:14 images.
01:36:15 That is why they are so good, but they have much lesser knowledge in their underlying
01:36:21 context in their latent space.
01:36:23 So this is basically fine tuning how it is done.
01:36:27 If you want to be exactly same as Stable Diffusion training that the official training.
01:36:33 You can also remove text encoder training with setting this parameter as zero.
01:36:40 So with this way the tokens won't be improved.
01:36:44 Only Unet will be improved.
01:36:46 However, you don't want that for fine tuning.
01:36:49 This is more like using hundreds thousands of images and training from scratch your model.
01:36:57 So you should keep it perhaps like one and train Unet as well.
01:37:01 So you will train both text encoder and the Unet and improve all of those keywords together.
01:37:09 Hopefully I will make another very technical video about how training works, what is Unet,
01:37:16 what is text encoder, how they are being changed during training, and it will explain a lot
01:37:22 of the questions that are not very well answered in the community.
01:37:27 So stay subscribed.
01:37:29 Open notifications to not miss it.
01:37:31 So let's check out our 2.1 version training.
01:37:34 Okay, our sanity prompt already looks like lost its stylizing ability and the sample
01:37:41 is not also looking very good.
01:37:43 Uh, however, I have seen that it was learning so let's open the directory.
01:37:47 Okay, inside DreamBooth, inside samples, let's look at each one of the sample.
01:37:53 So this is the 20 epoch.
01:37:54 Yes, it has a resemblance.
01:37:56 It is not very good.
01:37:57 This is the 40 epoch.
01:37:59 Very minor resemblance.
01:38:01 Let's check out the sanity prompt.
01:38:03 The sanity prompt is much better.
01:38:05 So this is somewhat similar to me, but stylized in Tomer Hanuka style.
01:38:10 So the sanity prompt of the 60 epoch is not good at all.
01:38:14 It lost its stylizing.
01:38:17 The sample is also not very related, but this is SD 2.1 so it is harder to train and obtain
01:38:23 good images.
01:38:24 So you see this is the 80 epoch.
01:38:26 This is almost as like me.
01:38:28 Let me show you for comparison.
01:38:31 With 80 epoch 2.1 version, it is started to learning my face very well.
01:38:37 Let's check out the sanity prompt.
01:38:39 However, sanity prompt also lost its ability to stylize so our learning rate could be very
01:38:46 high.
01:38:47 Perhaps we should try half of it.
01:38:49 Based on your training data set the learning rate may change, number of steps, number of
01:38:55 epochs that you need to do training may change.
01:38:58 So it is up to you to do multiple trainings and compare how well they are working with
01:39:05 x/y/z plots as I have shown.
01:39:07 However, the training is working very well.
01:39:10 It is learning the subject very well so we managed to make it work very well for SD 2.1
01:39:17 version 768 a model.
01:39:21 Let me show you the parameters once again.
01:39:24 So I will slowly scroll down and you will be able to see all of the settings.
01:39:30 This totally depends on your learning rate and how many number of training images you
01:39:35 use.
01:39:36 You should also save multiple checkpoints during training and compare them: batch size
01:39:41 one and gradient accumulation one.
01:39:42 If you increase this, it will increase significantly your VRAM usage.
01:39:47 Also, we can't say bigger batch size is better.
01:39:50 It's a debated topic.
01:39:52 Mini batches versus full batches.
01:39:54 These two are checked.
01:39:55 Otherwise, we are getting VRAM error on 24 gigabyte.
01:39:58 This is my current learning rate.
01:40:01 This may be fast, so you may try half of it or even lower.
01:40:05 This is the resolution.
01:40:07 This is the sanity prompt to see how well it stylized.
01:40:10 So don't check EMA because you will get error VRAM error even when using xformers.
01:40:16 Use 8-bit adam, use fp16 to be sure that it is supported on your graphic card.
01:40:22 Use xformers, cache latents, train Unet, train text encoder and these other things are just
01:40:29 default.
01:40:30 Okay, now I will show you how to install and run Kohya Lora training Kohya GUI on RunPod.
01:40:37 To do that we are going to use Kohya ss linux branch.
01:40:41 To do that we are going to use kohya ss linux fork of the official repository of kohya ss.
01:40:48 This is modified to run on linux.
01:40:51 So first of all, we are going to clone the repository into our RunPod.
01:40:56 So this is my 1.5 RunPod.
01:40:58 I am inside workspace.
01:41:00 I have closed everything.
01:41:02 Open a new terminal, copy paste the git clone command.
01:41:05 It will clone into the kohya ss linux folder, then move into the kohya ss linux, type cd
01:41:12 ko type tab.
01:41:14 Hit enter and now I am inside kohya ss linux.
01:41:18 Then we will generate virtual environment folder with this command.
01:41:23 Copy it, hit enter inside this folder.
01:41:26 Okay, it is generated.
01:41:28 Let's also move it in here.
01:41:30 Now we will run the next command which is for activating and entering inside that virtual
01:41:36 folder.
01:41:37 Actually source venv command: copy paste it.
01:41:40 Hit.
01:41:41 Enter now you see venv here.
01:41:44 That means that currently actually we are running on the newly generated virtual environment
01:41:50 folder.
01:41:51 Next, we are going to install requirements.
01:41:53 This is only one time necessary.
01:41:56 The requirements file is located inside here and currently we are also inside that folder
01:42:01 so it should work.
01:42:02 The requirements installation may take some time.
01:42:06 These installations will not affect your other installations because everything being installed
01:42:12 here will be only installed inside this folder.
01:42:16 Okay, we got an error that says no space left on the drive so I will just.
01:42:21 I will just close the RunPod with stop pod and I will increase the container disk size
01:42:26 to 10 gigabytes.
01:42:28 To do that, click here, edit pod and run it once again.
01:42:31 Start then click connect.
01:42:33 Connect to Jupyter lab.
01:42:35 Okay, it is still being getting launched so be patient.
01:42:40 Okay notebook started once again so I will just delete the venv folder rm -r, venv.
01:42:49 So I will start from beginning.
01:42:51 Python minus m, venv venv then source command activate then install command.
01:42:58 It will install the requirements.
01:43:00 Okay, all requirements have been installed.
01:43:03 As author here noted, it requires python 3.10 and it doesn't work on 3.11 since the RunPod
01:43:12 runs on 3.10.9 for Stable Diffusion, it is just fine.
01:43:18 Then we will set accelerate config.
01:43:20 I am copying this pasting in here.
01:43:23 We are still inside that venv folder.
01:43:27 So now it will ask us bunch of questions.
01:43:29 Select this machine hit, enter select, no distributed hit enter type no to this question,
01:43:36 then type no to this question as well.
01:43:39 And type no to this question as well.
01:43:42 And type all for this question.
01:43:45 And do you wish to use fp16 or bf16?
01:43:49 select fp16.
01:43:51 It will speed up your training and also use lesser VRAM.
01:43:56 Okay and everything is ready.
01:43:58 Then we are currently activated with source command so we don't need to run this again.
01:44:03 We will just run this command and it should start our GUI.
01:44:08 Okay, it is running on localhost so therefore we need to run it on shared link.
01:44:15 To enable public Gradio link as we are using in Web UI.
01:44:19 Open the kohyagui.py go to the interface launch tab here and add this comma Share true, save
01:44:28 it and start it once again.
01:44:30 So new terminal open first.
01:44:33 Activate the venv like this and just run this command and now it has given us a Gradio link.
01:44:40 When we run it, the famous Kohya GUI is loaded and ready to do training.
01:44:46 The training with it is another topic and I won't cover it in this.
01:44:50 Now I will stop my running RunPods and when I stop at them nothing will happen.
01:44:55 They will just remain as they are.
01:44:57 I can also start them without any GPU.
01:45:00 So from here you can select zero and you can start your RunPod to backup your data to download
01:45:07 your data without using any GPUs.
01:45:09 So when you run them on CPU the disk cost plus it uses 0.16 dollars per hour.
01:45:18 So it is still costing something.
01:45:20 I think it is costing half of the original GPU price.
01:45:24 However, sometimes you may not get a GPU.
01:45:27 Sometimes all of the GPUs may be full on the RunPod so you will be have to run it without
01:45:33 a GPU.
01:45:35 So this is how you start it without using any GPU and there is also terminate.
01:45:41 When you hit terminate it will delete your RunPod permanently.
01:45:45 I already said this but I am saying it again.
01:45:49 So do not hit terminate button unless you are 100 sure because it will delete everything
01:45:55 on this RunPod and until you terminate and delete your RunPod it will continue using
01:46:02 your credits.
01:46:04 Currently I have two RunPods not running and it is using 0.056 dollars per hour.
01:46:14 So this is the cost of keeping these two RunPods on my account.
01:46:20 And when I delete them you will see this will get decreased.
01:46:23 Let's delete first one with terminate pod.
01:46:26 Okay and now this should get decreased.
01:46:29 Let's go to the my pods.
01:46:30 Let's refresh.
01:46:32 Okay now you see currently it is decreasing.
01:46:35 Zero point zero point twenty eight dollars per hour.
01:46:39 This is charged per minute by the way, not per hour and I will also delete this and it
01:46:45 will become zero.
01:46:47 And now my credits are remaining as they are until I start another pod.
01:46:52 There is one final thing that I want to show you cloud sync button here.
01:46:57 So with cloud sync you can synchronize your data in your server to these cloud services
01:47:04 and there is a great tutorial on the RunPod blog.
01:47:08 I will share this link into the description as well so you can read here and set up your
01:47:15 cloud storage and set a synchronization with your run pod and everything generated in your
01:47:22 RunPod will be synchronized with your cloud.
01:47:25 Also you can use the runpodctl that I have shown multiple times to download your data
01:47:32 to upload your data.
01:47:34 It is up to you that how you want to use it.
01:47:36 I think I have covered everything that I have mentioned in the beginning.
01:47:41 I hope you have enjoyed.
01:47:42 Please like, subscribe and leave a comment to do this tutorial also join our Discord
01:47:48 channel, ask any questions that you can't solve.
01:47:52 Also, please support us on Patreon.
01:47:55 It is really important.
01:47:56 The Patreon link and the Discord link will be in the comments and description.
01:48:01 All of the links we have used in this video will be in the description.
01:48:05 You can also find our Patreon page on our about tab of our youtube page youtube channel.
01:48:06 We have so far 26 patrons.
01:48:07 I thank them a lot thank them very much.
01:48:08 I hope you also become a Patreon.
01:48:09 Hopefully see you in another awesome video!
01:48:10 Thank you so much.
Beta Was this translation helpful? Give feedback.
All reactions