8 GB LoRA Training - Fix CUDA & xformers For DreamBooth and Textual Inversion in Automatic1111 SD UI #297

FurkanGozukara · 2025-10-26T23:49:01Z

FurkanGozukara
Oct 26, 2025
Maintainer

8 GB LoRA Training - Fix CUDA & xformers For DreamBooth and Textual Inversion in Automatic1111 SD UI

Full tutorial: https://www.youtube.com/watch?v=O01BrQwOd-Q

updated tutorial: https://youtu.be/pom3nQejaTs - Our Discord : https://discord.gg/HbqgGaZVmr. This video I am showing how to downgrade CUDA and xformers version for proper training and I am showing how to do LoRA training with 8GB GPU. If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰 https://www.patreon.com/SECourses

Playlist of Stable Diffusion Tutorials, #Automatic1111 and Google Colab Guides, DreamBooth, Textual Inversion / Embedding, #LoRA, AI Upscaling, Pix2Pix, Img2Img:

https://www.youtube.com/playlist?list=PL_pbwdIyffsmclLl0O144nQRnezKlNdx3

This CUDA downgrade will not be necessary probably after the extensions get updated. However it is not certain when will they get updated. Meanwhile you can downgrade and use CUDA 11.6.

Stable Diffusion Playlist : https://www.youtube.com/playlist?list=PL_pbwdIyffsmclLl0O144nQRnezKlNdx3

The commands you need to execute with order to downgrade CUDA

https://gist.github.com/FurkanGozukara/e2db853d2016a4a9ae2cc32dc41d730a

Run CMD as administrator if you get error

1:

activate

2:

pip uninstall torch torchvision

3:

pip uninstall torchaudio

4:

pip uninstall xformers

5:

pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu116

6:

pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/torch13/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl

These below are specific hashes used in video but not necessary to use. You can install newest version of both DreamBooth and Automatic1111 and just downgrade CUDA with the above commands.

Automatic 1111 commit : dc8d1f4f8beb546089abd107db3432e03339c9c0

Dreambooth commit : c544ee11aee0085a7fbb7fdda65898dea2145f0c

Watch this video for learning how to use FileWords:

https://youtu.be/KwxNcGhHuLY

#xformers

OUTLINE

00:00:00 Introduction to How to downgrade CUDA version

00:01:46 Automatic1111 will ask you to upgrade CUDA. Don't yet.

00:02:03 How to downgrade your CUDA version in your Automatic1111 installation folder

00:04:30 How to install DreamBooth extension

00:05:07 How to install and use dev branch of DreamBooth extension

00:06:42 How to stash local changes to checkout different git branch

00:07:13 How to start LoRA training for 8 GB VRAM GPUs

00:08:22 Settings and setup for LoRA training

00:13:36 How to generate ckpt file from LoRA training checkpoint

Sure, here are some additional details on how transformers can be used with CUDA-enabled NVIDIA hardware:

Transfer learning: Transfer learning is a technique that can be used to leverage pre-trained transformer models, such as BERT or GPT-2, to improve the performance of NLP tasks with limited training data. NVIDIA's hardware and software can be used to fine-tune these pre-trained models on specific NLP tasks, allowing for faster convergence and higher accuracy.

Customization and optimization: The flexibility of transformers allows for a wide range of customization options and optimization techniques. NVIDIA's software libraries can be used to implement custom activation functions, weight initialization schemes, and other architectural modifications to improve model performance. In addition, CUDA enables developers to optimize the transformer models for specific hardware configurations, such as different numbers of GPUs, to achieve the best performance.

Real-time applications: Transformers can be used for real-time NLP applications, such as chatbots and speech recognition, which require low latency and high throughput. NVIDIA's hardware and software can be used to optimize transformer models for real-time applications by reducing inference time and increasing throughput.

Natural language generation: Transformers can be used for natural language generation (NLG) tasks, such as text summarization and language translation. NVIDIA's hardware and software can be used to optimize transformer models for NLG tasks, by improving the generation speed and quality of the output.

Deployment: NVIDIA's software libraries, such as TensorRT, can be used to optimize and deploy transformer models to various production environments, such as cloud-based services and edge devices. This allows for the efficient deployment of transformer models in a variety of real-world applications.

Overall, transformers and CUDA-enabled NVIDIA hardware provide a powerful combination for accelerating NLP tasks, including training and inference of transformer models, transfer learning, customization and optimization, real-time applications, natural language generation, and deployment to production environments.

Video Transcription

00:00:00 Greetings everyone. This will be a short video to explain how to use CUDA 11.6 version after
00:00:07 the latest Automatic1111 update to be able to do training correctly by using either DreamBooth or
00:00:15 Textual Inversion. Moreover, I will show how to use dev branch of DreamBooth extension to
00:00:20 be able to use LoRA if you have 8GB VRAM having GPU. If you are interested in learning more,
00:00:26 I have very detailed several videos. So this is the playlist of my Stable Diffusion related videos
00:00:32 on my channel. If you are interested in to learn more information, I suggest you to watch with
00:00:39 this order: first Zero to Hero Stable Diffusion. Then how to do Stable Diffusion Textual Inversion.
00:00:45 Then how to inject your training subject, then DreamBooth Got Buffed 22 January Update. This
00:00:52 will teach you a lot of information related to Stable Diffusion and finally, you can watch my
00:00:58 older how to do Stable Diffusion LoRA training video. But this is not very up to date at the
00:01:03 moment and hopefully I will make much more updated one. So Automatic1111 recently updated its Torch
00:01:11 version and xformers version to latest ones or the more updated ones. You see the Torch version is
00:01:17 now 1.13 and CUDA version is 11.7. However, this is currently not very well supported by DreamBoot
00:01:25 or Textual Inversion training. How do I know? There are several issues, topics on the GitHub of
00:01:31 Automatic1111 and you see, don't use Torch 13. It is breaking the functionality. Or CUDA, use CUDA
00:01:39 11.6. So in this video I will show you how you can revert back to older version of CUDA after
00:01:45 you have upgraded. By the way, it will ask you to upgrade your Torch version with this command line
00:01:52 argument. If you have already updated, watch this video to learn how to downgrade. Or if
00:01:58 you are doing a fresh installation, watch this video to learn how to do downgrade.
00:02:03 So for downgrading our Torch and CUDA version, we are entering our installation folder,
00:02:10 as you can see, Stable Diffusion Web UI master and enter inside the venv folder and inside here,
00:02:16 enter the scripts folder. Let me show you with zooming in. This is the folder where
00:02:21 you need to enter first the installation folder inside that venv folder and inside the scripts
00:02:27 folder. Then in here, type CMD. It will open CMD window with that path, as you can see right now.
00:02:36 Then, with the following order, we are going to execute each one of these commands inside
00:02:41 here. I will put all of these commands into the description of the video. So don't worry about
00:02:46 that. I am just copying and pasting them like this and hitting enter, one by one.
00:02:52 It will ask you to proceed and click and hit y letter and hit enter. By the way,
00:03:00 I got error because both of my CMD windows for running Stable Diffusion is open. Make sure
00:03:05 that you have closed them first. Once you closed your CMD windows you will see that successfully
00:03:12 uninstalled Torch Vision. Then execute the next command like this: and it's executed. Then execute
00:03:21 the next one like this: OK, it is asking. And hit y keyword and hit enter. Then we are going
00:03:28 to execute this command. This will take some time because it will download the CUDA version.
00:03:36 If you get a warning like this, it is just fine. Just ignore it. OK, in the end you will get a
00:03:42 message like this: Ignore this error message and focus on this message. You see successfully
00:03:48 installed Torch 1.13 and CUDA 11.6. Then the next command is the xformers installation.
00:03:56 Just copy and paste it in here and hit enter. OK, let me copy paste again. OK, now it is installing
00:04:06 that one as well. OK, now we are all ready. We can now start our application as usual.
00:04:13 I'm just starting with xformers and I am using the latest commit of the automatic
00:04:19 1111. Let me show you which one it is. It is 12 minutes ago updated and its hash is this:
00:04:27 I will put this into the description of the video as well. OK, now we have started, like this,
00:04:32 with the newest installation and let's go to the extensions and click available, load from. And in
00:04:41 here let's install the DreamBooth extension, like this. You see, while installing, I am seeing now
00:04:49 the checking DreamBooth requirements and it is showing me the installed things. Torch version
00:04:54 is 1.13 and CUDA is 11.6 and Torch Vision is this one. So currently we are on the correct and after
00:05:03 installation is completed we need to restart CMD window. But before starting again now, I
00:05:09 will show you how to move to the developer branch of the DreamBooth extension. From here. Go to the
00:05:16 installer, click here. It will open the GitHub repository of the extension And in here currently
00:05:21 you are seeing the main branch. By default, it is installing the main branch. However, there is also
00:05:27 development branch and which is the most up to date branch. Actually, I think he just merged with
00:05:34 the main, but I will still show you the developer branch because in future you may need it. Yes,
00:05:43 he just updated while I started the video. So how we are going to load into the development
00:05:49 branch. We are going to enter our extensions folder. By the way, to do a fresh installation,
00:05:55 just delete this folder and you can then fresh install your extension and enter inside here
00:06:02 And in here we are running CMD. By the way, for git commands to work you need to have installed
00:06:08 Git Bash or any git repository handler. For example. If you type Git Bash, you can see it's
00:06:16 link in here and you can download it and install it. Then the git commands will work and then we
00:06:23 will pull the development branch. Git pull origin dev. It will pull the development branch. For me.
00:06:32 It says it's already up to date. Then you need to do git checkout dev. OK, now we are already in the
00:06:41 development branch. This is how you check out. If you encounter error, you can just do git
00:06:47 stash and it will stash the local changes. Then you will be able to check out the development
00:06:53 branch. Now I will check out the main again. So we can use the main and after doing that,
00:07:00 you see it is telling me switched to branch main. I will just restart CMD window. Which one?
00:07:09 Oh, we still didn't start the CMD yet. OK, sorry about that. Let's just click the start and now I
00:07:14 will show you how to do LoRA training and generate ckpt from saved Checkpoint. OK,
00:07:22 we are finally started and correct Torch Vision and xformers. Currently, we will use this CUDA
00:07:29 version. However, I am pretty sure that the developer will fix the problem with CUDA 11.7
00:07:35 in future. Then you won't be need to downgrade your CUDA version. Let's refresh our stable,
00:07:42 refresh our Automatic1111 web UI. Go to the Dreambooth tab And now for LoRA to appear,
00:07:50 first we need to pick LoRA and now LoRA drop downs will appear. Of course, we will first generate a
00:07:57 LoRA model as test one and you will see a new experimental thing: unfreeze model. Currently
00:08:05 I am working on to figure out the best settings to do LoRA training. However, it is taking time. I am
00:08:11 making this video to show you the latest changes. And when I get more information to train a better
00:08:19 LoRA model, hopefully I will make another video. So by I will use just the default settings for
00:08:25 now and just create model. However, you can still play with unfreeze model option. You see. You see
00:08:33 it says that unfreezes model layers and allows for potentially better training, but makes increased
00:08:38 VRAM usage more likely. Okay, once the model is generated, you will see this model is selected
00:08:45 here and we still didn't start the training. Therefore it is not appearing here. Then in
00:08:51 here I am. You can. I think they fixed. This class generate classification images using text2image.
00:08:57 Let's also try that. Let's say 500 epochs, zero, let's save model preview and model saving weights
00:09:06 every five epochs. And you see that these are the default learning rates. Actually, these these are
00:09:15 not very optimal right now. When I figure out the optimal ones, hopefully I will make another video.
00:09:20 Lets type our usual sanity prompt photo of ohwx man by Tomer Hanuka. If you watch my more detailed
00:09:29 videos you will learn more about how we, why we are doing sanity sample prompt and other settings.
00:09:36 Okay, in here I am selecting now FP16 because FP16 have better precision than BF16. Actually,
00:09:44 I was knowing incorrectly in my previous videos. So FP16 is supposed to have better
00:09:50 precision and better performance. If you check this cache latents, it will use more VRAM. So
00:09:56 if you have 8GB of VRAM GPU, then you may not want to check this, but I suggest you to first
00:10:04 try it. If you get out of memory error, then uncheck this and we are going to train UNET.
00:10:10 I think without this it is using about 7 GB of VRAM. So you can still train UNET with this.
00:10:18 And there is also another experimental thing which is freeze clip normalization layers.
00:10:24 Keep the normalization layers of clip frozen during training. Advanced usage, may increase
00:10:29 model performance and editability. However, again, this is very experimental and I am yet to figure
00:10:35 out the best working settings. I have been working on them for over two days and still I am not
00:10:42 figured out the best settings. And, as usual, let's set up our training directory and other
00:10:48 things. So I am going to use this training data set. This is 9 images. They are all
00:10:53 different backgrounds and different clothes. Okay, classification let's say example, okay,
00:11:01 instance token. And FileWords. So in my previous video I explained how to use FileWords. I am not
00:11:08 going to repeat it here. Let's just say, ohwx man and photo of man and photo of ohwx man. Okay,
00:11:19 these are the classical things. And let's say 100 images per instance image. Okay, and in saving
00:11:28 you can generate a ckpt when training completes. But 500 epochs is very likely to over train.
00:11:35 Actually it is becoming too fast over trained with default settings. LoRA rank. This is also another
00:11:42 new thing. And as you increase LoRA rank, it is supposed to hold more data. But I tested that
00:11:48 and when I increased it to maximum, the results were much worse than the default for. So still,
00:11:57 I am yet to figure out the best settings and hopefully when I figure out I will make
00:12:01 another video. Generate LoRA weights when saving during training. With this way it will generate a
00:12:07 checkpoints for us. Then later we will be able to pick the checkpoint and generate a ckpt from that.
00:12:12 Okay, click save settings and click train. Let's see if it will generate the classification images.
00:12:26 [Inaudible] Okay, in here it is showing correctly the number of steps. So I am thinking that it
00:12:34 will generate now. Yes, it generated. Why there is text2img tab. so you can customize the image
00:12:44 generation from here. Alternatively, you can batch generate from here and give the folder path. I
00:12:51 will show you my previous training results because I just did a training before starting this video.
00:12:57 Okay, this is from my previous training with LoRA with the same settings as I just shown,
00:13:03 and you see, I lost stylizing out even after the 187 steps and when you divide it by 9,
00:13:13 it was just over 20 epochs and, as you see, the results are not very good and it takes a lot of
00:13:21 tries to get your results stylizing. I have used this specific checkpoint 1356 to generate ckpt and
00:13:34 generate images from that. So how do we generate ckpt? Go to the DreamBooth and in here select the
00:13:41 model. Okay, then make sure that you have selected use LoRA, and then you will see the generated ckpt
00:13:49 points. And in here, select the ckpt point that you want to generate a generated ckpt, select the
00:13:57 LoRA model checkpoint, then make sure that you clicked first, load settings, then click save
00:14:06 settings, otherwise it is not working. Actually, the last time I tried, I tried it was not working.
00:14:12 Once you see config saved, click generate ckpt and in here you will see the messages:
00:14:21 Okay, you see it has loaded. First test one 1356 checkpoint. However, it generated a ckpt file
00:14:31 name with the latest step of the training. This is incorrect. I have reported this to
00:14:37 the developer and I am hoping that it will get fixed soon. After a ckpt is generated,
00:14:42 you can just click refresh and you can now start using your LoRA trained model.
00:14:50 And now, now let me show you the results I have got from my previous tries. I have used these
00:14:57 as command prompt. This is the positive prompt and this is the negative prompt. And now let me
00:15:03 show you the output. So you see, the outputs are all my face, the subject I teach by, but
00:15:10 the stylizing is very poor and the quality is also poor. I think it is already pretty overtrained and
00:15:20 so we need better settings. It certainly learns your subject, your face. However, it loses its
00:15:28 ability to stylize your face as in the DreamBooth, because the last video, the last video I made for
00:15:35 DreamBooth, was extremely successful, which you can watch in this video actually. So the LoRA
00:15:44 is currently very inferior than the DreamBooth with the default settings, but with new this,
00:15:50 with these new experimental settings, I am hoping that it will become much better once we figure
00:15:57 out the optimal settings. Still, you can stylize it, but it is much harder than DreamBooth. You
00:16:03 need to generate a lot of images and you need to test different cfg and perhaps checkpoints.
00:16:13 I hope you have enjoyed it. Please subscribe, like, comment, share and hopefully I will let
00:16:19 you know the news. If you also support us on patreon, I would appreciate it very much.
00:16:24 Currently we have 12 patrons and I appreciate them very much for supporting us. They are making me to
00:16:34 continue produce more quality content. You can also join our discord channel from here and I
00:16:40 will also put the discord channel link into the description. Hopefully see you in another video.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

8 GB LoRA Training - Fix CUDA & xformers For DreamBooth and Textual Inversion in Automatic1111 SD UI #297

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

8 GB LoRA Training - Fix CUDA & xformers For DreamBooth and Textual Inversion in Automatic1111 SD UI #297

Uh oh!

FurkanGozukara Oct 26, 2025 Maintainer

8 GB LoRA Training - Fix CUDA & xformers For DreamBooth and Textual Inversion in Automatic1111 SD UI

Video Transcription

Replies: 0 comments

FurkanGozukara
Oct 26, 2025
Maintainer