CausVid LoRA V2 of Wan 2.1 Brings Massive Quality Improvements, Better Colors and Saturation #99
FurkanGozukara
announced in
Tutorials
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
CausVid LoRA V2 of Wan 2.1 Brings Massive Quality Improvements, Better Colors and Saturation
Full tutorial: https://www.youtube.com/watch?v=1rAwZv0hEcU
CausVid LoRA V2 of Wan 2.1 is just amazing. In this tutorial video I will show you how to use the most powerful video generation model Wan 2.1 with CausVid LoRA effortlessly. Normally, Wan 2.1 requires 50 steps to get excellent results. With CausiVid LoRA we get such excellent results only in 8 steps. Morever, with newest version 2, now the quality is almost identical to base Wan 2.1. I will show how to download and use in SwarmUI with 1-click to apply download and apply presets. We will also leverage of ComfyUI and fastest attention (Sage Attention).
🔗Follow below link to download the zip file that contains SwarmUI installer and AI models downloader Gradio App - the one used in the tutorial⤵️
🔗Follow below link to download the zip file that contains ComfyUI 1-click installer that has all the Flash Attention, Sage Attention, xFormers, Triton, DeepSpeed, RTX 5000 series support⤵️
🔗 Python, Git, CUDA, C++, FFMPEG, MSVC installation tutorial - needed for ComfyUI⤵️
🔗 SECourses Official Discord 10500+ Members⤵️
🔗 Stable Diffusion, FLUX, Generative AI Tutorials and Resources GitHub⤵️
🔗 SECourses Official Reddit - Stay Subscribed To Learn All The News and More⤵️
Wan 2.1 and CausVid with CausVid LoRA
In the rapidly evolving field of video generation, two models have made significant strides: Wan 2.1 and CausVid. Wan 2.1, developed by Alibaba Group, is a large-scale video generative model that sets new benchmarks in quality and diversity. CausVid, designed for fast and interactive causal video generation, introduces an autoregressive approach to overcome the limitations of traditional models. A key innovation is the CausVid LoRA (Low-Rank Adaptation), which reduces the computational steps required for video generation with Wan 2.1 from 50 to just 8, while maintaining exceptional quality.
CausVid: Speed and Interactivity
CausVid adapts a pretrained bidirectional diffusion transformer into an autoregressive transformer, generating frames sequentially. This reduces initial latency to 1.3 seconds and enables continuous frame generation at 9.4 FPS. Using Distribution Matching Distillation (DMD), CausVid distills a 50-step diffusion
Video Chapters
00:00:00 Intro: CausVid LoRA v2 vs v1 - Huge Quality Leap
00:00:17 Unveiling Massive Quality Boost in Local Video AI (Wan 2.1 & CausVid LoRA)
00:00:40 Deep Dive: CausVid LoRA v2 - 8 Steps, Speed & Enhanced Quality
00:01:17 Tutorial Goal: One-Click Install & Use New LoRA v2 in SwarmUI
00:01:56 For Existing Users & Full Walkthrough Start
00:02:07 Step 1: Download & Extract SwarmUI Model Downloader
00:02:29 Step 2: Running the Model Downloader Script
00:02:42 Step 3: Downloading Wan 2.1 Core Models (Includes LoRA v2)
00:03:04 Model Downloader: Advanced Features & Customization
00:03:42 Step 4: Update SwarmUI to Latest Version
00:03:58 Step 5: Importing SwarmUI Presets for LoRA v2
00:04:23 Step 6: Applying "Fast CausVid with Wan 2.1" Preset
00:04:42 Step 7: Image-to-Video - Image Setup & Aspect Ratio
00:05:01 Model Selection for Image-to-Video (Wan 2.1 Variants)
00:05:28 Step 8: Critical Settings for Image-to-Video (Creativity, Prompt, Frames, RIFE)
00:05:51 Pro Tip: Monitor GPU Watt Usage with nvitop for Optimal Performance
00:06:30 GPU Optimization: "Reverse VRAM" Trick in SwarmUI Server Settings
00:06:57 Monitoring Generation Progress, Speed & HD Resolution Example
00:07:25 Image-to-Video Result: Excellent Quality in Under 2.5 Minutes
00:07:38 Text-to-Video: Setup with CausVid LoRA v2 & Model Selection
00:08:12 Text-to-Video Tips: Using Sage Attention, No T-cache with Fast LoRA
00:08:47 Troubleshooting Text-to-Video: The Importance of Selecting the LoRA Model!
00:09:04 Mastering LoRAs in SwarmUI: Adjusting Weights, Scale & Impact
00:09:30 Advanced LoRA Usage: Selecting and Weighting Multiple LoRAs
00:09:53 Text-to-Video Result with LoRA: Significant Improvement, Prompting Tips
00:10:09 Sneak Peek Part 1: The Ultimate Video Upscaler (In Development)
00:10:21 Upscaler Deep Dive: Diffusion-Based, Frame/Sliding Window, Flicker Prevention
00:10:49 Upscaler Features: Auto Scene Splitting, CogVLM2 Captioning, Batch, FPS Control
00:11:15 Upscaler Tool: Output Comparison Video Generation
00:11:47 Sneak Peek Part 2: Local Video Comparison Slider Application
00:12:11 Slider Demo: Visualizing LoRA v1 vs v2 Quality Improvement
00:12:27 Upscaler & Comparison App Development: Call for Feedback & Suggestions
00:12:57 Conclusion & Future Release Plans for New Tools
Some background music by NoCopyrightSounds : https://gist.github.com/FurkanGozukara/681667e5d7051b073f2e795794c46170
Video Transcription
00:00:00 And you see on the right, it is CausVid with LoRA version 1. On the left, it is
00:00:05 CausVid with LoRA version 2. There is a huge quality difference, huge improvement. So this
00:00:11 is just amazing. With just 8 steps, we get this much quality improvement. This is just amazing.
00:00:17 Greetings everyone. Today, I am going to introduce you to a massive quality improvement in local
00:00:24 models video generation. You will like this. These videos are generated with Wan 2.1, our
00:00:31 most powerful model, with the CausVid with LoRA. I made a video about this. This is a LoRA that
00:00:40 allows you to generate videos only in 8 steps. Therefore, it is super fast, and you are seeing
00:00:49 right now on the left, the first version of the CausVid with LoRA with 8 steps, and on the right,
00:00:57 you are seeing the new version, version 2 of CausVid with LoRA with 8 steps. As you are seeing
00:01:04 right now, there is a huge, huge improvement in terms of quality, in terms of colors, saturation,
00:01:12 everything, everything has been significantly improved, as you are seeing right now.
00:01:17 Let's also see the difference in this video. You see, this is the beginning of the video. It is
00:01:21 really, really bad compared to the new version. On the left, we see the CausVid with LoRA version 1,
00:01:28 and on the right, we are seeing the CausVid with LoRA version 2. So,
00:01:33 today I will show you how to use this LoRA with one click to download and install.
00:01:38 We can say that this is the second video, the preceding video of this
00:01:43 tutorial, Wan 2.1 CausVid with LoRA tutorial. This is a second step, second video of this video,
00:01:50 and how we are going to use this in our SwarmUI. It is exactly same as before. So,
00:01:56 if you already have watched that video, if you already know how to do it, just download
00:02:01 the new model and use it. But let me show you one more time how to use this amazing model.
00:02:07 First of all, follow the link in the description and open this page. Download
00:02:11 the latest SwarmUI model downloader zip file. Move it into your SwarmUI folder. Right-click,
00:02:17 and I will extract all of them here. Overwrite all the files. Currently, it didn't overwrite,
00:02:23 so you can just extract and overwrite yourself. Then, you see we have Windows start download
00:02:29 models up.bat file. Double-click, more info, run anyway. It will start the latest version of the
00:02:35 model downloader. We have updated this, we have added new models, and now everything is a button
00:02:41 because it was requested. All you need to do is just download Wan 2.1 core models bundle.
00:02:47 This will download the new model file, which is the new LoRA file. You can see that it is here:
00:02:53 Wan 2.1 CausVid with text to video image to video LoRA version 2. You can follow the logs here,
00:02:59 what is happening. It will just download the new file and skip the existing ones. You can
00:03:04 also follow the speed on the CMD window. You can see that it is super fast because we are
00:03:10 using all of the optimizations. Moreover, we support custom model download folders,
00:03:16 so you can give any model folder, you can give ComfyUI, or wherever you are using. For ComfyUI,
00:03:21 you can check this checkbox. If you get errors, you can just start again and resume. This is the
00:03:27 ultimate model downloader. It supports so many models. You can just type name here,
00:03:32 like Flux, and it will list you all of the Flux models. You see Flux bundles we have. Moreover,
00:03:37 we have image generation models, Flux models, so it is listing you all of the models like this.
00:03:42 Okay, you see all the models have been downloaded. We can see the queue size is zero. There is no
00:03:47 errors. Now, what we need to do is, we need to update our SwarmUI. As I said, if you didn't
00:03:53 follow the previous videos, follow them. So I will use Windows update SwarmUI.bat file,
00:03:58 more info, run anyway. It will update it to the latest version and it will automatically start.
00:04:03 So, how we are going to use this LoRA? I have prepared presets for you. So, click import
00:04:09 presets, choose file, go back to the installation folder and you will see amazing SwarmUI presets.
00:04:16 Click overwrite and import, and it will import the presets. It will not overwrite your existing or
00:04:23 previous presets. Then, you will see that here in the quick tools, there is reset params to default.
00:04:30 I recommend that, and it will set everything to the default. Then select this fast CausVid with
00:04:35 Wan 2.1. Click this icon and direct apply, and it will apply everything directly like this. Then I
00:04:42 will show image to video because it is most widely used, but you can also do the text to video,
00:04:48 totally same. So, choose file. Let's choose our file like this. Then click this res and use
00:04:55 closest aspect ratio. It will set the aspect ratio according to the model you selected. Currently,
00:05:01 it is selected Wan 2.1, this 720p model. You can use any model. There are a lot of variants
00:05:09 of the Wan 2.1 like Q6, Q5. We support all of them. Let me show you one more time. When I type
00:05:15 here Wan 2.1, and when I go to video generation models, Wan 2.1, you can see that we support FP16,
00:05:22 FP8. You can see the description is here. So you can download any one of them. After that, set init
00:05:28 image creativity zero. This is super important. Then type your prompt like a talking children,
00:05:34 whatever you want. Then you can change the number of frames you want. This will impact the length
00:05:39 of the video. Currently, it is by default set to 49 frames, so it will generate 3 seconds video,
00:05:46 and you can make the FPS double with RIFE like this. Then click generate.
00:05:51 There is one more important thing that I need to tell you. As I have shown in my previous
00:05:56 tutorials, you should follow your GPU watt usage. What does this mean? Let me open a new terminal,
00:06:03 type pip install nvitop, then type nvitop and follow how much watt your GPU is using. It should
00:06:12 be close to the maximum. If it is not close to the maximum, that means that it is using shared VRAM.
00:06:18 We don't want that. Since we are using ComfyUI as a backend, by default, it is supporting
00:06:23 block swapping. Therefore, it should be extremely optimized. You see, currently I'm using over 5W of
00:06:30 the 575 maximum. So it is working perfectly fine. And there is one trick, if it is not working like
00:06:37 this, even after you did restart your computer, go to the backends in the server tab and add this:
00:06:43 reverse VRAM. So, this will tell application to reverse 3 GB of VRAM memory all the time,
00:06:50 and this fixed my issue. Sometimes you don't need this, but sometimes you need this. Unfortunately,
00:06:57 I couldn't pinpoint the error reason. So if it is using shared VRAM, then set this. Don't
00:07:02 forget. And it will generate our video with most optimized way. You can go to the logs,
00:07:07 click here debug, and you can see the step speed. Even though I am recording a video right now,
00:07:11 it is generating with this speed with a HD resolution. You can see the resolution
00:07:16 is here: 784 by 1136. So this is a really HD resolution, and we should have it in a moment.
00:07:25 And our video has been generated as you are seeing right now, with maximum quality in a
00:07:30 very short time. This video took only 2.3 minutes, and this is excellent quality. So let's also see
00:07:38 the how it performs on text to video model. To do that, I will click quick tools here,
00:07:43 reset params to default again. Then I will change my model. I will change my model to
00:07:49 text to video. So let's type text here. Let's select text to video GGUF Q8. Then go to presets
00:07:56 and we need to apply our fast CausVid with Wan 2.1 text to video direct apply. It is applied,
00:08:02 as you can see. Just type our prompt. This is a prompt that I have taken from here. I don't know
00:08:07 how it will work, so let's hit generate and let's see what we will get. You see,
00:08:12 since I did click reset params to default, it disabled everything else and it did set
00:08:17 everything according to the preset accurately. This is super important.
00:08:21 With fast CausVid with LoRA, we are not using Tea-cache because we are already doing
00:08:27 only 8 steps instead of 50 steps, but we can always use the Sage Attention,
00:08:33 which I am using, and it is automatically installed with my installer, as I have shown
00:08:38 in my previous videos. Sage Attention is the fastest attention for video generation,
00:08:44 so you can use it and you are seeing that it is really, really high quality.
00:08:47 Okay, video has been generated, but I made a mistake. I didn't select the LoRA. Therefore,
00:08:53 the video quality is terrible with 8 steps. So let's select LoRA and generate again. Do
00:08:59 not make this mistake. You need to select both base model and also the LoRA model,
00:09:04 like here. Also, I am frequently getting asked how you can change the LoRA weights, LoRA power,
00:09:11 LoRA scale. You see there is, it says that 1. So this is the LoRA scale, the LoRA power, the LoRA
00:09:17 impact it is using. You can change it from here. This applies to all LoRAs. Either it is Flux LoRA,
00:09:24 it is video generation model LoRA, whatever it is, this is how you can set the selected LoRA. You can
00:09:30 also select multiple LoRAs. For example, I can select both of them, and I can set LoRA weight,
00:09:36 LoRA scale, each one of them like this. So this is how SwarmUI works. I'm just showing you a
00:09:42 general knowledge which can be very useful because I am getting asked of this all the
00:09:47 time. This is how you can set LoRA weight, LoRA scale, LoRA impact, whichever you may call it.
00:09:53 Okay, so the video has been generated. Let's see the difference. Okay, this is very,
00:09:58 very better video. It could be better with a better prompting. I'm just using some random
00:10:03 prompt. Probably it was made for image to video. And there is one another thing I would like to
00:10:09 show you. This is our ultimate video upscaler. It will have so many features. I am still developing,
00:10:14 I am adding so many features. This is industry level upscaling. It is diffusion based. It
00:10:21 supports frame window based upscaling. What it means that by default, it is processing 32 frames
00:10:29 at a time. Therefore, it prevents flickering, it prevents other issues. It also supports
00:10:36 sliding window upscaling as well, which means that transition between processed chunks have
00:10:43 more consistency. It supports everything. This is literally going to be the ultimate video upscaler,
00:10:49 many times better than Topaz AI because we have automatically splitting every video into scenes,
00:10:57 and we are automatically captioning them because since this is diffusion based,
00:11:01 the caption matters a lot. Currently, we have CogVLM2 captioning with 4-bit quantization
00:11:08 working amazing. And we have so many features: batch processing, FPS decrease and increase. Why?
00:11:15 Because you can decrease FPS to half and make your upscaling 2 times faster, then you can use RIFE
00:11:23 frame interpolation to increase FPS. Also, we have output comparison, and I have developed a,
00:11:30 a feature here that you can upload any two videos, like this one, and generate
00:11:35 a comparison video according to their aspect ratio. So let's click manual comparison video,
00:11:41 and let's click download. This is how I made this comparison video. And it is really, really good
00:11:47 to see impact of different videos. But this is not all yet. There is also another application
00:11:53 that I have developed: start video comparison up file. Choose the files from here. This is a video
00:12:00 slider. You see? This is just amazing, and it is working really, really fast and efficient. All
00:12:06 of these are running locally. I will share everything once all of them are completed.
00:12:11 And you see on the right, it is CausVid with LoRA version 1. On the left,
00:12:16 it is CausVid with LoRA version 2. There is a huge quality difference, huge improvement. So
00:12:22 this is just amazing. With just 8 steps, we get this much quality improvement. This is
00:12:27 just amazing. This application is not published yet because I am still developing it. I am still
00:12:33 adding features such as I will add presets, save your config, load your config. There may
00:12:38 be other new features. Therefore, if you have any recommendation, suggestions, please make
00:12:45 your recommendations to me. I am time to time sharing on our Patreon and also on our Reddit
00:12:51 the progress. So therefore, it is really important for you to make recommendations to me. Hopefully,
00:12:57 I will publish it as soon as possible because I know that so many people is waiting for this,
00:13:02 and hopefully it will be amazing. So thank you so much. See you later.
Beta Was this translation helpful? Give feedback.
All reactions