WAN 2.1 FusionX is the New Best of Local Video Generation with Only 8 Steps + FLUX Upscaling Guide #96
FurkanGozukara
announced in
Tutorials
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
WAN 2.1 FusionX is the New Best of Local Video Generation with Only 8 Steps + FLUX Upscaling Guide
Full tutorial: https://www.youtube.com/watch?v=Xbn93GRQKsQ
FusionX: The BEST AI Video Model? + FLUX Hyper-Realistic Upscaling (One-Click Setup!). Struggling to create high-quality AI videos and hyper-realistic images? This tutorial is your ultimate solution! I'm introducing the incredible new Wan 2.1 FusionX model and a game-changing 2x latent upscaler for the FLUX model, all made incredibly simple with my custom one-click presets for SwarmUI.
🔗Follow below link to download the zip file that contains SwarmUI installer and AI models downloader Gradio App - the one used in the tutorial⤵️
🔗Follow below link to download the zip file that contains ComfyUI 1-click installer that has all the Flash Attention, Sage Attention, xFormers, Triton, DeepSpeed, RTX 5000 series support⤵️
🔗 Python, Git, CUDA, C++, FFMPEG, MSVC installation tutorial - needed for ComfyUI⤵️
🔗 SECourses Official Discord 10500+ Members⤵️
🔗 Stable Diffusion, FLUX, Generative AI Tutorials and Resources GitHub⤵️
🔗 SECourses Official Reddit - Stay Subscribed To Learn All The News and More⤵️
Video Chapters
00:00:00 Introduction to the New FusionX Video Model & FLUX Upscaling
00:00:30 One-Click Presets & The SwarmUI Model Downloader Explained
00:01:07 Achieving Hyper-Realism with the FLUX 2x Latent Upscale Preset
00:01:58 How to Download & Install the SwarmUI Model Downloader
00:02:49 Downloading Full Models vs. Downloading Just The LoRAs
00:03:48 Final Setup: Updating SwarmUI & Importing The New Presets
00:04:32 Generating a Video: Applying the FusionX Image-to-Video Preset
00:05:03 Critical Step: Correcting The Model's Native Resolution Metadata
00:05:55 Finalizing Image-to-Video Settings (Frame Count & RIFE Interpolation)
00:06:49 Troubleshooting Performance: Identifying Low GPU Usage & Shared VRAM Bug
00:08:35 The Solution: Disabling Sage Attention for Image-to-Video Models
00:10:02 Final Result: Showcasing The Amazing HD Quality Animation
00:10:40 How to Use the FusionX Text-to-Video Model with Presets
00:11:49 Text-to-Video Result & Quality Comparison
00:12:08 How to Use the FusionX LoRA with the Base Wan 2.1 Model
00:13:07 FLUX Tutorial: Downloading The Required Upscaler & Face Models
00:13:48 Generating a High-Quality Image with The Official FLUX Preset
00:14:50 Using Automatic Face Segmentation & Inpainting with FLUX
00:16:05 The Ultimate Upgrade: Applying The FLUX 2x Latent Upscaler Preset
00:16:32 Final Result: Comparing Standard vs. 2x Upscaled Image Quality
00:16:50 Outro & Sneak Peek of The New Ultimate Video Processing App
Discover FusionX, a powerful model that generates stunning videos from text or images in as few as 8 steps. I'll guide you through the entire process, from downloading the model (or just the LoRA) with our custom high-speed downloader to applying the optimized presets. We'll even tackle a common performance bug to ensure you're getting maximum speed from your GPU.
Then, take your FLUX generations to a level you never thought possible! If you've had trouble getting realistic results, my new presets are here to help. Learn how to generate a great base image and then apply a powerful 2x latent upscale workflow that adds breathtaking detail, quality, and realism. You won't believe the before-and-after difference.
🔥 In This Tutorial, You Will Learn:
Introducing FusionX: A deep dive into the new Wan 2.1 video model.
Easy Setup: How to use the custom Model Downloader to get all necessary files (FusionX, FLUX, Upscalers, Face Models) with a single click.
Image-to-Video Mastery: A step-by-step guide to animating images with FusionX for mind-blowing results.
Text-to-Video Made Simple: How to use the text-to-video model and LoRA for amazing animations.
FLUX Hyper-Realism: Generate stunningly realistic images using the official FLUX Dev preset.
Ultimate Upscaling: Apply the 2x Latent Upscaler preset to add incredible detail and quality to your FLUX images.
Troubleshooting: How to fix common performance issues (like the shared VRAM bug) for optimal generation speed.
This guide provides everything you need to start creating professional-grade AI content today. No more guesswork, just incredible results.
Thank you for watching! If this tutorial helped you, please leave a Like, Subscribe for more advanced AI content, and share your amazing creations in the comments below!
Some background music by NoCopyrightSounds : https://gist.github.com/FurkanGozukara/681667e5d7051b073f2e795794c46170
Video Transcription
00:00:00 Greetings everyone. Today I am going to introduce you a new Wan 2.1 model which is
00:00:05 called as FusionX. This model has become extremely popular recently. In addition,
00:00:11 it also has LoRAs, so you can use it with your base model. I have made a lot of experimentation
00:00:17 to find out the very best parameters for this new model and prepare one-click to use
00:00:24 and one-click to download presets for you. With the model downloader application that we have,
00:00:30 you will be able to download this model, import presets into the SwarmUI and directly use this
00:00:37 model with the highest quality. This model only requires 8 steps to generate videos from text
00:00:45 or images and its quality is just mind-blowingly amazing. Additionally, I have prepared presets for
00:00:52 FLUX Dev model as well because this was getting asked of me. There is official FLUX Dev model
00:00:58 preset and there is 2x latent upscale preset. With our model downloader, you will be able to download
00:01:07 the face restoration plus upscaler models, then with applying the preset that I have developed,
00:01:14 you will be able to generate amazing quality images. The 2x upscale will add huge realism,
00:01:21 quality, details to your generations as you are seeing right now. Just look at these newly
00:01:27 generated details in the image. It is just mind-blowingly amazing and it is super real.
00:01:33 So if you were having issues to generate realistic images with FLUX and if you are
00:01:38 struggling to upscale it properly, this is the tutorial that you were looking for.
00:01:42 We are going to use SwarmUI and ComfyUI as a backend, and if you don't know how to install
00:01:48 them, I already have an amazing tutorial here, so you can watch this. The link will be also
00:01:54 in the description of the video. So, this is the main link that you are going to use. You see the
00:01:58 tutorial is also linked here. The link will be in the description of the video. First of all,
00:02:03 download the latest SwarmUI model downloader zip file. Move it into your SwarmUI installation
00:02:10 like this, then right click and extract all. I will extract like this, then move and overwrite
00:02:17 all of the files in the main folder like this, replace. Then double click Windows start download
00:02:23 models up.bat file, more info, run anyway. Pay attention to the version you are seeing here.
00:02:29 It should be at least version 46. Always use the latest version. Click download bundles and you can
00:02:36 download the FusionX FP16 bundle or FusionX FP8 bundle. Either of them is fine. When you click
00:02:44 the download bundle, it will download all of the necessary models. But if you don't want to
00:02:49 download new big models, what you can do, scroll down and you will see there are LoRA models. Click
00:02:55 there, click various LoRAs, and you will see that there is Wan 2.1 14 billion FusionX LoRA
00:03:01 for image-to-video and text-to-video. You can click each one of them and all of the downloads
00:03:06 will start with the maximum speed. You will see the queue here. Currently we have queue of 3. You
00:03:12 can also watch the progress here. It will use your entire bandwidth to download. This downloader is
00:03:19 extremely optimized and it uses the latest Hugging Face libraries, and I am downloading from Hugging
00:03:25 Face so they are super fast. Moreover, if you have previously downloaded, it will just skip
00:03:30 that file. And if your download fails for any reason, it will try to resume the download.
00:03:35 Alright, so all of the models have been downloaded. We can see from the logs and from the
00:03:41 CMD window. So, as a next step, what we need to do is just return back to your installation and use
00:03:48 Windows update swarmui.bat file. This will update the SwarmUI to the latest version and start it.
00:03:55 You see, automatically updated and now starting, and it is started. So, how we are going to use
00:04:01 these new models? To use them, first of all, we should have the presets imported. To import
00:04:08 presets, you can delete your existing presets because I changed some of their names, but if you
00:04:14 want to keep all of them, click Import Presets, click Choose File, return back to the installation
00:04:19 folder and select amazing swarmui presets version 1. Check this overwrite existing presets and
00:04:25 import. And it will import all of these best configuration presets. Then let's generate an
00:04:32 example video. So first of all, I will click Quick Tools, Reset Params to Default so that I will
00:04:37 have the default parameters. Then, which model we should use? Select Wan 2.1 FusionX Image to Video,
00:04:45 direct apply, and it will apply all of the settings. Now you see that it is selecting init
00:04:50 image because the bug has been fixed and it has the init image creativity as zero. So click choose
00:04:56 file, choose the image that you want to animate like this, then select the model from here. Why?
00:05:03 Because there is one important thing. To find the model, let's type image or let's type FusionX, it
00:05:10 is easier. And select the FusionX Image to Video and click Res closest aspect ratio. You see the
00:05:17 resolution is selected like this. This is wrong because this model has 960 resolution. So click
00:05:24 this hamburger menu, Edit Metadata, set it as 960, 960. But you can keep it as 640 if you want to
00:05:32 generate at lower resolution. However, this is the native resolution, so change models and it will
00:05:37 get fixed like this. Okay, we are ready. What else you can change? You can make the steps count 10,
00:05:43 it works better with 10 but 8 is also sufficient. You can go to image to video settings and make
00:05:49 sure that you have changed the model from here. This is not saved in the preset for some reason
00:05:55 or this may not be. So select this FusionX image to video and you need to set your frame count.
00:06:02 So 16 frames is 1 second. So let's generate 49 frames, it is 3 seconds. But there is plus
00:06:09 one as you are seeing. And I can also apply RIFE frame interpolation to increase FPS to 32. Okay,
00:06:16 that's it. I also need to type a prompt for animating this image. For this task, I have used
00:06:23 O3 model. I just said write a single line detailed prompt. Okay, this is the prompt it written. Copy
00:06:29 paste it, but you can also write your own prompt. And let's generate. I have RTX 5090, so let's
00:06:35 monitor what's happening. nvitop. I have to be sure that it is using almost entire GPU watt,
00:06:43 otherwise that means that it is using shared VRAM and if it uses shared VRAM, it will be extremely
00:06:49 slow. So I need to be sure that it is properly doing block swapping and it is using almost entire
00:06:56 GPU. So far it is looking good, 460 watts. Okay, 100, this is bad. Let's see what happens. Okay,
00:07:03 150 watts. So currently there is a bug somewhere because maybe I am using a lot of other stuff with
00:07:09 the VRAM. So how can I fix this? I can restart and try again, but let's just wait a little
00:07:15 bit more to be sure. We can also see from task manager and shared VRAM usage is 500 megabytes.
00:07:22 Okay, it is getting better, 300 watts, 320 watts. I am also recording a video, therefore it is also
00:07:29 using some VRAM. So what can I do? It is still slow. I can go to server backends and I can
00:07:35 change the reserve VRAM like 5 gigabytes. Okay, save. It will cancel the task. I need to wait for
00:07:42 backends to be reloaded. This normally shouldn't happen you if you just restart your computer,
00:07:48 do not run anything else, just run the SwarmUI. This shouldn't happen you. But if it happens like
00:07:53 me because of some other applications, you need to do same, just click generate again and we got an
00:07:59 error because we just restarted, it is fine. Let's see the watt usage. Okay, it is again definitely
00:08:05 using shared VRAM. This is a bug in ComfyUI. I couldn't find the reason yet. If I restart my PC,
00:08:13 it would get fixed, but I don't want to set up everything from again. So I will just wait
00:08:18 for video to be generated, then I will move to the next step of showing you tutorial,
00:08:23 but you understand the logic. If this happens, that means that you should restart your computer,
00:08:28 do not open another application, and you should try to get as many as watt usages.
00:08:35 Okay guys, I have solved the problem of using shared VRAM. I have reinstalled my ComfyUI
00:08:41 backend and I discovered that with image-to-video models currently, Sage Attention is broken. I have
00:08:50 reported this error. It works with text-to-video and also text-to-image models. However,
00:08:56 with image-to-video, it is broken currently. This is because of ComfyUI latest version probably,
00:09:04 but now it is properly generating the video. It is using accurate amount of VRAM. The speed
00:09:11 is very decent. Let me demonstrate you while I am recording a video. So the speed is around 20
00:09:17 second/it. It is for HD resolution. The resolution is 960 to 960. Okay,
00:09:25 I have forgotten it, so I need to reset. Then the frame count is 49. So with these settings,
00:09:31 this is the speed we are getting. Okay, this is 960 to 960 video. Now I am generating another
00:09:39 one with accurate aspect ratio after I did set. So we will see it in a moment.
00:09:44 Let's also see the speed. Yes, the speed is around same, 20 second/it right now on
00:09:50 RTX 5090 while I am recording a video and this is the VRAM usage currently.
00:09:56 Okay, so the video has been generated. It only took 3 minutes and let's see the
00:10:02 quality. The quality is just amazing, the mind-blowing. You see the realism,
00:10:07 you see the high definition. This is HD resolution and this is the initial
00:10:13 image we have used. It already has some mistakes, anatomically and from other parts,
00:10:19 but the quality of the generated video is just mind-blowingly amazing with only 8 steps and
00:10:26 it only took 3 minutes. I also applied RIFE 2x frame interpolation, so it is also looking very,
00:10:34 very high FPS. So how do we use text-to-video model? To use the text-to-video model,
00:10:40 click Quick Tools, Reset Params to Default, then just use Wan 2.1 FusionX Text to Video,
00:10:47 direct apply. Then I will just type it and I don't need to type anything else, but if you want to
00:10:53 set RIFE frame interpolation, you need to go to the text-to-video and enable it from here. You
00:11:00 can also change the number of frames from here and let's see the difference. By the way, we also need
00:11:06 to select the model, so interrupt and let's change it to text-to-video model from here and generate
00:11:13 and that's it. It will load text-to-video model right now, then it will start the generation.
00:11:18 Okay, text-to-video generation started. The speed is looking almost same, decent speed,
00:11:24 like 20 second/it. By the way, I am using FP16 version of the model. Therefore,
00:11:31 it requires some significant amount of block swapping. It is not fitting into
00:11:36 the VRAM. You can also use FP8 version if you don't have a powerful GPU and a lot of RAM,
00:11:43 but this may yield better quality compared to FP8 version.
00:11:49 Okay, so text-to-video is also has been completed. This is text-to-video generation. It is pretty
00:11:55 good, pretty decent, but image-to-video yields better results if you ask my opinion. You can
00:12:02 control more, but it is working. How about if we want to use LoRA? It is exactly same.
00:12:08 Instead of this model, you need to select base Wan 2.1 model. You can use GGUF or other variants as
00:12:16 in the other videos. For example, you can select Wan 2.1 text-to-video GGUF and from the LoRAs,
00:12:23 all you need to do is just select Wan 2.1 text-to-video FusionX LoRA and you can change
00:12:29 its strength, its scale, its power from here. Let's try as a default. Let's also try with the
00:12:36 same seed and see if there will be significant difference or not. Okay, let's generate.
00:12:43 Okay, we got the output with LoRA as well and the results are astonishingly similar. You see, this
00:12:52 is with LoRA, almost same and this is the base model. I think both of them is looking amazing.
00:12:59 Okay, now as a next step, I'm going to show you how to use FLUX presets and also how to upscale.
00:13:07 So open back our Windows start download models up.bat file. You can download the FLUX bundle
00:13:13 from here. You see FLUX models bundle. Download it if you are lacking that and additionally,
00:13:20 what you should download is, you will see that there is other models, image upscaling models.
00:13:26 This is what I recommend. So click the download best upscaler models. These are image upscaling
00:13:33 models. Moreover, download face segment masking models. These two are the extra
00:13:39 models to the FLUX additionally that you need to download. Since I already have downloaded them,
00:13:44 they are automatically downloaded. So Quick Tools, Reset Params to Default,
00:13:48 then go to the models and let's select FLUX. So I have this model that I have trained myself and
00:13:55 go to the presets and you see FLUX Dev Official preset. So, direct apply and all the parameters
00:14:02 are set. So I'm going to use this prompt on the model that I have trained myself and I
00:14:09 will generate an image. This will generate default 1024 to 1024. The FLUX generation on RTX 5090 is
00:14:19 already really fast. So without even using the Sage Attention, let's see the speed. Okay, it
00:14:25 is 2 it/s, almost 1.8 it/s and I am using 16-bit precision, not even 8-bit precision, and it is
00:14:36 getting generated very quickly. It is also going to do a face inpainting which doubles the time
00:14:43 that it takes. Okay, this is weird why it shows like this, but let's see. Interesting, we got an
00:14:50 error with the face inpainting for a reason. Now I am generating without face inpainting,
00:14:56 segmentation, face inpainting. Oh, by the way, we can modify this prompt and use the default
00:15:02 model of the SwarmUI. So let's try it. Segment, then it will ask us face, then we can just type
00:15:10 our prompt like photograph of ohwx man and let's use the same seed. Okay, this is generated. This
00:15:18 image doesn't require face inpainting, but let me show you. I'm just going to show you. Okay,
00:15:23 it is going to generate. The automatic segmentation is determined from here. You see,
00:15:29 segment refining, you can decide the order. You can change the other parameters like segment blur,
00:15:36 grow, oversize, threshold. Okay, after this, it should automatically segment. I think it
00:15:41 is using CLIP model for segmentation. Let's see from the server logs what is happening. Yeah,
00:15:47 it didn't show. Okay, it shows now. Yes. Now it is accurately processing the segment. Yeah,
00:15:54 it is working. Let's compare images from this to this. Yes, I can see that the face looking
00:16:01 better right now. Yes. So how we are going to upscale? Go to presets and this time we
00:16:05 are going to apply FLUX Dev Official 2x Latent Upscaler, direct apply. What it is going to do is,
00:16:13 it is just going to enable refine upscale and set these settings. This is the model that I prefer.
00:16:19 And then let's hit generate. This will increase the time by four times because it is upscaling.
00:16:26 And also when you upscale, you usually don't need face inpainting. But let's see the result.
00:16:32 And the upscale has been completed. Let's see the difference from this image to this image.
00:16:38 It has a huge, huge amount of quality difference as you are seeing right now. It is looking way,
00:16:44 way better as you can see. Before ending this tutorial, I have one more thing to show you.
00:16:50 I'm working on ultimate video processing application. Let me demonstrate you how
00:16:55 far it has come and soon hopefully it will be published. Before publishing,
00:17:01 I'm going to raise the Bronze, Premium prices, so you can subscribe right now. We have made
00:17:06 so much changes. We have added so many new features. One of the new features we added is
00:17:13 image-based upscalers which are extremely fast. Hopefully see you in another tutorial video.
Beta Was this translation helpful? Give feedback.
All reactions