MultiTalk Levelled Up - Way Better Animation Compared to Before with New Workflows - Image to Video #93
FurkanGozukara
announced in
Tutorials
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
MultiTalk Levelled Up - Way Better Animation Compared to Before with New Workflows - Image to Video
Full tutorial: https://www.youtube.com/watch?v=wgCtUeog41g
MultiTalk is greatly upgraded. After doing more than 1 day more research with MultiTalk by using 8x A6000 48 GB GPUs, I have significantly improved the MultiTalk workflows and now I am sharing 4 different category workflows with you. VRAM usages and speeds are same but just better quality and animation. Moreover I am introducing a new app which is image and video comparison sliders. Ultra fast and lightweight. Runs as a html app and no GPU is required.
🔗 Main Tutorial That You Have To Watch⤵️
🔗Follow below link to download the zip file that contains MultiTalk bundle downloader Gradio App - the one used in the tutorial⤵️
🔗Follow below link to download the zip file that contains ComfyUI 1-click installer and the WORKFLOW shown in tutorial that has all the Flash Attention, Sage Attention, xFormers, Triton, DeepSpeed, RTX 5000 series support⤵️
🔗Follow below link to download the zip file that contains Image and Video Comparison Slider App⤵️
🔗 Python, Git, CUDA, C++, FFMPEG, MSVC installation tutorial - needed for ComfyUI⤵️
🔗 SECourses Official Discord 10500+ Members⤵️
🔗 Stable Diffusion, FLUX, Generative AI Tutorials and Resources GitHub⤵️
🔗 SECourses Official Reddit - Stay Subscribed To Learn All The News and More⤵️
Video Chapters
00:00:00 Introduction to the MultiTalk Tutorial
00:00:12 One-Click ComfyUI and MultiTalk Installation
00:00:29 Demonstration of MultiTalk's Singing Animation Capabilities
00:00:58 VRAM Requirements and Workflow Optimizations
00:01:12 Overview of the Tutorial Content
00:01:35 Improvements and New Workflow Options
00:01:52 How to Update and Use the New SwarmUI and MultiTalk Bundle
00:02:24 Exploring the New Workflow Presets in ComfyUI
00:03:08 Downloading and Using the Demo Videos with Embedded Workflows
00:03:36 Introduction to the New Video and Image Comparison Application
00:04:00 How to Use the Image Comparison Tool
00:04:33 How to Use the Video Comparison Tool
00:05:24 Advanced Upscaling and Comparison Demonstration
00:06:11 Final Remarks and Where to Find Installation Instructions
MultiTalk: Bringing Avatars to Life with Lip-Sync
Complementing WAN 2.1 is MultiTalk, a specialized model for generating talking avatars from images and text or audio inputs. Available on platforms like fal.ai, it offers variants such as single-text for solo avatars, multi-text for conversations, and audio-based syncing. By converting text to speech and ensuring natural lip movements, MultiTalk addresses a key challenge in AI video: realistic dialogue delivery.
When paired with WAN 2.1 in ComfyUI workflows, MultiTalk achieves Veo 3-level lip-sync, enabling local AI video projects with enhanced expressiveness. This integration has been hailed for solving lip-sync issues that plagued earlier models, allowing creators to produce dynamic talking-head videos from static portraits. For instance, workflows turn three images into videos in minutes, ideal for animations or virtual influencers.
Veo 3: Google's Leap in Photo-to-Video with Audio
Google's Veo 3, integrated into the Gemini app, marks a milestone in I2V by generating eight-second clips with sound from uploaded photos. Launched in May 2025 and expanded to over 150 countries, it allows users to describe scenes, animate objects, or add audio effects, transforming stills into immersive videos.
Key features include groundbreaking audio generation with perfect lip-sync, improved motion controls, physics simulation, and style precision. Available to Google AI Pro subscribers, it has generated over 40 million videos, from fairy tale reimaginings to ASMR experiments. Safety measures like SynthID watermarks and red teaming ensure responsible use.
Comparative Insights and Future Impact
While WAN 2.1 offers open-source flexibility and multi-task efficiency, MultiTalk adds specialized avatar animation, and Veo 3 excels in seamless audio integration and user-friendliness. Together, they democratize video creation, rivaling professional tools. WAN 2.1's local runnability contrasts Veo 3's cloud-based approach, but both achieve high fidelity.
As AI evolves, these models hint at a future where anyone can create cinematic content from a single image, blending creativity with technology. With ongoing updates, expect even longer videos and better realism ahead.
Some background music by NoCopyrightSounds : https://gist.github.com/FurkanGozukara/681667e5d7051b073f2e795794c46170
Video Transcription
00:00:00 Greetings everyone. Welcome to the Wan 2.1 based MultiTalk tutorial. In this tutorial,
00:00:06 I will show you how to literally one-click to install ComfyUI and MultiTalk and right away
00:00:12 start generating amazing animations from static images. MultiTalk generates not only speaking
00:00:19 videos but also impressive singing performances. Moreover, MultiTalk can even generate 30-second
00:00:26 videos like this upcoming one. [Music/Singing]
00:00:29 SECourses shows the way, generating, composing, and play. Images and
00:00:44 videos flow, animation 3D glow. Learning AI every day, SECourses shows the way.
00:00:58 With our optimizations, one-click installers and modified workflows,
00:01:02 generating 480p videos requires only 8 gigabytes of VRAM, making it possible to generate amazing
00:01:09 videos even on weaker GPUs. To prepare this tutorial, I have literally tested different
00:01:16 parameters more than one day and found the very best workflow for you. In this tutorial, I will
00:01:22 show local Windows, cloud Massed Compute, and RunPod. Moreover, the tutorial will have manually
00:01:29 written subtitles and also video chapters. So as you have seen, we have significantly
00:01:35 improved our workflows. We made it possible for maximum resemblance or maximum animation. If you
00:01:42 haven't watched the main tutorial, please watch it because after watching this tutorial, you will
00:01:48 understand how to use this. All you need to do is just get the latest SwarmUI model downloader, start
00:01:55 it as usual, and you will notice something. We have updated our ComfyUI MultiTalk bundle. It
00:02:01 now includes the FusionX GGUF Q8. Moreover, in the video generation models, we did split the Wan 2.1
00:02:10 models, official models, FusionX models like this, and LORAs. So it will just download the new model.
00:02:17 After that, download the latest ComfyUI version 38 zip file. It can be a bigger version too.
00:02:24 So what is different with this new zip file? When you enter inside the workflows, inside
00:02:29 Kijai MultiTalk, now you will see four presets, which are Super Loyal Official, which was in the
00:02:36 initial tutorial, then Loyal Medium Animated, as I have shown you the difference, or Lesser Loyal
00:02:44 Super Animated, or Less Loyal More Animated. So depending on your case, you can choose
00:02:49 the different workflows and generate videos. The examples I have shown you were all first
00:02:56 generations. None of them were cherry-picked. So therefore, as you generate with different prompts,
00:03:01 different seeds, you may get much better results. Moreover, I have shared a link in this TXT file
00:03:08 where you will be able to download the demo videos. Once you download the demo videos and
00:03:12 extract the zip file, you will get all these demo videos. These videos containing the workflows,
00:03:19 so you can drag and drop these into your ComfyUI and it will show you the workflow. For example,
00:03:24 let me demonstrate you. Let's just drag and drop this into our ComfyUI
00:03:29 and it will load the workflow as you are seeing. Moreover, I have shared a new amazing application.
00:03:36 You see the link is here. Go to there. This is for comparing videos and images. Just download
00:03:42 it. This is super lightweight. Extract it into any folder, any drive where you want. Then install
00:03:48 update bat file. More info, run anyway. It will install it almost instantly. Then you can start
00:03:55 image comparison or video comparison. Let me show you image comparison first. More info,
00:04:00 run anyway. So this is the image comparison slider. This is so easy to use. Just select all
00:04:06 your images and then click full screen and you will be able to compare images. You can switch
00:04:13 them from here directly. You see, 512 resolution versus 2048 pixel resolution. It also has zoom
00:04:21 feature, so you can zoom the part that you want to see like this. Then you can just change the
00:04:28 slider like this and make a great comparison. So this is an amazing tool to compare images.
00:04:33 Let me show you also video comparison. More info, run anyway. So this is the same logic.
00:04:39 Let's upload some of the videos. For example, let's compare these ones.
00:04:43 So drag and drop them into here or you can click here and choose from folder, either way works. It
00:04:50 also shows previews. Then click full screen and here the video. So let's, for example,
00:04:55 select the base at the left and select the most animated at the right. Then click play.
00:05:02 With our optimizations, one-click installers and modified workflows,
00:05:06 generating 480p videos requires only 8 gigabytes of VRAM, making it possible
00:05:12 to generate amazing videos even on weaker GPUs. So as you are seeing, it is working amazing. You
00:05:18 can also zoom this comparison slider as well. So this is working that way. This is extremely
00:05:24 useful when you upscale or change minimal things. Let me demonstrate. So let's go back to our test.
00:05:30 I am doing a lot of upscale testing as well. So let's select two of the videos that I want to
00:05:36 show you. Then I will select them from here like this and here, then full screen. Okay. So you see,
00:05:44 this is upscaled with the upcoming star model. It is really, really good, but it is professional
00:05:51 level. Why? Because it is requiring a lot of VRAM, unfortunately. However, don't worry,
00:05:55 I will also add low VRAM alternatives as well to the application. So let's also play this and see.
00:06:01 Greetings everyone. Welcome to the Wan 2.1 based MultiTalk tutorial.
00:06:06 Greetings everyone. Welcome to the Wan 2.1 based MultiTalk tutorial.
00:06:11 So as you are seeing, we needed this video comparison application as well
00:06:16 and it is just working perfect as you are seeing right now. That's all. As I said,
00:06:21 please watch this main tutorial. Everything has been explained in this one. I'm not going to
00:06:26 repeat that. In this tutorial, I have shown how to install ComfyUI and the workflow on Windows,
00:06:31 then on Massed Compute, and then on RunPod. All three of them are
00:06:35 shown. All of them are up to date. Moreover, in the ComfyUI installation,
00:06:39 I have updated the Windows update file to be more robust. Also on RunPod and Massed Compute,
00:06:46 just follow the instructions and run the installer again when you want to update and it
00:06:51 will update the installation fully. So that's it. Hopefully see you in the future amazing tutorials.
Beta Was this translation helpful? Give feedback.
All reactions