Wan 2.2 training and general discussion -- advice, ideas, questions #455
Replies: 4 comments 19 replies
-
|
I just started with Wan 2.2 myself! As for your questions, the only one I really have an answer for is 3 - and in my opinion it's a solid no. The final size of a LoRA is a function of it's rank/dimension and the number of parameters the base model has. Wan 2.2 is much larger than SDXL so a given rank LoRA is MUCH bigger and more capable. I personally almost always train Wan at rank 16/alpha 16, sometimes even less for simple character LoRA. A smaller rank is not only easier to share but plays nicer with other LoRA and can be less prone to learn parts of the training data you aren't targetting. With smaller models like SDXL or SD, bumping the rank was beneficial to increase the learning capacity but it's just not really necessary here. This applies to all the "larger" diffusion transformers like Wan 2.1/2.2, Hunyuan, Flux, etc. If you were using the 5B Wan 2.2 or the 1.3B Wan 2.1, then it can be beneficial, but not for the biggies! Of course, feel free to experiment! So anyway, I have 16GB of VRAM on my 4070 ti Super. My first foray last night, I used the same exact settings I'd been using for Wan 2.1 to train Wan 2.2 high: The training ran for about 8 hours, the optimized compile I'm working on really helps! Loss looks great, initial results are quite promising though unrefined without the low noise LoRA. I need to train the low noise model tonight to fully see how it turns out. A few of those option are unique to my Blissful Tuner(https://github.com/Sarania/blissful-tuner/), specifically I used multiple resolutions on the same video dataset, e.g.: For the low noise model tonight, I was thinking of removing the lower res 480x272x65f bucket and adding an 848x480x21f bucket to show a little more detail to it to. The idea is to play to the model's way of working, the high noise model decides overall structure and layout, the low noise model refines it into something aesthetically pleasing and adds fine details. I don't know if this is beneficial but I'm just following my intuition as always. That's where I am currently! I'll definitely come back to this thread and share more when I have more results, cheers! |
Beta Was this translation helpful? Give feedback.
-
|
So that run didn't really work out, I tried a new run last night where I did 3e-5 with loraplus of 4 for 1600 steps with shift of 12. This allows me to train high and low back to back in a single night, took about 12 hours! It's looking pretty good too, I did 360x360x65 and 512x512x33 for the high noise model, and for the low noise model I added in 640x640x21 to show some more detail. The rest was the same as above. It's looking promising and I'm especially pleased that I can train both high and low noise in a single night on 16GB! |
Beta Was this translation helpful? Give feedback.
-
|
I don’t know what a typical configuration example for Schedule Free Optimizer is. I kind of understand that the other settings aren’t necessary, but what should I do about --learning_rate? |
Beta Was this translation helpful? Give feedback.
-
|
everyone training in fp16 but normally bf16 is much more accurate and working better anyone tested bf16? wan original models are fp32 so can be converted into bf16 i think |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Advice and general discussion is appreciated, both for the config provided and for general discussion! Hopefully we can share knowledge here to improve everyone's runs.
A few questions to start a discussion:
--timestep_sampling shift --discrete_flow_shift 12.0?I'll start by sharing my current config, just doing a test run on images only on t2v 14B high noise. This is with a 24gb VRAM card, running on Linux mint. The images are batch size 4 with resolutions [768,768]. I managed to squeeze it all to fit into VRAM by running
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:Truein the terminal (only works on Linux I think).Beta Was this translation helpful? Give feedback.
All reactions