Good settings for training Hunyuan LoRA? #302

Sarania · 2025-05-23T14:43:33Z

Sarania
May 23, 2025
Sponsor

Hi all, I've recently returned to trying to train Hunyuan after having great success with Wan and... I just can't get decent results. I've tried all kinds of different combinations of settings and what got me closest is the same settings I use for Wan - Dim 16, Alpha 16, LR 2e-5, constant_with_warmup for 2400 steps and it's closer to what I'm after but still not like the stellar results I've been getting with Wan there's a lot of issue with fine hand movements or physical interactions. I've been considering maybe revisiting cosine scheduling at this point, I've tried LR from 2e-5 to 2e-4, Dim 8-32, Alpha = 1, half, dim, loraplus, no lora plus, etc. My datasets are all very high quality(I'm kinda obsessive about that), 20-30 videos hand captioned in a descriptive style, cropped to remove watermarks and provide AR variation and normalized to 24fps. I've read lots of guides and stuff and even written my own at this point but I'm just not getting the results I've seen others achieve. Any help is appreciated!

dorpxam · 2025-05-26T11:02:52Z

dorpxam
May 26, 2025

I understand perfectly. I am myself very cautious about the dataset quality. I can't really help you because I do not have found the perfect methodology / settings to produce good results, especially for body moves. I use some tips/tricks founds in a civitai tutorial and this is my (current) configuration:

--network_module networks.lora ^
--network_dim 32 ^
--network_alpha 32 ^
--gradient_checkpointing ^
--gradient_accumulation_steps 4 ^
--optimizer_type modules.prodigy.Prodigy ^
--optimizer_args betas=0.9,0.99 weight_decay=0.01 slice_p=11 decouple=True d_coef=0.5 beta3=None use_bias_correction=True ^
--learning_rate 1 ^
--lr_scheduler_type torch.optim.lr_scheduler.CosineAnnealingLR ^
--lr_scheduler_args T_max=512 ^
--max_train_epochs 128 ^
--save_every_n_epochs 1 ^
--save_state ^

I'm not pretty sure that the Prodigy optimizer is very usefull here, but working well with the Cosine Annealing LR scheduler. The learning rate is fixed at 1 and decrease to 0 along the whole training step in cosine fashion. The Prodigy optimizer manage the best LR himself.

Here the T_max is fixed at 512 for the current dataset that is a 27 x [512 x 768] images (Character LoRA). I use a global batch_size = 4 and num_repeats = 2.

The tutorial use --gradient_accumulation_steps 2 and the use of 4 is experimental here.

num train items / 学習画像、動画数: 54
num batches per epoch / 1epochのバッチ数: 14
num epochs / epoch数: 128
batch size per device / バッチサイズ: 4, 4
gradient accumulation steps / 勾配を合計するステップ数 = 4
total optimization steps / 学習ステップ数: 512

For my current training session, I use the total optimization steps as value for the scheduler T_max.

The training start at an average loss of ~0.180 and start quickly to converge. Using the running_average curve on WanDB, I got a good curve and currently achieve some best loss at epoch 256 for now (~0.086).

The only advice is can tell even if it remains a feeling is that mixing images/videos training is a bad idea. It seem that a little dataset of pictures is perfect for a Character LoRA but losing certain physical attributes and character traits (mouth move, general allure ...)

I'd like to experiment with a method that would involve training a multitude of LoRAs with limited datasets. Portraits, full-body shots, single images in specific poses and attitudes. Then, training on short videos for specific body movements. Next, I'd like to try merging the LoRAs to see if it's worth the effort.

Obviously, large datasets are a bad idea. In my case, 27 images already seem like a bad idea; I think 13 to 17 images seem reasonable, provided you don't overdo it with the same images. Front, profile, close-up, half-body, and full-body are sufficient to achieve good results.

The same goes for video-based training. There's no need to produce long videos. Keeping a few seconds of a movement (1 to 3 seconds) is more than enough. I tried with long, chunked videos, but it had no impact other than slowing down the training and wasting GPU power.

So I'm looking for a method to merge LoRAs trained on the Hunyuan model to experiment with the idea.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Good settings for training Hunyuan LoRA? #302

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Good settings for training Hunyuan LoRA? #302

Uh oh!

Sarania May 23, 2025 Sponsor

Replies: 1 comment

Uh oh!

dorpxam May 26, 2025

Sarania
May 23, 2025
Sponsor

dorpxam
May 26, 2025