Replies: 2 comments
-
|
Tried different batch size
Bigger batch size clearly makes training faster. Not sure whether it will affect final quality.
|
Beta Was this translation helpful? Give feedback.
-
Although the training results have not been verified, it is likely that the training quality will also improve based on the quality of the inference. You may have already seen it, but please also see #564 . |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Did some testing with Qwen Image (not Edit) Lora args. I have an RTX 3090TI with 24GB VRAM.
Qwen training docs suggest using
--blocks_to_swap 16 --fp8_base --fp8_scaledMy results for VRAM usage and iteration time:
--blocks_to_swap 16 --fp8_base --fp8_scaled--blocks_to_swap 8 --fp8_base --fp8_scaled--blocks_to_swap 8 --fp8_base--fp8_base--fp8_base --fp8_scaledNote that ~750MB of VRAM was used already used by Windows.
So I am able to train on 24GB with no block swapping.
--fp8_baseseems to be fastest if you consider sampling.Should I be always be using
--fp8_scaledwith--fp8_base, because of some quality implications? Because it seems to be adding ~700MB instead of reducing VRAM usage.Full command:
accelerate launch --num_cpu_threads_per_process 1 --mixed_precision bf16 src/musubi_tuner/qwen_im age_train_network.py --dit "F:\musubi-tuner\TRAINING\models\qwen_image_bf16.safetensors" --vae "F:\musubi-tuner\TRAINING \models\diffusion_pytorch_model.safetensors" --text_encoder "F:\musubi-tuner\TRAINING\models\qwen_2.5_vl_7b.safetensors" --dataset_config "F:\musubi-tuner\TRAINING\dataset.toml" --output_dir F:\musubi-tuner\TRAINING\output --output_name my_ qwen_lora --network_module networks.lora_qwen_image --mixed_precision bf16 --gradient_checkpointing --optimizer_type ada mw8bit --network_dim 16 --max_train_epochs 16 --save_every_n_epochs 1 --max_data_loader_n_workers 2 --persistent_data_lo ader_workers --seed 42 --sample_prompts "F:\musubi-tuner\TRAINING\prompts.txt" --sample_every_n_epochs 1 --learning_rat e 2e-4 --sdpa --network_weights "F:\musubi-tuner\TRAINING\output\820.safetensors" <insert specific args from table above here>Beta Was this translation helpful? Give feedback.
All reactions