-
-
Notifications
You must be signed in to change notification settings - Fork 166
Fixing multiple GPU Qwen Image Fine tuning training #674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fixing multiple GPU Qwen Image Fine tuning training #674
Conversation
|
@kohya-ss good news i tested trained on this (1x gpu) and works perfect no issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that this file has been unintentionally modified.
src/musubi_tuner/hv_train_network.py
Outdated
| if args.ddp_gradient_as_bucket_view or args.ddp_static_graph | ||
| else None | ||
| DistributedDataParallelKwargs( | ||
| find_unused_parameters=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the PyTorch documentation, specifying find_unused_parameters=True when it is not necessary will slow down the training:
https://docs.pytorch.org/docs/stable/notes/ddp.html#internal-design
Therefore, as with other DDP-related options, it would be preferable to be able to specify it as an argument (for example, --ddp_find_unused_parameters).
src/musubi_tuner/hv_train_network.py
Outdated
| # Ensure DDP is properly configured for models with unused parameters | ||
| if hasattr(transformer, 'module') and hasattr(transformer.module, 'find_unused_parameters'): | ||
| transformer.module.find_unused_parameters = True | ||
| elif hasattr(transformer, 'find_unused_parameters'): | ||
| transformer.find_unused_parameters = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be no point in overriding find_unused_parameters here, it will already be True if configured correctly.
src/musubi_tuner/hv_train_network.py
Outdated
| unwrapped_transformer = accelerator.unwrap_model(transformer) | ||
| logger.info(f"DiT dtype: {unwrapped_transformer.dtype}, device: {unwrapped_transformer.device}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defining a new local variable unwrapped_transformer may prevent garbage collection later, it is better to call it directly: accelerator.unwrap_model(transformer).dtype and accelerator.unwrap_model(transformer).device.
…ddp_find_unused_parameters argument to hv_train.py and hv_train_network.py - Remove unnecessary override of find_unused_parameters after accelerator.prepare() - Fix unwrapped_transformer variable to prevent garbage collection issues - Revert unintentional changes to qwen_image_train_network.py - Make DDP configuration consistent across all training scripts
|
@kohya-ss i just tried to make the changes you requested I am ok with any way of fixing thank you |
…ddp_find_unused_parameters argument to hv_train.py and hv_train_network.py - Remove unnecessary override of find_unused_parameters after accelerator.prepare() - Fix unwrapped_transformer variable to prevent garbage collection issues - Keep intentional fix in qwen_image_train_network.py for multi-GPU training - Make DDP configuration consistent across all training scripts
|
Thank you for the update. Ruff's lint check is reporting an error, so please format it using Also, it seems that parts of |
sure i will work on that but have you seen this error? Error 2
|
The modification on |
I am not 100% sure if this is accurate way but after this i was able to train
Haven't tested result of training yet
This fixes issue mentioned here : #672