Replies: 2 comments 1 reply
-
Beta Was this translation helpful? Give feedback.
0 replies
-
|
@jferments don't use --full_bf16 !! Use --fp8_base and --fp8_scaled. Though I would add that I'm not confident with the multigpu implementation in general here, using diffusion-pipe I get way less VRAM usage and seemingly faster speeds. I'm not technical enough to know why, but this uses accelerate and diffusion-pipe uses deep speed. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment


Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to run a full fine tune on my 2xRTX4090 machine. It has 512GB system RAM so I am fine to use as much memory as needed there to offload layers. But I keep getting CUDA OOM errors when trying to use accelerate to run the Qwen-Image training script from PR #492
I tried to run it both with and without FSDP enabled, and with/without "--blocks_to_swap 24" (I also tried 36, 48, and 59 blocks)... I am getting OOM errors for both of them. I tried both FSDP v1 and v2 .. neither worked. This is my first time using musubi-tuner, so I very well might be doing something stupid in my command:
Here is the command that I am trying to run for my test dataset:
Here is my "accelerate config" choices:
(Side note: one question I had here was whether I should used "TRANSFORMER_BASED_WRAP" instead of "SIZE_BASED_WRAP", but I wasn't sure how to answer the question: "Specify the comma-separated list of transformer layer class names (case-sensitive) to wrap ,e.g, :BertLayer, GPTJBlock, T5Block, BertLayer,BertEmbeddings,BertSelfOutput ...?")
And here is the error I'm getting when I run the command above:
Any ideas how to prevent OOM errors on my machine?
Beta Was this translation helpful? Give feedback.
All reactions