Replies: 1 comment
-
|
@exnx sorry I don't understand. FP8 has independent groups to keep reduce is accurate. Could it be a problem of your fp8 group and pipeline group setting ? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Your question
Ask a clear and concise question about Megatron-LM.
Hello, can fp8 and pipeline parallelism be used together? When I try to use both the training gets hung up, and then timed out by NCCL. Training code starts up, but no logging update occurs.
I can use fp8 and model parallelism ok, though.
Curious if anyone else noticed this?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions