Open
Description
I have pulled the code from branch train. Is there a way to train or fine tune the GPT-2 model with data parallelism on multiple GPUs? Thanks for your help.
Metadata
Metadata
Assignees
Labels
No labels
I have pulled the code from branch train. Is there a way to train or fine tune the GPT-2 model with data parallelism on multiple GPUs? Thanks for your help.