GSPMD "2D Partitioning" Training Speed #1201
Unanswered
agemagician
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I am testing the training speed using 2D Partitioning, and it gives me approximately 9.5 seconds per step, while on the paper, it should be approximately 6.5 giving a similar model size.
I am testing it with "scalable_t5" while using the following related parameters:
Paper "Table 2":
https://arxiv.org/pdf/2105.04663.pdf
@adarob Is it possible to share the configuration that achieved such a speed on TPU V3 ?
Beta Was this translation helpful? Give feedback.
All reactions