How long should training take? #178
-
|
I know it depends on the hardware specs like number of GPUs and CPUs and amount of memory, but how long should it take to train a model? I tried training with MP-20 using two Nvidia A100s where each one has 80 GB of memory. I canceled the training after 46 hours and 408 epochs. I used the command shown below for the training. At this rate, it seems like it would take 2-3 days to train this model. Is that a reasonable amount of time or am I doing something wrong with the configuration settings? How long did it take to train the provided models in the repository and what was the hardware specs used to generate the pre-trained models? Edit: In the Supplementary Information for the paper I found the following:
So how long did it take to train with eight A100s? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 7 replies
-
|
Hi @wigging, Training took around 5 hours (one epoch took a bit less than 20 seconds typically). |
Beta Was this translation helpful? Give feedback.
Hi @wigging,
Training took around 5 hours (one epoch took a bit less than 20 seconds typically).