Hi team,
Would you be able to share the post-training loss curves and training time estimates for each post-training stage of the OLMo-2 1B model on an 8xH100 cluster? I cannot seem to find these in the OLMo-2 paper.
I've reviewed #485 and the OLMo-3 paper (which mentions a 9-day estimate), but both reference either larger models or more GPUs, so I'm hoping to get figures specific to this setup.
Thanks so much for any guidance!
Hi team,
Would you be able to share the post-training loss curves and training time estimates for each post-training stage of the OLMo-2 1B model on an 8xH100 cluster? I cannot seem to find these in the OLMo-2 paper.
I've reviewed #485 and the OLMo-3 paper (which mentions a 9-day estimate), but both reference either larger models or more GPUs, so I'm hoping to get figures specific to this setup.
Thanks so much for any guidance!