Post-training loss curves and training time estimates for OLMo-2 1B

Hi team,
Would you be able to share the post-training loss curves and training time estimates for each post-training stage of the OLMo-2 1B model on an 8xH100 cluster? I cannot seem to find these in the OLMo-2 paper.

I've reviewed #485 and the OLMo-3 paper (which mentions a 9-day estimate), but both reference either larger models or more GPUs, so I'm hoping to get figures specific to this setup.

Thanks so much for any guidance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Post-training loss curves and training time estimates for OLMo-2 1B #1296

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Post-training loss curves and training time estimates for OLMo-2 1B #1296

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions