Skip to content

Post-training loss curves and training time estimates for OLMo-2 1B #1296

@BrownianNotion

Description

@BrownianNotion

Hi team,
Would you be able to share the post-training loss curves and training time estimates for each post-training stage of the OLMo-2 1B model on an 8xH100 cluster? I cannot seem to find these in the OLMo-2 paper.

I've reviewed #485 and the OLMo-3 paper (which mentions a 9-day estimate), but both reference either larger models or more GPUs, so I'm hoping to get figures specific to this setup.

Thanks so much for any guidance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions