Hi there!
Thank you so much for all your work; it is incredibly helpful for the community.
I have a question I’d like to ask: during the pre-training phase of OLMo-3, is there any way to identify which documents were used to train the model up to a specific checkpoint?
More specifically, I’m interested in the Wikipedia articles used during training. If I download a specific checkpoint (for example, checkpoint 3000), is there a way to know exactly which documents from the Wikipedia subset were seen by the model up to that point?
Thanks in advance for your help!
Best regards,
Mario
Hi there!
Thank you so much for all your work; it is incredibly helpful for the community.
I have a question I’d like to ask: during the pre-training phase of OLMo-3, is there any way to identify which documents were used to train the model up to a specific checkpoint?
More specifically, I’m interested in the Wikipedia articles used during training. If I download a specific checkpoint (for example, checkpoint 3000), is there a way to know exactly which documents from the Wikipedia subset were seen by the model up to that point?
Thanks in advance for your help!
Best regards,
Mario