the error when save optimizer state

Hi,

I have finished reading your excellent paper and am trying to run and reproduce your code. I would like to ask a few questions:

1. I noticed that the saved weights are still stored in FP32 format. Why are they not saved in FP8?
2. Apart from the optimizer states, where should I look for the code implementation of layers and operators that use FP8? Is it possible for me to observe gradients, activations, weights, etc., in FP8 format in the model architecture?
3. I noticed that in the paper, you mentioned training on the MAmmoth dataset with llama2-7B for 3 epochs and then testing its performance on four datasets, including NumGLUE. May I ask which testing toolkit you used for evaluation?
4. How should I evaluate the memory footprint it occupies?


Thank you very much for your time!

Best Regards!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

the error when save optimizer state #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

the error when save optimizer state #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions