-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
Hi,
I have finished reading your excellent paper and am trying to run and reproduce your code. I would like to ask a few questions:
- I noticed that the saved weights are still stored in FP32 format. Why are they not saved in FP8?
- Apart from the optimizer states, where should I look for the code implementation of layers and operators that use FP8? Is it possible for me to observe gradients, activations, weights, etc., in FP8 format in the model architecture?
- I noticed that in the paper, you mentioned training on the MAmmoth dataset with llama2-7B for 3 epochs and then testing its performance on four datasets, including NumGLUE. May I ask which testing toolkit you used for evaluation?
- How should I evaluate the memory footprint it occupies?
Thank you very much for your time!
Best Regards!
Metadata
Metadata
Assignees
Labels
No labels