Skip to content

the error when save optimizer state #21

@zyuanzi

Description

@zyuanzi

Hi,

I have finished reading your excellent paper and am trying to run and reproduce your code. I would like to ask a few questions:

  1. I noticed that the saved weights are still stored in FP32 format. Why are they not saved in FP8?
  2. Apart from the optimizer states, where should I look for the code implementation of layers and operators that use FP8? Is it possible for me to observe gradients, activations, weights, etc., in FP8 format in the model architecture?
  3. I noticed that in the paper, you mentioned training on the MAmmoth dataset with llama2-7B for 3 epochs and then testing its performance on four datasets, including NumGLUE. May I ask which testing toolkit you used for evaluation?
  4. How should I evaluate the memory footprint it occupies?

Thank you very much for your time!

Best Regards!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions