Open
Description
Hi,
I am attempting to run the quantized model in real quant mode (as opposed to using fake quant). Is utilizing the load_quantized_model
function from quantize.int_linear_real
the correct approach to load the model? I am encountering issues executing this function successfully. Furthermore, if the accuracy in eval.py is based on fake quant (running in FP16), can we expect the same accuracy when running the W8A8 quant model?
Metadata
Metadata
Assignees
Labels
No labels