for example here:
https://github.com/AniZpZ/AutoSmoothQuant/blob/main/autosmoothquant/models/llama.py#L89
int8_module.q_proj = W8A8BFP32OFP32Linear.from_float(module.q_proj, attn_input_scale,
int8_module.o_proj = W8A8BFP32OFP32LinearWithQuantScale.from_float(
module.o_proj, out_input_scale, act_quant=int8_module.o_quant_type)
Is the difference whether it involvesquant_scale or not?
quant_scale is for activition x and dequant_scale is for weight, right ?