Description
hi, When I run the inference script, I get the following error,how to solve it
k-cache quantization: set model.layers.26.self_attn.apply_rotary_pos_emb_qk_rotation_wrapper as 4-bit 128 groupsize static symmetric quantization
k-cache quantization: set model.layers.27.self_attn.apply_rotary_pos_emb_qk_rotation_wrapper as 4-bit 128 groupsize static symmetric quantization
Loading pre-computed quantized weights...
Traceback (most recent call last):
File "/code/PrefixQuant/eval.py", line 113, in
main()
File "/code/PrefixQuant/eval.py", line 105, in main
load_checkpoint_in_model(model,checkpoint=args.quant_model_path,device_map=device_map,dtype=torch.float16)
File "/root/miniconda3/envs/zxd_prefix/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 1700, in load_checkpoint_in_model
set_module_tensor_to_device(
File "/root/miniconda3/envs/zxd_prefix/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 314, in set_module_tensor_to_device
new_module = getattr(module, split)
File "/root/miniconda3/envs/zxd_prefix/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1931, in getattr
raise AttributeError(
AttributeError: 'Qwen2RMSNorm' object has no attribute 'output_quantizer'