-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
I'm trying to quantize Llama2 7b using the instructions in the readme, but get this:
start trans into int8, this might take a while
Instantiating Int8LlamaAttention without passing `layer_idx` is not recommended and will to errors during the forward call, if caching is used. Please make sure to provide a `layer_idx` when creating this class.
Traceback (most recent call last):
File "/home/anon/disk1/AutoSmoothQuant/autosmoothquant/examples/smoothquant_model.py", line 117, in <module>
main()
File "/home/anon/micromamba/envs/testing/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/anon/disk1/AutoSmoothQuant/autosmoothquant/examples/smoothquant_model.py", line 112, in main
int8_model = quant_model_class.from_float(model, decoder_layer_scales, quant_config)
File "/home/anon/disk1/AutoSmoothQuant/autosmoothquant/models/llama.py", line 245, in from_float
int8_module.model = Int8LlamaModel.from_float(
File "/home/anon/disk1/AutoSmoothQuant/autosmoothquant/models/llama.py", line 216, in from_float
int8_module.layers[i] = Int8LlamaDecoderLayer.from_float(
File "/home/anon/disk1/AutoSmoothQuant/autosmoothquant/models/llama.py", line 174, in from_float
int8_module.input_layernorm = Int8LlamaRMSNorm.from_float(
File "/home/anon/disk1/AutoSmoothQuant/autosmoothquant/models/llama.py", line 27, in from_float
int8_module.weight = module.weight / output_scale
File "/home/anon/micromamba/envs/testing/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1708, in __setattr__
raise TypeError(f"cannot assign '{torch.typename(value)}' as parameter '{name}' "
TypeError: cannot assign 'torch.cuda.HalfTensor' as parameter 'weight' (torch.nn.Parameter or None expected)
The scales generate correctly.
Metadata
Metadata
Assignees
Labels
No labels