Open
Description
Following up from a chat with @jainapurva
For private internal model enablement purposes, we would like to request support for bias quantization in prequantized checkpoint loading. At the moment we are doing manual source transformation after loading the prequantized checkpoint here https://github.com/pytorch/executorch/blob/main/examples/models/llama/source_transformation/pre_quantization.py#L40 into your deprecated Int8DynActInt4WeightLinear
, which doesn't support bias quantization.
Activity