Bias quantization for prequantized checkpoints

Following up from a chat with @jainapurva 

For private internal model enablement purposes, we would like to request support for bias quantization in prequantized checkpoint loading. At the moment we are doing manual source transformation after loading the prequantized checkpoint here https://github.com/pytorch/executorch/blob/main/examples/models/llama/source_transformation/pre_quantization.py#L40 into your deprecated `Int8DynActInt4WeightLinear`, which doesn't support bias quantization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bias quantization for prequantized checkpoints #1821

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bias quantization for prequantized checkpoints #1821

Description

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions