Skip to content

After performing quantization on llama3_2 using tune run quantize --config custom_quantization.yaml, how should I proceed with inference? #2935

@Begoogh

Description

@Begoogh

I am a beginner, and while using the quantize.py script for Quantization-Aware Training (QAT) to convert and quantize my model, I encountered an issue where the saved quantized model only contains the data and not the model structure. As a result, when using the original llama3_2 model structure, there is a mismatch between the model parameters and the model structure. After I manually added some quantized nodes, such as 'layers.0.attn.q_proj.scales', 'layers.0.attn.k_proj.scales', and other quantization nodes to fully match the model parameters, the inference output became garbled. However, the model worked correctly before the quantization conversion. What should I do in this case, or could this indicate a failure in my QAT training?I followed this documentation to perform the quantization:https://meta-pytorch.org/torchtune/main/tutorials/qat_finetune.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions