Skip to content

Possible PrefixQuant issues on Smaller LLMs? #23

Open
@sasha-hailo

Description

@sasha-hailo

Dear @ChenMnZ,
Thank you for the paper and sharing the code.
I'm very interested about the idea. Especially encouraging is the prospect of reaching great accuracy using static per-channel / tensor quantization!

I evaluated PrefixQuant with a few HF models, and was able to reproduce the good results of your paper with Llama-2-7B, Llama-3-8B, Mistral-7B (and a couple other models).
[I'm writing this to indicate that my settings are probably OK, and are not the cause for the issues below]

However, with smaller models (of sizes 0.5B - 3B), the results I'm getting are catastrophic (multi-digit perplexity numbers on WikiText2).
The models I tested that resulted in PPL disasters are:

  • Qwen2-0.5B
  • Llama-3.2-1B
  • Qwen2-1.5B-Instruct
  • Llama-3.2-3B-Instruct

A couple remarks:

  • Llama-3.2-xx models are not supported by the older transformers library version required by your repo (4.40.1).
    I encountered minor compatibility issues when trying to run your repo with a more recent transformers version, but solved them by explicitly configuring use_cache=True. But even after solving them, the models would not properly quantize.
  • For Qwen2-0.5B model, there was an issue with creating the online Hadamard matrix for down_proj input (function get_hadK() does not support the intermediate feature size). I overcame it by disabling the configuration option down_online_had, but it didn't help reach reasonable quantization accuracy.
  • As a sanity check, I tried to quantize one of the problematic models listed above with additional fine-tuning - this did not help, either.

Do you have any inputs on what can help SLM models quantize well?

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions