Possible PrefixQuant issues on Smaller LLMs?

Dear @ChenMnZ,
Thank you for the paper and sharing the code.
I'm very interested about the idea. Especially encouraging is the prospect of reaching great accuracy using static per-channel / tensor quantization!

I evaluated PrefixQuant with a few HF models, and was able to reproduce the good results of your paper with `Llama-2-7B`, `Llama-3-8B`, `Mistral-7B` (and a couple other models).
_[I'm writing this to indicate that my settings are probably OK, and are not the cause for the issues below]_

However, with smaller models (of sizes 0.5B - 3B), the results I'm getting are catastrophic (multi-digit perplexity numbers on WikiText2).
The models I tested that resulted in PPL disasters are:
- `Qwen2-0.5B`
- `Llama-3.2-1B`
- `Qwen2-1.5B-Instruct`
- `Llama-3.2-3B-Instruct`


A couple remarks:
- `Llama-3.2-xx` models are not supported by the older `transformers` library version required by your repo (4.40.1). 
I encountered minor compatibility issues when trying to run your repo with a more recent `transformers` version, but solved them by explicitly configuring `use_cache=True`. But even after solving them, the models would not properly quantize.
- For `Qwen2-0.5B` model, there was an issue with creating the online Hadamard matrix for `down_proj` input (function `get_hadK()` does not support the intermediate feature size). I overcame it by disabling the configuration option `down_online_had`, but it didn't help reach reasonable quantization accuracy.
- As a sanity check, I tried to quantize one of the problematic models listed above with additional fine-tuning - this did not help, either.

Do you have any inputs on what can help SLM models quantize well?

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible PrefixQuant issues on Smaller LLMs? #23

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Possible PrefixQuant issues on Smaller LLMs? #23

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions