Benchmark LLM

**Is your feature request related to a problem? Please describe.**
We have grown to support quite a few PTQ techniques within our LLM entrypoint, with even more possible combinations of them.
Although some minor benchmarking has been performed, it would be good to do systematic runs and understand what works with what, what to avoid, etc.

**Describe the solution you'd like**
An extensive search is not feasible, a few suggestions:
- Weight Only 4b/8b, W8A8, W4A8, W4A4
- MXFp8/6/4 for Weights/Activations
- Combination of HQO for zero point + MSE for scale (might require to write custom quantizers)
- GPxQ (with/without HQO, also with/without MSE), weight only/weight + activations
- GPxQ (as above) with/without activation quantization

Few suggestions on the model side:
- Llama 3.1/3.2
- Mistral
- Phi3
- MoE (currently untested)
- ...

**Additional context**
Reach out for further clarifications.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark LLM #1054

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmark LLM #1054

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions