Skip to content

Benchmark LLM #1054

Open
Open
@Giuseppe5

Description

@Giuseppe5

Is your feature request related to a problem? Please describe.
We have grown to support quite a few PTQ techniques within our LLM entrypoint, with even more possible combinations of them.
Although some minor benchmarking has been performed, it would be good to do systematic runs and understand what works with what, what to avoid, etc.

Describe the solution you'd like
An extensive search is not feasible, a few suggestions:

  • Weight Only 4b/8b, W8A8, W4A8, W4A4
  • MXFp8/6/4 for Weights/Activations
  • Combination of HQO for zero point + MSE for scale (might require to write custom quantizers)
  • GPxQ (with/without HQO, also with/without MSE), weight only/weight + activations
  • GPxQ (as above) with/without activation quantization

Few suggestions on the model side:

  • Llama 3.1/3.2
  • Mistral
  • Phi3
  • MoE (currently untested)
  • ...

Additional context
Reach out for further clarifications.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions