Skip to content

Latest commit

 

History

History
25 lines (18 loc) · 1.23 KB

File metadata and controls

25 lines (18 loc) · 1.23 KB

Qwen2.5-1.5B-Instruct Mixed Precision Quantization

This recipe demonstrates how to use Olive to perform mixed precision (INT4/INT4) quantization, export to ONNX, and evaluate using lm-evaluation-harness. Please refer to the Exploring Optimal Quantization Settings for Small Language Models with Olive for more details.

Pre-requisites

Install Olive and other dependencies:

pip install -r requirements.txt

Run

To run the mixed precision quantization recipe, execute the following command:

olive run --config mixed.json

Note: Evaluation requires a machine with CUDA enabled GPU. If you don't have a GPU, you can skip the evaluation step by modifying the mixed.json file to remove the "evaluator": "evaluator" line.

Results

model arc_challenge arc_easy mmlu hellaswag mmlu_stem openbookqa model_size_gb
Original (fp16) 0.465 0.760 0.601 0.683 0.539 0.404 3.318
Mixed 0.487 0.772 0.592 0.670 0.533 0.410 1.479