Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
info.yml	info.yml
mixed-tied.json	mixed-tied.json
mixed.json	mixed.json
requirements.txt	requirements.txt

Name

Last commit message

Last commit date

Phi-4-mini-instruct Mixed Precision Quantization

This recipe demonstrates how to use Olive to perform mixed precision (INT4/INT4) quantization, export to ONNX, and evaluate using lm-evaluation-harness. Please refer to the Exploring Optimal Quantization Settings for Small Language Models with Olive for more details.

Pre-requisites

Install Olive and other dependencies:

pip install -r requirements.txt

Run

To run the mixed precision quantization recipe, execute the following command:

olive run --config mixed.json

To run the mixed precision quantization with embedding quantization and weight tying, execute the following command:

olive run --config mixed-tied.json

Note: Evaluation requires a machine with CUDA enabled GPU. If you don't have a GPU, you can skip the evaluation step by modifying the config json file to remove the "evaluator": "evaluator" line.

Results

model	arc_challenge	arc_easy	mmlu	hellaswag	mmlu_stem	openbookqa	model_size_gb
Original (fp16)	0.585	0.803	0.669	0.728	0.598	0.426	8.314
Mixed	0.593	0.801	0.664	0.721	0.592	0.424	3.844
Mixed Tied	0.578	0.806	0.649	0.721	0.594	0.426	3.285

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

Phi-4-mini-instruct Mixed Precision Quantization

Pre-requisites

Run

Results

Uh oh!

FilesExpand file tree

olive

Directory actions

More options

Directory actions

More options

Latest commit

History

olive

Folders and files

parent directory

README.md

Phi-4-mini-instruct Mixed Precision Quantization

Pre-requisites

Run

Results