Skip to content

Commit cd61fed

Browse files
rM-planetRonak Mahawar
andauthored
Phi4 mini instruct QNN recipie (#517)
Co-authored-by: Ronak Mahawar <rmahawar@qti.qualcomm.com>
1 parent 7b0bade commit cd61fed

14 files changed

Lines changed: 4562 additions & 0 deletions

File tree

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# Phi-4-mini-instruct Model Optimization
2+
3+
This directory demonstrates the optimization of the [Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct) model using various AIMET quantization techniques.
4+
5+
## Overview
6+
7+
After quantization, the QAIRT GenAIBuilder API is utilized to apply additional model transformations, perform conversion, and compile the model for execution on the HTP backend.
8+
9+
Finally, a prepared QAIRT DLC is encapsulated in an ONNX protobuf and exported to a directory compatible with onnxruntime-genai.
10+
11+
## Requirements
12+
13+
**Validated host configuration:**
14+
* Ubuntu 22.04
15+
* Python 3.10.12
16+
* qairt-dev 0.8.1
17+
* QAIRT 2.45.40
18+
19+
**Validated target configuration:**
20+
* HTP backend on SC8480XP
21+
22+
Other configurations may work but have not been validated.
23+
24+
## Preparation Instructions
25+
26+
1. Authenticate with Hugging Face
27+
28+
```bash
29+
huggingface-cli login # Recommended: stores credentials securely, avoids shell history
30+
# Alternative: export HF_TOKEN=<your_hugging_face_token>
31+
```
32+
33+
2. Prepare Environment
34+
35+
```bash
36+
pip install --no-deps -r requirements.txt
37+
pip install --no-build-isolation git+https://github.com/microsoft/Olive.git@f7efd41ab24a2eb07be7edc6d84d0f6304b46598
38+
pip install --no-deps qairt-dev==0.8.1 # Install the proper qairt-dev version, if not installed
39+
```
40+
41+
3. Use qairt-vm to install a non-default version of QAIRT and set QAIRT_SDK_ROOT
42+
43+
```bash
44+
# List available QAIRT SDK versions
45+
qairt-vm fetch --list
46+
47+
# Download non-default version of QAIRT SDK
48+
qairt-vm fetch -v <version>
49+
50+
# Set QAIRT_SDK_ROOT to download location of QAIRT SDK
51+
# By default, /opt/qcom/aistack/qairt/<version>
52+
# Note: No further QAIRT SDK installation steps are required when using qairt-dev
53+
export QAIRT_SDK_ROOT=/path/to/qairt/sdk
54+
```
55+
56+
4. Run Olive recipe
57+
58+
```bash
59+
# For X Elite:
60+
olive run --config htp_sc8380xp.json
61+
```
62+
63+
## Execution Instructions
64+
65+
The output of the above olive recipe is a directory compatible with the following versions of onnxruntime-genai and onnxruntime-qnn.
66+
67+
```bash
68+
pip install onnxruntime-genai>=0.13
69+
pip install onnxruntime-qnn>=2.1.0
70+
```
71+
72+
Please see the following script in the onnxruntime-genai repository for [an example of how to run this model directory](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/model-qa.py).
73+
74+
## Known Issues
75+
76+
### `AttributeError: module 'pydantic._internal._typing_extra' has no attribute 'add_module_globals'`
77+
78+
This error can occasionally occur on the first invocation of the recipe. If encountered, re-running the recipe is sufficient as a workaround.
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
{
2+
"module_list": [
3+
{
4+
"module_name": "QuantizedRmsNorm",
5+
"exceptions": {
6+
"param_exceptions": { "asymmetric": true, "bitwidth": 16 },
7+
"input_exceptions": null,
8+
"output_exceptions": null
9+
}
10+
}
11+
],
12+
"name_list": [
13+
{
14+
"module_name": "\\w*model_embed_tokens_Gather",
15+
"exceptions": {
16+
"param_exceptions": { "bitwidth": 16, "asymmetric": true },
17+
"input_exceptions": null,
18+
"output_exceptions": null
19+
}
20+
},
21+
{
22+
"module_name": "\\w*lm_head_(MatMul|conv_Conv|conv2d_Conv|Conv)",
23+
"exceptions": {
24+
"param_exceptions": { "bitwidth": 16 },
25+
"input_exceptions": null,
26+
"output_exceptions": null
27+
}
28+
},
29+
{
30+
"module_name": "\\w*norm_(Mul_1|Mul_1.module)",
31+
"exceptions": {
32+
"param_exceptions": null,
33+
"input_exceptions": [ { "input_index": 0, "bitwidth": 16, "asymmetric": true } ],
34+
"output_exceptions": null
35+
}
36+
},
37+
{
38+
"module_name": "\\w*norm_(Pow|Pow.module|ReduceMean|Add|Sqrt|Div|Mul)",
39+
"exceptions": {
40+
"param_exceptions": null,
41+
"input_exceptions": null,
42+
"output_exceptions": [ { "output_index": 0, "enabled": false } ]
43+
}
44+
},
45+
{
46+
"module_name": "\\w*self_attn_Concat_1",
47+
"exceptions": {
48+
"param_exceptions": null,
49+
"input_exceptions": null,
50+
"output_exceptions": [ { "output_index": 0, "bitwidth": 16, "asymmetric": false } ]
51+
}
52+
},
53+
{
54+
"module_name": "\\w*self_attn_Concat_4",
55+
"exceptions": {
56+
"param_exceptions": null,
57+
"input_exceptions": null,
58+
"output_exceptions": [ { "output_index": 0, "bitwidth": 16, "asymmetric": false } ]
59+
}
60+
},
61+
{
62+
"module_name": "\\w*v_proj_(MatMul|conv_Conv|conv2d_Conv|Conv)(\\.base_layer)?",
63+
"exceptions": {
64+
"param_exceptions": null,
65+
"input_exceptions": null,
66+
"output_exceptions": [ { "output_index": 0, "bitwidth": 16, "asymmetric": false } ]
67+
}
68+
}
69+
]
70+
}

0 commit comments

Comments
 (0)