Commit e74cd86
[QNN-EP] Add MatMulNBits translation for GPU (microsoft#26340)
### Description
Add support for translation of MatMulNBits contrib op to
QNN with FullyConnected operation with INT4 BlockQuantized weights
Implementation details:
- Translate MatMulNBits to FullyConnected in OpBuilder
- Support QNN_QUANTIZATION_ENCODING_BLOCK for INT4 weights
- Pass INT4 weights and quant params as BlockQuantization encoding
params in QNN
Testing:
- Added new unit tests for MNB -> QNN-GPU
- Validated all OnnxRuntime tests
- Validated the following LLMs through Olive and ORT-GenAI execution
flow
- LlaMA3.2 1B
- Qwen2.5
- DeepSeek-R1-Qwen 1.5b
- Phi3.5-mini-instruct
### Motivation and Context
LLMs with INT4 quantization pass in Olive will generate a model with
MatMulMBits contrib ops.
To run these ops via QNN-EP, MatMulNBits is translated to QNN
FullyConnected op with INT4 weights.
---------
Co-authored-by: tirupath-qti <tirupath@qti.qualcomm.com>1 parent 4aad8c7 commit e74cd86
File tree
11 files changed
+568
-16
lines changed- onnxruntime
- core/providers/qnn/builder
- opbuilder
- test/contrib_ops
11 files changed
+568
-16
lines changedLines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
227 | 227 | | |
228 | 228 | | |
229 | 229 | | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
230 | 234 | | |
231 | 235 | | |
232 | 236 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
127 | 127 | | |
128 | 128 | | |
129 | 129 | | |
| 130 | + | |
130 | 131 | | |
131 | 132 | | |
132 | 133 | | |
| |||
Lines changed: 381 additions & 0 deletions
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
456 | 456 | | |
457 | 457 | | |
458 | 458 | | |
459 | | - | |
| 459 | + | |
460 | 460 | | |
461 | 461 | | |
462 | 462 | | |
| |||
Lines changed: 6 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
881 | 881 | | |
882 | 882 | | |
883 | 883 | | |
884 | | - | |
| 884 | + | |
| 885 | + | |
885 | 886 | | |
886 | 887 | | |
887 | 888 | | |
| |||
891 | 892 | | |
892 | 893 | | |
893 | 894 | | |
894 | | - | |
895 | | - | |
| 895 | + | |
| 896 | + | |
| 897 | + | |
896 | 898 | | |
897 | 899 | | |
898 | 900 | | |
899 | | - | |
| 901 | + | |
900 | 902 | | |
901 | 903 | | |
902 | 904 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
245 | 245 | | |
246 | 246 | | |
247 | 247 | | |
248 | | - | |
| 248 | + | |
| 249 | + | |
249 | 250 | | |
250 | 251 | | |
251 | 252 | | |
| |||
Lines changed: 79 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
23 | 26 | | |
24 | | - | |
| 27 | + | |
25 | 28 | | |
26 | 29 | | |
27 | 30 | | |
| |||
30 | 33 | | |
31 | 34 | | |
32 | 35 | | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
33 | 39 | | |
34 | | - | |
| 40 | + | |
35 | 41 | | |
36 | 42 | | |
37 | 43 | | |
| |||
156 | 162 | | |
157 | 163 | | |
158 | 164 | | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
159 | 198 | | |
160 | 199 | | |
161 | 200 | | |
| |||
195 | 234 | | |
196 | 235 | | |
197 | 236 | | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
198 | 249 | | |
199 | 250 | | |
200 | 251 | | |
| |||
208 | 259 | | |
209 | 260 | | |
210 | 261 | | |
211 | | - | |
| 262 | + | |
212 | 263 | | |
213 | 264 | | |
214 | 265 | | |
| |||
278 | 329 | | |
279 | 330 | | |
280 | 331 | | |
281 | | - | |
| 332 | + | |
282 | 333 | | |
283 | 334 | | |
284 | 335 | | |
| |||
291 | 342 | | |
292 | 343 | | |
293 | 344 | | |
294 | | - | |
| 345 | + | |
295 | 346 | | |
296 | 347 | | |
297 | 348 | | |
| |||
301 | 352 | | |
302 | 353 | | |
303 | 354 | | |
304 | | - | |
| 355 | + | |
305 | 356 | | |
306 | 357 | | |
307 | 358 | | |
| |||
310 | 361 | | |
311 | 362 | | |
312 | 363 | | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
313 | 386 | | |
314 | 387 | | |
315 | 388 | | |
| |||
Lines changed: 17 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
37 | 42 | | |
38 | 43 | | |
39 | 44 | | |
40 | 45 | | |
41 | | - | |
| 46 | + | |
42 | 47 | | |
43 | 48 | | |
44 | 49 | | |
| |||
67 | 72 | | |
68 | 73 | | |
69 | 74 | | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
70 | 80 | | |
71 | 81 | | |
72 | 82 | | |
| |||
163 | 173 | | |
164 | 174 | | |
165 | 175 | | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
166 | 182 | | |
167 | 183 | | |
168 | 184 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| 39 | + | |
39 | 40 | | |
40 | 41 | | |
41 | 42 | | |
| 43 | + | |
42 | 44 | | |
43 | 45 | | |
44 | 46 | | |
| |||
105 | 107 | | |
106 | 108 | | |
107 | 109 | | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
108 | 117 | | |
109 | 118 | | |
110 | 119 | | |
111 | | - | |
112 | | - | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
113 | 129 | | |
114 | 130 | | |
115 | 131 | | |
| |||
999 | 1015 | | |
1000 | 1016 | | |
1001 | 1017 | | |
1002 | | - | |
| 1018 | + | |
1003 | 1019 | | |
1004 | 1020 | | |
1005 | 1021 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
| 81 | + | |
81 | 82 | | |
| 83 | + | |
82 | 84 | | |
83 | 85 | | |
84 | 86 | | |
| |||
0 commit comments