Commit 35add0e
committed
This commit adds model-side integration for the previously introduced experimental AscendC fused operator DispatchGmmCombineDecode, used in MoE decoding.
The operator implementation itself was added in a prior PR #4139.
This change only adapts the model execution path to optionally use the fused operator.
When the environment variable VLLM_ASCEND_ENABLE_FUSED_MC2=1 is set, the original MC2 path composed of multiple operators (A8W8 dispatch → GMM → SwiGLU → GMM → combine) is replaced by the single fused operator DispatchGmmCombineDecode.
By default, the existing multi-operator MC2 implementation is preserved.
Signed-off-by: wangqiankun <[email protected]>1 parent df7e0fe commit 35add0e
File tree
6 files changed
+85
-6
lines changed- vllm_ascend
- ops
- fused_moe
- quantization
- worker
6 files changed
+85
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| 27 | + | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
132 | 132 | | |
133 | 133 | | |
134 | 134 | | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
135 | 138 | | |
136 | 139 | | |
137 | 140 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
| 49 | + | |
| 50 | + | |
49 | 51 | | |
50 | 52 | | |
51 | 53 | | |
| |||
315 | 317 | | |
316 | 318 | | |
317 | 319 | | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
250 | 250 | | |
251 | 251 | | |
252 | 252 | | |
253 | | - | |
| 253 | + | |
254 | 254 | | |
255 | 255 | | |
256 | 256 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
235 | 235 | | |
236 | 236 | | |
237 | 237 | | |
| 238 | + | |
| 239 | + | |
238 | 240 | | |
239 | 241 | | |
240 | 242 | | |
| |||
244 | 246 | | |
245 | 247 | | |
246 | 248 | | |
247 | | - | |
| 249 | + | |
248 | 250 | | |
249 | 251 | | |
250 | 252 | | |
| |||
284 | 286 | | |
285 | 287 | | |
286 | 288 | | |
| 289 | + | |
| 290 | + | |
287 | 291 | | |
288 | 292 | | |
289 | 293 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1418 | 1418 | | |
1419 | 1419 | | |
1420 | 1420 | | |
1421 | | - | |
1422 | | - | |
1423 | | - | |
1424 | | - | |
| 1421 | + | |
| 1422 | + | |
| 1423 | + | |
| 1424 | + | |
| 1425 | + | |
| 1426 | + | |
| 1427 | + | |
| 1428 | + | |
| 1429 | + | |
1425 | 1430 | | |
1426 | 1431 | | |
1427 | 1432 | | |
| |||
0 commit comments