Commit a86f7d3
Stable Diffusion 3.x and Flux Optimization (#22986)
### Description
It has dependency on the following PRs:
- #23297
Optimize the ONNX pipeline for Stable Diffusion 3.x and Flux 1.0 models
(fp32 or fp16).
- [x] Update optimize_pipeline script
- [x] Update benchmkark script
- [x] Update document about Stable Diffusion 3.x and Flux 1.0 models
- [x] Add graph optimizations for MMDit model
- [x] FastGelu fusion
- [x] RMSNorm fusion
- [x] MultiHeadAttention fusion
- [x] Add graph optimizations for Flux transformer models
- [x] MultiHeadAttention fusion
- [x] Update graph optimizations for t5
- [x] Add tests
Optimize the ONNX pipeline for Stable Diffusion 3.x and Flux 1.0 models:
```
python optimize_pipeline.py -i ./flux1_schnell_onnx/fp32 -o ./flux1_schnell_onnx/fp16 --float16
Optimize flux1_schnell_onnx/fp32/transformer/model.onnx ...
Fused LayerNormalization: 115
Fused SimplifiedLayerNormalization: 152
Fused FastGelu: 76
Fused MultiHeadAttention: 57
```
### H100 Benchmark Results
* GPU: NVIDIA H100 80GB HBM3
* Image Size: 1024x1024
* Batch Size: 1
Model | Steps | Precision | Engine | Latency (Seconds) | GPU Memory (MB)
-- | -- | -- | -- | -- | --
Flux 1.0 Dev | 50 | BF16 | Torch 2.5.1 (compile) | 8.198 | 37,603
Flux 1.0 Dev | 50 | FP16+BF16 | Optimum (ORT) | 10.762 | 41,469
Flux 1.0 Dev | 50 | FP16+FP32 | Optimum (ORT) | 10.891 | 43,545
Flux 1.0 Dev | 50 | BF16 | Torch 2.5.1 (eager) | 12.339 | 36,651
Flux 1.0 Schnell | 4 | BF16 | Torch 2.5.1 (compile) | 0.775 | 37,857
Flux 1.0 Schnell | 4 | FP16+BF16 | Optimum (ORT) | 0.931 | 41,433
Flux 1.0 Schnell | 4 | FP16+FP32 | Optimum (ORT) | 0.939 | 43,809
Flux 1.0 Schnell | 4 | BF16 | Torch 2.5.1 (eager) | 1.120 | 36,629
SD 3.5 Large | 50 | BF16 | Torch 2.5.1 (compile) | 7.466 | 32,217
SD 3.5 Large | 50 | FP16+BF16 | Optimum (ORT) | 10.275 | 36,609
SD 3.5 Large | 50 | FP16+FP32 | Optimum (ORT) | 10.283 | 36,729
SD 3.5 Large | 50 | BF16 | Torch 2.5.1 (eager) | 11.615 | 31,517
SD 3.5 Medium | 50 | BF16 | Torch 2.5.1 (compile) | 3.240 | 21,143
SD 3.5 Medium | 50 | FP16+BF16 | Optimum (ORT) | 4.799 | 25,097
SD 3.5 Medium | 50 | FP16+FP32 | Optimum (ORT) | 4.838 | 25,109
SD 3.5 Medium | 50 | BF16 | Torch 2.5.1 (eager) | 5.582 | 20,489
### A100 Benchmark Results
* GPU: A100-SXM4-80GB
* Image Size: 1024x1024
* Batch Size: 1
Model | Steps | Precision | Engine | Latency (Seconds) | GPU Memory (MB)
-- | -- | -- | -- | -- | --
Flux 1.0 Dev | 50 | BF16 | Torch 2.5.1 (compile) | 17.593 | 37,723
Flux 1.0 Dev | 50 | FP16+BF16 | Optimum (ORT) | 21.918 | 41,348
Flux 1.0 Dev | 50 | FP16+FP32 | Optimum (ORT) | 22.060 | 44,860
Flux 1.0 Dev | 50 | BF16 | Torch 2.5.1 (eager) | 24.267 | 36,847
Flux 1.0 Schnell | 4 | BF16 | Torch 2.5.1 (compile) | 1.627 | 37,881
Flux 1.0 Schnell | 4 | FP16+BF16 | Optimum (ORT) | 1.884 | 41,537
Flux 1.0 Schnell | 4 | FP16+FP32 | Optimum (ORT) | 1.902 | 44,858
Flux 1.0 Schnell | 4 | BF16 | Torch 2.5.1 (eager) | 2.162 | 36,831
SD 3.5 Large | 50 | BF16 | Torch 2.5.1 (compile) | 15.881 | 32,307
SD 3.5 Large | 50 | FP16+FP32 | Optimum (ORT) | 19.837 | 36,451
SD 3.5 Large | 50 | FP16+BF16 | Optimum (ORT) | 19.964 | 36,461
SD 3.5 Large | 50 | BF16 | Torch 2.5.1 (eager) | 22.477 | 31,513
SD 3.5 Medium | 50 | BF16 | Torch 2.5.1 (compile) | 6.476 | 21,341
SD 3.5 Medium | 50 | FP16+FP32 | Optimum (ORT) | 8.775 | 25,183
SD 3.5 Medium | 50 | BF16 | Torch 2.5.1 (eager) | 10.057 | 20,433
### Future Works
* Triton kernel for matrix multiplication and auto tuning.
* FP8/Int8 quantization
### Motivation and Context
SD 3.5 Architecture:
https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/resolve/main/mmdit-x.png1 parent 8e4253d commit a86f7d3
File tree
19 files changed
+2089
-525
lines changed- onnxruntime
- contrib_ops/cuda/bert
- python/tools/transformers
- models/stable_diffusion
- test/python/transformers
19 files changed
+2089
-525
lines changedLines changed: 25 additions & 36 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
125 | 125 | | |
126 | 126 | | |
127 | 127 | | |
128 | | - | |
129 | | - | |
130 | | - | |
131 | | - | |
132 | | - | |
133 | | - | |
134 | | - | |
135 | | - | |
136 | | - | |
137 | | - | |
138 | | - | |
139 | | - | |
140 | | - | |
141 | | - | |
142 | | - | |
143 | | - | |
144 | | - | |
145 | | - | |
146 | | - | |
147 | | - | |
148 | | - | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
162 | | - | |
163 | | - | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
164 | 153 | | |
165 | 154 | | |
166 | 155 | | |
| |||
Lines changed: 10 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| 40 | + | |
| 41 | + | |
40 | 42 | | |
41 | | - | |
42 | 43 | | |
43 | 44 | | |
44 | | - | |
| 45 | + | |
| 46 | + | |
45 | 47 | | |
46 | 48 | | |
47 | 49 | | |
48 | 50 | | |
49 | 51 | | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
50 | 57 | | |
51 | 58 | | |
52 | 59 | | |
| |||
66 | 73 | | |
67 | 74 | | |
68 | 75 | | |
| 76 | + | |
69 | 77 | | |
70 | 78 | | |
71 | 79 | | |
| |||
Lines changed: 0 additions & 39 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
355 | 355 | | |
356 | 356 | | |
357 | 357 | | |
358 | | - | |
359 | | - | |
360 | | - | |
361 | | - | |
362 | | - | |
363 | | - | |
364 | | - | |
365 | | - | |
366 | | - | |
367 | | - | |
368 | | - | |
369 | | - | |
370 | | - | |
371 | | - | |
372 | | - | |
373 | | - | |
374 | | - | |
375 | | - | |
376 | | - | |
377 | | - | |
378 | | - | |
379 | | - | |
380 | | - | |
381 | | - | |
382 | | - | |
383 | | - | |
384 | | - | |
385 | | - | |
386 | | - | |
387 | | - | |
388 | | - | |
389 | | - | |
390 | | - | |
391 | | - | |
392 | | - | |
393 | | - | |
394 | | - | |
395 | | - | |
396 | | - | |
397 | 358 | | |
398 | 359 | | |
399 | 360 | | |
| |||
Lines changed: 122 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
29 | 32 | | |
30 | 33 | | |
31 | 34 | | |
| |||
358 | 361 | | |
359 | 362 | | |
360 | 363 | | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
| 87 | + | |
87 | 88 | | |
88 | 89 | | |
89 | 90 | | |
| |||
156 | 157 | | |
157 | 158 | | |
158 | 159 | | |
159 | | - | |
| 160 | + | |
| 161 | + | |
160 | 162 | | |
161 | 163 | | |
162 | 164 | | |
| |||
0 commit comments