Commit 5307dc5
authored
Update GQA benchmark to support bfloat16 (#26898)
Update GQA benchmark to support bfloat16 and default to testing the
first configuration (fast mode).
Note that test_sparse_attention.py was removed in
#23547. It is referenced by
the benchmark script, so I add it back and disable the test in pipeline
mode.
Example output from H200 GPU:
```
prompt-sm90-Llama3-8B-b1-h32_8x128-float16:
sequence_length ORT-GQA-Dense ORT-GQA-Dense-PackedQKV
0 16.0 0.781751 0.571226
1 32.0 0.893813 0.684198
2 64.0 1.434056 1.589263
3 128.0 1.142192 1.681969
4 256.0 1.503483 2.225498
5 512.0 1.045732 1.878660
6 1024.0 2.334924 0.916745
7 2048.0 2.229924 3.001290
8 4096.0 4.309678 3.198855
9 8192.0 7.932211 7.910411
token-sm90-Llama3-8B-b1-h32_8_d128-float16:
past_sequence_length ORT-GQA-Dense ORT-GQA-Dense-PackedQKV
0 16.0 1.751966 0.780081
1 32.0 1.302806 0.043939
2 64.0 2.301024 2.207282
3 128.0 2.294556 3.010107
4 256.0 2.931330 1.781768
5 512.0 1.210220 2.799579
6 1024.0 2.767142 2.660434
7 2048.0 1.420229 0.091433
8 4096.0 0.860655 0.801022
9 8191.0 0.749525 0.820858
prompt-sm90-Llama3-8B-b1-h32_8x128-bfloat16:
sequence_length ORT-GQA-Dense ORT-GQA-Dense-PackedQKV
0 16.0 1.085427 0.666664
1 32.0 1.714795 0.931262
2 64.0 1.729093 1.438733
3 128.0 1.071263 2.486135
4 256.0 1.957349 1.342417
5 512.0 1.159680 1.591321
6 1024.0 0.743702 2.035150
7 2048.0 1.452736 1.788801
8 4096.0 4.029917 4.041565
9 8192.0 7.934485 7.931600
token-sm90-Llama3-8B-b1-h32_8_d128-bfloat16:
past_sequence_length ORT-GQA-Dense ORT-GQA-Dense-PackedQKV
0 16.0 0.044354 0.043983
1 32.0 0.040715 0.044061
2 64.0 0.045586 0.044071
3 128.0 0.062204 0.061418
4 256.0 0.074764 4.874854
5 512.0 2.472094 2.102259
6 1024.0 4.911269 1.396149
7 2048.0 4.898032 1.684034
8 4096.0 2.523432 2.192279
9 8191.0 1.651366 3.427370
```1 parent db3eb22 commit 5307dc5
File tree
2 files changed
+1229
-39
lines changed- onnxruntime/test/python/transformers
2 files changed
+1229
-39
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
22 | 20 | | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
28 | 30 | | |
29 | 31 | | |
30 | 32 | | |
| |||
37 | 39 | | |
38 | 40 | | |
39 | 41 | | |
| 42 | + | |
40 | 43 | | |
41 | 44 | | |
42 | 45 | | |
| |||
48 | 51 | | |
49 | 52 | | |
50 | 53 | | |
51 | | - | |
| 54 | + | |
52 | 55 | | |
53 | 56 | | |
54 | 57 | | |
55 | 58 | | |
56 | 59 | | |
57 | 60 | | |
58 | 61 | | |
| 62 | + | |
59 | 63 | | |
60 | 64 | | |
61 | 65 | | |
| |||
70 | 74 | | |
71 | 75 | | |
72 | 76 | | |
| 77 | + | |
73 | 78 | | |
74 | 79 | | |
75 | 80 | | |
| |||
86 | 91 | | |
87 | 92 | | |
88 | 93 | | |
| 94 | + | |
89 | 95 | | |
90 | 96 | | |
91 | 97 | | |
| |||
107 | 113 | | |
108 | 114 | | |
109 | 115 | | |
| 116 | + | |
110 | 117 | | |
111 | 118 | | |
112 | 119 | | |
| |||
118 | 125 | | |
119 | 126 | | |
120 | 127 | | |
121 | | - | |
| 128 | + | |
122 | 129 | | |
123 | 130 | | |
124 | 131 | | |
125 | 132 | | |
126 | 133 | | |
127 | 134 | | |
128 | 135 | | |
| 136 | + | |
129 | 137 | | |
130 | 138 | | |
131 | 139 | | |
| |||
140 | 148 | | |
141 | 149 | | |
142 | 150 | | |
| 151 | + | |
143 | 152 | | |
144 | 153 | | |
145 | 154 | | |
| |||
158 | 167 | | |
159 | 168 | | |
160 | 169 | | |
| 170 | + | |
161 | 171 | | |
162 | 172 | | |
163 | 173 | | |
| |||
168 | 178 | | |
169 | 179 | | |
170 | 180 | | |
171 | | - | |
| 181 | + | |
172 | 182 | | |
173 | 183 | | |
174 | 184 | | |
| |||
177 | 187 | | |
178 | 188 | | |
179 | 189 | | |
180 | | - | |
| 190 | + | |
181 | 191 | | |
182 | 192 | | |
183 | 193 | | |
| |||
188 | 198 | | |
189 | 199 | | |
190 | 200 | | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
191 | 207 | | |
192 | 208 | | |
193 | 209 | | |
194 | 210 | | |
195 | | - | |
196 | | - | |
197 | | - | |
198 | | - | |
199 | | - | |
200 | | - | |
201 | | - | |
202 | | - | |
203 | | - | |
204 | | - | |
205 | | - | |
206 | | - | |
207 | | - | |
208 | | - | |
209 | | - | |
210 | | - | |
211 | | - | |
212 | | - | |
213 | | - | |
214 | | - | |
215 | | - | |
216 | | - | |
217 | | - | |
218 | | - | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
219 | 238 | | |
220 | 239 | | |
221 | 240 | | |
| |||
224 | 243 | | |
225 | 244 | | |
226 | 245 | | |
227 | | - | |
| 246 | + | |
0 commit comments