@@ -298,14 +298,14 @@ Plain `gemini-2.0-flash` can be run by setting `tokenBudget` to zero, skipping t
298298
299299It should not be surprised that plain ` gemini-2.0-flash ` has a 0% pass rate, as I intentionally filtered out the questions that LLMs can answer.
300300
301- | Metric | gemini-2.0-flash | #5e80ed4 | #3deee87 (latest) |
302- | --------| ------------------| ------------------------------------------------- | -------- |
303- | Pass Rate | 0% | 60% | 75% |
304- | Average Steps | 1 | 5 | 5 |
305- | Maximum Steps | 1 | 13 | 13 |
306- | Minimum Steps | 1 | 2 | 1 |
307- | Median Steps | 1 | 3 | 3 |
308- | Average Tokens | 428 | 59,408 | 32,392 |
309- | Median Tokens | 434 | 16,001 | 9,172 |
310- | Maximum Tokens | 463 | 347,222 | 202,055 |
311- | Minimum Tokens | 374 | 5,594 | 3,236 |
301+ | Metric | gemini-2.0-flash | #18f0312 |
302+ | --------| ------------------| -----------|
303+ | Pass Rate | 0% | 75% |
304+ | Average Steps | 1 | 4 |
305+ | Maximum Steps | 1 | 14 |
306+ | Minimum Steps | 1 | 0 |
307+ | Median Steps | 1 | 3 |
308+ | Average Tokens | 428 | 71,285 |
309+ | Median Tokens | 434 | 22,771 |
310+ | Maximum Tokens | 463 | 536,148 |
311+ | Minimum Tokens | 374 | 0 |
0 commit comments