@@ -5,30 +5,29 @@ tags: [SWE-bench, 代码, 编程]
55summary : " 基于 SWE-bench Bash-Only 的 AI 模型代码能力排名"
66data_source : " https://www.swebench.com/"
77benchmarks : [SWE-bench Bash-Only]
8- last_updated : " 2026-05-16 "
8+ last_updated : " 2026-05-23 "
99auto_updated : true
10- date : " 2026-05-16 "
10+ date : " 2026-05-23 "
1111---
1212
13+ > 数据来源:SWE-bench 官方 Bash-Only 排行榜(mini-SWE-agent v2.0.0,500 个实例,单次尝试)。数据获取于 2026 年 2 月。LMSYS Chatbot Arena 等排行榜因网络限制暂时无法获取。
14+
1315| 排名 | 模型 | 厂商 | SWE-bench | 类型 |
1416| --- | --- | --- | --- | --- |
1517| 🥇 | Claude 4.5 Opus | Anthropic | 76.8% | 闭源 |
1618| 🥈 | Gemini 3 Flash | Google DeepMind | 75.8% | 闭源 |
1719| 🥉 | MiniMax M2.5 | MiniMax | 75.8% | 闭源 |
1820| 4 | Claude Opus 4.6 | Anthropic | 75.6% | 闭源 |
19- | 5 | Gemini 3 Pro Preview | Google DeepMind | 74.2 % | 闭源 |
20- | 6 | GPT-5.2 | OpenAI | 72.8 % | 闭源 |
21+ | 5 | Claude 4.5 Opus (medium) | Anthropic | 74.4 % | 闭源 |
22+ | 6 | Gemini 3 Pro Preview | Google DeepMind | 74.2 % | 闭源 |
2123| 7 | GLM-5 | Z-AI | 72.8% | 闭源 |
22- | 8 | Claude 4.5 Sonnet | Anthropic | 71.4% | 闭源 |
23- | 9 | Kimi K2.5 | Moonshot AI | 70.8% | 闭源 |
24- | 10 | DeepSeek V3.2 | DeepSeek | 70.0% | 开源 |
25- | 11 | Gemini 3 Pro | Google DeepMind | 69.6% | 闭源 |
26- | 12 | Claude 4 Opus | Anthropic | 67.6% | 闭源 |
27- | 13 | Claude 4.5 Haiku | Anthropic | 66.6% | 闭源 |
28- | 14 | GPT-5.1 | OpenAI | 66.0% | 闭源 |
29- | 15 | GPT-5 | OpenAI | 65.0% | 闭源 |
30- | 16 | Claude 4 Sonnet | Anthropic | 64.9% | 闭源 |
31- | 17 | Kimi K2 Thinking | Moonshot AI | 63.4% | 闭源 |
32- | 18 | MiniMax M2 | MiniMax | 61.0% | 闭源 |
33- | 19 | DeepSeek V3.2 Reasoner | DeepSeek | 60.0% | 开源 |
34- | 20 | GPT-5 mini | OpenAI | 58.4% | 闭源 |
24+ | 8 | GPT-5.2 | OpenAI | 72.8% | 闭源 |
25+ | 9 | Claude 4.5 Sonnet | Anthropic | 71.4% | 闭源 |
26+ | 10 | Kimi K2.5 | Moonshot AI | 70.8% | 闭源 |
27+ | 11 | DeepSeek V3.2 | DeepSeek | 70.0% | 开源 |
28+ | 12 | Gemini 3 Pro | Google DeepMind | 69.6% | 闭源 |
29+ | 13 | Claude 4 Opus | Anthropic | 67.6% | 闭源 |
30+ | 14 | Claude 4.5 Haiku | Anthropic | 66.6% | 闭源 |
31+ | 15 | GPT-5.1 | OpenAI | 66.0% | 闭源 |
32+ | 16 | GPT-5 | OpenAI | 65.0% | 闭源 |
33+ | 17 | Claude 4 Sonnet | Anthropic | 64.9% | 闭源 |
0 commit comments