Skip to content

Commit 629d301

Browse files
committed
docs: update SWE-bench coding leaderboard with real data 2026-05-23
1 parent 991bce7 commit 629d301

1 file changed

Lines changed: 16 additions & 17 deletions

File tree

src/content/models/coding.md

Lines changed: 16 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -5,30 +5,29 @@ tags: [SWE-bench, 代码, 编程]
55
summary: "基于 SWE-bench Bash-Only 的 AI 模型代码能力排名"
66
data_source: "https://www.swebench.com/"
77
benchmarks: [SWE-bench Bash-Only]
8-
last_updated: "2026-05-16"
8+
last_updated: "2026-05-23"
99
auto_updated: true
10-
date: "2026-05-16"
10+
date: "2026-05-23"
1111
---
1212

13+
> 数据来源:SWE-bench 官方 Bash-Only 排行榜(mini-SWE-agent v2.0.0,500 个实例,单次尝试)。数据获取于 2026 年 2 月。LMSYS Chatbot Arena 等排行榜因网络限制暂时无法获取。
14+
1315
| 排名 | 模型 | 厂商 | SWE-bench | 类型 |
1416
| --- | --- | --- | --- | --- |
1517
| 🥇 | Claude 4.5 Opus | Anthropic | 76.8% | 闭源 |
1618
| 🥈 | Gemini 3 Flash | Google DeepMind | 75.8% | 闭源 |
1719
| 🥉 | MiniMax M2.5 | MiniMax | 75.8% | 闭源 |
1820
| 4 | Claude Opus 4.6 | Anthropic | 75.6% | 闭源 |
19-
| 5 | Gemini 3 Pro Preview | Google DeepMind | 74.2% | 闭源 |
20-
| 6 | GPT-5.2 | OpenAI | 72.8% | 闭源 |
21+
| 5 | Claude 4.5 Opus (medium) | Anthropic | 74.4% | 闭源 |
22+
| 6 | Gemini 3 Pro Preview | Google DeepMind | 74.2% | 闭源 |
2123
| 7 | GLM-5 | Z-AI | 72.8% | 闭源 |
22-
| 8 | Claude 4.5 Sonnet | Anthropic | 71.4% | 闭源 |
23-
| 9 | Kimi K2.5 | Moonshot AI | 70.8% | 闭源 |
24-
| 10 | DeepSeek V3.2 | DeepSeek | 70.0% | 开源 |
25-
| 11 | Gemini 3 Pro | Google DeepMind | 69.6% | 闭源 |
26-
| 12 | Claude 4 Opus | Anthropic | 67.6% | 闭源 |
27-
| 13 | Claude 4.5 Haiku | Anthropic | 66.6% | 闭源 |
28-
| 14 | GPT-5.1 | OpenAI | 66.0% | 闭源 |
29-
| 15 | GPT-5 | OpenAI | 65.0% | 闭源 |
30-
| 16 | Claude 4 Sonnet | Anthropic | 64.9% | 闭源 |
31-
| 17 | Kimi K2 Thinking | Moonshot AI | 63.4% | 闭源 |
32-
| 18 | MiniMax M2 | MiniMax | 61.0% | 闭源 |
33-
| 19 | DeepSeek V3.2 Reasoner | DeepSeek | 60.0% | 开源 |
34-
| 20 | GPT-5 mini | OpenAI | 58.4% | 闭源 |
24+
| 8 | GPT-5.2 | OpenAI | 72.8% | 闭源 |
25+
| 9 | Claude 4.5 Sonnet | Anthropic | 71.4% | 闭源 |
26+
| 10 | Kimi K2.5 | Moonshot AI | 70.8% | 闭源 |
27+
| 11 | DeepSeek V3.2 | DeepSeek | 70.0% | 开源 |
28+
| 12 | Gemini 3 Pro | Google DeepMind | 69.6% | 闭源 |
29+
| 13 | Claude 4 Opus | Anthropic | 67.6% | 闭源 |
30+
| 14 | Claude 4.5 Haiku | Anthropic | 66.6% | 闭源 |
31+
| 15 | GPT-5.1 | OpenAI | 66.0% | 闭源 |
32+
| 16 | GPT-5 | OpenAI | 65.0% | 闭源 |
33+
| 17 | Claude 4 Sonnet | Anthropic | 64.9% | 闭源 |

0 commit comments

Comments
 (0)