Skip to content

Commit 5ac0cac

Browse files
committed
docs: update SWE-bench coding leaderboard EN with real data 2026-05-23
1 parent 629d301 commit 5ac0cac

1 file changed

Lines changed: 16 additions & 17 deletions

File tree

src/content/en/models/coding.md

Lines changed: 16 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -5,30 +5,29 @@ tags: [SWE-bench, coding, programming]
55
summary: "AI model coding capability rankings based on SWE-bench Bash-Only"
66
data_source: "https://www.swebench.com/"
77
benchmarks: [SWE-bench Bash-Only]
8-
last_updated: "2026-05-16"
8+
last_updated: "2026-05-23"
99
auto_updated: true
10-
date: "2026-05-16"
10+
date: "2026-05-23"
1111
---
1212

13+
> Data source: SWE-bench official Bash-Only leaderboard (mini-SWE-agent v2.0.0, 500 instances, single attempt). Data retrieved February 2026. LMSYS Chatbot Arena and other leaderboards temporarily unavailable due to network restrictions.
14+
1315
| Rank | Model | Vendor | SWE-bench | Type |
1416
|---|---|---|---|---|
1517
| 🥇 | Claude 4.5 Opus | Anthropic | 76.8% | Closed |
1618
| 🥈 | Gemini 3 Flash | Google DeepMind | 75.8% | Closed |
1719
| 🥉 | MiniMax M2.5 | MiniMax | 75.8% | Closed |
1820
| 4 | Claude Opus 4.6 | Anthropic | 75.6% | Closed |
19-
| 5 | Gemini 3 Pro Preview | Google DeepMind | 74.2% | Closed |
20-
| 6 | GPT-5.2 | OpenAI | 72.8% | Closed |
21+
| 5 | Claude 4.5 Opus (medium) | Anthropic | 74.4% | Closed |
22+
| 6 | Gemini 3 Pro Preview | Google DeepMind | 74.2% | Closed |
2123
| 7 | GLM-5 | Z-AI | 72.8% | Closed |
22-
| 8 | Claude 4.5 Sonnet | Anthropic | 71.4% | Closed |
23-
| 9 | Kimi K2.5 | Moonshot AI | 70.8% | Closed |
24-
| 10 | DeepSeek V3.2 | DeepSeek | 70.0% | Open |
25-
| 11 | Gemini 3 Pro | Google DeepMind | 69.6% | Closed |
26-
| 12 | Claude 4 Opus | Anthropic | 67.6% | Closed |
27-
| 13 | Claude 4.5 Haiku | Anthropic | 66.6% | Closed |
28-
| 14 | GPT-5.1 | OpenAI | 66.0% | Closed |
29-
| 15 | GPT-5 | OpenAI | 65.0% | Closed |
30-
| 16 | Claude 4 Sonnet | Anthropic | 64.9% | Closed |
31-
| 17 | Kimi K2 Thinking | Moonshot AI | 63.4% | Closed |
32-
| 18 | MiniMax M2 | MiniMax | 61.0% | Closed |
33-
| 19 | DeepSeek V3.2 Reasoner | DeepSeek | 60.0% | Open |
34-
| 20 | GPT-5 mini | OpenAI | 58.4% | Closed |
24+
| 8 | GPT-5.2 | OpenAI | 72.8% | Closed |
25+
| 9 | Claude 4.5 Sonnet | Anthropic | 71.4% | Closed |
26+
| 10 | Kimi K2.5 | Moonshot AI | 70.8% | Closed |
27+
| 11 | DeepSeek V3.2 | DeepSeek | 70.0% | Open |
28+
| 12 | Gemini 3 Pro | Google DeepMind | 69.6% | Closed |
29+
| 13 | Claude 4 Opus | Anthropic | 67.6% | Closed |
30+
| 14 | Claude 4.5 Haiku | Anthropic | 66.6% | Closed |
31+
| 15 | GPT-5.1 | OpenAI | 66.0% | Closed |
32+
| 16 | GPT-5 | OpenAI | 65.0% | Closed |
33+
| 17 | Claude 4 Sonnet | Anthropic | 64.9% | Closed |

0 commit comments

Comments
 (0)