Programming LLM Benchmark: Code Generation & Game Development

This benchmark evaluates 6 leading programming LLMs on real-world tasks (frontend development, game design, etc.) using 胜算云「AI群聊」. Tested models:
Qwen3-Coder-Plus, Kimi K2, GLM-4.5, Claude Sonnet 4, Gemini 2.5 Pro, OpenAI o4-mini-high.

Key Findings

🚀 Code Generation: Domestic Models Shine

High Completion Rates: All models (except GLM-4.5 in Snake Game logic) delivered functional code for basic tasks.
Design Superiority: Chinese models (Qwen, GLM, Kimi) outperformed in UI/UX tasks (e.g., weather cards, restaurant homepages) with better visuals and dynamic interactions (e.g., countdowns, quote generators).

🎮 Game Development: Functional but Limited Strategy

Basic Playability Achieved: All models implemented core mechanics for Snake and Gomoku (Five-in-a-Row).
Weak AI Strategy: Opponent logic (e.g., Gomoku) was often simplistic.
Interaction Highlights:
- Kimi/Claude: Added pause functionality.
- Qwen: Offered undo moves for better usability.

Model Recommendations

Use Case	Top Picks
All-Rounder	Gemini 2.5 Pro, Qwen3-Coder-Plus
UI/UX Focus	Qwen3-Coder-Plus, GLM-4.5
Game Dev	Claude Sonnet 4, Kimi K2

Key Features:

Task Diversity: Covers frontend design, game logic, and interactive elements.
Actionable Insights: Direct model recommendations for different needs.
Transparent Platform: Tests run via 胜算云「AI群聊」.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
test1-clock		test1-clock
test2-weather		test2-weather
test3-canteen		test3-canteen
test4-Snake		test4-Snake
test5-Gobang		test5-Gobang
test6-Super Mario		test6-Super Mario
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Programming LLM Benchmark: Code Generation & Game Development

Key Findings

🚀 Code Generation: Domestic Models Shine

🎮 Game Development: Functional but Limited Strategy

Model Recommendations

Key Features:

About

Uh oh!

Releases

Packages

Languages

xinrui-z/Coding-LLM

Folders and files

Latest commit

History

Repository files navigation

Programming LLM Benchmark: Code Generation & Game Development

Key Findings

🚀 Code Generation: Domestic Models Shine

🎮 Game Development: Functional but Limited Strategy

Model Recommendations

Key Features:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages