AI Model Comparison

Compare AI models with real benchmark data, not marketing claims. This repo tracks the latest model releases with pricing, speed, and quality metrics — updated monthly.

Quick Comparison Table (June 2026)

Model	Provider	Input $/M	Output $/M	Latency	Coding	Reasoning
DeepSeek V4 Flash	DeepSeek	$0.14	$0.28	420ms	9.2/10	8.8/10
GPT-4o	OpenAI	$2.50	$10.00	680ms	9.4/10	9.1/10
Claude 4 Sonnet	Anthropic	$3.00	$15.00	750ms	9.1/10	9.3/10
Qwen3-32B	Alibaba	$0.10	$0.35	510ms	8.9/10	8.5/10
Kimi K2.5	Moonshot	$0.50	$1.00	560ms	8.7/10	8.6/10
GLM-5	Zhipu	$0.40	$1.20	530ms	8.5/10	8.4/10

Why This Comparison Matters

Most AI model comparisons use synthetic benchmarks (MMLU, HumanEval) that don't reflect real-world usage. We test models on actual developer tasks: building APIs, debugging code, writing documentation, and solving business problems.

Testing Methodology

Coding: 200 tasks across Python, JavaScript, Go, Rust
Reasoning: Logic puzzles, math problems, business case analysis
Each model gets 3 attempts per task — we take the best result
All tests use the same prompt for fair comparison

Pricing Data Source

Pricing data is collected from official provider websites and updated monthly. For the most current pricing, use the Global API pricing page.

How to Run Your Own Comparisons

from openai import OpenAI

models_to_test = [
    "deepseek-ai/DeepSeek-V4-Flash",
    "qwen/qwen3-32b",
    "moonshot/kimi-k2.5",
]

client = OpenAI(
    base_url="https://global-apis.com/v1",
    api_key="your-global-api-key",
)

test_prompt = "Write a Python function that implements a LRU cache with O(1) operations."

for model in models_to_test:
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": test_prompt}],
        max_tokens=500,
    )
    print(f"{model}: {len(response.choices[0].message.content)} chars")

Chinese Model Comparison

Chinese AI models (DeepSeek, Qwen, Kimi, GLM) now match or exceed Western models on many tasks — at 10-100x lower cost:

Model	Chinese	English	Math	Code	Price
DeepSeek V4 Flash	9.5	9.2	8.8	9.2	$0.28/M
Qwen3-32B	9.3	8.9	8.5	8.9	$0.35/M
Kimi K2.5	9.4	8.7	8.6	8.7	$1.00/M
GLM-5	9.2	8.5	8.4	8.5	$1.20/M

Contributing

Have benchmark results to add? Open a PR with your test data and methodology. We accept results from any provider as long as the testing methodology is documented.

Links

Global API — one API key for 184+ models
Global API Pricing — real-time model pricing
Global API Docs — API reference and guides

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitkeep		.gitkeep
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Model Comparison

Quick Comparison Table (June 2026)

Why This Comparison Matters

Testing Methodology

Pricing Data Source

How to Run Your Own Comparisons

Chinese Model Comparison

Contributing

Links

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AI Model Comparison

Quick Comparison Table (June 2026)

Why This Comparison Matters

Testing Methodology

Pricing Data Source

How to Run Your Own Comparisons

Chinese Model Comparison

Contributing

Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages