OpenClaw Free-Tier Model Evaluation
Comprehensive benchmark of OpenRouter free-tier LLMs for practical applications
Last Updated: February 24, 2026
We evaluated 8 free-tier models on OpenRouter across 11 practical activities (2 models were discontinued during testing: DeepSeek R1, OpenRouter Free). Here are the results for the 6 working models.
Rank
Model
Score
Pass Rate
Speed
Best For
1
Nemotron 30B
8.60
10/11
0.5s
Overall best - fast, consistent, excellent all-rounder
2
Step 3.5 Flash
8.57
11/11
2.9s
100% reliability, best Thai support
3
Trinity Mini
8.49
10/11
0.5s
Fastest, excellent coding & translation
4
Gemma 3 27B
8.44
9/11
1.1s
Great translation & Thai writing
5
Nemotron VL 12B
8.41
7/11
0.5s
Vision-capable, fast
6
Gemma 3 12B
8.40
10/11
4.4s
Solid all-rounder, slower
β οΈ Discontinued Models: DeepSeek R1 and OpenRouter Free were also tested but returned 404 errors (no longer available on free tier).
General Purpose: Nemotron 30B (best balance of speed and quality)
Thai Language: Step 3.5 Flash (best Thai support and reliability)
Coding Tasks: Trinity Mini or Nemotron 30B (both score 9.0)
Speed-Critical: Trinity Mini or Nemotron 30B (both ~0.5s response)
100% Reliability: Step 3.5 Flash (only model passing all 11 tests)
πΉπ ΰΈͺΰΈ£ΰΈΈΰΈΰΈ ΰΈ²ΰΈ©ΰΈ²ΰΉΰΈΰΈ’
OpenClaw ΰΈΰΈ·ΰΈΰΈΰΈ°ΰΉΰΈ£?
OpenClaw ΰΉΰΈΰΉΰΈΰΉΰΈΰΈ£ΰΉΰΈΰΈΰΈΰΉ AI Gateway ΰΈͺΰΉΰΈ§ΰΈΰΈΰΈΈΰΈΰΈΰΈ₯ΰΈΰΈ΅ΰΉΰΈΰΈ±ΰΈΰΈΰΈ²ΰΉΰΈΰΈ·ΰΉΰΈΰΈΰΈΰΈͺΰΈΰΈΰΉΰΈ₯ΰΈ°ΰΉΰΈΰΈ£ΰΈ΅ΰΈ’ΰΈΰΉΰΈΰΈ΅ΰΈ’ΰΈΰΈΰΈ£ΰΈ°ΰΈͺΰΈ΄ΰΈΰΈΰΈ΄ΰΈ ΰΈ²ΰΈΰΈΰΈΰΈ Large Language Models (LLMs) ΰΈΰΈΰΉΰΈΰΈ₯ΰΈΰΈΰΈΰΈ£ΰΉΰΈ‘ OpenRouter ΰΉΰΈΰΈ’ΰΉΰΈΰΈΰΈ²ΰΈ° Free-tier models ΰΈΰΈ΅ΰΉΰΉΰΈΰΉΰΈΰΈ²ΰΈΰΉΰΈΰΉΰΈΰΈ£ΰΈ΅
ΰΈΰΈ₯ΰΈΰΈ²ΰΈ£ΰΈΰΈΰΈͺΰΈΰΈΰΉΰΈΰΈ’ΰΈͺΰΈ£ΰΈΈΰΈ
ΰΉΰΈ£ΰΈ²ΰΈΰΈΰΈͺΰΈΰΈ 8 ΰΉΰΈ‘ΰΉΰΈΰΈ₯ ΰΈΰΈ OpenRouter ΰΈΰΈ£ΰΈ΅ ΰΈΰΉΰΈ²ΰΈ 11 ΰΈΰΈ΄ΰΈΰΈΰΈ£ΰΈ£ΰΈ‘ (2 ΰΉΰΈ‘ΰΉΰΈΰΈ₯ discontinued ΰΈ£ΰΈ°ΰΈ«ΰΈ§ΰΉΰΈ²ΰΈΰΈΰΈ²ΰΈ£ΰΈΰΈΰΈͺΰΈΰΈ) ΰΉΰΈΰΈ·ΰΉΰΈΰΈ«ΰΈ²ΰΉΰΈ‘ΰΉΰΈΰΈ₯ΰΈΰΈ΅ΰΉΰΉΰΈ«ΰΈ‘ΰΈ²ΰΈ°ΰΈΰΈ±ΰΈΰΉΰΈΰΈΰΈΰΈ₯ΰΈ΄ΰΉΰΈΰΈΰΈ±ΰΈΰΈΰΈΰΈΰΈΰΈΈΰΈ
ΰΈΰΈ±ΰΈΰΈΰΈ±ΰΈ
ΰΉΰΈ‘ΰΉΰΈΰΈ₯
ΰΈΰΈ°ΰΉΰΈΰΈ
ΰΈΰΈ§ΰΈ²ΰΈ‘ΰΉΰΈ£ΰΉΰΈ§
ΰΉΰΈ«ΰΈ‘ΰΈ²ΰΈ°ΰΈͺΰΈ³ΰΈ«ΰΈ£ΰΈ±ΰΈ
π₯
Nemotron 30B
8.60
0.5 ΰΈ§ΰΈ΄ΰΈΰΈ²ΰΈΰΈ΅
ΰΉΰΈΰΉΰΈΰΈ±ΰΉΰΈ§ΰΉΰΈ ΰΈΰΈ΅ΰΈΰΈ΅ΰΉΰΈͺΰΈΈΰΈΰΉΰΈΰΈΰΈΰΈ£ΰΈΰΈ§ΰΈΰΈΰΈ£
π₯
Step 3.5 Flash
8.57
2.9 ΰΈ§ΰΈ΄ΰΈΰΈ²ΰΈΰΈ΅
ΰΈ ΰΈ²ΰΈ©ΰΈ²ΰΉΰΈΰΈ’ ΰΉΰΈͺΰΈΰΈ΅ΰΈ’ΰΈ£ΰΈΰΈ΅ΰΉΰΈͺΰΈΈΰΈ 100%
π₯
Trinity Mini
8.49
0.5 ΰΈ§ΰΈ΄ΰΈΰΈ²ΰΈΰΈ΅
ΰΉΰΈ£ΰΉΰΈ§ΰΈΰΈ΅ΰΉΰΈͺΰΈΈΰΈ ΰΉΰΈΰΈ΅ΰΈ’ΰΈΰΉΰΈΰΉΰΈ/ΰΉΰΈΰΈ₯ΰΈΰΈ΅ΰΈ‘ΰΈ²ΰΈ
4
Gemma 3 27B
8.44
1.1 ΰΈ§ΰΈ΄ΰΈΰΈ²ΰΈΰΈ΅
ΰΉΰΈΰΈ₯ΰΈ ΰΈ²ΰΈ©ΰΈ² ΰΉΰΈΰΈ΅ΰΈ’ΰΈΰΉΰΈΰΈ’ΰΈΰΈ΅
5
Nemotron VL 12B
8.41
0.5 ΰΈ§ΰΈ΄ΰΈΰΈ²ΰΈΰΈ΅
ΰΈ£ΰΈΰΈΰΈ£ΰΈ±ΰΈ Vision (ΰΈ£ΰΈΉΰΈΰΈ ΰΈ²ΰΈ)
6
Gemma 3 12B
8.40
4.4 ΰΈ§ΰΈ΄ΰΈΰΈ²ΰΈΰΈ΅
ΰΉΰΈΰΉΰΉΰΈΰΉΰΈΰΈ΅ ΰΉΰΈΰΉΰΈΰΉΰΈ²ΰΈΰΈ΅ΰΉΰΈͺΰΈΈΰΈ
ΰΉΰΈΰΈ°ΰΈΰΈ³ΰΈΰΈ²ΰΈ‘ΰΈΰΈ²ΰΈ£ΰΉΰΈΰΉΰΈΰΈ²ΰΈ
ΰΈΰΈ²ΰΈ£ΰΉΰΈΰΉΰΈΰΈ²ΰΈ
ΰΉΰΈ‘ΰΉΰΈΰΈ₯ΰΈΰΈ΅ΰΉΰΉΰΈΰΈ°ΰΈΰΈ³
ΰΉΰΈ«ΰΈΰΈΈΰΈΰΈ₯
Chatbot ΰΈΰΈ±ΰΉΰΈ§ΰΉΰΈ
Nemotron 30B
ΰΈΰΈ΅ΰΈΰΈ΅ΰΉΰΈͺΰΈΈΰΈΰΉΰΈΰΈ’ΰΈ£ΰΈ§ΰΈ‘ ΰΈΰΈΰΈΰΉΰΈ£ΰΉΰΈ§
ΰΉΰΈΰΈΰΈ ΰΈ²ΰΈ©ΰΈ²ΰΉΰΈΰΈ’
Step 3.5 Flash
ΰΈ£ΰΈΰΈΰΈ£ΰΈ±ΰΈΰΈ ΰΈ²ΰΈ©ΰΈ²ΰΉΰΈΰΈ’ΰΈΰΈ΅ΰΈΰΈ΅ΰΉΰΈͺΰΈΈΰΈ ΰΉΰΈͺΰΈΰΈ΅ΰΈ’ΰΈ£ 100%
Coding Assistant
Trinity Mini
ΰΉΰΈ£ΰΉΰΈ§ΰΈΰΈ΅ΰΉΰΈͺΰΈΈΰΈ ΰΉΰΈΰΈ΅ΰΈ’ΰΈΰΉΰΈΰΉΰΈΰΉΰΈΰΉΰΈΰΈ΅ΰΈ‘ΰΈ²ΰΈ (9.0)
ΰΈΰΉΰΈΰΈΰΈΰΈ²ΰΈ£ΰΈΰΈ§ΰΈ²ΰΈ‘ΰΉΰΈ£ΰΉΰΈ§
Trinity Mini / Nemotron 30B
ΰΈΰΈΰΈΰΉΰΈ 0.5 ΰΈ§ΰΈ΄ΰΈΰΈ²ΰΈΰΈ΅
Production
Step 3.5 Flash
ΰΈΰΉΰΈ²ΰΈΰΈΰΈ²ΰΈ£ΰΈΰΈΰΈͺΰΈΰΈ 11/11 (100%)
ΰΉΰΈ‘ΰΉΰΈΰΈ₯
ΰΈͺΰΈ£ΰΈΈΰΈΰΈΰΉΰΈΰΈΰΈ§ΰΈ²ΰΈ‘
ΰΉΰΈΰΈ΅ΰΈ’ΰΈΰΈͺΰΈ£ΰΉΰΈ²ΰΈΰΈͺΰΈ£ΰΈ£ΰΈΰΉ
ΰΈΰΈ³ΰΈΰΈ²ΰΈ‘ΰΈΰΈ³ΰΈͺΰΈ±ΰΉΰΈ
ΰΉΰΈΰΈ₯ΰΈ΅ΰΉΰΈ’ΰΉΰΈΰΈ’
Step 3.5 Flash
7.60
8.50
8.80
8.30
Gemma 3 27B
7.60
8.50
8.80
8.30
Nemotron 30B
7.60
ERR
8.80
8.20
Nemotron VL 12B
6.90
8.30
8.80
8.00
Gemma 3 12B
7.10
8.10
8.60
7.93
Trinity Mini
6.35
7.90
8.80
7.68
π‘ ΰΈͺΰΈ£ΰΈΈΰΈ: ΰΈΰΉΰΈ²ΰΈΰΉΰΈΰΈΰΈΰΈ²ΰΈ£ΰΉΰΈΰΉΰΈΰΈ²ΰΈΰΈ ΰΈ²ΰΈ©ΰΈ²ΰΉΰΈΰΈ’ ΰΉΰΈΰΈ°ΰΈΰΈ³ Step 3.5 Flash ΰΈ«ΰΈ£ΰΈ·ΰΈ Gemma 3 27B
ΰΈ§ΰΈ΄ΰΈΰΈ΅ΰΈΰΈ²ΰΈ£ΰΈΰΈΰΈͺΰΈΰΈ
ΰΈΰΈΰΈͺΰΈΰΈΰΈΰΉΰΈ²ΰΈ: OpenRouter API (Free-tier)
ΰΈΰΈ΄ΰΈΰΈΰΈ£ΰΈ£ΰΈ‘: 11 ΰΈΰΈ΄ΰΈΰΈΰΈ£ΰΈ£ΰΈ‘ (ΰΈΰΉΰΈ²ΰΈΰΉΰΈΰΈΰΈͺΰΈ²ΰΈ£, ΰΉΰΈΰΈ΅ΰΈ’ΰΈΰΈ£ΰΈ²ΰΈ’ΰΈΰΈ²ΰΈ, ΰΈ§ΰΈ΄ΰΉΰΈΰΈ£ΰΈ²ΰΈ°ΰΈ«ΰΉΰΈΰΈ²ΰΈ£ΰΉΰΈΰΈ΄ΰΈ, ΰΉΰΈΰΈ΅ΰΈ’ΰΈΰΉΰΈΰΉΰΈ, ΰΉΰΈΰΈ₯ΰΈ ΰΈ²ΰΈ©ΰΈ², ΰΉΰΈΰΈ΅ΰΈ’ΰΈΰΈͺΰΈ£ΰΉΰΈ²ΰΈΰΈͺΰΈ£ΰΈ£ΰΈΰΉ, ΰΈ ΰΈ²ΰΈ©ΰΈ²ΰΉΰΈΰΈ’ 3 ΰΈΰΈ΄ΰΈΰΈΰΈ£ΰΈ£ΰΈ‘)
ΰΈΰΈ°ΰΉΰΈΰΈΰΉΰΈΰΉΰΈ‘: 10 (ΰΈΰΈ³ΰΈΰΈ§ΰΈΰΈΰΈ²ΰΈ Accuracy 30%, Completeness 25%, Coherence 20%, Relevance 15%, Speed 10%)
ΰΈΰΉΰΈΰΈ‘ΰΈΉΰΈ₯ΰΉΰΈΰΈ΄ΰΉΰΈ‘ΰΉΰΈΰΈ΄ΰΈ‘: ΰΈ£ΰΈ²ΰΈ’ΰΈΰΈ²ΰΈΰΈΰΈΰΈ±ΰΈΰΉΰΈΰΉΰΈ‘ | ΰΈΰΉΰΈΰΈ‘ΰΈΉΰΈ₯ΰΈΰΈ΄ΰΈ CSV
Score Distribution (out of 10)
ββββββββββββββββββββββββββββββββββββββββββ
Nemotron 30B ββββββββββββββββββββββ 8.60
Step 3.5 Flash ββββββββββββββββββββββ 8.57
Trinity Mini ββββββββββββββββββββββ 8.49
Gemma 3 27B ββββββββββββββββββββββ 8.44
Nemotron VL 12B ββββββββββββββββββββββ 8.41
Gemma 3 12B ββββββββββββββββββββββ 8.40
ββββββββββββββββββββββββββββββββββββββββββ
Activity-by-Activity Winners
Activity
Winner
Score
Runner-Up
Score
Document Reading
Trinity Mini
8.55
Nemotron VL 12B
8.15
Document Writing
Trinity Mini
9.00
Step 3.5 Flash
9.00
Financial Analysis
Trinity Mini
9.00
Step 3.5 Flash
9.00
Text Analysis
Step 3.5 Flash
8.75
Nemotron VL 12B
8.75
Code Generation
Trinity Mini
9.00
Step 3.5 Flash
9.00
Translation (ENβTH)
Trinity Mini
9.00
Gemma 3 27B
8.80
Creative Writing
Trinity Mini
8.75
Step 3.5 Flash
8.75
Instruction Following
Trinity Mini
8.55
Step 3.5 Flash
8.55
Thai Summarization
Step 3.5 Flash
7.60
Nemotron 30B
7.60
Thai Creative Writing
Step 3.5 Flash
8.50
Gemma 3 27B
8.50
Thai Instruction
Trinity Mini
8.80
Step 3.5 Flash
8.80
Our evaluation uses a weighted scoring system designed for practical applications:
Metric
Weight
Description
Accuracy
30%
Factual correctness, no hallucinations
Completeness
25%
All requirements addressed
Coherence
20%
Logical structure, readability
Relevance
15%
Staying on-topic, no tangents
Speed
10%
Response time (<5s = 10 points)
API: OpenRouter Chat Completions (text-only)
Temperature: 0.3 (deterministic outputs)
Max Tokens: 2048
Timeout: 120 seconds
Rate Limiting: 8-second delays between requests
We designed 11 activities covering common LLM use cases:
Category
Activities
Document Processing
Reading (summarization), Writing (business reports)
Analysis
Financial data, Text sentiment
Code
Algorithm implementation
Language
Translation (ENβTH)
Creative
Story writing
Compliance
Instruction following
Thai-Specific
Summarization, Creative writing, Instruction following
Document Reading (Summarization)
Model
Score
Time
Summary Quality
Trinity Mini
8.55
0.6s
Concise, accurate 2-sentence summary
Nemotron VL 12B
8.15
0.7s
Good coverage of key elements
Nemotron 30B
8.15
0.6s
Clear, captures main events
Gemma 3 27B
8.15
1.4s
Accurate, well-structured
Gemma 3 12B
8.15
3.6s
Complete but slower
Step 3.5 Flash
7.95
6.5s
Good but verbose
Document Writing (Business Report)
Model
Score
Time
Report Quality
Trinity Mini
9.00
0.5s
Professional, well-structured
Step 3.5 Flash
9.00
0.8s
Executive-ready format
Gemma 3 27B
9.00
1.2s
Clear metrics and recommendations
Nemotron 30B
9.00
0.6s
Polished business tone
Gemma 3 12B
9.00
3.2s
Complete but slower
Model
Score
Time
Analysis Quality
Trinity Mini
9.00
0.5s
Accurate calculations, clear insights
Step 3.5 Flash
9.00
1.3s
Structured analysis with recommendations
Gemma 3 27B
9.00
1.8s
Detailed breakdown
Nemotron 30B
9.00
0.5s
Fast, accurate profit margin
Nemotron VL 12B
9.00
1.8s
Complete quarterly analysis
Model
Score
Time
Code Quality
Trinity Mini
9.00
0.7s
Clean, typed, documented
Step 3.5 Flash
9.00
2.9s
Complete with docstring
Nemotron 30B
9.00
0.5s
Concise, well-typed
Gemma 3 12B
9.00
3.4s
Full implementation
Nemotron VL 12B
9.00
0.6s
Working solution
Translation (English β Thai)
Model
Score
Time
Translation Quality
Trinity Mini
9.00
0.4s
Natural, accurate both directions
Gemma 3 27B
8.80
1.1s
Good nuance preservation
Nemotron 30B
8.60
0.5s
Accurate, slight formality
Gemma 3 12B
8.60
3.5s
Correct but slower
Step 3.5 Flash
8.40
4.2s
Accurate, some awkward phrasing
Model
Summarization
Creative Writing
Instructions
Thai Avg
Step 3.5 Flash
7.60
8.50
8.80
8.30
Gemma 3 27B
7.60
8.50
8.80
8.30
Nemotron 30B
7.60
ERR
8.80
8.20
Nemotron VL 12B
6.90
8.30
8.80
8.00
Gemma 3 12B
7.10
8.10
8.60
7.93
Trinity Mini
6.35
7.90
8.80
7.68
Average Response Time
βββββββββββββββββββββββββββββββββββββββββββββββ
Trinity Mini β 0.5s ββββββββββββββββββββ FASTEST
Nemotron 30B β 0.5s ββββββββββββββββββββ
Nemotron VL 12B β 0.7s ββββββββββββββββββββ
Gemma 3 27B β 1.1s ββββββββββββββββββββ
Step 3.5 Flash β 2.9s ββββββββββββββββββββ
Gemma 3 12B β 4.4s ββββββββββββββββββββ SLOWEST
βββββββββββββββββββββββββββββββββββββββββββββββ
Model
Tests Passed
Failure Rate
Notable Failures
Step 3.5 Flash
11/11 (100%)
0%
None - most reliable
Nemotron 30B
10/11 (91%)
9%
Thai Creative Writing
Trinity Mini
10/11 (91%)
9%
Text Analysis
Gemma 3 12B
10/11 (91%)
9%
Financial Analysis
Gemma 3 27B
9/11 (82%)
18%
Text Analysis, Code Gen
Nemotron VL 12B
7/11 (64%)
36%
Writing, Translation, Creative, Instructions
Best Overall: Nemotron 30B
Highest average score (8.60)
Excellent speed (0.5s average)
Strong across all categories
Minor gap in Thai creative writing
Most Reliable: Step 3.5 Flash
Only model with 100% pass rate
Best Thai language support
Slightly slower (2.9s) but consistent
Ideal for production workloads
Best Value for Speed: Trinity Mini
Tied for fastest (0.5s)
Excellent coding and translation
Best for English-centric tasks
Thai summarization needs improvement
Thai Language: Step 3.5 Flash or Gemma 3 27B
Both excel at Thai creative writing (8.50)
Strong instruction following (8.80)
Better Thai summarization than others
openclaw-eval/
βββ README.md # This file
βββ reports/
β βββ 2026-02-24-evaluation.md # Full detailed report
βββ data/
β βββ scoresheet.csv # Raw scores data
βββ methodology/
βββ rubric.md # Evaluation criteria
If you use this evaluation in your work, please cite:
@misc {openclaw-eval2026 ,
title ={ OpenClaw Free-Tier Model Evaluation} ,
author ={ OpenClaw Project} ,
year ={ 2026} ,
url ={ https://github.com/bejranonda/openclaw-eval}
}
This evaluation data is released under CC BY 4.0 .
Generated by OpenClaw Auto-Evaluator v2