feat: add SimpleVQA benchmark task by Luodian · Pull Request #1184 · EvolvingLMMs-Lab/lmms-eval

Luodian · 2026-02-22T14:59:09Z

Summary

add new simplevqa task under lmms_eval/tasks/simplevqa with auto-discovered YAML config and utils
decode base64-encoded dataset images from m-a-p/SimpleVQA, format prompts, and score with normalized exact match
configure public dataset loading with token: false to avoid expired local HF token issues

Validation

uv run pre-commit run --files lmms_eval/tasks/simplevqa/simplevqa.yaml lmms_eval/tasks/simplevqa/_default_template_simplevqa_yaml lmms_eval/tasks/simplevqa/utils.py
- passed (black, isort)
OPENAI_API_KEY="$OPENROUTER_API_KEY" OPENAI_API_BASE="https://openrouter.ai/api/v1" uv run python -m lmms_eval --model openai_compatible --model_args "model_version=bytedance-seed/seed-1.6-flash,max_retries=5,retry_backoff_s=2.0" --tasks simplevqa --batch_size 1 --limit 8 --log_samples --output_path ./logs/simplevqa_smoke_v2 --verbosity INFO
- score table: simplevqa exact_match = 0.625 ± 0.183 (limit=8 smoke)
- samples log: logs/simplevqa_smoke_v2/bytedance-seed__seed-1.6-flash/20260222_225744_samples_simplevqa.jsonl
- result json: logs/simplevqa_smoke_v2/bytedance-seed__seed-1.6-flash/20260222_225744_results.json
- verified JSONL entries have non-empty filtered_resps for all 8 samples

Smoke Validation (limit=8)

Status: PASS (LMM-299 / simplevqa)

Output Table

Metric	Value
exact_match	0.625

Sample Output

Sample 1 (doc_id: 0)

Input: 图中所示穴位所属的经脉是什么？ ↵ Answer the question using a short phrase.
Model Output: 足阳明胃经
Reference: 足阳明胃经
Scores: exact_match = 1.0
Tokens: output=83, reasoning=79

Sample 2 (doc_id: 1)

Input: 图中这种中药药材叫什么？ ↵ Answer the question using a short phrase.
Model Output: 黄柏
Reference: 黄柏
Scores: exact_match = 1.0
Tokens: output=60, reasoning=59

Test Params

uv run python -m lmms_eval --model openai_compatible --model_args "model_version=bytedance-seed/seed-1.6-flash" --tasks simplevqa --batch_size 1 --limit 8 --log_samples

feat: add simplevqa benchmark task

01f8949

Luodian merged commit 0116ba7 into dev-v0d7 Feb 23, 2026
2 checks passed

Luodian deleted the feat/lmm-299-simplevqa branch February 23, 2026 08:25

Luodian added a commit that referenced this pull request Feb 28, 2026

feat: add simplevqa benchmark task (#1184)

84b7602

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add SimpleVQA benchmark task#1184

feat: add SimpleVQA benchmark task#1184
Luodian merged 1 commit into
dev-v0d7from
feat/lmm-299-simplevqa

Luodian commented Feb 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Luodian commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Smoke Validation (limit=8)

Output Table

Sample Output

Test Params

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Luodian commented Feb 22, 2026 •

edited

Loading