Skip to content

feat: add SimpleVQA benchmark task#1184

Merged
Luodian merged 1 commit into
dev-v0d7from
feat/lmm-299-simplevqa
Feb 23, 2026
Merged

feat: add SimpleVQA benchmark task#1184
Luodian merged 1 commit into
dev-v0d7from
feat/lmm-299-simplevqa

Conversation

@Luodian
Copy link
Copy Markdown
Contributor

@Luodian Luodian commented Feb 22, 2026

Summary

  • add new simplevqa task under lmms_eval/tasks/simplevqa with auto-discovered YAML config and utils
  • decode base64-encoded dataset images from m-a-p/SimpleVQA, format prompts, and score with normalized exact match
  • configure public dataset loading with token: false to avoid expired local HF token issues

Validation

  • uv run pre-commit run --files lmms_eval/tasks/simplevqa/simplevqa.yaml lmms_eval/tasks/simplevqa/_default_template_simplevqa_yaml lmms_eval/tasks/simplevqa/utils.py
    • passed (black, isort)
  • OPENAI_API_KEY="$OPENROUTER_API_KEY" OPENAI_API_BASE="https://openrouter.ai/api/v1" uv run python -m lmms_eval --model openai_compatible --model_args "model_version=bytedance-seed/seed-1.6-flash,max_retries=5,retry_backoff_s=2.0" --tasks simplevqa --batch_size 1 --limit 8 --log_samples --output_path ./logs/simplevqa_smoke_v2 --verbosity INFO
    • score table: simplevqa exact_match = 0.625 ± 0.183 (limit=8 smoke)
    • samples log: logs/simplevqa_smoke_v2/bytedance-seed__seed-1.6-flash/20260222_225744_samples_simplevqa.jsonl
    • result json: logs/simplevqa_smoke_v2/bytedance-seed__seed-1.6-flash/20260222_225744_results.json
    • verified JSONL entries have non-empty filtered_resps for all 8 samples

Smoke Validation (limit=8)

Status: PASS (LMM-299 / simplevqa)

Output Table

Metric Value
exact_match 0.625

Sample Output

Sample 1 (doc_id: 0)

  • Input: 图中所示穴位所属的经脉是什么? ↵ Answer the question using a short phrase.
  • Model Output: 足阳明胃经
  • Reference: 足阳明胃经
  • Scores: exact_match = 1.0
  • Tokens: output=83, reasoning=79

Sample 2 (doc_id: 1)

  • Input: 图中这种中药药材叫什么? ↵ Answer the question using a short phrase.
  • Model Output: 黄柏
  • Reference: 黄柏
  • Scores: exact_match = 1.0
  • Tokens: output=60, reasoning=59

Test Params

uv run python -m lmms_eval --model openai_compatible --model_args "model_version=bytedance-seed/seed-1.6-flash" --tasks simplevqa --batch_size 1 --limit 8 --log_samples

@Luodian Luodian merged commit 0116ba7 into dev-v0d7 Feb 23, 2026
2 checks passed
@Luodian Luodian deleted the feat/lmm-299-simplevqa branch February 23, 2026 08:25
Luodian added a commit that referenced this pull request Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant