Skip to content

feat: integrate worldvqa benchmark task#1168

Merged
Luodian merged 1 commit into
dev-v0d7from
feat/lmm-297-worldvqa
Feb 23, 2026
Merged

feat: integrate worldvqa benchmark task#1168
Luodian merged 1 commit into
dev-v0d7from
feat/lmm-297-worldvqa

Conversation

@Luodian
Copy link
Copy Markdown
Contributor

@Luodian Luodian commented Feb 22, 2026

Summary

  • add a new worldvqa task backed by moonshotai/WorldVQA with base64 image decoding and exact-match scoring
  • add WorldQA compatibility aliases under lmms_eval/tasks/worldvqa/ (worldvqa_gen, worldvqa_mc, worldvqa_mc_ppl) to keep parity with existing worldqa task flows
  • update docs/current_tasks.md to document the new WorldVQA task and compatibility aliases

Verification

  • uv run python -c "import subprocess, sys; output = subprocess.check_output([sys.executable, '-m', 'lmms_eval', '--tasks', 'list'], text=True); print('worldvqa_present', 'worldvqa' in output)"
  • HF_TOKEN='' HUGGING_FACE_HUB_TOKEN='' uv run python -m lmms_eval --model dummy_video_reader --model_args response=A,fail_on_missing=false --tasks worldvqa --limit 1 --batch_size 1 --output_path /tmp/worldvqa_smoke --log_samples
  • lsp_diagnostics clean for lmms_eval/tasks/worldvqa/utils.py (YAML files still emit expected !function unresolved-tag schema warnings)

Closes #1146

Smoke Validation (limit=8)

Status: PASS (LMM-297 / worldvqa)

Output Table

Metric Value
exact_match 0.25

Sample Output

Sample 1 (doc_id: 0)

  • Input: What breed of dog is in the picture? ↵ Answer briefly.
  • Model Output: Doberman Pinscher
  • Reference: Greek Hound
  • Scores: exact_match = 0.0
  • Tokens: output=95, reasoning=89

Sample 2 (doc_id: 1)

  • Input: What breed of dog is in the picture? ↵ Answer briefly.
  • Model Output: Finnish Lapphund.
  • Reference: European Russian Laika
  • Scores: exact_match = 0.0
  • Tokens: output=601, reasoning=594

Test Params

uv run python -m lmms_eval --model openai_compatible --model_args "model_version=bytedance-seed/seed-1.6-flash" --tasks worldvqa --batch_size 1 --limit 8.0 --log_samples

@Luodian Luodian merged commit c3e3675 into dev-v0d7 Feb 23, 2026
2 checks passed
@Luodian Luodian deleted the feat/lmm-297-worldvqa branch February 23, 2026 08:25
Luodian added a commit that referenced this pull request Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant