Skip to content

[Benchmark Backfill] Integrate CountBench into lmms-eval#1156

Merged
Luodian merged 2 commits into
dev-v0d7from
feat/lmm-291-countbench
Feb 23, 2026
Merged

[Benchmark Backfill] Integrate CountBench into lmms-eval#1156
Luodian merged 2 commits into
dev-v0d7from
feat/lmm-291-countbench

Conversation

@Luodian
Copy link
Copy Markdown
Contributor

@Luodian Luodian commented Feb 22, 2026

Summary

  • Add a new countbench task backed by vikhyatk/CountBenchQA (test split) with the CountBenchQA reference prompt style.
  • Implement task utilities for image/text conversion and source-aligned count normalization (number words -> numerals) with exact-match accuracy scoring.
  • Update docs/current_tasks.md to include CountBench in the image task catalog.

Validation

  • uv run python -m lmms_eval --tasks list (verified countbench is listed)
  • uv run python -m lmms_eval --model dummy_video_reader --model_args response=2 --tasks countbench --limit 8 --batch_size 1
  • uv run pre-commit run --files docs/current_tasks.md lmms_eval/tasks/countbench/countbench.yaml lmms_eval/tasks/countbench/utils.py

Notes

  • YAML LSP reports unresolved !function tags for task YAMLs by default; this matches existing task files in the repo.

Smoke Validation (limit=8)

Status: PASS (LMM-291 / countbench)

Output Table

Metric Value
acc 0.875

Sample Output

Sample 1 (doc_id: 0)

  • Input: Look at the image carefully and count the objects. Answer with just a number, without any additional text. How many headsets are there in the image?
  • Model Output: 10
  • Reference: 10
  • Scores: acc = 1.0
  • Tokens: output=153, reasoning=151

Sample 2 (doc_id: 1)

  • Input: Look at the image carefully and count the objects. Answer with just a number, without any additional text. How many light bulbs are there in the image?
  • Model Output: 3
  • Reference: 3
  • Scores: acc = 1.0
  • Tokens: output=57, reasoning=56

Test Params

uv run python -m lmms_eval --model openai_compatible --model_args "model_version=bytedance-seed/seed-1.6-flash" --tasks countbench --batch_size 1 --limit 8 --log_samples

@Luodian Luodian merged commit f927b68 into dev-v0d7 Feb 23, 2026
2 checks passed
@Luodian Luodian deleted the feat/lmm-291-countbench branch February 23, 2026 08:25
Luodian added a commit that referenced this pull request Feb 28, 2026
* feat: add CountBench task config and scoring

* docs: add CountBench to current task catalog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant