[Benchmark Backfill] Integrate CountBench into lmms-eval by Luodian · Pull Request #1156 · EvolvingLMMs-Lab/lmms-eval

Luodian · 2026-02-22T12:54:22Z

Summary

Add a new countbench task backed by vikhyatk/CountBenchQA (test split) with the CountBenchQA reference prompt style.
Implement task utilities for image/text conversion and source-aligned count normalization (number words -> numerals) with exact-match accuracy scoring.
Update docs/current_tasks.md to include CountBench in the image task catalog.

Validation

uv run python -m lmms_eval --tasks list (verified countbench is listed)
uv run python -m lmms_eval --model dummy_video_reader --model_args response=2 --tasks countbench --limit 8 --batch_size 1
uv run pre-commit run --files docs/current_tasks.md lmms_eval/tasks/countbench/countbench.yaml lmms_eval/tasks/countbench/utils.py

Notes

YAML LSP reports unresolved !function tags for task YAMLs by default; this matches existing task files in the repo.

Smoke Validation (limit=8)

Status: PASS (LMM-291 / countbench)

Output Table

Metric	Value
acc	0.875

Sample Output

Sample 1 (doc_id: 0)

Input: Look at the image carefully and count the objects. Answer with just a number, without any additional text. How many headsets are there in the image?
Model Output: 10
Reference: 10
Scores: acc = 1.0
Tokens: output=153, reasoning=151

Sample 2 (doc_id: 1)

Input: Look at the image carefully and count the objects. Answer with just a number, without any additional text. How many light bulbs are there in the image?
Model Output: 3
Reference: 3
Scores: acc = 1.0
Tokens: output=57, reasoning=56

Test Params

uv run python -m lmms_eval --model openai_compatible --model_args "model_version=bytedance-seed/seed-1.6-flash" --tasks countbench --batch_size 1 --limit 8 --log_samples

* feat: add CountBench task config and scoring * docs: add CountBench to current task catalog

Luodian added 2 commits February 22, 2026 20:53

feat: add CountBench task config and scoring

39160cd

docs: add CountBench to current task catalog

c8dcab9

Luodian merged commit f927b68 into dev-v0d7 Feb 23, 2026
2 checks passed

Luodian deleted the feat/lmm-291-countbench branch February 23, 2026 08:25

Luodian added a commit that referenced this pull request Feb 28, 2026

[Benchmark Backfill] Integrate CountBench into lmms-eval (#1156)

0b71775

* feat: add CountBench task config and scoring * docs: add CountBench to current task catalog

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Benchmark Backfill] Integrate CountBench into lmms-eval#1156

[Benchmark Backfill] Integrate CountBench into lmms-eval#1156
Luodian merged 2 commits into
dev-v0d7from
feat/lmm-291-countbench

Luodian commented Feb 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Luodian commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Notes

Smoke Validation (limit=8)

Output Table

Sample Output

Test Params

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Luodian commented Feb 22, 2026 •

edited

Loading