Skip to content

[Benchmark Backfill] Integrate FSC-147 into lmms-eval#1163

Merged
Luodian merged 1 commit into
dev-v0d7from
feat/lmm-292-fsc-147
Feb 23, 2026
Merged

[Benchmark Backfill] Integrate FSC-147 into lmms-eval#1163
Luodian merged 1 commit into
dev-v0d7from
feat/lmm-292-fsc-147

Conversation

@Luodian
Copy link
Copy Markdown
Contributor

@Luodian Luodian commented Feb 22, 2026

Summary

  • Add a new fsc147 benchmark task with YAML wiring and task utilities under lmms_eval/tasks/fsc147/.
  • Implement FSC-147 prompt construction and counting metrics (fsc147_exact_match, fsc147_mae) using the FSC147-derived Hugging Face dataset yifehuang97/CoCount-train-fsc147.
  • Register the benchmark in docs by adding FSC-147 to docs/current_tasks.md.

Validation

  • uv run python -m lmms_eval --tasks list (confirmed fsc147 appears in available tasks)
  • uv run python -m lmms_eval --model dummy_video_reader --model_args response=0,fail_on_missing=false --tasks fsc147 --limit 8 --batch_size 1 --verbosity INFO (run succeeded; metrics emitted for fsc147_exact_match and fsc147_mae)
  • lsp_diagnostics clean for lmms_eval/tasks/fsc147/utils.py

Linked Issue

Smoke Validation (limit=8)

Status: PASS (LMM-292 / fsc147)

Output Table

Metric Value
fsc147_exact_match 0.25
fsc147_mae 1.875

Sample Output

Sample 1 (doc_id: 0)

  • Input: How many peppers are there in the image? ↵ Answer with only an integer.
  • Model Output: 12
  • Reference: 13
  • Scores: fsc147_exact_match = 0.0 · fsc147_mae = 1.0
  • Tokens: output=1316, reasoning=1314

Sample 2 (doc_id: 1)

  • Input: How many bread rolls are there in the image? ↵ Answer with only an integer.
  • Model Output: 7
  • Reference: 8
  • Scores: fsc147_exact_match = 0.0 · fsc147_mae = 1.0
  • Tokens: output=678, reasoning=677

Test Params

uv run python -m lmms_eval --model openai_compatible --model_args "model_version=bytedance-seed/seed-1.6-flash" --tasks fsc147 --batch_size 1 --limit 8.0 --log_samples

@Luodian Luodian merged commit e8cde29 into dev-v0d7 Feb 23, 2026
2 checks passed
@Luodian Luodian deleted the feat/lmm-292-fsc-147 branch February 23, 2026 08:25
Luodian added a commit that referenced this pull request Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant