Skip to content

fix ocrbench v2 scoring regressions#1229

Merged
Luodian merged 1 commit into
EvolvingLMMs-Lab:mainfrom
Luodian:codex/fix-ocrbench-v2-legacy
Mar 7, 2026
Merged

fix ocrbench v2 scoring regressions#1229
Luodian merged 1 commit into
EvolvingLMMs-Lab:mainfrom
Luodian:codex/fix-ocrbench-v2-legacy

Conversation

@Luodian
Copy link
Copy Markdown
Contributor

@Luodian Luodian commented Mar 7, 2026

Summary

  • Fix OCRBench v2 chart parsing so string ground truth is parsed from the answer instead of the prediction.
  • Make OCRBench v2 aggregation stateless across repeated runs in the same Python process.
  • Isolate text spotting temporary artifacts per call and remove fragile OCRBench v2 runtime debug leftovers.

In scope

  • Fix chart parsing en scoring in lmms_eval/tasks/ocrbench_v2/utils.py.
  • Replace module-level OCRBench v2 score accumulation with per-run aggregation buckets.
  • Move spotting metric scratch files to a temporary work directory.
  • Fix scalar-answer fallback paths in lmms_eval/tasks/ocrbench_v2/vqa_metric.py.
  • Add OCRBench v2 regression tests in test/eval/test_ocrbench_v2.py.

Out of scope

  • Changes to non-OCRBench tasks or evaluator-wide scoring behavior.
  • Benchmark prompt changes or dataset updates.

Validation

  • uv run python -m pytest test/eval/test_ocrbench_v2.py -q | sample size: N=5 tests | key metrics: 5 passed | result: pass
  • uv run pre-commit run --all-files | sample size: N=all tracked files | key metrics: black, isort passed | result: pass

Risk / Compatibility

  • Low risk: changes are scoped to OCRBench v2 evaluation helpers and covered by task-specific regression tests.
  • Behavior change is intentional for previously incorrect OCRBench v2 scores and repeated-run aggregation.

Type of Change

  • Bug fix (non-breaking change)
  • New feature
  • New benchmark/task
  • New model integration
  • Breaking change
  • Documentation update
  • Refactoring (no functional changes)

@Luodian Luodian force-pushed the codex/fix-ocrbench-v2-legacy branch from 151e5bb to 0ebf355 Compare March 7, 2026 06:21
@Luodian Luodian changed the title [codex] fix ocrbench v2 scoring regressions fix ocrbench v2 scoring regressions Mar 7, 2026
@Luodian Luodian marked this pull request as ready for review March 7, 2026 06:31
@Luodian Luodian merged commit 15c32bf into EvolvingLMMs-Lab:main Mar 7, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant