Skip to content

feat: Add Spatial-DISE benchmark task#1327

Merged
kcz358 merged 5 commits into
EvolvingLMMs-Lab:mainfrom
shinmohuang:codex/add-spatial-dise-benchmark
May 15, 2026
Merged

feat: Add Spatial-DISE benchmark task#1327
kcz358 merged 5 commits into
EvolvingLMMs-Lab:mainfrom
shinmohuang:codex/add-spatial-dise-benchmark

Conversation

@shinmohuang
Copy link
Copy Markdown
Contributor

Summary

  • Add Spatial-DISE (ICLR2026) as a new lmms-eval task for 2D/3D visual-spatial reasoning.
  • Load the official 559-example benchmark split from TACPS-liv/Spatial-DISE.
  • Evaluate A/B/C/D multiple-choice predictions with exact answer-letter extraction.

In scope

  • Adds lmms_eval/tasks/spatial_dise/spatial_dise.yaml.
  • Adds lmms_eval/tasks/spatial_dise/utils.py.
  • Reads Spatial-DISE images from Hugging Face tar shards and maps CSV images/... paths to tar member paths.

Out of scope

  • Does not modify existing lmms-eval task behavior.
  • Does not add or change model integrations.
  • Does not change Spatial-DISE data, labels, splits, or evaluation semantics.

Validation

  • python3 -m py_compile lmms_eval/tasks/spatial_dise/utils.py && git diff --check && black --check lmms_eval/tasks/spatial_dise/utils.py | sample size: N/A | key metrics: syntax/style/whitespace checks | result: pass
  • TaskManager(verbosity='ERROR') discovery check | sample size: N/A | key metrics: spatial_dise registered=True | result: pass
  • python -m lmms_eval eval --model dummy --tasks spatial_dise --limit 1 --batch_size 1 | sample size: N=1/559 | key metrics: spatial_dise_acc=0.0 with dummy model | result: pass

Risk / Compatibility

  • Low risk: this PR only adds a new task under lmms_eval/tasks/spatial_dise.
  • Requires access to the Spatial-DISE Hugging Face dataset tar shards; users may set SPATIAL_DISE_ROOT to reuse a local dataset checkout.

Type of Change

  • Bug fix (non-breaking change)
  • New feature
  • New benchmark/task
  • New model integration
  • Breaking change
  • Documentation update
  • Refactoring (no functional changes)

@shinmohuang shinmohuang changed the title Add Spatial-DISE benchmark task feat: Add Spatial-DISE benchmark task May 9, 2026
@kcz358 kcz358 merged commit bd71e82 into EvolvingLMMs-Lab:main May 15, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants