[Task] Report sub category score for 3DSRBench and Viewspatial by oscarqjh · Pull Request #1285 · EvolvingLMMs-Lab/lmms-eval

oscarqjh · 2026-04-06T09:41:37Z

Updated 3DSRBench and Viewspatial bench to report sub category metric scores

test run (3DSRBench):

test run (Viewspatial):

…metric scores

oscarqjh · 2026-04-06T09:50:11Z

@PeterWangyi @kcz358

oscarqjh · 2026-04-06T10:43:56Z

Added sub category metrics for Embspatial as well:

oscarqjh · 2026-04-07T03:14:51Z

Also added vsibench_debiased by frames:

oscarqjh · 2026-04-07T04:50:39Z

Added sub category metrics for Sparbench as well:

Per review feedback (ref: PR #1285 pattern): - Report relevance/logic group-type scores separately - Report per-level (1/2/3) scores separately - Refactor aggregate logic into _compute_all_subscores helper - process_results returns same entry under all 6 metric keys - Detailed second_head/third_head breakdowns still logged

* feat(videomme_v2): add task config and default template - Dataset: MME-Benchmarks/Video-MME-v2 (800 videos, 3200 questions) - 8-option MCQ (A-H) with grouped non-linear scoring - Generation config: max_new_tokens=64, temperature=0 * feat(videomme_v2): add scoring, prompts, and evaluation logic - Grouped non-linear scoring: relevance (quadratic) + logic (chain-based) - 3 group structures: [1,2,3,4], [1,[2,3],4], [[1,2],3,4] - Answer extraction with 11 prefix patterns (A-H range) - Per-level, per-category, per-group-type breakdown reporting - Prompt aligned with official INSTRUCT_PROMPT - Verified against VLMEvalKit implementation * feat(videomme_v2): add subtitle variant (concatenated mode) - Load word-level JSONL subtitles and prepend to prompt - Graceful fallback when subtitle file is missing - Task: videomme_v2_w_subtitle * feat(videomme_v2): add reasoning mode variant - Chain-of-thought prompt requiring Final Answer: <letter> format - max_new_tokens=4096 for reasoning space - Task: videomme_v2_reasoning * fix(videomme_v2): report sub-category scores as separate metrics Per review feedback (ref: PR #1285 pattern): - Report relevance/logic group-type scores separately - Report per-level (1/2/3) scores separately - Refactor aggregate logic into _compute_all_subscores helper - process_results returns same entry under all 6 metric keys - Detailed second_head/third_head breakdowns still logged --------- Co-authored-by: mwxely <mwxely@users.noreply.github.com>

oscarqjh added 2 commits April 6, 2026 17:39

feat: updated 3dsrbench and viewspatial bench to report sub category …

30c6a14

…metric scores

style: black reformat

b3e9261

oscarqjh and others added 3 commits April 6, 2026 18:01

style: pre commit

12a4e4e

fix: remove redundant imports of datasets across multiple files

3dd50de

feat: updated embspatial to report sub category metrics

782ab79

feat: added vsidebiased multi image variant

5f26efa

feat: updated sparbench to report sub category metrics

c1b2bbe

kcz358 approved these changes Apr 7, 2026

View reviewed changes

kcz358 merged commit 20e1c96 into EvolvingLMMs-Lab:main Apr 7, 2026
3 checks passed

kcz358 mentioned this pull request Apr 9, 2026

feat: add Video-MME-v2 benchmark task #1289

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Task] Report sub category score for 3DSRBench and Viewspatial#1285

[Task] Report sub category score for 3DSRBench and Viewspatial#1285
kcz358 merged 7 commits into
EvolvingLMMs-Lab:mainfrom
oscarqjh:sub-metrics-update

oscarqjh commented Apr 6, 2026

Uh oh!

oscarqjh commented Apr 6, 2026

Uh oh!

oscarqjh commented Apr 6, 2026

Uh oh!

oscarqjh commented Apr 7, 2026

Uh oh!

oscarqjh commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

oscarqjh commented Apr 6, 2026

Uh oh!

oscarqjh commented Apr 6, 2026

Uh oh!

oscarqjh commented Apr 6, 2026

Uh oh!

oscarqjh commented Apr 7, 2026

Uh oh!

oscarqjh commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants