Skip to content

feat: backfill VisuLogic benchmark integration (LMM-288)#1159

Merged
Luodian merged 2 commits into
dev-v0d7from
feat/lmm-288-visulogic
Feb 23, 2026
Merged

feat: backfill VisuLogic benchmark integration (LMM-288)#1159
Luodian merged 2 commits into
dev-v0d7from
feat/lmm-288-visulogic

Conversation

@Luodian
Copy link
Copy Markdown
Contributor

@Luodian Luodian commented Feb 22, 2026

Summary

  • Added a new visulogic task integration under lmms_eval/tasks/visulogic/ with task YAML + utils.
  • Implemented VisuLogic image loading from VisuLogic/VisuLogic (data.jsonl + images.zip) and multiple-choice answer extraction for A/B/C/D scoring.
  • Updated docs/current_tasks.md to include VisuLogic in the Image Tasks benchmark index.
  • Scope check: confirmed no existing VisuLogic task coverage in this worktree before implementation (glob '**/*visulogic*' and repo grep returned no matches).

Validation

  • uv run python -m lmms_eval --tasks list
    • Result: task list includes visulogic.
  • uv run python -m lmms_eval --model dummy_video_reader --model_args response=A,fail_on_missing=false --tasks visulogic --limit 8 --batch_size 1
    • Result: completed successfully; output metric table includes visulogic_acc = 0.125 for the smoke slice.
  • uv run pre-commit run --all-files
    • Result: black passed, isort passed.
  • uv run python -m unittest discover -s test/eval -p "test_model_registry_v2.py"
    • Result: command completed successfully (Ran 0 tests, OK).

Issue Link

Smoke Validation (limit=8)

Status: PASS (LMM-288 / visulogic)

Output Table

Metric Value
visulogic_acc 0.125

Sample Output

Sample 1 (doc_id: 0)

  • Input: From the four given options, select the most suitable one to fill in the question mark, so that a certain regularity is presented: ↵ ↵ ↵ ↵ A: A ↵ B: B ↵ C: C ↵ D: D ↵ Answer with the option letter only.
  • Model Output: To determine the most suitable option, we analyze the pattern in the given figures: ↵ ↵ ↵ ### Step 1: Identify the pattern in the left set ↵ The left set consists of three figures. The first figure is a square with two intersecting diagonal lines (dividing the square into 4 regions). The second
  • Reference: A
  • Scores: visulogic_acc = 0.0
  • Tokens: output=5177, reasoning=5113

Sample 2 (doc_id: 1)

  • Input: From the four given options, choose the most suitable one to fill in the question mark, so that it presents a certain regularity: ↵ ↵ ↵ A: A ↵ B: B ↵ C: C ↵ D: D ↵ Answer with the option letter only.
  • Model Output: To determine the pattern, we analyze the left and right sets of figures: ↵ ↵ ↵ ### Left Set Analysis ↵ - First figure: Two triangles overlapping, forming a shape with 5 edges. ↵ - Second figure: Two squares intersecting, forming a shape with 8 edges. ↵ - **Third figure
  • Reference: D
  • Scores: visulogic_acc = 0.0
  • Tokens: output=4287, reasoning=4223

Test Params

uv run python -m lmms_eval --model openai_compatible --model_args "model_version=bytedance-seed/seed-1.6-flash" --tasks visulogic --batch_size 1 --limit 8.0 --log_samples

@Luodian Luodian merged commit ff912cc into dev-v0d7 Feb 23, 2026
2 checks passed
@Luodian Luodian deleted the feat/lmm-288-visulogic branch February 23, 2026 08:25
Luodian added a commit that referenced this pull request Feb 28, 2026
* feat: integrate VisuLogic benchmark task (#1137)

* docs: add VisuLogic to current task index
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant