[Benchmark Backfill] Integrate Point-Bench into lmms-eval by Luodian · Pull Request #1157 · EvolvingLMMs-Lab/lmms-eval

Luodian · 2026-02-22T12:55:51Z

Summary

Integrate new pointbench task with YAML + utils following existing task conventions.
Load PointArena metadata from data.json, fetch per-sample images via HF datasets-server rows API, and resolve masks from selected_masks.zip.
Add pointbench_acc point-in-mask scoring and register docs mapping in docs/current_tasks.md.

Validation Evidence

/Users/luodian/Github/lmms-eval/.venv/bin/python -m lmms_eval --tasks list -> includes pointbench in available tasks.
/Users/luodian/Github/lmms-eval/.venv/bin/python -m lmms_eval --model dummy_video_reader --model_args response=\"[(500,500)]\",allow_remote=true,fail_on_missing=false --tasks pointbench --limit 8 --batch_size 1 --output_path ./logs/lmm-293-pointbench-smoke -> succeeds with pointbench_acc=0.125.
/Users/luodian/Github/lmms-eval/.venv/bin/python -m unittest discover -s test/eval -p \"test_cli_parse_args.py\" -> OK.

Tracking

Closes [Benchmark Backfill] Integrate Point-Bench into lmms-eval #1142
Linear: LMM-293

Smoke Validation (limit=8)

Status: PASS (LMM-293 / pointbench)

Output Table

Metric	Value
pointbench_acc	0.03125

Sample Output

Sample 1 (doc_id: 0)

Input: Point to the free space between the person in a black shirt and the car. Your answer should be formatted as a list of tuples, i.e. [(x1, y1), (x2, y2), ...], where each tuple contains the x and y coordinates of a point satisfying the conditions above. The coordinates should be integers between 0 and…
Model Output: [LMMS_EVAL_REQUEST_FAILED after 5 retries] Error code: 404 - {'error': {'message': 'No endpoints found for google/gemini-flash-1.5.', 'code': 404}, 'user_id': 'user_2sYkuU3dimruZBqpDO0almnSIBN'}
Reference:
Scores: N/A
Tokens: output=0, reasoning=0

Sample 2 (doc_id: 1)

Input: Point to the tool used for cutting wood. Your answer should be formatted as a list of tuples, i.e. [(x1, y1), (x2, y2), ...], where each tuple contains the x and y coordinates of a point satisfying the conditions above. The coordinates should be integers between 0 and 999, representing the pixel loc…
Model Output: [LMMS_EVAL_REQUEST_FAILED after 5 retries] Error code: 404 - {'error': {'message': 'No endpoints found for google/gemini-flash-1.5.', 'code': 404}, 'user_id': 'user_2sYkuU3dimruZBqpDO0almnSIBN'}
Reference:
Scores: N/A
Tokens: output=0, reasoning=0

Test Params

uv run python -m lmms_eval --model openai_compatible --model_args "model_version=bytedance-seed/seed-1.6-flash" --tasks pointbench --batch_size 1 --limit 8 --log_samples

feat: integrate Point-Bench benchmark task (#1142)

0580cd0

Luodian merged commit 17bf443 into dev-v0d7 Feb 23, 2026
2 checks passed

Luodian deleted the feat/lmm-293-point-bench branch February 23, 2026 08:25

Luodian added a commit that referenced this pull request Feb 28, 2026

feat: integrate Point-Bench benchmark task (#1142) (#1157)

cfc5a25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Benchmark Backfill] Integrate Point-Bench into lmms-eval#1157

[Benchmark Backfill] Integrate Point-Bench into lmms-eval#1157
Luodian merged 1 commit into
dev-v0d7from
feat/lmm-293-point-bench

Luodian commented Feb 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Luodian commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation Evidence

Tracking

Smoke Validation (limit=8)

Output Table

Sample Output

Test Params

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Luodian commented Feb 22, 2026 •

edited

Loading