Skip to content

feat: add VPCT benchmark task#1183

Merged
Luodian merged 1 commit into
dev-v0d7from
feat/lmm-299-vpct
Feb 23, 2026
Merged

feat: add VPCT benchmark task#1183
Luodian merged 1 commit into
dev-v0d7from
feat/lmm-299-vpct

Conversation

@Luodian
Copy link
Copy Markdown
Contributor

@Luodian Luodian commented Feb 22, 2026

Summary

  • Add a new auto-discovered vpct benchmark task under lmms_eval/tasks/vpct/.
  • Implement VPCT prompt + image loading + answer parsing with vpct_accuracy and vpct_answered_rate metrics.
  • Load public VPCT assets from camelCase12/vpct-1 with token: false/token=False to avoid expired local HF token issues.

Validation

  • HF_TOKEN= uv run python -m lmms_eval --model dummy_video_reader --model_args \"response=answer(2)\" --tasks vpct --limit 8 --batch_size 1 --log_samples --output_path ./outputs/vpct_smoke
  • Score table from smoke run:
    • vpct_accuracy: 0.375
    • vpct_answered_rate: 1.000
  • JSONL sample log: outputs/vpct_smoke/20260222_225543_samples_vpct.jsonl
    • Verified non-empty model outputs, e.g. filtered_resps: \"answer(2)\".

Smoke Validation (limit=8)

Status: PASS (LMM-299 / vpct)

Output Table

Metric Value
vpct_accuracy 0.125
vpct_answered_rate 0.750

Sample Output

Sample 1 (doc_id: 0)

  • Input: You are an expert physics simulator. Looking at this image of a ball-and-bucket simulation, predict which bucket (numbered 1, 2, or 3 from left to right) the ball will eventually fall into. ↵ Respond with answer(X), where X is 1, 2, or 3.
  • Model Output: This is a classic physics simulation problem, often related to projectile motion or, in this simplified 2D representation, the trajectory of a falling object influenced by gravity
  • Reference: 3
  • Scores: N/A
  • Tokens: output=31, reasoning=0

Sample 2 (doc_id: 1)

  • Input: You are an expert physics simulator. Looking at this image of a ball-and-bucket simulation, predict which bucket (numbered 1, 2, or 3 from left to right) the ball will eventually fall into. ↵ Respond with answer(X), where X is 1, 2, or 3.
  • Model Output: This is a classic physics simulation problem, often related to projectile motion or, in this simplified 2D representation, the trajectory of a falling object under gravity.
  • Reference: 1
  • Scores: N/A
  • Tokens: output=31, reasoning=0

Test Params

uv run python -m lmms_eval --model openai --model_args "model_version=google/gemini-2.5-flash-lite-preview-09-2025" --tasks vpct --batch_size 1 --limit 8 --log_samples

@Luodian Luodian merged commit b35c140 into dev-v0d7 Feb 23, 2026
2 checks passed
@Luodian Luodian deleted the feat/lmm-299-vpct branch February 23, 2026 08:25
Luodian added a commit that referenced this pull request Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant