Skip to content

feat: integrate mtvqa benchmark task#1167

Merged
Luodian merged 1 commit into
dev-v0d7from
feat/lmm-296-mtvqa
Feb 23, 2026
Merged

feat: integrate mtvqa benchmark task#1167
Luodian merged 1 commit into
dev-v0d7from
feat/lmm-296-mtvqa

Conversation

@Luodian
Copy link
Copy Markdown
Contributor

@Luodian Luodian commented Feb 22, 2026

Summary

  • add a new mtvqa task using the official ByteDance/MTVQA dataset and prompt format aligned with MTVQA evaluation spec
  • implement MTVQA-specific processing in utils.py to flatten qa_pairs, prepare image-text inputs, and score predictions with the official answer in prediction normalization
  • update docs/current_tasks.md to include MTVQA in the supported benchmark list

Validation

  • uv run pre-commit run --all-files
  • uv run python -m lmms_eval --tasks list (verified mtvqa appears)
  • uv run python -m lmms_eval --model dummy_video_reader --model_args response=placeholder --tasks mtvqa --limit 8 --batch_size 1

Smoke Validation (limit=8)

Status: PASS (LMM-296 / mtvqa)

Output Table

Metric Value
mtvqa_score 12.5

Sample Output

Sample 1 (doc_id: 0)

  • Input: ما هي أهمية قناة السويس؟ ↵ Answer the question using a word or phrase in the language of the question.
  • Model Output: اتصال بين المحيطات المتوسط والبحر الأحمر
  • Reference: قناة السويس هي واحدة من أهم الممرات المائية في العالم، فهي تربط بين البحر الأبيض المتوسط والبحر الأحمر، مما يسهل التجارة بين أوروبا وآسيا.
  • Scores: mtvqa_score = 0.0 (category: AR)
  • Tokens: output=1202, reasoning=1188

Sample 2 (doc_id: 1)

  • Input: ما هو موضوع النص؟ ↵ Answer the question using a word or phrase in the language of the question.
  • Model Output: اثنان
  • Reference: الحب و الكراهية
  • Scores: mtvqa_score = 0.0 (category: AR)
  • Tokens: output=146, reasoning=144

Test Params

uv run python -m lmms_eval --model openai_compatible --model_args "model_version=bytedance-seed/seed-1.6-flash" --tasks mtvqa --batch_size 1 --limit 8.0 --log_samples

@Luodian Luodian merged commit 71dd188 into dev-v0d7 Feb 23, 2026
2 checks passed
@Luodian Luodian deleted the feat/lmm-296-mtvqa branch February 23, 2026 08:25
Luodian added a commit that referenced this pull request Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant