Skip to content

[Benchmark Backfill] Integrate EgoTempo into lmms-eval#1155

Merged
Luodian merged 1 commit into
dev-v0d7from
feat/lmm-290-egotempo
Feb 23, 2026
Merged

[Benchmark Backfill] Integrate EgoTempo into lmms-eval#1155
Luodian merged 1 commit into
dev-v0d7from
feat/lmm-290-egotempo

Conversation

@Luodian
Copy link
Copy Markdown
Contributor

@Luodian Luodian commented Feb 22, 2026

Summary

  • Add egotempo benchmark task integration with YAML config and utility functions.
  • Add optional local clip path resolution for EgoTempo videos via EGOTEMPO_VIDEO_DIR and EGOTEMPO_CACHE_DIR.
  • Update video benchmark documentation list in docs/current_tasks.md.

Validation

  • uv run pre-commit run --files docs/current_tasks.md lmms_eval/tasks/egotempo/egotempo.yaml lmms_eval/tasks/egotempo/utils.py
  • uv run python -m lmms_eval --tasks list (includes egotempo)
  • uv run python -m lmms_eval --model dummy_video_reader --model_args response=A,fail_on_missing=false --tasks egotempo --batch_size 1 --limit 2
  • uv run python -m lmms_eval --model openai --model_args model_version=bytedance-seed/seed-1.6-flash,api_key=$OPENROUTER_API_KEY,base_url=https://openrouter.ai/api/v1 --tasks egotempo --batch_size 1 --limit 1

Smoke Validation (limit=8)

Status: PASS (LMM-290 / egotempo)

Output Table

Metric Value
egotempo_anls 0.09375
egotempo_anls_pct 9.375

Sample Output

Sample 1 (doc_id: 0)

  • Input: Which object does the person pick up after taking the bowl from the cupboard? ↵ Answer with a short phrase.
  • Model Output: Spoon.
  • Reference: A spoon.
  • Scores: egotempo_anls = 0.75 (question_type: action-specific object)
  • Tokens: output=242, reasoning=239

Sample 2 (doc_id: 1)

  • Input: What does the person pick up before rubbing their hands together? ↵ Answer with a short phrase.
  • Model Output: Soap.
  • Reference: The oil remover spray.
  • Scores: egotempo_anls = 0.0 (question_type: action-specific object)
  • Tokens: output=477, reasoning=474

Test Params

uv run python -m lmms_eval --model openai_compatible --model_args "model_version=bytedance-seed/seed-1.6-flash" --tasks egotempo --batch_size 1 --limit 8.0 --log_samples

@Luodian Luodian merged commit 5d84268 into dev-v0d7 Feb 23, 2026
2 checks passed
@Luodian Luodian deleted the feat/lmm-290-egotempo branch February 23, 2026 08:25
Luodian added a commit that referenced this pull request Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant