feat: integrate Neptune long-video benchmark tasks by Luodian · Pull Request #1187 · EvolvingLMMs-Lab/lmms-eval

Luodian · 2026-02-22T16:59:55Z

Summary

Add Neptune long-video benchmark task family (neptune_full_*, neptune_mma_*, neptune_mmh_*) with YAML configs, task utilities, and unit coverage.
Update task catalog (docs/current_tasks.md) and add a Neptune task README documenting hydration notes and the two currently unavailable full-split videos.
Cap OpenAI-compatible chat video frame ingestion by honoring max_frames_num in message conversion to avoid oversized video payloads.

Validation

uv run python -m unittest test/eval/test_neptune_task.py (pass)
uv run pre-commit run --files <Neptune-related files> (black/isort pass)
Smoke command:
uv run python -m lmms_eval --model openai_compatible --model_args "model_version=google/gemini-3-flash-preview,max_frames_num=2,num_concurrent=1,adaptive_concurrency=false,max_retries=1,retry_backoff_s=0.1" --tasks neptune_mma_v --batch_size 1 --limit 8 --log_samples --verbosity INFO

Output Table

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
neptune_mma_v	Yaml	none	0	neptune_acc	↑	0.75	±	N/A

Throughput Summary

Metric	Value	Unit
total_gen_tokens	8.0000	tokens
total_elapsed_time	19.8799	seconds
avg_speed	0.4024	tokens/s

…videos

* LMM-271: [P0][Benchmark] Neptune long-video benchmark integration... * fix(neptune): cap chat video frame loading and document missing full videos * style: auto-fix lint (black + isort) --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Luodian added 2 commits February 23, 2026 00:57

LMM-271: [P0][Benchmark] Neptune long-video benchmark integration...

2584a57

fix(neptune): cap chat video frame loading and document missing full …

85be460

…videos

Luodian mentioned this pull request Feb 22, 2026

[LMM-271] [P0][Benchmark] Neptune long-video benchmark integration tracking #1127

Closed

style: auto-fix lint (black + isort)

b02c816

Luodian merged commit ecbed1c into dev-v0d7 Feb 23, 2026

Luodian deleted the feat/neptune-long-video-v0d7 branch February 23, 2026 08:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: integrate Neptune long-video benchmark tasks#1187

feat: integrate Neptune long-video benchmark tasks#1187
Luodian merged 3 commits into
dev-v0d7from
feat/neptune-long-video-v0d7

Luodian commented Feb 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Luodian commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Output Table

Throughput Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Luodian commented Feb 22, 2026 •

edited

Loading