feat: integrate MINERVA benchmark with Lance-backed video mode and pipeline latency guidance by Luodian · Pull Request #1121 · EvolvingLMMs-Lab/lmms-eval

Luodian · 2026-02-21T06:05:51Z

Summary

add a new minerva task integration with task YAML/template, scoring utilities, and optional Lance-backed video blob resolution
add tools/minerva_to_lance.py to convert local MINERVA metadata + downloaded videos into a Lance dataset (video_blob enabled)
add benchmark tooling for both resolver-only latency (tools/bench_minerva_video_resolution.py) and fixed-pipeline storage comparison (tools/bench_minerva_pipeline_latency.py)
document practical interpretation: with the same decode pipeline, local raw vs Lance is often near-parity on local pre-downloaded disks; Lance value is stronger for remote/object-storage access, reproducibility, and repeated subset workflows

Review Feedback Triage (Copilot + Maintainer)

addressed robustness feedback in lmms_eval/tasks/minerva/utils.py: thread-safe resolver init, blob resource cleanup, safer answer-letter extraction, empty video_id handling, deprecated key warning, and loguru eval_logger consistency
addressed converter feedback in tools/minerva_to_lance.py: explicit empty-metadata validation, missing-file accounting, sample missing ID reporting, and hard failure when no rows are writable
clarified package naming in docs: pylance is the Python package that exposes the lance import module
kept full resolver-index issue closed via on-demand filtered scans (no eager full-table index build)
maintainer suggestion on unified Lance dataset reader/API is tracked as follow-up architecture work; current PR keeps minimal task-scoped resolver for safe merge scope

Validation

uv run python tools/bench_minerva_pipeline_latency.py --local-video-dir data/minerva/videos --lance-uri data/minerva_hf_package/data/train.lance --lance-cache-dir /tmp/minerva_lance_cache_bench_20260221_v3 --limit 100 --batch-size 1 --decode-num-frames 8 --output-root /tmp/minerva_pipeline_latency_20260221_v3
- local: total_mean_ms=529.845, decode_mean_ms=529.535, videos_per_s=1.887
- lance: total_mean_ms=533.630, decode_mean_ms=528.902, videos_per_s=1.874
- ratio: lance/local total_mean=1.007 (about +0.7%)
uv run python tools/bench_minerva_pipeline_latency.py --local-video-dir data/minerva/videos --lance-uri data/minerva_hf_package/data/train.lance --lance-cache-dir /tmp/minerva_lance_cache_probe_1121 --limit 10 --batch-size 1 --decode-num-frames 4 --output-root /tmp/minerva_pipeline_latency_probe_1121
uv run python -m py_compile lmms_eval/tasks/minerva/utils.py tools/minerva_to_lance.py lmms_eval/models/model_utils/load_video.py tools/bench_minerva_pipeline_latency.py tools/bench_minerva_video_resolution.py lmms_eval/models/simple/dummy_video_reader.py
lsp_diagnostics clean for touched Python files

kcz358 · 2026-02-21T07:17:35Z

+Lance mode variables:
+
+```bash
+export MINERVA_LANCE_VIDEO_URI="hf://datasets/lmms-lab-eval/minerva/data/train.lance"
+export MINERVA_LANCE_VIDEO_ID_COLUMN="video_id"
+export MINERVA_LANCE_VIDEO_BLOB_COLUMN="video_blob"
+export MINERVA_LANCE_CACHE_DIR="~/.cache/lmms_eval/minerva_lance_videos"
+```


Looks quite hardcoded right now if want to use lance db. Possibly can have an lance db reader via hf api if we are using hub and we might need a unified dataset format for lancedb type dataset.

good suggestion

…flow

Luodian · 2026-02-21T16:25:54Z

interesting, copilot hallucinates on the package name, it should be pylance instead of lance.

Luodian added 2 commits February 21, 2026 14:05

feat: add minerva task with Lance-backed video loading

b924e02

docs: document MINERVA and Lance mode in v0.7

71864c5

Luodian requested review from Copilot and kcz358 February 21, 2026 06:24

Copilot started reviewing on behalf of Luodian February 21, 2026 06:24 View session

This comment was marked as off-topic.

Sign in to view

kcz358 reviewed Feb 21, 2026

View reviewed changes

Luodian changed the title ~~feat: integrate MINERVA benchmark with Lance-backed video mode~~ feat: integrate MINERVA benchmark with Lance-backed video mode and pipeline latency guidance Feb 21, 2026

Luodian mentioned this pull request Feb 21, 2026

Design unified Lance dataset reader for Hub/local video tasks #1128

Closed

Luodian added 3 commits February 21, 2026 23:43

feat: harden minerva lance integration and standardize follow-up work…

e8b60c3

…flow

fix: apply copilot package-name and typing follow-ups

e5a2fcf

fix: restore pylance package naming for lance runtime

4a31bee

Luodian merged commit 8d34fca into dev-v0d7 Feb 21, 2026
2 checks passed

Luodian deleted the feat/lmm-272-minerva-video-reasoning-benchmark-integration-tracking branch February 23, 2026 08:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: integrate MINERVA benchmark with Lance-backed video mode and pipeline latency guidance#1121

feat: integrate MINERVA benchmark with Lance-backed video mode and pipeline latency guidance#1121
Luodian merged 5 commits into
dev-v0d7from
feat/lmm-272-minerva-video-reasoning-benchmark-integration-tracking

Luodian commented Feb 21, 2026 •

edited

Loading

Uh oh!

This comment was marked as off-topic.

Uh oh!

kcz358 Feb 21, 2026

Uh oh!

Luodian Feb 21, 2026

Uh oh!

Luodian commented Feb 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Luodian commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Review Feedback Triage (Copilot + Maintainer)

Validation

Uh oh!

This comment was marked as off-topic.

Uh oh!

kcz358 Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Luodian Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Luodian commented Feb 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Luodian commented Feb 21, 2026 •

edited

Loading