Skip to content

feat: integrate MINERVA benchmark with Lance-backed video mode and pipeline latency guidance#1121

Merged
Luodian merged 5 commits into
dev-v0d7from
feat/lmm-272-minerva-video-reasoning-benchmark-integration-tracking
Feb 21, 2026
Merged

feat: integrate MINERVA benchmark with Lance-backed video mode and pipeline latency guidance#1121
Luodian merged 5 commits into
dev-v0d7from
feat/lmm-272-minerva-video-reasoning-benchmark-integration-tracking

Conversation

@Luodian
Copy link
Copy Markdown
Contributor

@Luodian Luodian commented Feb 21, 2026

Summary

  • add a new minerva task integration with task YAML/template, scoring utilities, and optional Lance-backed video blob resolution
  • add tools/minerva_to_lance.py to convert local MINERVA metadata + downloaded videos into a Lance dataset (video_blob enabled)
  • add benchmark tooling for both resolver-only latency (tools/bench_minerva_video_resolution.py) and fixed-pipeline storage comparison (tools/bench_minerva_pipeline_latency.py)
  • document practical interpretation: with the same decode pipeline, local raw vs Lance is often near-parity on local pre-downloaded disks; Lance value is stronger for remote/object-storage access, reproducibility, and repeated subset workflows

Review Feedback Triage (Copilot + Maintainer)

  • addressed robustness feedback in lmms_eval/tasks/minerva/utils.py: thread-safe resolver init, blob resource cleanup, safer answer-letter extraction, empty video_id handling, deprecated key warning, and loguru eval_logger consistency
  • addressed converter feedback in tools/minerva_to_lance.py: explicit empty-metadata validation, missing-file accounting, sample missing ID reporting, and hard failure when no rows are writable
  • clarified package naming in docs: pylance is the Python package that exposes the lance import module
  • kept full resolver-index issue closed via on-demand filtered scans (no eager full-table index build)
  • maintainer suggestion on unified Lance dataset reader/API is tracked as follow-up architecture work; current PR keeps minimal task-scoped resolver for safe merge scope

Validation

  • uv run python tools/bench_minerva_pipeline_latency.py --local-video-dir data/minerva/videos --lance-uri data/minerva_hf_package/data/train.lance --lance-cache-dir /tmp/minerva_lance_cache_bench_20260221_v3 --limit 100 --batch-size 1 --decode-num-frames 8 --output-root /tmp/minerva_pipeline_latency_20260221_v3
    • local: total_mean_ms=529.845, decode_mean_ms=529.535, videos_per_s=1.887
    • lance: total_mean_ms=533.630, decode_mean_ms=528.902, videos_per_s=1.874
    • ratio: lance/local total_mean=1.007 (about +0.7%)
  • uv run python tools/bench_minerva_pipeline_latency.py --local-video-dir data/minerva/videos --lance-uri data/minerva_hf_package/data/train.lance --lance-cache-dir /tmp/minerva_lance_cache_probe_1121 --limit 10 --batch-size 1 --decode-num-frames 4 --output-root /tmp/minerva_pipeline_latency_probe_1121
  • uv run python -m py_compile lmms_eval/tasks/minerva/utils.py tools/minerva_to_lance.py lmms_eval/models/model_utils/load_video.py tools/bench_minerva_pipeline_latency.py tools/bench_minerva_video_resolution.py lmms_eval/models/simple/dummy_video_reader.py
  • lsp_diagnostics clean for touched Python files

This comment was marked as off-topic.

Comment thread docs/lmms-eval-0.7.md
Comment on lines +548 to +555
Lance mode variables:

```bash
export MINERVA_LANCE_VIDEO_URI="hf://datasets/lmms-lab-eval/minerva/data/train.lance"
export MINERVA_LANCE_VIDEO_ID_COLUMN="video_id"
export MINERVA_LANCE_VIDEO_BLOB_COLUMN="video_blob"
export MINERVA_LANCE_CACHE_DIR="~/.cache/lmms_eval/minerva_lance_videos"
```
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks quite hardcoded right now if want to use lance db. Possibly can have an lance db reader via hf api if we are using hub and we might need a unified dataset format for lancedb type dataset.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion

@Luodian Luodian changed the title feat: integrate MINERVA benchmark with Lance-backed video mode feat: integrate MINERVA benchmark with Lance-backed video mode and pipeline latency guidance Feb 21, 2026
@Luodian
Copy link
Copy Markdown
Contributor Author

Luodian commented Feb 21, 2026

interesting, copilot hallucinates on the package name, it should be pylance instead of lance.

@Luodian Luodian merged commit 8d34fca into dev-v0d7 Feb 21, 2026
2 checks passed
@Luodian Luodian deleted the feat/lmm-272-minerva-video-reasoning-benchmark-integration-tracking branch February 23, 2026 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants