Skip to content

Design unified Lance dataset reader for Hub/local video tasks #1128

@Luodian

Description

@Luodian

Context

Problem

  • Current MINERVA integration uses task-scoped resolver wiring via env vars and assumes task-specific schema conventions.
  • This works for immediate integration but does not provide a reusable abstraction for future video tasks that want Lance-backed loading.

Goal

  • Define a reusable Lance video reader abstraction that supports both local and HF Hub Lance URIs with clear schema contracts and fallback behavior.

Proposed plan

  1. Introduce a generic Lance video reader utility under shared task/model utils.
  2. Define and document a unified Lance video schema contract (required columns, optional metadata, blob handling, row id/address strategy).
  3. Support pluggable source resolution (local path, hf:// URI) and cache strategy.
  4. Add adapter hooks so tasks can opt in without duplicating resolver logic.
  5. Add smoke tests covering local Lance + hf:// Lance read paths.

Acceptance criteria

  • At least one existing task (MINERVA) migrated to the shared reader with no regression.
  • Shared docs for Lance dataset format + runtime env/config knobs.
  • End-to-end smoke validation for local and Hub-backed Lance inputs.

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions