Skip to content

feat: Add new VideoFile FileType #5441

@everettVT

Description

@everettVT

Is your feature request related to a problem?

Given the discussion from #5054 there is a near term opportunity to add valuable video processing capabilities by extending the daft.File DataType.

In conjunction with a daft VideoFile there would also be a daft AudioFile.

As mentioned in the discussion a video type would need to support methods for:

  1. Reading Metadata (Including width, height, fps, frame_count, and time_base)
  2. Extracting Keyframes
  3. Reading image frames (image frames + seeking)
  4. reading audio frames (fixed duration + seeking)

This is intended to support several of the use cases outlined in the discussion to streamline both image, audio, and video ai preprocessing for inference/training.

A Few extra notes. The standard representation of an Image DataType in Daft materialized as a numpy array in a UDF. This appears to be the standard format for performing inference on open source audio and video models as well, while closed-source/proprietary inference providers prefer http or base64 data urls. File references or tempfile URI references are also sufficient for lots of transcription use cases but from what I've seen most workloads would appreciate intelligent numpy conversion.

Finally the AudioFile and VideoFile Types help set the stage for daft to more natively support audio and video ai workloads. Working with Files in this manner will also help to feed development in the wide world of documents like PDFs, HTML, Docx, PPT, and so on.

Describe the solution you'd like

Use PyAv for extracting metadata, keyframes, image frames, and audio frames seperately -> Should Materialize to metadata enriched numpy arrays or should at least be accompanied by metadata, keyframe info, for downstream inference/training.

Use Soundfile for extracting audio from audio files with resampling and seek support.

Last Note on Audio -> We should be able to write audio back to a new audio file.

Describe alternatives you've considered

A native DataType implementation has been considered, but since working with Audio and Video files is fundamentally different workload (data tends to be processed in a streaming fashion), a file based approach makes more sense.

Additional Context

@universalmind303 @stayrascal @malcolmgreaves @jaychia

Would you like to implement a fix?

No

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions