vectordb-recipes/examples/v-jepa-video-search/README.md at main · lancedb/vectordb-recipes

Video Search with V-Jepa 2 and LanceDB

V-Jepa 2 is a self-supervised video model designed to enhance AI's understanding, prediction, and planning capabilities in real-world environments. The model is initially pre-trained on over one million hours of internet video data using a mask-denoising technique in representation space, demonstrating state-of-the-art performance in video understanding and human action anticipation.

Subsequently, an action-conditioned variant, V-JEPA 2-AC, is fine-tuned with a limited amount of robot interaction data, enabling zero-shot robotic planning for tasks like object manipulation. The research also highlights V-JEPA 2's effectiveness when integrated with a large language model for video question-answering tasks, achieving strong results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Video Search with V-Jepa 2 and LanceDB

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Video Search with V-Jepa 2 and LanceDB