Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Video Search with V-Jepa 2 and LanceDB

V-Jepa 2 is a self-supervised video model designed to enhance AI's understanding, prediction, and planning capabilities in real-world environments. The model is initially pre-trained on over one million hours of internet video data using a mask-denoising technique in representation space, demonstrating state-of-the-art performance in video understanding and human action anticipation.

alt text

Subsequently, an action-conditioned variant, V-JEPA 2-AC, is fine-tuned with a limited amount of robot interaction data, enabling zero-shot robotic planning for tasks like object manipulation. The research also highlights V-JEPA 2's effectiveness when integrated with a large language model for video question-answering tasks, achieving strong results.