Find key moments in an video using natural language queries - powered by AI and running fully local in your browser via transformers.js
.
Go to the prebuilt GitHub Page: tobidi0410.github.io/ai-video-search/
Make sure you are running the latest Chrome Browser or any other browser with WebGPU support!
This app utilizes the transformers.js
library for running CLIP models directly in your browser.
Utilizing the Xenova/clip-vit-large-patch14-336
model, we can generate both visual and textual embeddings.
We generate embeddings for each frame in the video as well as for the text query.
After that we can simply compare the frame embeddings to the query embedding (cosineSimilarity) and calculate a match score.
- Xenova/clip-vit-large-patch14-336 - by Xenova
- transformers.js β by Huggingface β Apache 2.0
- DaisyUI β by saadeghi β MIT
- Tailwind CSS β by Tailwind Labs β MIT
- Material Design Icons β by Google β Apache 2.0
- parcel β by Parcel-Bundler β MIT