GitHub - paulsung97/Customized_CLIP_Model: This GitHub repository provides an architecture for extracting images from text utilizing the CLIP model. It includes an overview of the CLIP model and descriptions of custom architectures along with how-to's.

Architecture for Extracting Text-to-Image using CLIP Model

1. What is CLIP?

CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3. We found CLIP matches the performance of the original ResNet50 on ImageNet "zero-shot" without using any of the original 1.28M labeled examples, overcoming several major challenges in computer vision.

2. Custom Architecture

This architecture utilizes the VIT-B 32 model from the CLIP model to extract text from videos and multiple images. When provided with input text, it retrieves the most similar image corresponding to that text.

3. How to Use?

Clone the repository:

$ git clone https://github.com/paulsung97/CLIP_model.git

Install the required dependencies:
```
$ pip install -r requirement.txt
```
For videos, rename the file to video.mp4 and place it in the video directory.
Run the script for video search:
```
$ python video_search.py
```
Enter the text, and the script will extract the most similar image from the video content and save it to the output file.
Run the script for image search:
```
$ python image_search.py
```
The script will extract the most similar image from the images directory based on the input text and save it to the output file.
Run the script for cosine similarity:
```
$ python cos.py
```
The script provides a visual representation of the cosine similarity of images and text in the output file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Architecture for Extracting Text-to-Image using CLIP Model

1. What is CLIP?

2. Custom Architecture

3. How to Use?

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
images		images
output		output
videos		videos
README.md		README.md
cos.py		cos.py
image_search.py		image_search.py
requirement.txt		requirement.txt
video_search.py		video_search.py

paulsung97/Customized_CLIP_Model

Folders and files

Latest commit

History

Repository files navigation

Architecture for Extracting Text-to-Image using CLIP Model

1. What is CLIP?

2. Custom Architecture

3. How to Use?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages