GitHub - ErikssonWilliam/Finetuned-Open-CLIP: Semi-supervised finetuning of OpenAI CLIP using Contrastive Learning, Vision Transformers & Transformers. Enabled Image-to-Text and Text-to-Image retrival of Google Earth photos to relevant prompts.


Image Credit: https://github.com/openai/CLIP

1. Contrastive Pre-training

This is the training phase. The model is fed a vast dataset of paired images and text descriptions from the internet. The goal is not to predict the text from the image, but to learn which image-text pairs are correct.

An Image Encoder processes an image (e.g., "a photo of a dog") and converts it into a numerical representation called an image embedding. A Text Encoder does the same for a text description, creating a text embedding

The model's objective is to adjust the encoders so that the embeddings for a matching image-text pair, have a high similarity score, while the embeddings for incorrect pairs have a low similarity score. This is a contrastive learning approach.

2. Zero-shot Classification

After pre-training, the model can perform a task it was never explicitly trained for—zero-shot classification—without any fine-tuning.

A list of potential labels (e.g., "plane", "car", "dog", "bird") is turned into sentences like "A photo of a {object}."

The Text Encoder creates an embedding for each of these sentences.

A new, unseen image is given to the Image Encoder to get its embedding.

The model calculates the similarity score between the new image embedding and each of the label-text embeddings.

The highest similarity score indicates the most likely correct label. For instance, in the example shown, the similarity score between the image of the dog and the text "A photo of a dog." is the highest, and the model correctly predicts "dog".

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
TDDE70_Project_Report.pdf		TDDE70_Project_Report.pdf
zero_shot_classification.ipynb		zero_shot_classification.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

1. Contrastive Pre-training

2. Zero-shot Classification

About

Uh oh!

Releases

Packages

Languages

ErikssonWilliam/Finetuned-Open-CLIP

Folders and files

Latest commit

History

Repository files navigation

1. Contrastive Pre-training

2. Zero-shot Classification

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages