Skip to content

MohammadrezaChv/Contrastive_Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contrastive Learning with Vision Transformers and CLIP

Python PyTorch License

Overview

This repository demonstrates contrastive learning for visual-text representation using Vision Transformers (ViT) and OpenAI's CLIP. Contrastive learning is a self-supervised approach that learns representations by bringing similar pairs (image-text) closer in embedding space while pushing dissimilar pairs apart.

The repository contains three main modules:

  1. ViT – Implementation and fine-tuning of Vision Transformer models for image representation.

  2. CLIP – Zero-shot image-text modeling using OpenAI's CLIP, along with preprocessing and embedding extraction.

  3. Interacting with CLIP – Utilities for text-to-image retrieval, similarity computation, and visualization.

    Note: This module is adapted from OpenAI’s CLIP GitHub repository.


Repository Structure

Contrastive-Learning/
│
├── ViT.ipynb                   # Vision Transformer model and fine-tuning examples
├── CLIP.ipynb                  # Using CLIP for encoding and zero-shot classification
├── Interacting_with_CLIP.ipynb # Image-text retrieval, similarity scoring, and visualization (adapted from OpenAI)
├── README.md                   # Project documentation


Features

  • Vision Transformer (ViT)

    • Load pre-trained ViT models (ViT-B/32) for image classification.
    • Fine-tune the last transformer blocks using contrastive loss.
    • Freeze and unfreeze layers selectively for efficient training.
  • CLIP

    • Encode images and text into a shared embedding space.
    • Perform zero-shot classification using text prompts.
    • Compute cosine similarity between images and text for retrieval tasks.
  • Interacting with CLIP

    • Text-to-image search with real or simulated datasets.
    • Visualization of images along with predicted labels.
    • Utilities for working in Google Colab or local environments.

    Adapted from OpenAI CLIP GitHub.


Installation

  1. Clone the repository:
git clone https://github.com/yourusername/contrastive-learning.git
cd contrastive-learning
  1. Create and activate a Python environment (optional but recommended):
python -m venv venv
source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows
  1. Install dependencies:
pip install -r requirements.txt

Recommended Packages in requirements.txt:

torch>=2.0
torchvision
timm
clip @ git+https://github.com/openai/CLIP.git
scikit-image
matplotlib
Pillow
numpy
requests

Usage

1. ViT.ipynb

  • Load a pre-trained ViT model.
  • Freeze early layers and fine-tune selected transformer blocks.
  • Compute image embeddings for contrastive learning.

2. CLIP.ipynb

  • Load CLIP model and preprocess images and text.
  • Perform zero-shot classification or feature extraction.
  • Normalize embeddings and compute cosine similarity.

3. Interacting_with_CLIP.ipynb

  • Text-to-image retrieval example with sample datasets.
  • Visualize results with matplotlib.
  • Test fine-tuned CLIP models with custom image-text pairs.

Adapted from OpenAI’s official CLIP repository.


Example: Zero-Shot Image Classification

import clip
import torch
from PIL import Image

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

image = preprocess(Image.open("cat.jpg")).unsqueeze(0).to(device)
text = clip.tokenize(["a cat", "a dog"]).to(device)

with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)
    similarity = (image_features @ text_features.T).squeeze(0)
    predicted_class = similarity.argmax().item()

print("Predicted class:", ["a cat", "a dog"][predicted_class])

Contribution

Contributions are welcome!

  • Fork the repository
  • Create a new branch for your feature or bugfix
  • Submit a pull request with a clear description

License

This project is licensed under the MIT License – see the LICENSE file for details.


References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published