Skip to content

This project focuses on the implementation of inverting I-JEAP, a new architecture designed to simulate human intelligence through self-supervised learning. Our goal is to invert the embeddings to demonstrate that such architectures can be vulnerable to inversion attacks

License

Notifications You must be signed in to change notification settings

aymen-000/i-jeap-emdedding-inversion-attack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IJePA Image Reconstruction

This project implements an image reconstruction system using IJePA (Image-based Joint-Embedding Predictive Architecture) embeddings. The model learns to reconstruct images from their IJePA feature representations using a convolutional decoder network.

Features

  • IJePA-based feature extraction using pretrained facebook/ijepa_vith14_1k model
  • Convolutional decoder with residual blocks for high-quality reconstruction
  • Multi-component loss function (MSE, SSIM, LPIPS, TV, Cosine)
  • Comprehensive evaluation metrics
  • Visualization tools for reconstruction quality assessment
  • Support for CIFAR-10 and Tiny ImageNet datasets

Project Structure

ijepa-reconstruction/
├── src/
│   ├── models/          # Model architectures
│   ├── data/            # Data loading and preprocessing
│   ├── utils/           # Utilities for metrics and visualization
│   └── training/        # Training logic
├── scripts/             # Executable scripts
├── configs/             # Configuration files
├── cache/               # Cached embeddings (gitignored)
├── checkpoints/         # Model checkpoints (gitignored)
└── results/             # Output results (gitignored)

Installation

# Clone the repository
git clone  https://github.com/aymen-000/i-jeap-emdedding-inversion-attack
cd i-jeap-emdedding-inversion-attack

# Install dependencies
pip install -r requirements.txt

Usage

1. Precompute IJePA Embeddings

First, compute and cache the IJePA embeddings for your dataset (if you want to accelerate the training ):

# For CIFAR-10
python scripts/compute_embeddings.py \
    --dataset cifar10 \
    --train_subset 6000 \
    --test_subset 2000 \
    --batch_size 8 \
    --cache_dir ./cache

# For Tiny ImageNet
python scripts/compute_embeddings.py \
    --dataset tiny_imagenet \
    --data_root /path/to/tiny-imagenet-200/train \
    --train_subset 6000 \
    --test_subset 2000 \
    --batch_size 8 \
    --cache_dir ./cache

2. Train the Reconstruction Model

Train the decoder to reconstruct images from embeddings:

python scripts/train.py \
    --dataset cifar10 \
    --train_subset 6000 \
    --test_subset 2000 \
    --batch_size 32 \
    --epochs 100 \
    --checkpoint_dir ./checkpoints \
    --output_dir ./results

3. Evaluate the Model

Compute reconstruction metrics on the test set:

python scripts/evaluate.py \
    --dataset cifar10 \
    --checkpoint ./checkpoints/cifar10_model_final.pth \
    --embeddings_file ./cache/cifar10_test_embeddings.pt \
    --output_csv ./results/metrics.csv \
    --test_subset 2000 \
    --train_size 6000 \
    --num_epochs 100

4. Visualize Results

Generate side-by-side comparisons of original and reconstructed images:

python scripts/visualize.py \
    --dataset cifar10 \
    --checkpoint ./checkpoints/cifar10_model_final.pth \
    --embeddings_file ./cache/cifar10_test_embeddings.pt \
    --num_samples 10 \
    --output_dir ./results

Model Architecture

Decoder Architecture

The decoder uses a flexible progressive upsampling architecture that supports arbitrary output resolutions:

Input Embeddings (B, 1280, 16, 16)
          |
          v
    ┌─────────────────────┐
    │  Upsampling Block 1 │
    │  ConvTranspose2d    │
    │  1280 → 640 ch      │
    │  16×16 → 32×32      │
    └──────────┬──────────┘
               |
               v
    ┌─────────────────────┐
    │  Residual Block     │
    │  Conv → BN → GELU   │
    │  Conv → BN          │
    │  + Skip Connection  │
    └──────────┬──────────┘
               |
               v
    ┌─────────────────────┐
    │  Upsampling Block 2 │
    │  ConvTranspose2d    │
    │  640 → 320 ch       │
    │  32×32 → 64×64      │
    └──────────┬──────────┘
               |
               v
    ┌─────────────────────┐
    │  Residual Block     │
    └──────────┬──────────┘
               |
               v
         (Continue for
      log₂(output_size/16)
          iterations)
               |
               v
    ┌─────────────────────┐
    │  Final Conv Layer   │
    │  C → 3 channels     │
    │  3×3 kernel         │
    └──────────┬──────────┘
               |
               v
    ┌─────────────────────┐
    │ Sigmoid Activation  │
    └──────────┬──────────┘
               |
               v
Reconstructed Image (B, 3, output_size, output_size)

Key Features:

  • Dynamic Resolution: Automatically calculates upsampling layers based on input/output size
  • Progressive Channel Reduction: Each stage halves channels (1280→640→320→160...)
  • Residual Connections: Improves gradient flow and reconstruction quality
  • Fast GELU Activation: x * σ(1.702x) for efficient non-linear transformations

Loss Function

The model uses a weighted combination of five loss components:

  • MSE Loss (weight: 1.0): Pixel-level reconstruction accuracy
  • SSIM Loss (weight: 0.1): Structural similarity
  • LPIPS Loss (weight: 0.1): Perceptual similarity (AlexNet-based)
  • TV Loss (weight: 0.01): Total variation (encourages smoothness)
  • Cosine Loss (weight: 0.01): Feature space similarity

Results

Example results on Tiny ImageNet (10000 train, 2000 test, 50 epochs):

Metric Value
MSE 0.127
Cosine Similarity 0.7851
LPIPS 0.0805
SSIM 0.7026

Results on Tiny ImageNet

results

Citation

@article{assran2023self,
  title={Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture},
  author={Assran, Mahmoud and Duval, Quentin and Misra, Ishan and Bojanowski, Piotr and Vincent, Pascal and Rabbat, Michael and LeCun, Yann and Ballas, Nicolas},
  journal={arXiv preprint arXiv:2301.08243},
  year={2023}
}

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request

About

This project focuses on the implementation of inverting I-JEAP, a new architecture designed to simulate human intelligence through self-supervised learning. Our goal is to invert the embeddings to demonstrate that such architectures can be vulnerable to inversion attacks

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages