IJePA Image Reconstruction

This project implements an image reconstruction system using IJePA (Image-based Joint-Embedding Predictive Architecture) embeddings. The model learns to reconstruct images from their IJePA feature representations using a convolutional decoder network.

Features

IJePA-based feature extraction using pretrained facebook/ijepa_vith14_1k model
Convolutional decoder with residual blocks for high-quality reconstruction
Multi-component loss function (MSE, SSIM, LPIPS, TV, Cosine)
Comprehensive evaluation metrics
Visualization tools for reconstruction quality assessment
Support for CIFAR-10 and Tiny ImageNet datasets

Project Structure

ijepa-reconstruction/
├── src/
│   ├── models/          # Model architectures
│   ├── data/            # Data loading and preprocessing
│   ├── utils/           # Utilities for metrics and visualization
│   └── training/        # Training logic
├── scripts/             # Executable scripts
├── configs/             # Configuration files
├── cache/               # Cached embeddings (gitignored)
├── checkpoints/         # Model checkpoints (gitignored)
└── results/             # Output results (gitignored)

Installation

# Clone the repository
git clone  https://github.com/aymen-000/i-jeap-emdedding-inversion-attack
cd i-jeap-emdedding-inversion-attack

# Install dependencies
pip install -r requirements.txt

Usage

1. Precompute IJePA Embeddings

First, compute and cache the IJePA embeddings for your dataset (if you want to accelerate the training ):

# For CIFAR-10
python scripts/compute_embeddings.py \
    --dataset cifar10 \
    --train_subset 6000 \
    --test_subset 2000 \
    --batch_size 8 \
    --cache_dir ./cache

# For Tiny ImageNet
python scripts/compute_embeddings.py \
    --dataset tiny_imagenet \
    --data_root /path/to/tiny-imagenet-200/train \
    --train_subset 6000 \
    --test_subset 2000 \
    --batch_size 8 \
    --cache_dir ./cache

2. Train the Reconstruction Model

Train the decoder to reconstruct images from embeddings:

python scripts/train.py \
    --dataset cifar10 \
    --train_subset 6000 \
    --test_subset 2000 \
    --batch_size 32 \
    --epochs 100 \
    --checkpoint_dir ./checkpoints \
    --output_dir ./results

3. Evaluate the Model

Compute reconstruction metrics on the test set:

python scripts/evaluate.py \
    --dataset cifar10 \
    --checkpoint ./checkpoints/cifar10_model_final.pth \
    --embeddings_file ./cache/cifar10_test_embeddings.pt \
    --output_csv ./results/metrics.csv \
    --test_subset 2000 \
    --train_size 6000 \
    --num_epochs 100

4. Visualize Results

Generate side-by-side comparisons of original and reconstructed images:

python scripts/visualize.py \
    --dataset cifar10 \
    --checkpoint ./checkpoints/cifar10_model_final.pth \
    --embeddings_file ./cache/cifar10_test_embeddings.pt \
    --num_samples 10 \
    --output_dir ./results

Model Architecture

Decoder Architecture

The decoder uses a flexible progressive upsampling architecture that supports arbitrary output resolutions:

Input Embeddings (B, 1280, 16, 16)
          |
          v
    ┌─────────────────────┐
    │  Upsampling Block 1 │
    │  ConvTranspose2d    │
    │  1280 → 640 ch      │
    │  16×16 → 32×32      │
    └──────────┬──────────┘
               |
               v
    ┌─────────────────────┐
    │  Residual Block     │
    │  Conv → BN → GELU   │
    │  Conv → BN          │
    │  + Skip Connection  │
    └──────────┬──────────┘
               |
               v
    ┌─────────────────────┐
    │  Upsampling Block 2 │
    │  ConvTranspose2d    │
    │  640 → 320 ch       │
    │  32×32 → 64×64      │
    └──────────┬──────────┘
               |
               v
    ┌─────────────────────┐
    │  Residual Block     │
    └──────────┬──────────┘
               |
               v
         (Continue for
      log₂(output_size/16)
          iterations)
               |
               v
    ┌─────────────────────┐
    │  Final Conv Layer   │
    │  C → 3 channels     │
    │  3×3 kernel         │
    └──────────┬──────────┘
               |
               v
    ┌─────────────────────┐
    │ Sigmoid Activation  │
    └──────────┬──────────┘
               |
               v
Reconstructed Image (B, 3, output_size, output_size)

Key Features:

Dynamic Resolution: Automatically calculates upsampling layers based on input/output size
Progressive Channel Reduction: Each stage halves channels (1280→640→320→160...)
Residual Connections: Improves gradient flow and reconstruction quality
Fast GELU Activation: x * σ(1.702x) for efficient non-linear transformations

Loss Function

The model uses a weighted combination of five loss components:

MSE Loss (weight: 1.0): Pixel-level reconstruction accuracy
SSIM Loss (weight: 0.1): Structural similarity
LPIPS Loss (weight: 0.1): Perceptual similarity (AlexNet-based)
TV Loss (weight: 0.01): Total variation (encourages smoothness)
Cosine Loss (weight: 0.01): Feature space similarity

Results

Example results on Tiny ImageNet (10000 train, 2000 test, 50 epochs):

Metric	Value
MSE	0.127
Cosine Similarity	0.7851
LPIPS	0.0805
SSIM	0.7026

Results on Tiny ImageNet

Citation

@article{assran2023self,
  title={Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture},
  author={Assran, Mahmoud and Duval, Quentin and Misra, Ishan and Bojanowski, Piotr and Vincent, Pascal and Rabbat, Michael and LeCun, Yann and Ballas, Nicolas},
  journal={arXiv preprint arXiv:2301.08243},
  year={2023}
}

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
configs		configs
results		results
scripts		scripts
src		src
setup.py		setup.py
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IJePA Image Reconstruction

Features

Project Structure

Installation

Usage

1. Precompute IJePA Embeddings

2. Train the Reconstruction Model

3. Evaluate the Model

4. Visualize Results

Model Architecture

Decoder Architecture

Loss Function

Results

Results on Tiny ImageNet

Citation

License

Contributing

About

Uh oh!

Releases

Packages

Languages

License

aymen-000/i-jeap-emdedding-inversion-attack

Folders and files

Latest commit

History

Repository files navigation

IJePA Image Reconstruction

Features

Project Structure

Installation

Usage

1. Precompute IJePA Embeddings

2. Train the Reconstruction Model

3. Evaluate the Model

4. Visualize Results

Model Architecture

Decoder Architecture

Loss Function

Results

Results on Tiny ImageNet

Citation

License

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages