Master's Project at Albert-Ludwigs-Universität Freiburg
Computer Vision Group | Supervised by Arian Mousakhan
This research investigates how high-frequency image components impact generation quality in two-stage generative models (VQGANs). By replacing learned tokenizers with deterministic JPEG compression, we demonstrate that aggressive frequency removal maintains generation performance despite reconstruction degradation.
- FID Scores: QF=20 (42.78) vs QF=40 (43.96) vs Baseline (43.92)
- Stable Generation: High-frequency loss doesn't degrade semantic understanding
- Training Stability: JPEG preprocessing eliminates codebook collapse
The complete implementation is not publicly available due to:
- Proprietary research agreements
- Institutional privacy policies
This repository contains preliminary JPEG compression utilities only. But you can find architecture details in report attached below.
├── DCT_JPG.py # JPEG DCT compression
├── compression.py # Compression pipeline
├── huffman_parser.py # Huffman coding utilities
├── quant.py # Quantization analysis
└── output/ # Sample outputs
Report - https://drive.google.com/file/d/11o6W0CFiWkaqf2uq4OGQDJObHr4CbU_n/view?usp=sharing The report includes:
- Complete methodology
- VQGAN+DINO architecture details
- Transformer training procedures
- Comprehensive experimental results
- Visual comparisons
pip install numpy opencv-python pillowFor full implementation (from report):
- PyTorch 2.0+
- timm (Vision Transformer)
- LPIPS, FID metrics
- BDD100K dataset
| Method | Embedding | FID ↓ | rFID ↓ |
|---|---|---|---|
| VQGAN | 32-dim | 40.66 | 20.53 |
| VQGAN | 16-dim | 43.92 | 19.62 |
| JPEG (QF=40) | 16-dim | 43.96 | 21.85 |
| JPEG (QF=20) | 16-dim | 42.78 | 21.92 |
@mastersthesis{jadhav2025jpeg,
title={How High-Frequency Image Components Affect Generation Quality:
A JPEG-Based Canonical Representation Approach},
author={Jadhav, Sejal},
year={2025},
school={Albert-Ludwigs-Universit{\"a}t Freiburg},
type={Master's Project}
}Sejal Jadhav
Supervisor: Arian Mousakhan
Examiner: Prof. Dr. Thomas Brox
- VQGAN Paper - Esser et al., 2021
- DINO - Caron et al., 2021
- BDD100K Dataset
Computer Vision Group | University of Freiburg