Skip to content

TahirKurtar/AFM-to-O2A-GAN-Comparison

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”¬ A Comparative Study of GAN-Based Image-to-Image Translation Methods for AFM-to-O2A Microscopy Images

Tahir Kurtar | Izmir Democracy University β€” Electrical and Electronics Engineering
Supervisor: Asst. Prof. Başak Esin Kâktürk Güzel
Year: 2025


πŸ“Œ Project Overview

This project investigates deep learning–based image-to-image translation methods for converting Atomic Force Microscopy (AFM) images into Second Harmonic scattering-type Scanning Near-Field Optical Microscopy (Oβ‚‚A) amplitude maps.

s-SNOM systems required for Oβ‚‚A imaging are expensive, complex, and not widely available. This study explores whether high-quality Oβ‚‚A maps can be synthetically generated from standard AFM images using GAN-based models β€” reducing reliance on specialized optical instrumentation.

Five GAN architectures are implemented and systematically compared:

Model Type Key Characteristic
Pix2Pix Supervised baseline Paired image translation with L₁ + adversarial loss
Pix2Pix + ESRGAN Supervised + enhancement Super-resolution refinement on Pix2Pix outputs
GauGAN Spatially-aware synthesis Semantic and spatial conditioning via SPADE
Vanilla GAN Minimal baseline Adversarial loss only, no reconstruction constraint
CycleGAN Unpaired translation Cycle-consistency constraints, no paired data needed

πŸ§ͺ Dataset

The dataset consists of paired AFM images and corresponding Oβ‚‚A amplitude maps acquired from the same nanoscale regions.

  • AFM images: High-resolution surface topography of nanoscale material samples
  • Oβ‚‚A maps: Optical amplitude contrast obtained via s-SNOM measurements
  • All image pairs are spatially aligned and normalized to the range [βˆ’1, 1]
  • The dataset is split into training, validation, and test sets

⚠️ The raw dataset is not included in this repository due to data sharing restrictions.


🧠 GAN Models β€” Architecture & Methodology


1. Pix2Pix (Supervised Baseline)

Architecture:

  • Generator: U-Net with encoder–decoder structure and skip connections
  • Discriminator: PatchGAN β€” classifies local image patches as real or fake

How it works: Pix2Pix is a conditional GAN trained on paired data. The generator learns to map AFM inputs to Oβ‚‚A outputs by minimizing a combination of adversarial loss (fooling the discriminator) and L₁ pixel-wise loss (minimizing pixel-level reconstruction error).

Why used in this project: Pix2Pix is the standard supervised baseline for paired image-to-image translation. It provides a strong reference point for evaluating all other models.

Difference from others: Unlike CycleGAN, it requires paired training data. Unlike GauGAN, it does not incorporate spatial conditioning at multiple generator layers.

Project-specific architecture diagram:

Pix2Pix Architecture

Reference architecture:

Pix2Pix Reference


2. Pix2Pix + ESRGAN (Enhancement Pipeline)

Architecture:

  • Stage 1: Pix2Pix generates an initial Oβ‚‚A prediction
  • Stage 2: ESRGAN (Enhanced Super-Resolution GAN) refines the output using Residual-in-Residual Dense Blocks (RRDB)

How it works: ESRGAN is applied as a post-processing step to the Pix2Pix output. It enhances perceptual sharpness and suppresses high-frequency noise by leveraging its super-resolution capabilities. The base Pix2Pix model is not retrained.

Why used in this project: To investigate whether visual quality can be improved beyond standard Pix2Pix without retraining the base model.

Difference from others: This is the only two-stage pipeline in the study. The first stage handles domain translation, the second handles visual refinement.

Project-specific architecture diagram:

ESRGAN Pipeline

Reference architecture:

ESRGAN Reference


3. GauGAN (Spatially-Aware Synthesis)

Architecture:

  • Encoder: Encodes style information (Mu / LogVar) via VAE-style reparameterization
  • Generator: SPADE (Spatially-Adaptive Denormalization) ResBlocks β€” AFM map injected at every layer
  • Discriminator: Multi-scale discriminator (same as Pix2PixHD)

How it works: GauGAN incorporates spatial and semantic information at every generator layer through SPADE normalization. Instead of encoding the input only at the bottleneck, it continuously injects the AFM structural map throughout generation via learned Ξ³ and Ξ² parameters β€” enabling fine-grained structural control at multiple scales.

Why used in this project: To evaluate whether spatially-conditioned synthesis can better preserve the nanoscale structural details found in Oβ‚‚A maps.

Difference from others: GauGAN is the only model that uses spatially-adaptive normalization (SPADE). This allows it to maintain structural consistency across different spatial regions simultaneously β€” a critical property for nanoscale microscopy image synthesis.

Project-specific architecture diagram:

GauGAN Architecture

Reference architecture:

GauGAN Reference


4. Vanilla GAN (Minimal Baseline)

Architecture:

  • Generator: Simple convolutional network
  • Discriminator: Standard convolutional classifier

How it works: Trained solely with adversarial loss. The generator learns to produce images that fool the discriminator, without any explicit reconstruction constraint. There is no L₁ loss guiding pixel-level accuracy.

Why used in this project: To serve as a minimal reference and highlight the benefits of more structured architectures such as Pix2Pix and GauGAN.

Difference from others: The simplest model in the comparison. The absence of pixel-wise loss means the model is not explicitly guided to match the ground truth β€” it only learns to produce realistic-looking images.

Project-specific architecture diagram:

Vanilla GAN Architecture

Reference architecture:

Vanilla GAN Reference


5. CycleGAN (Unpaired Translation)

Architecture:

  • Two generators: G (AFM β†’ Oβ‚‚A) and F (Oβ‚‚A β†’ AFM)
  • Two discriminators: D_A and D_B
  • Cycle-consistency loss: enforces F(G(x)) β‰ˆ x and G(F(y)) β‰ˆ y

How it works: CycleGAN learns bidirectional mappings between two domains without requiring paired training data. The cycle-consistency constraint ensures that translating an image to the other domain and back produces the original image β€” providing a form of implicit supervision without pixel-wise pairing.

Why used in this project: To assess the impact of removing strict pixel-level supervision. Even though paired data is available, CycleGAN is evaluated to benchmark the unpaired setting against supervised models.

Difference from others: The only model that does not use paired data during training. This makes it more flexible but less precise β€” it cannot enforce pixel-wise correspondence between AFM and Oβ‚‚A domains.

Project-specific architecture diagram:

CycleGAN Architecture

Reference architecture:

CycleGAN Reference


πŸ“Š Results

Quantitative Evaluation

Model L₁ Loss ↓ SSIM ↑ PSNR ↑
Pix2Pix 0.2084 0.4422 19.48 dB
Pix2Pix + ESRGAN 0.2062 0.4419 19.39 dB
GauGAN 0.202 0.457 19.49 dB
Vanilla GAN 0.2054 0.4255 19.22 dB
CycleGAN A→B 1.8326 0.2421 12.86 dB
CycleGAN B→A 1.8326 0.2216 11.04 dB

GauGAN achieves the best overall quantitative performance β€” lowest L₁ loss, highest SSIM and PSNR.

Qualitative Evaluation

  • GauGAN produces the sharpest and most spatially coherent Oβ‚‚A images with improved edge definition
  • Pix2Pix and Vanilla GAN exhibit comparable visual quality; Vanilla GAN occasionally yields sharper results
  • Pix2Pix + ESRGAN improves visual smoothness but does not enhance structural fidelity
  • CycleGAN captures coarse structures but introduces smoothing artifacts and lacks pixel-level correspondence

Key Finding

Numerical metrics alone do not fully capture model quality in microscopy image synthesis. GauGAN's advantage in spatial coherence is more apparent in visual inspection than in quantitative scores β€” highlighting the importance of combining both evaluation approaches.


πŸ“ˆ Evaluation Metrics

L₁ Loss (Mean Absolute Error): Measures pixel-wise reconstruction accuracy between generated and real Oβ‚‚A images.

SSIM (Structural Similarity Index): Measures perceptual similarity by comparing luminance, contrast, and structural information.

SSIM(x, y) = (2ΞΌxΞΌy + C1)(2Οƒxy + C2) / (ΞΌxΒ² + ΞΌyΒ² + C1)(ΟƒxΒ² + ΟƒyΒ² + C2)

PSNR (Peak Signal-to-Noise Ratio): Measures signal fidelity in decibels. Higher values indicate less distortion.

PSNR = 10 Γ— log10(255Β² / MSE)

πŸ—‚οΈ Repository Structure

πŸ“ AFM-to-O2A-GAN-Comparison
β”œβ”€β”€ πŸ“ notebooks
β”‚   β”œβ”€β”€ pix2pix.ipynb
β”‚   β”œβ”€β”€ pix2pix_esrgan.ipynb
β”‚   β”œβ”€β”€ gaugan.ipynb
β”‚   β”œβ”€β”€ vanilla_gan.ipynb
β”‚   └── cyclegan.ipynb
β”œβ”€β”€ πŸ“ cyclegan
β”‚   β”œβ”€β”€ cyclegan_train.py
β”‚   └── πŸ“ test_outputs
β”‚       β”œβ”€β”€ example_1_A2B.png
β”‚       β”œβ”€β”€ example_1_B2A.png
β”‚       β”œβ”€β”€ example_2_A2B.png
β”‚       └── ...
β”œβ”€β”€ πŸ“ images
β”‚   β”œβ”€β”€ πŸ“ diagrams
β”‚   β”‚   β”œβ”€β”€ pix2pix_architecture.svg
β”‚   β”‚   β”œβ”€β”€ cyclegan_architecture.svg
β”‚   β”‚   β”œβ”€β”€ gaugan_architecture.svg
β”‚   β”‚   β”œβ”€β”€ esrgan_pipeline.svg
β”‚   β”‚   └── vanilla_gan_architecture.svg
β”‚   └── πŸ“ references
β”‚       β”œβ”€β”€ pix2pix_reference.png
β”‚       β”œβ”€β”€ cyclegan_reference.png
β”‚       β”œβ”€β”€ gaugan_reference.webp
β”‚       β”œβ”€β”€ esrgan_reference.png
β”‚       └── vanilla_gan_reference.png
β”œβ”€β”€ πŸ“ poster
β”‚   └── poster.pdf
β”œβ”€β”€ πŸ“ report
β”‚   └── report.pdf
└── README.md

⚠️ CycleGAN model checkpoints (~340 MB per file, 20 epochs) are not included in this repository due to GitHub file size limits. All checkpoints are available on Hugging Face: Download Checkpoints


πŸ› οΈ Implementation Details

Detail Description
Framework PyTorch
Optimizer Adam (fixed learning rate and momentum)
Training Fixed epochs with early stopping on validation loss
Hardware GPU-enabled (Kaggle / local)
Pix2Pix, ESRGAN, GauGAN, Vanilla GAN Implemented and trained on Kaggle
CycleGAN Implemented and trained locally (VS Code)

πŸ“š References & Visual Sources

  1. Zhang et al. Pix2PixHD++: Image-to-Image Translation via Enhanced Generator and Discriminator. arXiv:2504.02982, 2024.
  2. He et al. An Introduction to Image Synthesis with Generative Adversarial Nets. arXiv:1803.04469, 2018.
  3. Isola et al. Image-to-Image Translation with Conditional Adversarial Networks. arXiv:1611.07004, 2017.
  4. Bagherkhani et al. Antenna Near-Field Reconstruction from Far-Field Data Using CNNs. arXiv:2504.17065, 2025.
  5. Chen & Dal Negro. Physics-informed Neural Networks for Imaging and Parameter Retrieval. arXiv:2109.12754, 2021.
  6. Stanciu et al. Inferring s-SNOM Data from Atomic Force Microscopy Images. arXiv:2504.02982, 2025.
  7. Pix2Pix Schema: View Source
  8. ESRGAN Structure: View Source
  9. GauGAN Detail: View Source
  10. Vanilla GAN Flow: View Source
  11. CycleGAN Diagram: View Source

Note: Click the links above to view the original high-resolution reference diagrams.

About

A comparative study of GAN architectures (Pix2Pix, CycleGAN, GauGAN, VanillaGAN, ESRGAN) for synthesizing O2A microscopy maps from AFM surface topography.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors