π¬ A Comparative Study of GAN-Based Image-to-Image Translation Methods for AFM-to-O2A Microscopy Images
Tahir Kurtar | Izmir Democracy University β Electrical and Electronics Engineering
Supervisor: Asst. Prof. BaΕak Esin KΓΆktΓΌrk GΓΌzel
Year: 2025
This project investigates deep learningβbased image-to-image translation methods for converting Atomic Force Microscopy (AFM) images into Second Harmonic scattering-type Scanning Near-Field Optical Microscopy (OβA) amplitude maps.
s-SNOM systems required for OβA imaging are expensive, complex, and not widely available. This study explores whether high-quality OβA maps can be synthetically generated from standard AFM images using GAN-based models β reducing reliance on specialized optical instrumentation.
Five GAN architectures are implemented and systematically compared:
| Model | Type | Key Characteristic |
|---|---|---|
| Pix2Pix | Supervised baseline | Paired image translation with Lβ + adversarial loss |
| Pix2Pix + ESRGAN | Supervised + enhancement | Super-resolution refinement on Pix2Pix outputs |
| GauGAN | Spatially-aware synthesis | Semantic and spatial conditioning via SPADE |
| Vanilla GAN | Minimal baseline | Adversarial loss only, no reconstruction constraint |
| CycleGAN | Unpaired translation | Cycle-consistency constraints, no paired data needed |
The dataset consists of paired AFM images and corresponding OβA amplitude maps acquired from the same nanoscale regions.
- AFM images: High-resolution surface topography of nanoscale material samples
- OβA maps: Optical amplitude contrast obtained via s-SNOM measurements
- All image pairs are spatially aligned and normalized to the range [β1, 1]
- The dataset is split into training, validation, and test sets
β οΈ The raw dataset is not included in this repository due to data sharing restrictions.
Architecture:
- Generator: U-Net with encoderβdecoder structure and skip connections
- Discriminator: PatchGAN β classifies local image patches as real or fake
How it works: Pix2Pix is a conditional GAN trained on paired data. The generator learns to map AFM inputs to OβA outputs by minimizing a combination of adversarial loss (fooling the discriminator) and Lβ pixel-wise loss (minimizing pixel-level reconstruction error).
Why used in this project: Pix2Pix is the standard supervised baseline for paired image-to-image translation. It provides a strong reference point for evaluating all other models.
Difference from others: Unlike CycleGAN, it requires paired training data. Unlike GauGAN, it does not incorporate spatial conditioning at multiple generator layers.
Project-specific architecture diagram:
Reference architecture:
Architecture:
- Stage 1: Pix2Pix generates an initial OβA prediction
- Stage 2: ESRGAN (Enhanced Super-Resolution GAN) refines the output using Residual-in-Residual Dense Blocks (RRDB)
How it works: ESRGAN is applied as a post-processing step to the Pix2Pix output. It enhances perceptual sharpness and suppresses high-frequency noise by leveraging its super-resolution capabilities. The base Pix2Pix model is not retrained.
Why used in this project: To investigate whether visual quality can be improved beyond standard Pix2Pix without retraining the base model.
Difference from others: This is the only two-stage pipeline in the study. The first stage handles domain translation, the second handles visual refinement.
Project-specific architecture diagram:
Reference architecture:
Architecture:
- Encoder: Encodes style information (Mu / LogVar) via VAE-style reparameterization
- Generator: SPADE (Spatially-Adaptive Denormalization) ResBlocks β AFM map injected at every layer
- Discriminator: Multi-scale discriminator (same as Pix2PixHD)
How it works: GauGAN incorporates spatial and semantic information at every generator layer through SPADE normalization. Instead of encoding the input only at the bottleneck, it continuously injects the AFM structural map throughout generation via learned Ξ³ and Ξ² parameters β enabling fine-grained structural control at multiple scales.
Why used in this project: To evaluate whether spatially-conditioned synthesis can better preserve the nanoscale structural details found in OβA maps.
Difference from others: GauGAN is the only model that uses spatially-adaptive normalization (SPADE). This allows it to maintain structural consistency across different spatial regions simultaneously β a critical property for nanoscale microscopy image synthesis.
Project-specific architecture diagram:
Reference architecture:
Architecture:
- Generator: Simple convolutional network
- Discriminator: Standard convolutional classifier
How it works: Trained solely with adversarial loss. The generator learns to produce images that fool the discriminator, without any explicit reconstruction constraint. There is no Lβ loss guiding pixel-level accuracy.
Why used in this project: To serve as a minimal reference and highlight the benefits of more structured architectures such as Pix2Pix and GauGAN.
Difference from others: The simplest model in the comparison. The absence of pixel-wise loss means the model is not explicitly guided to match the ground truth β it only learns to produce realistic-looking images.
Project-specific architecture diagram:
Reference architecture:
Architecture:
- Two generators: G (AFM β OβA) and F (OβA β AFM)
- Two discriminators: D_A and D_B
- Cycle-consistency loss: enforces F(G(x)) β x and G(F(y)) β y
How it works: CycleGAN learns bidirectional mappings between two domains without requiring paired training data. The cycle-consistency constraint ensures that translating an image to the other domain and back produces the original image β providing a form of implicit supervision without pixel-wise pairing.
Why used in this project: To assess the impact of removing strict pixel-level supervision. Even though paired data is available, CycleGAN is evaluated to benchmark the unpaired setting against supervised models.
Difference from others: The only model that does not use paired data during training. This makes it more flexible but less precise β it cannot enforce pixel-wise correspondence between AFM and OβA domains.
Project-specific architecture diagram:
Reference architecture:
| Model | Lβ Loss β | SSIM β | PSNR β |
|---|---|---|---|
| Pix2Pix | 0.2084 | 0.4422 | 19.48 dB |
| Pix2Pix + ESRGAN | 0.2062 | 0.4419 | 19.39 dB |
| GauGAN | 0.202 | 0.457 | 19.49 dB |
| Vanilla GAN | 0.2054 | 0.4255 | 19.22 dB |
| CycleGAN AβB | 1.8326 | 0.2421 | 12.86 dB |
| CycleGAN BβA | 1.8326 | 0.2216 | 11.04 dB |
GauGAN achieves the best overall quantitative performance β lowest Lβ loss, highest SSIM and PSNR.
- GauGAN produces the sharpest and most spatially coherent OβA images with improved edge definition
- Pix2Pix and Vanilla GAN exhibit comparable visual quality; Vanilla GAN occasionally yields sharper results
- Pix2Pix + ESRGAN improves visual smoothness but does not enhance structural fidelity
- CycleGAN captures coarse structures but introduces smoothing artifacts and lacks pixel-level correspondence
Numerical metrics alone do not fully capture model quality in microscopy image synthesis. GauGAN's advantage in spatial coherence is more apparent in visual inspection than in quantitative scores β highlighting the importance of combining both evaluation approaches.
Lβ Loss (Mean Absolute Error): Measures pixel-wise reconstruction accuracy between generated and real OβA images.
SSIM (Structural Similarity Index): Measures perceptual similarity by comparing luminance, contrast, and structural information.
SSIM(x, y) = (2ΞΌxΞΌy + C1)(2Οxy + C2) / (ΞΌxΒ² + ΞΌyΒ² + C1)(ΟxΒ² + ΟyΒ² + C2)
PSNR (Peak Signal-to-Noise Ratio): Measures signal fidelity in decibels. Higher values indicate less distortion.
PSNR = 10 Γ log10(255Β² / MSE)
π AFM-to-O2A-GAN-Comparison
βββ π notebooks
β βββ pix2pix.ipynb
β βββ pix2pix_esrgan.ipynb
β βββ gaugan.ipynb
β βββ vanilla_gan.ipynb
β βββ cyclegan.ipynb
βββ π cyclegan
β βββ cyclegan_train.py
β βββ π test_outputs
β βββ example_1_A2B.png
β βββ example_1_B2A.png
β βββ example_2_A2B.png
β βββ ...
βββ π images
β βββ π diagrams
β β βββ pix2pix_architecture.svg
β β βββ cyclegan_architecture.svg
β β βββ gaugan_architecture.svg
β β βββ esrgan_pipeline.svg
β β βββ vanilla_gan_architecture.svg
β βββ π references
β βββ pix2pix_reference.png
β βββ cyclegan_reference.png
β βββ gaugan_reference.webp
β βββ esrgan_reference.png
β βββ vanilla_gan_reference.png
βββ π poster
β βββ poster.pdf
βββ π report
β βββ report.pdf
βββ README.md
β οΈ CycleGAN model checkpoints (~340 MB per file, 20 epochs) are not included in this repository due to GitHub file size limits. All checkpoints are available on Hugging Face: Download Checkpoints
| Detail | Description |
|---|---|
| Framework | PyTorch |
| Optimizer | Adam (fixed learning rate and momentum) |
| Training | Fixed epochs with early stopping on validation loss |
| Hardware | GPU-enabled (Kaggle / local) |
| Pix2Pix, ESRGAN, GauGAN, Vanilla GAN | Implemented and trained on Kaggle |
| CycleGAN | Implemented and trained locally (VS Code) |
- Zhang et al. Pix2PixHD++: Image-to-Image Translation via Enhanced Generator and Discriminator. arXiv:2504.02982, 2024.
- He et al. An Introduction to Image Synthesis with Generative Adversarial Nets. arXiv:1803.04469, 2018.
- Isola et al. Image-to-Image Translation with Conditional Adversarial Networks. arXiv:1611.07004, 2017.
- Bagherkhani et al. Antenna Near-Field Reconstruction from Far-Field Data Using CNNs. arXiv:2504.17065, 2025.
- Chen & Dal Negro. Physics-informed Neural Networks for Imaging and Parameter Retrieval. arXiv:2109.12754, 2021.
- Stanciu et al. Inferring s-SNOM Data from Atomic Force Microscopy Images. arXiv:2504.02982, 2025.
- Pix2Pix Schema: View Source
- ESRGAN Structure: View Source
- GauGAN Detail: View Source
- Vanilla GAN Flow: View Source
- CycleGAN Diagram: View Source
Note: Click the links above to view the original high-resolution reference diagrams.




