-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
102 changed files
with
3,657 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,25 +1,42 @@ | ||
This repository will contain the source code to reproduce the results of the paper | ||
This repository contains the source code to reproduce the results of the paper | ||
|
||
"Content representation for Neural Style Transfer Algorithms based on Structural Similarity" | ||
|
||
written by [Philip Meier](https://www.th-owl.de/init/en/das-init/team/c/meier-5.html) and [Volker Lohweg](https://www.th-owl.de/init/en/das-init/team/c/lohweg-1.html). It will be available after the paper is presented at the [29. Workshop "Computational Intelligence"](http://www.rst.e-technik.tu-dortmund.de/cms/de/Veranstaltungen/GMA-Fachausschuss/index.html) on the 28th and 29th of November 2019 in Dortmund, Germany. | ||
written by [Philip Meier](https://www.th-owl.de/init/en/das-init/team/c/meier-5.html) and [Volker Lohweg](https://www.th-owl.de/init/en/das-init/team/c/lohweg-1.html). It was presented at the [29. Workshop "Computational Intelligence"](http://www.rst.e-technik.tu-dortmund.de/cms/de/Veranstaltungen/GMA-Fachausschuss/index.html) on the 28th and 29th of November 2019 in Dortmund, Germany. | ||
|
||
# Accepted Abstract | ||
If you use this work within a scientific publication, please cite it as | ||
|
||
Within the field of non-photorealistic rendering (NPR) the term style transfer describes a process, which applies an abstract style to an image without changing the underlying content. The emergence of neural style transfer (NST) techniques, which were pioneered by Gatys, Ecker, and Bethge in 2016 [GEB16], marks a paradigm shift within this field. While traditional NPR methods operate within the pixel space [EF01], NST algorithms utilise the feature space of convolutional neural networks (CNNs) trained on object classification tasks. This enables a general style transfer from a single example image of the intended style. The quality of the resulting image is sometimes high enough to even fool art critics [San+18]. | ||
``` | ||
@InProceedings{ML2019, | ||
author = {Meier, Philip and Lohweg, Volker}, | ||
title = {Content Representation for Neural Style Transfer Algorithms based on Structural Similarity}, | ||
booktitle = {Proceedings of the 28\textsuperscript{th} Workshop Computational Intelligence}, | ||
year = {2019}, | ||
url = {https://github.com/pmeier/GMA_CI_2019_ssim_content_loss}, | ||
} | ||
``` | ||
|
||
NST techniques treat the style of an image as texture. Thus, its representation involves various forms of global [GEB16; RWB17] and local statistics [LW16]. Within the original formulation, the content of an image is directly represented by the encodings from a deep layer of the CNN [GEB16]. These encodings are subsequently compared with their mean squared error (MSE). To the best knowledge of the authors currently no publication deals with alternative representations of the content. This contribution will change this by introducing a content representation based on the structural similarity (SSIM) index. The SSIM index was introduced by Wang et al. as a measure for image quality [Wan+04]. It was developed in order to compare two images with an objective measure that is aligned with the human perception opposed to conventional methods such as the MSE or the peak signal-to-noise-ratio. The SSIM index is incorporated as content representation into NST algorithms by utilising it as comparison between encodings of a CNN. | ||
The paper is part the conference proceedings, which are [openly accessible](https://dx.doi.org/10.5445/KSP/1000098736). | ||
|
||
The proposed approach will be evaluated in two stages. An objective comparison between different NST algorithms is not possible within the current state of the art, since the quality of the stylisation is highly subjective. Thus, this contribution will focus on content reconstruction in a first step. Images reconstructed by different algorithms can be objectively compared to the original, for example by the SSIM index or the number of matching descriptors of the speeded up robust features (SURF) algorithm [BTv06]. In the second step the proposed content representation is utilised within an NST algorithm and qualitatively compared to the original formulation. | ||
# Installation | ||
|
||
## References | ||
Clone this repository | ||
|
||
Symbol | Reference | ||
--- | --- | ||
BTv06 | Bay, Herbert; Tuytelaars, Tinne; van Gool, Luc: ‘SURF: Speeded Up Robust Features’. In: Proceedings of the 9th European Conference on Computer Vision (ECCV). 2006. | ||
EF01 | Efros, Alexei A.; Freeman, William T.: ‘Image Quilting for Texture Synthesis and Transfer’. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH). 2001. DOI: [10.1145/383259.383296](https://dl.acm.org/citation.cfm?doid=383259.383296). | ||
GEB16 | Gatys, Leon A.; Ecker, Alexander S.; Bethge, Matthias: ‘Image Style Transfer Using Convolutional Neural Networks’. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. DOI: [10.1109/CVPR.2016.265](https://ieeexplore.ieee.org/document/7780634). | ||
LW16 | Li, Chuan; Wand, Michael: ‘Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis’. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. DOI: [10.1109/CVPR.2016.272](https://ieeexplore.ieee.org/document/7780641). | ||
RWB17 | Risser, Eric; Wilmot, Pierre; Barnes, Connelly: [‘Stable and Controllable Neural Texture Synthesis and Style Transfer Using Histogram Losses’](https://arxiv.org/abs/1701.08893). In: Computing Research Repository (CoRR) 1701 (2017). | ||
San+18 | Sanakoyeu, Artsiom et al.: [‘A Style-Aware Content Loss for Real-time HD Style Transfer’](https://arxiv.org/abs/1807.10201). In: Computing Research Repository (CoRR) 1807 (2018). | ||
Wan+04 | Wang, Zhou et al.: ‘Image quality assessment: from error visibility to structural similarity’. In: IEEE Transactions on Image Processing 13.4 (2004). DOI: [10.1109/TIP.2003.819861](https://ieeexplore.ieee.org/document/1284395). | ||
`git clone https://github.com/pmeier/GMA_CI_2019_ssim_content_loss` | ||
|
||
and install the required packages | ||
|
||
``` | ||
cd GMA_CI_2019_ssim_content_loss | ||
pip install -r requirements | ||
``` | ||
|
||
If you experience problems while installing `torch` or `torchvision`, please follow the [official installation instructions](https://pytorch.org/get-started/locally/) for your setup. | ||
|
||
# Replication | ||
|
||
All results are contained in the `results` folder. If you want to replicate the results yourself you need to | ||
|
||
1. download the source images by running `images.py`, | ||
2. perform the experiments by running `experiments.py`, and | ||
3. finally run `process.py` to process the raw experiment results. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,226 @@ | ||
from os import path | ||
import itertools | ||
import numpy as np | ||
import pandas as pd | ||
import torch | ||
from torchimagefilter import GaussFilter, BoxFilter | ||
from torchssim import SimplifiedMSSIM | ||
from pystiche.image import read_image, write_image, extract_image_size | ||
from pystiche.image.transforms import Resize, RGBToGrayscale | ||
from utils import make_reproducible, intgeomspace, df_to_csv | ||
from images import ( | ||
get_npr_general_files, | ||
get_npr_general_proxy_file, | ||
get_style_image_files, | ||
) | ||
from nst import MeierLohweg2019NCRPyramid, MeierLohweg2019NSTPyramid | ||
from recording import record_nst | ||
|
||
|
||
def get_eval_transform(image): | ||
eval_transform = Resize(extract_image_size(image)) + RGBToGrayscale() | ||
return eval_transform.to(image.device) | ||
|
||
|
||
def get_input_image(target_image, random=True): | ||
if random: | ||
return torch.rand_like(target_image) | ||
else: | ||
return target_image.clone() | ||
|
||
|
||
def perform_ncr( | ||
target_image, seed=0, level_steps=None, quiet=True, print_steps=None, **kwargs | ||
): | ||
device = target_image.device | ||
make_reproducible(seed) | ||
input_image = get_input_image(target_image, random=True) | ||
|
||
ncr_pyramid = MeierLohweg2019NCRPyramid(**kwargs) | ||
ncr_pyramid = ncr_pyramid.to(device) | ||
ncr_pyramid.build_levels(level_steps) | ||
|
||
ncr_pyramid.ncr.content_operator.set_target(target_image) | ||
|
||
output_images = ncr_pyramid(input_image, quiet=quiet, print_steps=print_steps) | ||
|
||
return output_images[-1] | ||
|
||
|
||
def perform_nst(content_image, style_image, quiet=True, print_steps=None, **kwargs): | ||
device = content_image.device | ||
make_reproducible() | ||
input_image = get_input_image(content_image, random=False) | ||
|
||
nst_pyramid = MeierLohweg2019NSTPyramid(**kwargs) | ||
nst_pyramid = nst_pyramid.to(device) | ||
nst_pyramid.build_levels() | ||
|
||
nst_pyramid.nst.content_operator.set_target(content_image) | ||
nst_pyramid.nst.style_operator.set_target(style_image) | ||
|
||
output_images = nst_pyramid(input_image, quiet=quiet, print_steps=print_steps) | ||
|
||
return output_images[-1] | ||
|
||
|
||
def benchmark_ncr(images_root, results_root, device): | ||
target_files = get_npr_general_files() | ||
ssim_component_weight_ratios = (0.0, 3.0, 9.0, np.inf) | ||
num_seeds = 5 | ||
|
||
loss_variations = [ | ||
(True, ssim_component_weight_ratio) | ||
for ssim_component_weight_ratio in ssim_component_weight_ratios | ||
] | ||
loss_variations = [(False, None)] + loss_variations | ||
seeds = np.arange(num_seeds) | ||
|
||
calculate_ssim_score = SimplifiedMSSIM().to(device) | ||
data = [] | ||
for target_file in target_files: | ||
target_name = path.splitext(path.basename(target_file))[0] | ||
target_image = read_image(path.join(images_root, target_file)).to(device) | ||
|
||
eval_transform = get_eval_transform(target_image) | ||
target_image_eval = eval_transform(target_image) | ||
|
||
for loss_variation, seed in itertools.product(loss_variations, seeds): | ||
ssim_loss, ssim_component_weight_ratio = loss_variation | ||
|
||
output_image = perform_ncr( | ||
target_image, | ||
seed=seed, | ||
ssim_loss=ssim_loss, | ||
ssim_component_weight_ratio=ssim_component_weight_ratio, | ||
) | ||
output_image_eval = eval_transform(output_image) | ||
|
||
mssim = calculate_ssim_score(output_image_eval, target_image_eval) | ||
ssim_score = mssim.cpu().item() | ||
|
||
data.append( | ||
(target_name, ssim_loss, ssim_component_weight_ratio, seed, ssim_score) | ||
) | ||
|
||
columns = ("name", "ssim_loss", "ssim_component_weight_ratio", "seed", "ssim_score") | ||
df = pd.DataFrame.from_records(data, columns=columns) | ||
file = path.join(results_root, "ncr_benchmark", "raw.csv") | ||
df_to_csv(df, file) | ||
|
||
|
||
def evaluate_steady_state(images_root, results_root, device): | ||
target_file = path.join(images_root, get_npr_general_proxy_file()) | ||
num_steps = 200_000 | ||
|
||
target_image = read_image(target_file).to(device) | ||
level_steps = (0, num_steps) | ||
print_steps = intgeomspace(1, num_steps, num=1000) | ||
|
||
for ssim_loss in (False, True): | ||
with record_nst(quiet=True) as recorder: | ||
perform_ncr( | ||
target_image, | ||
level_steps=level_steps, | ||
quiet=False, | ||
print_steps=print_steps, | ||
ssim_loss=ssim_loss, | ||
diagnose_ssim_score=True, | ||
) | ||
|
||
df = recorder.extract() | ||
|
||
loss_type = "SSIM" if ssim_loss else "SE" | ||
df = df.rename( | ||
columns={f"Content loss ({loss_type})": "loss", "SSIM score": "ssim_score"} | ||
) | ||
df = df[["ssim_score", "loss"]] | ||
df = df.dropna(axis="index", how="all") | ||
|
||
file = f"{loss_type.lower()}.csv" | ||
file = path.join(results_root, "steady_state", "raw", file) | ||
df_to_csv(df, file, index=False) | ||
|
||
|
||
def evaluate_ssim_window(images_root, results_root, device): | ||
target_file = path.join(images_root, get_npr_general_proxy_file()) | ||
window_types = ("gauss", "box") | ||
output_shapes = ("same", "valid") | ||
radii = range(1, 10) | ||
num_seeds = 5 | ||
|
||
target_image = read_image(target_file).to(device) | ||
|
||
eval_transform = get_eval_transform(target_image) | ||
target_image_eval = eval_transform(target_image) | ||
|
||
def get_image_filter(window_type, output_shape, radius): | ||
kwargs = {"output_shape": output_shape, "padding_mode": "replicate"} | ||
if window_type == "gauss": | ||
return GaussFilter(radius=radius, std=radius / 3.0, **kwargs) | ||
else: # filter_type == "box" | ||
return BoxFilter(radius=radius, **kwargs) | ||
|
||
seeds = range(num_seeds) | ||
|
||
calculate_mssim = SimplifiedMSSIM().to(device) | ||
data = [] | ||
|
||
for image_filter_params in itertools.product(window_types, output_shapes, radii): | ||
image_filter = get_image_filter(*image_filter_params) | ||
|
||
for seed in seeds: | ||
|
||
kwargs = {"seed": seed, "image_filter": image_filter} | ||
output_image = perform_ncr(target_image, **kwargs) | ||
output_image_eval = eval_transform(output_image) | ||
|
||
mssim = calculate_mssim(output_image_eval, target_image_eval) | ||
ssim_score = mssim.cpu().item() | ||
data.append((*image_filter_params, seed, ssim_score)) | ||
|
||
columns = ("window_type", "output_shape", "radius", "seed", "ssim_score") | ||
df = pd.DataFrame.from_records(data, columns=columns) | ||
file = path.join(results_root, "ssim_window", "raw.csv") | ||
df_to_csv(df, file) | ||
|
||
|
||
def benchmark_nst(images_root, results_root, device): | ||
def process_image(file): | ||
name = path.splitext(path.basename(file))[0] | ||
image = read_image(path.join(images_root, file)).to(device) | ||
return name, image | ||
|
||
content_files = get_npr_general_files() | ||
style_files = get_style_image_files() | ||
|
||
for content_file in content_files: | ||
content_name, content_image = process_image(content_file) | ||
for style_file in style_files: | ||
style_name, style_image = process_image(style_file) | ||
|
||
for ssim_loss in (False, True): | ||
output_image = perform_nst( | ||
content_image, style_image, ssim_loss=ssim_loss, quiet=False | ||
) | ||
|
||
output_file = "__".join( | ||
(content_name, style_name, "ssim" if ssim_loss else "se") | ||
) | ||
output_file = path.join( | ||
results_root, "nst_benchmark", f"{output_file}.jpg" | ||
) | ||
write_image(output_image, output_file) | ||
|
||
|
||
if __name__ == "__main__": | ||
root = path.dirname(__file__) | ||
images_root = path.join(root, "images") | ||
results_root = path.join(root, "results") | ||
|
||
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | ||
|
||
benchmark_ncr(images_root, results_root, device) | ||
evaluate_steady_state(images_root, results_root, device) | ||
evaluate_ssim_window(images_root, results_root, device) | ||
benchmark_nst(images_root, results_root, device) |
Oops, something went wrong.