Skip to content

Files

Latest commit

 

History

History

07-generative-adversarial-networks

readme.md

Generative Adversarial Networks

Generative Adversarial Networks (GANs) are an adaptation of deep neural networks that allow users to create fake data that mimics or mirrors a training set. GANs have been used to generate works of art, realistic fake images, realistic fake video, and text.

A number of websites that generate examples of these so called "deep fakes" have been collected on this website. Some websites, like This Marketing Blog Does Not Exist train multiple GANs, some for the images, others for the text.

These websites typically use a well known open-source architecture such as StyleGAN, BigGAN, and Pix2Pix for images. For text generation something like BERT or GPT-2. Using an established architecture — and often an open source implementation & training regimen as well — the creators of these websites then train them on a dataset of their own choice.

GANs are actually a combination of two separate neural networks (the Generator and the Discriminator) who compete against each other during the training process (hence "adversarial"). The generator's job is to create convincing fake images, where "convincing" means good enough to trick the discriminator. Therefore, the discriminator's job is to detect images that are produced by the generator, and distinguish them from images that are in the actual training set.

One result of this setup is a complex training process. Because the two networks are competing, they must be trained "separately." But, because the generator's performance only makes sense in terms of the discriminator (i.e. the generator is "good" when it "fools" the discriminator, and bad otherwise) they must be trained together! Furthermore, there aren't any real "objective" metrics that tell us if the GAN is doing a good job — we have to actually look at the results of the generator to tell if it's generating fakes that can fool a human.

As a result, GANs are notoriously hard to train. In addition to improving architectural patterns and expanding those patterns into new domains (image, text, video, audio...), many of the advancements in GAN research are related to creating more successful training regimes.

By the end of this section students should be able to:

  • Describe the components of a GAN, specifically the generator and discriminator networks.
  • Describe the "adversarial" relationship between the two networks.
  • Implement and train GANs that create passable images for both the CIFAR-10 and MNIST datasets.
  • Apply best practices to their GAN's training regimen.
  • Study state of the art GANs on their own.

Part 1: GANs on MNIST

For our introduction to GANs, we're headed back to the trusty MNIST dataset. We're using MNIST for a variety of reasons. One major one is that GANs are complex and expensive to train so a simple dataset has advantages for all of us using commodity hardware. The small images, a single color channel, and a comparatively simple task allow us explore basic GAN architectures without getting lost in the complexities of state-of-the-art models.

Pre-Reading

Resources for Further Exploration

Part 2: Adding Color, Improving Training Stability, and Other Best Practices

GANs have earned a reputation for being hard to train. In the early days of GAN research (2014) very little was known about how and why exactly GANs worked, and therefore how to train them. Like many areas in Deep Learning, GAN research was advanced more from empirical research than through theoretical breakthroughs. In many ways, the theory still lags behind the empirical research. Nevertheless, over the last few years some best practices have been established to greatly improve training stability and overall GAN performance.

Pre-Reading

Helpful Documentation

Resources for Further Exploration

Advanced GAN Resources: Style Transfer, Resolution Recovery, Photo In-Painting, Video, and More

Since their invention, GANs were quickly adapted for use in tons interesting ways. "DeepFake" videos are one of the most well known applications of GANs, but the strategy is also proving to be useful in image editing, photo resolution recovery, photo reconstruction/inpainting, the creation of art, black-and-white photo colorization, and more.

Implementing examples of everything GANs can do is beyond the scope of this class (and my personal ability!) but there are numerous wonderful resources available elsewhere. Here is a collection of fascinating and detailed articles, papers, code repos, and tools that the curious student can use to continue their exploration of GANs.

Collections of GAN Applications

Controlling The Generated Images / AI Assisted Photo Realistic Painting

Style Transfer

Resolution Recovery / Super Resolution

Photo Inpainting

Face Swapping / Deep Video Portraits / Deep Fakes