- Title: Adversarially Learned Inference
- Authors: Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Alex Lamb, Martin Arjovsky, Olivier Mastropietro, Aaron Courville
- Link: http://arxiv.org/abs/1606.00704
- Tags: Neural Network, GAN, variational
- Year: 2016
-
What
- They suggest a new architecture for GANs.
- Their architecture adds another Generator for a reverse branch (from images to noise vector
z). - Their architecture takes some ideas from VAEs/variational neural nets.
- Overall they can improve on the previous state of the art (DCGAN).
-
How
- Architecture
- Usually, in GANs one feeds a noise vector
zinto a Generator (G), which then generates an image (x) from that noise. - They add a reverse branch (G2), in which another Generator takes a real image (
x) and generates a noise vectorzfrom that.- The noise vector can now be viewed as a latent space vector.
- Instead of letting G2 generate discrete values for
z(as it is usually done), they instead take the approach commonly used VAEs and use continuous variables instead.- That is, if
zrepresentsNlatent variables, they let G2 generateNmeans andNvariances of gaussian distributions, with each distribution representing one value ofz. - So the model could e.g. represent something along the lines of "this face looks a lot like a female, but with very low probability could also be male".
- That is, if
- Usually, in GANs one feeds a noise vector
- Training
- The Discriminator (D) is now trained on pairs of either
(real image, generated latent space vector)or(generated image, randomly sampled latent space vector)and has to tell them apart from each other. - Both Generators are trained to maximally confuse D.
- G1 (from
ztox) confuses D maximally, if it generates new images that (a) look real and (b) fit well to the latent variables inz(e.g. ifzsays "image contains a cat", then the image should contain a cat). - G2 (from
xtoz) confuses D maximally, if it generates good latent variableszthat fit to the imagex.
- G1 (from
- Continuous variables
- The variables in
zfollow gaussian distributions, which makes the training more complicated, as you can't trivially backpropagate through gaussians. - When training G1 (from
ztox) the situation is easy: You draw a randomz-vector following a gaussian distribution (N(0, I)). (This is basically the same as in "normal" GANs. They just often use uniform distributions instead.) - When training G2 (from
xtoz) the situation is a bit harder.- Here we need to use the reparameterization trick here.
- That roughly means, that G2 predicts the means and variances of the gaussian variables in
zand then we draw a sample ofzaccording to exactly these means and variances. - That sample gives us discrete values for our backpropagation.
- If we do that sampling often enough, we get a good approximation of the true gradient (of the continuous variables). (Monte Carlo approximation.)
- The variables in
- The Discriminator (D) is now trained on pairs of either
- Architecture
-
Results
- Images generated based on Celeb-A dataset:
- Left column per pair: Real image, right column per pair: reconstruction (
x -> zvia G2, thenz -> xvia G1) - Reconstructions of SVHN, notice how the digits often stay the same, while the font changes:
- CIFAR-10 samples, still lots of errors, but some quite correct:



