This post was originally published by Mohammed Alhamid at Towards Data Science
A walkthrough on how GAN models work with examples in Python.
The hypothetical example of Machine Learning is imagined around having a machine that is able to think and mimic passing a test with some degree of intelligence. Although this the ultimate goal, we are not there yet, and we still have a long way to go. In the past few years, many models have been developed to learn in an unsupervised mode attempting to engage in a competitive setting against another computer or human to perform a certain task. This article shed some light on the use of Generative Adversarial Networks (GANs) and how they can be used in today’s world.
Machine Learning has shown some power to recognize patterns such as data distribution, images, and sequence of events to solve classification and regression problems. Ian Goodfellow et al. in 2014  published an article using two separate neural networks to generate synthetic data that has similar properties as the real ones. This work has made the research community more interested in generating realistic images, videos, and generic synthetic structural data.
Figure 1: Examples of progressively learning GAN model generating artificial human faces.
GANs are unsupervised deep learning techniques. Usually, it is implemented using two neural networks: Generator and Discriminator. These two models compete with each other in the form of a game setting. The GAN model would be trained on real data and data generated by the generator. The discriminator’s job is to determine fake from real data. The generator is a learning model, so initially, it is likely to produce low or even completely noisy data that does not reflect the real distribution or the properties of the real data.
The generator model’s primary goal is generating artificial data that can pass the discriminator successfully. The model starts taking some noise, usually Gaussian noise, and produces an image formatted as a vector of pixels. The generator must learn how to trick the discriminator and win a positive classification (produced image classified as real). The generation step’s loss is computed whenever any of those generated images detected successfully as “fake”. The discriminator has to learn how to identify those fake images progressively. Negative loss is given to the discriminator whenever the model fails to recognize a fake image. The key concept is the simultaneous training of the generator and the discriminator at the same time.
Example of Generating Handwritten Digits:
The research community has many interesting datasets to measure the accuracy of a GAN model. In this article, we would use a few of those datasets in detail, starting with MNIST. MNIST is one of the most significant examples of explaining the generative models’ theory used widely for image processing. A sample from the MNIST dataset is shown in Figure 2.
Figure 2: Sample of handwritten digit images from MNIST dataset.
To generate artificial handwritten images, we need to implement two models: one to generate fake images and another to classify fake from real ones. The overall pipeline of training a GAN model is shown in Figure 3.
Figure 3: The GAN learning framework, which has the generator and the discriminator simultaneously trained.
There are many architectures to consider for building the discriminator and the generator. We could build a deep neural network or Convolutional Neural Network (CNN) and some other options. We will go over the types of GAN models shortly, but first, let’s pick CNN for now.
The source code of this example is available on my Github.
The discriminator model architecture start by receiving an image (28 x 28 x 1) and pass it through two convolutional layers with 64 filters in each. Alternatively, we could use 128 filters, which represents the number of hidden nodes in each layer. We can make the neural network architecture denser by using three layers with 64, 128, and 256 hidden nodes. To simplify how GAN networks work, we will use simple architecture in this tutorial, which still gives high accuracy. Figure 4 shows the overall architecture of the discriminator.
Figure 4: The architecture of the discriminator model showing the number of layers and parameters in each.
The generator model learns how to generate realistic images, but it needs to start from some random points in the latent space. If you compare the generator architecture in Figure 5 with the discriminator architecture in Figure 4, you would realize they look almost identical. It is essential to know that it is not necessary to flip the discriminator when building the generator network. The generator’s architecture can have a different number of layers, filters, and higher overall complexity.
Figure 5: The architecture of the generator model showing each layer.
Another main difference between the discriminator and the generator is the use of an activation function. The discriminator uses a sigmoid in the output layer. It is a boolean classification problem, and this will ensure the output would be either 0 or 1. The generator, on the other hand, has no loss function or any optimization algorithm to be used. It uses transpose convolution layers to upsample the low-resolution dense layer from the latent space to build a higher resolution image. The trick when building the generator model is that we don’t need to compile it. The GAN model now would combine the full framework, which combines the generator, the discriminator, and compile the model. We will discuss those aspects in detail in the following section.
def building_gan(generator, discriminator): GAN = Sequential() discriminator.trainable = False # Adding the generator and the discriminator GAN.add(generator) GAN.add(discriminator) # Optimization function opt = tf.keras.optimizers.Adam(lr=2e-4, beta_1=0.5) # Compile the model GAN.compile(loss='binary_crossentropy', optimizer=opt) return GAN
The next animation shows how the generator is improving in each set of epochs during the training:
Figure 5: An animated image showing the progressive quality of the generated digits using a GAN model.
One of the critical issues is approximating the quality of the generated data, whether it is an image, a text, or a song, and the diversity of those produced articles. The discriminator helps us to check whether the generated data is real or fake. However, the generated samples might look realistic from the discriminator point of view, but might be too obvious for the human eyes to notice. Hence, we need evaluation metrics that correlate with the subjective evaluation. One way to look into this problem is by analyzing the distribution properties between the real and generated data.
Two evaluation metrics can statistically help measure the quality of the generated data: Inception Score  and Frechet Inception . Both fo these objective metrics are widely adopted by the research community, especially for measuring the quality of produced images. Since this tutorial is an introduction, we will not detail how these metrics work.
(2) Loss Function
As we discussed earlier, the GAN model has a unique property of simultaneously training the generator and the discriminator at the same time. This requires loss functions that balance the training on one side (discriminator) while also improving the training on the other side of (generator). When building the discriminator model, we explicitly define the loss function just like any other neural network architecture.
# Defining the discriminator model def building_discriminator(): # The image dimensions provided as inputs image_shape = (28, 28, 1) disModel = Sequential() disModel.add(Conv2D(64, 3, strides=2, input_shape=image_shape)) disModel.add(LeakyReLU()) disModel.add(Dropout(0.4)) # Second layer disModel.add(Conv2D(64, 3, strides=2)) disModel.add(LeakyReLU()) disModel.add(Dropout(0.4)) # Flatten the output disModel.add(Flatten()) disModel.add(Dense(1, activation='sigmoid')) # Optimization function opt = tf.keras.optimizers.Adam(lr=2e-4, beta_1=0.5) # Compile the model disModel.compile(loss='binary_crossentropy', optimizer=opt, metrics = ['accuracy']) return disModel
The generator model, on the other hand, does not have the loss function explicitly defined. It is based on the training of the discriminator and the generator updated according to its loss function.
# Defining the generator model def building_generator(noise_dim): genModel = Sequential() genModel.add(Dense(128 * 6 * 6, input_dim=noise_dim)) genModel.add(LeakyReLU()) genModel.add(Reshape((6,6,128))) # Second layer genModel.add(Conv2DTranspose(128, (4,4), strides=(2,2))) genModel.add(LeakyReLU()) # Third layer genModel.add(Conv2DTranspose(128, (4,4), strides=(2,2))) genModel.add(LeakyReLU()) genModel.add(Conv2D(1, (3,3), activation='sigmoid')) return genModel
There are few options for choosing the loss functions, such as:
- Least squares.
- Wasserstein loss function.
(3) Determination of Convergence
One of the key issues associated with GAN models is the determination of when the model is converged. The competition between the discriminator and generator makes the game hard to reach a final winner. Both models are optimally want to maximize their gain and minimize their loss. In our situation, we want both models to reach the point where they almost make a complete guess whether an image is fake or real and whether the generated image will pass the discriminator successfully. The 50–50 chance is the perfect ideal case inherited from the game theory, where both models are good hard to win over.
GAN models are known to have the problem of slow convergence. Similar to other unsupervised models, the absence of true labels increase the challenge for determining when the training can stop. We need to make sure to balance between the training time and the produced quality. Several factors contribute to slow or speed up the training process, such as normalization of inputs, batch normalization, gradient penalties, and training the discriminator well before training the GAN model.
(4) Produced Image Sizes
GAN models are known to have limited capabilities when it comes to the size of the generated images. The image size that we have seen in the MNIST examples is only 28 x 28 pixels. These are pretty small images to use in a real application. If we want to generate bigger images, let us say 1024 x 1024, we will need a more scalable model. The research community has been interested in improving GAN capabilities. For instance, in 2017, T Karras et al. propose a novel model called Progressive Growing GANs to solve such a problem .
Some of the challenges introduced in the previous sections made the research community expand the GAN models’ idea to tackle one or more of the issues mentioned above. This section covers some popular extensions and optimized GAN architectures to scale up the original GAN capabilities.
Figure 6: An overview of the types of GAN model architecture and extensions.
Deep Convolutional GAN (DCGAN): This an extension to replace the feed-forward neural network with a CNN architecture proposed by A. Radford et al. . The idea of using a CNN architecture and learning through filters have improved the accuracy of GAN models.
Wasserstein GAN (WGAN): WGAN is designed by M. Arjovsky et al. . WGAN focuses on defining the distance between the generated distribution and the real distribution, which determines the model’s convergence. They propose the use of Earth Mover (EM) distance to approximate the differences between those distributions effectively.
Progressive GAN: ProgressiveGAN is designed by T. Karras et al.  and presented at ICLR conference. This work bough high contributions to the generator and discriminator to grow progressively from lower-resolution to higher-resolution layers. The technique requires reducing the size of the mini-batches while computing the mini-batch standard deviation. ProgressiveGan also uses equalized learning rate, and pixel-wise feature normalization.
This post was originally published by Mohammed Alhamid at Towards Data Science