Generative adversarial networks were proposed in a 2014 paper(Ian Goodfellow et al., “Generative Adversarial Nets,” Proceedings of the 27th International Conference on Neural
Information Processing Systems 2 (2014): 2672–2680.) by Ian Goodfellow et al., and although the idea got researchers excited almost instantly, it took a few years to overcome some of the difficulties of training GANs. Like many great ideas, it seems simple in hindsight事后诸葛亮: make neural networks compete against each other in the hope that this competition will push them to excel. As shown in Figure 17-15, a GAN is composed of two neural networks:
Figure 17-15. A generative adversarial network
During training, the generator and the discriminator have opposite goals: the discriminator tries to tell fake images from real images, while the generator tries to produce images that look real enough to trick the discriminator. Because the GAN is composed of two networks with different objectives, it cannot be trained like a regular neural network. Each training iteration is divided into two phases:
The generator never actually sees any real images, yet it gradually learns to produce convincing fake images! All it gets is the gradients flowing back through the discriminator. Fortunately, the better the discriminator gets, the more information about the real images is contained in these secondhand gradients, so the generator can make significant progress.
Let’s go ahead and build a simple GAN for Fashion MNIST.
First, we need to build the generator and the discriminator. The generator is similar to an autoencoder’s decoder, and the discriminator is a regular binary classifier (it takes an image as input and ends with a Dense layer containing a single unit and using the sigmoid activation function). For the second phase of each training iteration, we also need the full GAN model containing the generator followed by the discriminator:
import numpy as np
import tensorflow as tf
generator = keras.models.Sequential([
keras.layers.Dense( 100, activation="selu", input_shape=[codings_size] ),
keras.layers.Dense( 150, activation="selu" ),
keras.layers.Dense( 28*28, activation="sigmoid"),
discriminator = keras.models.Sequential([
keras.layers.Flatten( input_shape=[28,28] ),
keras.layers.Dense( 150, activation="selu" ),
keras.layers.Dense( 100, activation="selu" ),
keras.layers.Dense( 1, activation="sigmoid" )
gan = keras.models.Sequential([ generator, discriminator ])
Next, we need to compile these models. As the discriminator is a binary classifier, we can naturally use the binary cross-entropy loss. The generator will only be trained through the gan model, so we do not need to compile it at all. The gan model is also a binary classifier, so it can use the binary cross-entropy loss. Importantly, the discriminator should not be trained during the second phase, so we make it non-trainable before compiling the gan model:
discriminator.compile( loss="binary_crossentropy", optimizer="rmsprop" )
discriminator.trainable = False
gan.compile( loss="binary_crossentropy", optimizer="rmsprop" )
The trainable attribute is taken into account by Keras only when compiling a model, so after running this code, the discriminator is trainable if we call its fit() method or its train_on_batch() method (which we will be using), while it is not trainable when we call these methods on the gan model.
Since the training loop is unusual, we cannot use the regular fit() method. Instead, we will write a custom training loop. For this, we first need to create a Dataset to iterate through the images:
from tensorflow import keras
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()
X_train_full = X_train_full.astype( np.float32 )/255
X_test = X_test.astype( np.float32 )/255
X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]
y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]
batch_size = 32
dataset = tf.data.Dataset.from_tensor_slices( X_train ).shuffle( 1000 )
dataset = dataset.batch( batch_size, drop_remainder=True ).prefetch(1)
import matplotlib.pyplot as plt
def plot_multiple_images( images, n_cols=None ):
n_cols = n_cols or len(images)
n_rows = ( len(images)-1 )//n_cols + 1
if images.shape[-1] == 1:
images = np.squeeze( images, axis=-1 )
plt.figure( figsize=(n_cols, n_rows) )
for index, image in enumerate(images):
plt.subplot( n_rows, n_cols, index+1 )
plt.imshow( image, cmap="binary" )
As discussed earlier, you can see the two phases at each iteration:
def train_gan( gan, dataset, batch_size, codings_size, n_epochs=50 ):
# gan = keras.models.Sequential([ generator, discriminator ])
generator, discriminator = gan.layers
for epoch in range( n_epochs ):
print( "Epoch {}/{}".format( epoch+1, n_epochs ) )
for X_batch in dataset:
# phase 1 - training the discriminator
######################## ̂ = () ########################
noise = tf.random.normal( shape=[batch_size, codings_size] ) # mean=0.0, stddev=1.0
generated_images = generator( noise ) # training=True
X_fake_and_real = tf.concat( [generated_images, X_batch],
# label generated_images and X_batch
# OR
# tf.zeros_like(generated_images) + tf.ones_like(X_batch)
y1 = tf.constant( [[0.]]*batch_size + [[1.]]*batch_size ) # + is tf.concat
discriminator.trainable=True ########
discriminator.train_on_batch( X_fake_and_real,
y1 ) # Runs a single gradient update on a single batch of data.
# phase 2 - training the generator
noise = tf.random.normal( shape=[batch_size, codings_size ] )
# for training the generator, we swap the labels of real and fake examples
# by assigning label 1 to the outputs of the generator
y2 = tf.constant( [[1.]] * batch_size )
discriminator.trainable = False ########
gan.train_on_batch( noise, y2 )
plot_multiple_images( generated_images, 8 )
train_gan(gan, dataset, batch_size, codings_size, n_epochs=1)
Figure 17-16. Images generated by the GAN after one epoch of training
That’s it! If you display the generated images (see Figure 17-16), you will see that at the end of the first epoch, they already start to look like (very noisy) Fashion MNIST images.
noise = tf.random.normal( shape=[batch_size, codings_size] )
generated_images = generator( noise )
plot_multiple_images( generated_images, 8 )
Unfortunately, the images never really get much better than that, and you may even find epochs where the GAN seems to be forgetting what it learned. Why is that? Well, it turns out that training a GAN can be challenging. Let’s see why.
During training, the generator and the discriminator constantly try to outsmart each other, in a zero-sum game. As training advances, the game may end up in a state that game theorists call a Nash equilibrium纳什均衡, named after the mathematician John Nash: this is when no player would be better off changing their own strategy, assuming the other players do not change theirs. For example, a Nash equilibrium is reached when everyone drives on the left side of the road: no driver would be better off being the only one to switch sides. Of course, there is a second possible Nash equilibrium: when everyone drives on the right side of the road. Different initial states and dynamics may lead to one equilibrium or the other. In this example, there is a single optimal strategy once an equilibrium is reached (i.e., driving on the same side as everyone else), but a Nash equilibrium can involve multiple competing strategies (e.g., a predator[ˈpredətər]捕食者 chases its prey[preɪ]猎物, the prey tries to escape, and neither would be better off changing their strategy).
So how does this apply to GANs? Well, the authors of the paper demonstrated that a GAN can only reach a single Nash equilibrium: that’s when the generator produces perfectly realistic images, and the discriminator is forced to guess (50% real, 50% fake). This fact is very encouraging: it would seem that you just need to train the GAN for long enough, and it will eventually reach this equilibrium, giving you a perfect generator. Unfortunately, it’s not that simple: nothing guarantees that the equilibrium will ever be reached.https://blog.csdn.net/Linli522362242/article/details/117370337
The biggest difficulty is called mode collapse: this is when the generator’s outputs gradually become less diverse(OR One common cause of failure in training GANs is when the generator gets stuck in a small subspace and learns to generate similar samples. This is called mode collapse, and an example is shown in the previous figure.). How can this happen? Suppose that the generator gets better at producing convincing shoes than any other class. It will fool the discriminator a bit more with shoes, and this will encourage it to produce even more images of shoes. Gradually, it will forget how to produce anything else. Meanwhile, the only fake images that the discriminator will see will be shoes, so it will also forget how to discriminate fake images of other classes. Eventually, when the discriminator manages to discriminate the fake shoes from the real ones, the generator will be forced to move to another class. It may then become good at shirts, forgetting about shoes, and the discriminator will follow. The GAN may gradually cycle across a few classes, never really becoming very good at any of them.
Moreover, because the generator and the discriminator are constantly pushing against each other, their parameters may end up oscillating振荡的 and becoming unstable. Training may begin properly, then suddenly diverge for no apparent reason, due to these instabilities. And since many factors affect these complex dynamics, GANs are very sensitive to the hyperparameters: you may have to spend a lot of effort fine-tuning them.
These problems have kept researchers very busy since 2014: many papers were published on this topic, some proposing new cost functions ### For a nice comparison of the main GAN losses, check out this great GitHub project by Hwalsuk Lee. ### (though a 2018 paper ### Mario Lucic et al., “Are GANs Created Equal? A Large-Scale Study,” Proceedings of the 32nd International Conference on Neural Information Processing Systems (2018): 698–707. ### by Google researchers questions their efficiency) or techniques to stabilize training or to avoid the mode collapse issue. For example, a popular technique called experience replay consists in
Another common technique is called mini-batch discrimination: it measures how similar images are across the batch and provides this statistic to the discriminator, so it can easily reject a whole batch of fake images that lack diversity. This encourages the generator to produce a greater variety of images, reducing the chance of mode collapse. Other papers simply propose specific architectures that happen to perform well.
In short, this is still a very active field of research, and the dynamics of GANs are still not perfectly understood. But the good news is that great progress has been made, and some of the results are truly astounding! So let’s look at some of the most successful architectures, starting with deep convolutional GANs, which were the state of the art just a few years ago. Then we will look at two more recent (and more complex) architectures.
The original GAN paper in 2014 experimented with convolutional layers, but only tried to generate small images. Soon after, many researchers tried to build GANs based on deeper convolutional nets for larger images. This proved to be tricky, as training was very unstable, but Alec Radford et al. finally succeeded in late 2015, after
experimenting with many different architectures and hyperparameters. They called their architecture deep convolutional GANs (DCGANs).(Alec Radford et al., “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks,” arXiv preprint arXiv:1511.06434 (2015).) Here are the main guidelines they proposed for building stable convolutional GANs:
These guidelines will work in many cases, but not always, so you may still need to experiment with different hyperparameters (in fact, just changing the random seed and training the same model again will sometimes work). For example, here is a small DCGAN that works reasonably well with Fashion MNIST:
The generator
codings_size = 100
generator = keras.models.Sequential([
keras.layers.Dense( units=7*7*128, input_shape=[codings_size,]), # ==> # 128*7*7=6272
# tf.keras.layers.selu(),
keras.layers.Conv2DTranspose( 64, kernel_size=5,
strides=2, padding="SAME", # (14,14,filters=64)
keras.layers.Conv2DTranspose( 1, kernel_size=5,
strides=2, padding="SAME", # (28,28,filters=1)
The discriminator looks much like a regular CNN for binary classification, except instead of using max pooling layers to downsample the image, we use strided convolutions (strides=2). Also note that we use the leaky ReLU activation function.
Overall, we respected the DCGAN guidelines, except we replaced the BatchNormalization layers in the discriminator with Dropout layers (otherwise training was unstable in this case) and we replaced ReLU with SELU in the generator. Feel free to tweak this architecture: you will see how sensitive it is to the hyperparameters (especially the relative learning rates of the two networks).
discriminator = keras.models.Sequential([
keras.layers.Conv2D( 64, kernel_size=5,
strides=2, padding="SAME", # (14,14,filters=64)
keras.layers.Conv2D( 128, kernel_size=5,
strides=2, padding="SAME", # (7,7,filters=128)
keras.layers.Dense(1, activation="sigmoid"),
gan = keras.models.Sequential([generator, discriminator])
The generator's last layer( transposed convolutional layer) uses the tanh activation function, so the outputs will range from –1 to 1. For this reason, before training the GAN, we need to rescale the training set to that same range. We also need to reshape it to add the channel dimension:
# scale them by a factor of 2 and shift them by –1 such that
# the pixel intensities will be rescaled to be in the range [–1, 1]
X_train_dcgan = X_train.reshape(-1, 28,28,1)*2. -1. # reshape and rescale
Lastly, to build the dataset, then compile and train this model, we use the exact same code as earlier.
batch_size = 32
dataset = tf.data.Dataset.from_tensor_slices( X_train_dcgan )
dataset = dataset.shuffle(1000)
dataset = dataset.batch( batch_size, drop_remainder=True ).prefetch(1)
train_gan(gan, dataset, batch_size, codings_size)
After 50 epochs of training, the generator produces images like those shown in Figure 17-17. It’s still not perfect, but many of these images are pretty convincing.==>
Figure 17-17. Images generated by the DCGAN after 50 epochs of training
noise = tf.random.normal( shape=[batch_size, codings_size] )
generated_images = generator(noise)
plot_multiple_images( generated_images, 8)
#############################Figure 17-18. Vector arithmetic for visual concepts (part of figure 7 from the DCGAN paper)(Reproduced with the kind authorization of the authors.)
If you scale up this architecture and train it on a large dataset of faces, you can get fairly realistic images. In fact, DCGANs can learn quite meaningful latent representations, as you can see in Figure 17-18: many images were generated, and nine of them were picked manually (top left), including
For each of these categories, the codings that were used to generate the images were averaged, and an image was generated based on the resulting mean codings (lower left). In short, each of the three lower-left images represents the mean of the three images located above it. But this is not a simple mean computed at the pixel level (this would result in three overlapping faces), it is a mean computed in the latent space, so the images still look like normal faces. Amazingly, if you compute men with glasses, minus men without glasses, plus women without glasses—where each term corresponds to one of the mean codings—and you generate the image that corresponds to this coding, you get the image at the center of the 3 × 3 grid of faces on the right: a woman with glasses! The eight other images around it were generated based on the same vector plus a bit of noise, to illustrate the semantic interpolation (https://blog.csdn.net/Linli522362242/article/details/117370337) capabilities of DCGANs. Being able to do arithmetic on faces feels like science fiction!
If you add each image’s class as an extra input to both the generator and the discriminator, they will both learn what each class looks like, and thus you will be able to control the class of each image produced by the generator. This is called a conditional GAN (CGAN)(Mehdi Mirza and Simon Osindero, “Conditional Generative Adversarial Nets,” arXiv preprint arXiv:1411.1784 (2014).). ###Conditional Generative Adversarial Nets (https://arxiv.org/pdf/1411.1784.pdf) uses the class label information and learns to synthesize new images conditioned on the provided label, that is, ̃ = (|). Furthermore, conditional GANs allows us to do image-to-image translation, which is to learn how to convert a given image from a specific domain to another. In this context, one interesting work is the Pix2Pix algorithm, published in the paper Image-to-Image Translation with Conditional Adversarial Networks by PhilipIsola et al.(https://arxiv.org/pdf/1611.07004.pdf). It is worth mentioning that in the Pix2Pix algorithm, the discriminator provides the real/fake predictions for multiple patches across the image as opposed to a single prediction for an entire image.
DCGANs aren’t perfect, though. For example, when you try to generate very large images using DCGANs, you often end up with locally convincing features but overall inconsistencies (such as shirts with one sleeve much longer than the other). How can you fix this?
An important technique was proposed in a 2018 paper(Tero Karras et al., “Progressive Growing of GANs for Improved Quality, Stability, and Variation,” Proceedings of the International Conference on Learning Representations (2018).) by Nvidia researchers Tero Karras et al.: they suggested
Figure 17-19. Progressively growing GAN: a GAN generator outputs 4 × 4 color images (left); we extend it to output 8 × 8 images (right)
For example, when growing the generator’s outputs from 4 × 4 to 8 × 8 (see Figure 17-19),
The paper also introduced several other techniques aimed at increasing the diversity of the outputs (to avoid mode collapse) and making training more stable:
tf.concat( [ inputs,
tf.fill( [batch_size, height, width, 1],
). How does this help? Well, if the generator produces images with little variety, then there will be a small standard deviation across feature maps in the discriminator. Thanks to this layer, the discriminator will have easy access to this statistic, making it less likely to be fooled by a generator that produces too little diversity. This will encourage the generator to produce more diverse outputs, reducing the risk of mode collapse.inputs / tf.sqrt( tf.reduce_mean( tf.square(X),
axis=-1, # across all channels
) + 1e-8
(the smoothing term 1e-8 is needed to avoid division by zero). This technique avoids explosions in the activations due to excessive competition between the generator and the discriminator. 通过给真实样本加噪声能够起到均衡Generator和Discriminator的作用,起到缓解mode collapse的作用,这一点在WGAN(Wasserstein GAN)的前传中就已经提到[5]。尽管使用LSGAN会比原始的GAN更容易训练,然而它在Discriminator的输出接近1的适合,梯度就消失,不能给Generator起到引导作用。针对D趋近1的这种特性,作者提出了下面这种添加噪声的方式
The combination of all these techniques allowed the authors to generate extremely convincing high-definition images of faces. But what exactly do we call “convincing”? Evaluation is one of the big challenges when working with GANs: although it is possible to automatically evaluate the diversity of the generated images, judging their quality is a much trickier and subjective task. One technique is to use human raters, but this is costly and time-consuming. So the authors proposed to measure the similarity between the local image structure of the generated images and the training images, considering every scale. This idea led them to another groundbreaking innovation: StyleGANs.
The state of the art in high-resolution image generation was advanced once again by the same Nvidia team in a 2018 paper(Tero Karras et al., “A Style-Based Generator Architecture for Generative Adversarial Networks,” arXiv preprint arXiv:1812.04948 (2018).) that introduced the popular StyleGAN architecture. The authors used style transfer techniques in the generator to ensure that the generated images have the same local structure as the training images, at every scale, greatly improving the quality of the generated images. The discriminator and the loss function were not modified, only the generator. Let’s take a look at the StyleGAN. It is composed of two networks (see Figure 17-20):Figure 17-20. StyleGAN’s generator architecture (part of figure 1 from the StyleGAN paper)(Reproduced with the kind authorization of the authors.)
The idea of adding noise independently from the codings is very important. Some parts of an image are quite random, such as the exact position of each freckle[ˈfrekl]雀斑 or hair. In earlier GANs, this randomness had to either come from the codings or be some pseudorandom noise produced by the generator itself.
By adding extra noise inputs, all these issues are avoided; the GAN is able to use the provided noise to add the right amount of stochasticity to each part of the image.
The added noise is different for each level. Each noise input consists of a single feature map full of Gaussian noise, which is broadcast to all feature maps (of the given level) and scaled using learned per-feature scaling factors (this is represented by the “B” boxes in Figure 17-20) before it is added.
Finally, StyleGAN uses a technique called mixing regularization (or style mixing), where a percentage of the generated images are produced using two different codings. Specifically, the codings c1 and c2 are sent through the mapping network, giving two style vectors w1 and w2. Then the synthesis network generates an image based on the styles w1 for the first levels and the styles w2 for the remaining levels.
The cutoff level is picked randomly. This prevents the network from assuming that styles at adjacent levels are correlated, which in turn encourages locality in the GAN, meaning that each style vector only affects a limited number of traits in the generated image.
There is such a wide variety of GANs out there that it would require a whole book to cover them all. Hopefully this introduction has given you the main ideas, and most importantly the desire to learn more. If you’re struggling with a mathematical concept, there are probably blog posts out there that will help you understand it better. Then go ahead and implement your own GAN, and do not get discouraged if it has trouble learning at first: unfortunately, this is normal, and it will require quite a bit of patience before it works, but the result is worth it. If you’re struggling with an implementation detail, there are plenty of Keras or TensorFlow implementations that you can look at. In fact, if all you want is to get some amazing results quickly, then you can just use a pretrained model (e.g., there are pretrained StyleGAN models available for Keras).
In the next chapter we will move to an entirely different branch of Deep Learning: Deep Reinforcement Learning.
2. Suppose you want to train a classifier, and you have plenty of unlabeled training data but only a few thousand labeled instances. How can autoencoders help? How would you proceed?
If you want to train a classifier and you have plenty of unlabeled training data but only a few thousand labeled instances, then you could first train a deep autoencoder on the full dataset (labeled + unlabeled), then reuse its lower half for the classifier (i.e., reuse the layers up to the codings layer, included) and train the classifier using the labeled data. If you have little labeled data, you probably want to freeze the reused layers when training the classifier.Figure 17-6. Unsupervised pretraining using autoencoders
3. If an autoencoder perfectly reconstructs the inputs, is it necessarily a good autoencoder? How can you evaluate the performance of an autoencoder?
The fact that an autoencoder perfectly reconstructs its inputs does not necessarily mean that it is a good autoencoder; perhaps it is simply an overcomplete autoencoder( overcomplete, where the dimensionality of the latent vector, z, is, in fact, greater than the dimensionality of the input examples (p > d). ) that learned to copy its inputs to the codings layer and then to the outputs. In fact, even if the codings layer contained a single neuron, it would be possible for a very deep autoencoder to learn to map each training instance to a different coding (e.g., the first instance could be mapped to 0.001, the second to 0.002, the third to 0.003, and so on), and it could learn “by heart” to reconstruct the right training instance for each coding. It would perfectly reconstruct its inputs without really learning any useful pattern in the data(OR without learning any useful features). In practice such a mapping is unlikely to happen, but it illustrates the fact that perfect reconstructions are not a guarantee that the autoencoder learned anything useful. However, if it produces very bad reconstructions, then it is almost guaranteed to be a bad autoencoder. To evaluate the performance of an autoencoder, one option is to measure the reconstruction loss (e.g., compute the MSE, or the mean square of the outputs minus the inputs). Again, a high reconstruction loss is a good sign that the autoencoder is bad, but a low reconstruction loss is not a guarantee that it is good. You should also evaluate the autoencoder according to what it will be used for. For example, if you are using it for unsupervised pretraining of a classifier, then you should also evaluate the classifier’s performance.
4. What are undercomplete and overcomplete autoencoders? What is the main risk of an excessively undercomplete autoencoder? What about the main risk of an overcomplete autoencoder?
An undercomplete autoencoder is one whose codings layer is smaller than the input and output layers. If it is larger, then it is an overcomplete autoencoder.
The main risk of an excessively undercomplete autoencoder is that it may fail to reconstruct the inputs.
The main risk of an overcomplete autoencoder is that it may just copy the inputs to the outputs, without learning any useful features.
To tie the weights of an encoder layer and its corresponding decoder layer, you simply make the decoder weights equal to the transpose of the encoder weights. This reduces the number of parameters in the model by half, often making training converge faster with less training data and reducing the risk of overfitting the training set.
Specifically, if the autoencoder has a total of N layers (not counting the input layer), and represents the connection weights of the
layer (e.g., layer 1 is the first hidden layer, layer N/2 is the coding layer, and layer N is the output layer), then the decoder layer weights can be defined simply as:
(with L = 1, 2, …, N/2).
To tie weights between layers using Keras, let’s define a custom layer:
class DenseTranspose( keras.layers.Layer ):
def __init__( self, dense, activation=None, **kwargs ):
self.dense = dense
self.activation = keras.activations.get( activation )
super().__init__( **kwargs )
def build( self, batch_input_shape ):
# for using its own bias vector
self.biases = self.add_weight( name="bias",
shape=[ self.dense.input_shape[-1] ], # uses 100 for the DenseTranspose( dense_2, activation="selu" )
initializer='zeros' )
super().build( batch_input_shape ) #batch_input_shape= (batch_size, input_dimensions)
def call( self, inputs ):
# for the DenseTranspose( dense_2, activation="selu" )
# inputs: Tensor("Placeholder:0", shape=(None, 30), dtype=float32)
# self.dense.weights:
# [,
# # can't be used #############################################
# ]
z = tf.matmul( inputs, self.dense.weights[0],
transpose_b = True # for the second argument is transposed
) # before multiplication # self.dense.weights[0] ==> (30~input features,100) # x_1*w_1 + ... + x_n*w_n + b
return self.activation( z+self.biases )
dense_1 = keras.layers.Dense( 100, activation="selu" )
dense_2 = keras.layers.Dense( 30, activation="selu" )
tied_encoder = keras.models.Sequential([
keras.layers.Flatten( input_shape=[28,28] ), # 784=28*28
dense_1, # weight shape: (784, 100) # input_shape(?,784 neurons)
dense_2, # weight shape: (100, 30) # input_shape(?,100 neurons)
]) # output==> (batch_size, 30)
tied_decoder = keras.models.Sequential([
DenseTranspose( dense_2, activation="selu" ),
DenseTranspose( dense_1, activation="sigmoid" ),
tied_ae = keras.models.Sequential([ tied_encoder, tied_decoder ])
tied_ae.compile( loss="binary_crossentropy",
optimizer = keras.optimizers.SGD(lr=1.5),
metrics=[rounded_accuracy] )
history = tied_ae.fit( X_train, X_train, epochs=10,
validation_data=(X_valid, X_valid)
A generative model is a model capable of randomly generating outputs that resemble the training instances. For example, once trained successfully on the MNIST dataset, a generative model can be used to randomly generate realistic images of digits. The output distribution is typically similar to the training data. For example, since MNIST contains many images of each digit, the generative model would output roughly the same number of images of each digit. Some generative models can be parametrized—for example, to generate only some kinds of outputs. An example of a generative autoencoder is the variational autoencoder. https://blog.csdn.net/Linli522362242/article/details/116576478
A generative adversarial network is a neural network architecture composed of two parts, the generator and the discriminator, which have opposing objectives. The generator’s goal is to generate instances similar to those in the training set, to fool the discriminator. The discriminator must distinguish the real instances from the generated ones.
At each training iteration, the discriminator is trained like a normal binary classifier, then the generator is trained to maximize the discriminator’s error.
( As discussed earlier, you can see the two phases at each iteration:
GANs are used for advanced image processing tasks such as super resolution, colorization, image editing (replacing objects with realistic background), turning a simple sketch into a photorealistic image, or predicting the next frames in a video. They are also used to augment(https://blog.csdn.net/Linli522362242/article/details/108396485) a dataset (to train other models), to generate other types of data (such as text, audio, and time series), and to identify the weaknesses in other models and strengthen them.
Training GANs is notoriously臭名昭著 difficult, because of the complex dynamics between the generator and the discriminator. The biggest difficulty is mode collapse, where the generator produces outputs with very little diversity. Moreover, training can be terribly unstable: it may start out fine and then suddenly start oscillating摆动的 or diverging, without any apparent reason. GANs are also very sensitive to the choice of hyperparameters.
from tensorflow import keras
[X_train, y_train], [X_test, y_test] = keras.datasets.cifar10.load_data()
X_train = X_train /255.
X_test = X_test / 255.
import tensorflow as tf
import numpy as np
denoising_encoder = keras.models.Sequential([
keras.layers.GaussianNoise(0.1, input_shape=[32, 32, 3]),
keras.layers.Conv2D(32, kernel_size=3, padding="same", activation="relu"),
keras.layers.Flatten(), # since Dense
keras.layers.Dense(512, activation="relu"),
denoising_decoder = keras.models.Sequential([
keras.layers.Dense( 16*16*32, activation="relu", input_shape=[512]),
keras.layers.Conv2DTranspose( filters=3, kernel_size=3,
strides=2, padding="same",
When compiling the stacked autoencoder, we use the binary cross-entropy loss instead of the mean squared error. We are treating the reconstruction task as a multilabel binary classification problem(https://blog.csdn.net/Linli522362242/article/details/103866244): each pixel(or each label) intensity represents the probability that the pixel should be R/G/B. Framing it this way (rather than as a regression problem) tends to make the model converge faster.
denoising_ae = keras.models.Sequential([denoising_encoder, denoising_decoder])
denoising_ae.compile( loss="binary_crossentropy",
history = denoising_ae.fit( X_train, X_train, epochs=10,
validation_data=(X_test, X_test)
import matplotlib.pyplot as plt
%matplotlib inline
n_images = 5
new_images = X_test[:n_images]
new_images_noisy = new_images + np.random.randn( n_images, 32, 32, 3 )*0.1
new_images_denoised = denoising_ae.predict( new_images_noisy )
plt.figure( figsize=(6, n_images*2) )
for index in range( n_images ):
plt.subplot( n_images, 3, index*3 +1 )
plt.imshow( new_images[index] )
if index ==0:
plt.subplot( n_images, 3, index*3 +2 )
# if not clip: warning Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
plt.imshow( np.clip(new_images_noisy[index], 0., 1.) )
if index ==0:
plt.subplot( n_images, 3, index*3 +3 )
plt.imshow( new_images_denoised[index] )
if index ==0: