此为《Gans in Action》(对抗神经网络实战)第三章读书笔记
This chapter covers
本章内容包括:GAN原理、GAN与卷积神经网络差异、用Keras实现GAN产生手写数字
In this chapter, we explore the foundational theory behind GANs. We introduce the commonly used mathematical notation you may encounter if you choose to dive deeper into this field, perhaps by reading a more theoretically focused publication or even one of the many academic papers on this topic… This chapter also provides background knowledge for the more advanced chapters, particularly chapter 5.
本章介绍GAN基础理论,常用数学符号,以及后面高级章节的背景知识
From a strictly practical standpoint, however, you don’t have to worry about many of these formalisms—much as you don’t need to know how an internal combustion engine works to drive a car. Machine learning libraries such as Keras and TensorFlow abstract the underlying mathematics away from us and neatly package them into importable lines of code.
严格来说不需要知道底层如何实现也行。。。Keras和TensorFlow这些库已经为我们封装好了,几行代码就能搞定。
This will be a recurring theme throughout this book; it is also true for machine learning and deep learning in general. So, if you are someone who prefers to dive straight into practice, feel free to skim through the theory section and skip ahead to the coding tutorial.
如果你喜欢实践,可以随意浏览下理论部分直接跳到代码实战教程
Formally, the Generator and the Discriminator are represented by differentiable functions, such as neural networks, each with its own cost function. The two networks are trained by backpropagation by using the Discriminator’s loss. The Discriminator strives to minimize the loss for both the real and the fake examples, while the Generator tries to maximize the Discriminator’s loss for the fake examples it produces.
生成器和识别器分别有一个神经网络,两者都通过识别器的损失函数来反向传播更新参数。识别器尽量减少自己的损失,生成器尽量加大识别器的损失。
This dynamic is summarized in figure 3.1. It is a more general version of the diagram from chapter 1, where we first explained what GANs are and how they work. Instead of the concrete example of handwritten digits, in this diagram, we have a general training dataset which, in theory, could be anything.
下图3.1是GAN的总结说明
Figure 3.1. In this GAN architecture diagram, both the Generator and the Discriminator are trained using the Discriminator’s loss. The Discriminator strives to minimize the loss; the Generator seeks to maximize the loss for the fake examples it produces.
Importantly, the training dataset determines the kind of examples the Generator will learn to emulate. If, for instance, our goal is to produce realistic-looking images of cats, we would supply our GAN with a dataset of cat images.
提供什么训练集就会产生什么,比如提供猫的训练集图像就会产生猫的图像
In more technical terms, the Generator’s goal is to produce examples that capture the data distribution of the training dataset.[1] Recall that to a computer, an image is just a matrix of values: two-dimensional for grayscale and three-dimensional for color (RGB) images. When rendered onscreen, the pixel values within these matrices manifest all the visual elements of an image—lines, edges, contours, and so forth. These values follow a complex distribution across each image in a dataset; after all, if no distribution is followed, an image will be no more than random noise. Object recognition models learn the patterns in images to discern an image’s content. The Generator can be thought of as the reverse of the process: rather than recognizing these patterns, it learns to synthesize them.
生成器的目标是产生符合训练集数据分布规律的样本,比如图像的线条、边缘和轮廓等,其中遵循复杂的分布规律。毕竟,如果没有分布规律,图像将会是随机噪音。
[1]: See “Generative Adversarial Networks,” by Ian J. Goodfellow et al., 2014, https://arxiv.org/abs/1406.2661.
Following the standard notation, let J ( G ) J^{(G)} J(G) denote the Generator’s cost function and J ( D ) J^{(D)} J(D) the Discriminator’s cost function. The trainable parameters (weights and biases) of the two networks are represented by the Greek letter theta: θ ( G ) θ^{(G)} θ(G) for the Generator and θ ( D ) θ^{(D)} θ(D) for the Discriminator.
生成器损失函数: J ( G ) J^{(G)} J(G),训练参数: θ ( G ) θ^{(G)} θ(G)
识别器损失函数: J ( D ) J^{(D)} J(D),训练参数: θ ( D ) θ^{(D)} θ(D)
GANs differ from conventional neural networks in two key respects. First, the cost function, J J J, of a traditional neural network is defined exclusively in terms of its own trainable parameters, θ. Mathematically, this is expressed as J(θ). In contrast, GANs consist of two networks whose cost functions are dependent on both of the networks’ parameters. That is, the Generator’s cost function is J ( G ) ( θ ( G ) , θ ( D ) ) J^{(G)}(θ^{(G)}, θ^{(D)}) J(G)(θ(G),θ(D)), and the Discriminator’s cost function is J ( D ) ( θ ( G ) , θ ( D ) ) J^{(D)}(θ^{(G)}, θ^{(D)}) J(D)(θ(G),θ(D)).[2]
[2]: See “NIPS 2016 Tutorial: Generative Adversarial Networks,” by Ian Goodfellow, 2016, https://arxiv.org/abs/1701.00160.
生成器和识别器的损失函数都跟两者的训练参数有关,分别是 J ( G ) ( θ ( G ) , θ ( D ) ) J^{(G)}(θ^{(G)}, θ^{(D)}) J(G)(θ(G),θ(D))、 J ( D ) ( θ ( G ) , θ ( D ) ) J^{(D)}(θ^{(G)}, θ^{(D)}) J(D)(θ(G),θ(D))
The second (related) difference is that a traditional neural network can tune all its parameters, θ θ θ, during the training process. In a GAN, each network can tune only its own weights and biases. The Generator can tune only θ(G), and the Discriminator can tune only θ(D) during training. Accordingly, each network has control over only a part of what determines its loss.
生成器和识别器只能调节各自参数
To make this a little less abstract, consider the following analogy. Imagine we are choosing which route to drive home from work. If there is no traffic, the fastest option is the highway. During rush hour, however, we may be better off taking one of the side roads. Despite being longer and windier, they might get us home faster when the highway is all clogged up with traffic.
为了不那么抽象,举个下班开车回家的例子
Let’s phrase it as a math problem. Let J J J be our cost function, defined as the amount of time it takes us to get home. Our goal is to minimize J J J. For simplicity, let’s assume we have a set time to leave the office, so we cannot leave early to get ahead of rush hour or stay late to avoid it. The only parameter, θ θ θ, we can change is our route.
就像下班回家,假设回家的时间为损失函数 J J J,我们无法提前回家避开晚高峰,但我们可以选择不那么拥堵的路线来减少 J J J。
If ours were the only car on the road, our cost would be similar to a regular neural network’s: it would depend only on the route, and it would be entirely within our power to optimize, J ( θ ) J(θ) J(θ). However, as soon as we introduce other drivers into the equation, the situation gets more complicated. Suddenly, the time it will take us to get home depends not only on our decisions but also on other drivers’ course of action, J ( θ ( u s ) , θ ( o t h e r d r i v e r s ) ) J(θ^{(us)},θ^{(other\ drivers)}) J(θ(us),θ(other drivers)). Much like the Generator and Discriminator networks, our “cost function” will depend on an interplay of factors, some of which are under our control and others of which are not.
如果回家路上只有我们一辆车,那么损失函数跟常规神经网络一样,取决于我们自己的决策优化;但考虑到其他司机,损失函数不仅取决于我们自己的决策优化,还取决于其他司机,表示为 J ( θ ( 我 们 ) , θ ( 其 他 司 机 ) ) J(θ^{(我们)},θ^{(其他司机)}) J(θ(我们),θ(其他司机))。就像生成器和识别器,损失函数取决于两者参数的共同作用,但有些参数是自己能控制的,有些参数自己无法控制。
The two differences we’ve described have far-reaching implications on the GAN training process. The training of a traditional neural network is an optimization problem. We seek to minimize the cost function by finding a set of parameters such that moving to any neighboring point in the parameter space would increase the cost. This could be either a local or a global minimum in the parameter space, as determined by the cost function we are seeking to minimize. Figure 3.2 illustrates the optimization process of minimizing a cost function.
神经网络是为了减少损失函数来寻找最优解,如图3.2所示
Figure 3.2. The bowl-shaped mesh represents the loss J J J in the parameter space θ 1 θ_1 θ1 and θ 2 θ_2 θ2. The black dotted line illustrates the minimization of the loss in the parameter space through optimization.
(Source: “Adversarial Machine Learning” by Ian Goodfellow, ICLR Keynote, 2019, www.iangoodfellow.com/slides/2019-05-07.pdf.)
Because the Generator and Discriminator can tune only their own parameters and not each other’s, GAN training can be better described as a game, rather than optimization.[3] The players in this game are the two networks that the GAN comprises.
[3]:Ibid.
由于生成器和识别器只能调节各自参数,GAN训练更像包含两个角色的游戏而不是优化问题。
Recall from chapter 1 that GAN training ends when the two networks reach Nash equilibrium, a point in a game at which neither player can improve their situation by changing their strategy. Mathematically, this occurs when the Generator cost J ( G ) ( θ ( G ) , θ ( D ) ) J^{(G)}(θ^{(G)}, θ^{(D)}) J(G)(θ(G),θ(D)) is minimized with respect to the Generator’s trainable parameters θ ( G ) θ^{(G)} θ(G) and, simultaneously, the Discriminator cost J ( D ) ( θ ( G ) , θ ( D ) ) J^{(D)}(θ^{(G)}, θ^{(D)}) J(D)(θ(G),θ(D)) is minimized with respect to the parameters under this network’s control, θ ( D ) θ^{(D)} θ(D).[4] Figure 3.3 illustrates the setup of a two-player zero-sum game and the process of reaching Nash equilibrium.
[4]:Ibid.
识别器和生成器达到纳什均衡时训练结束。此时 J ( G ) ( θ ( G ) , θ ( D ) ) J^{(G)}(θ^{(G)}, θ^{(D)}) J(G)(θ(G),θ(D))无法通过修改 θ ( G ) θ^{(G)} θ(G)变小, J ( D ) ( θ ( G ) , θ ( D ) ) J^{(D)}(θ^{(G)}, θ^{(D)}) J(D)(θ(G),θ(D))无法通过修改 θ ( D ) θ^{(D)} θ(D)变小。图3.3描述了这一双方博弈过程
Figure 3.3. Player 1 (left) seeks to minimize V V V by tuning θ 1 θ_1 θ1. Player 2 (middle) seeks to minimize – V –V –V (maximize V V V) by tuning θ 2 θ_2 θ2. The saddle-shaped mesh (right) shows the combined loss in the parameter space V ( θ 1 , θ 2 ) V(θ_1, θ_2) V(θ1,θ2). The dotted line shows the convergence to Nash equilibrium at the center of the saddle. (Source: Goodfellow, 2019, www.iangoodfellow.com/slides/2019-05-07.pdf.)
Coming back to our analogy, Nash equilibrium would occur when every route home takes exactly the same amount of time—for us and all other drivers we may encounter on the way. Any faster route would be offset by a proportional increase in traffic, slowing everyone down just the right amount. As you may imagine, this state is virtually unattainable in real life. Even with tools like Google Maps that provide real-time traffic updates, it is often impossible to perfectly evaluate the optimal path home.
回过头来说下班回家的例子,每条路线花费的时间都一样时,相当于达到了纳什均衡,但在现实当中是不可能的
The same is true in the high-dimensional, nonconvex world of training GANs. Even small 28 × 28-pixel grayscale images like the ones in the MNIST dataset have 28 × 28 = 784 dimensions. If they were colored (RGB), their dimensionality would increase threefold, to 2,352. Capturing this distribution across all images in the training dataset is extremely difficult, especially when the best approach to learn is from an adversary (the Discriminator).
Training GANs successfully requires trial and error, and although there are best practices, it remains as much an art as it is a science. Chapter 5 revisits the topic of GAN convergence in more detail. For now, you can rest assured that the situation is not as bad as it may sound. As we previewed in chapter 1, and as you will see throughout this book, neither the enormous complexities in approximating the generative distribution nor our lack of complete understanding of what conditions make GANs converge has impeded GANs’ practical usability and their ability to generate realistic data samples.
Let’s recap what you’ve learned by introducing more notation. The Generator ( G G G) takes in a random noise vector z z z and produces a fake example x ∗ x^* x∗. Mathematically, G ( z ) = x ∗ G(z) = x^* G(z)=x∗. The Discriminator ( D D D) is presented either with a real example x x x or with a fake example x ∗ x^* x∗; for each input, it outputs a value between 0 and 1 indicating the probability that the input is real. Figure 3.4 depicts the GAN architecture by using the terminology and notation we just presented.
引入一些符号 G ( z ) = x ∗ G(z) = x^* G(z)=x∗,图3.4用新符号表示GAN的结构
Figure 3.4. The Generator network G transforms the random vector z into a fake example x ∗ : G ( z ) = x ∗ x^*:G(z) = x^* x∗:G(z)=x∗. The Discriminator network D outputs a classification of whether the input example is real. For the real examples x, the Discriminator strives to output values as close to 1 as possible. For the fake examples x ∗ x^* x∗, the Discriminator strives to output values as close to 0 as possible. In contrast, the Generator wants D ( x ∗ ) D(x^*) D(x∗) to be as close as possible to 1, indicating that the Discriminator was fooled into classifying a fake example as real.
The Discriminator’s goal is to be as accurate as possible. For the real examples x x x, D ( x ) D(x) D(x) seeks to be as close as possible to 1 (label for the positive class). For fake examples x ∗ x^* x∗, D ( x ∗ ) D(x^*) D(x∗) strives to be as close as possible to 0 (label for the negative class).
The Generator’s goal is the opposite. It seeks to fool the Discriminator by producing fake examples x ∗ x^* x∗ that are indistinguishable from the real data in the training dataset. Mathematically, the Generator strives to produce fake examples x ∗ x^* x∗ such that D ( x ∗ ) D(x^*) D(x∗) is as close to 1 as possible.
生成器和识别器的目的是相反的
The Discriminator’s classifications can be expressed in terms of a confusion matrix, a tabular representation of all the possible outcomes in binary classification. In the case of the Discriminator, these are as follows:
Table 3.1 presents these outcomes.
Table 3.1. Confusion matrix of Discriminator outcomes
Using the confusion matrix terminology, the Discriminator is trying to maximize true positive and true negative classifications or, equivalently, minimize false positive and false negative classifications. In contrast, the Generator’s goal is to maximize the Discriminator’s false positive classifications—these are the instances in which the Generator successfully fools the Discriminator into believing a fake example is real. The Generator is not concerned with how well the Discriminator classifies the real examples; it cares only about the Discriminator’s classifications of the fake data samples.
Let’s revisit the GAN training algorithm from chapter 1 and formalize it by using the notation introduced in this chapter. Unlike the algorithm in chapter 1, this one uses mini-batches rather than one example at a time.
用符号表示第1章中提到的算法
GAN training algorithm
For each training iteration do
1.Train the Discriminator:
1.Take a random mini-batch of real examples: x.
2.Take a mini-batch of random noise vectors z and generate a mini-batch of fake examples: G(z) = x*.
3.Compute the classification losses for D(x) and D(x*), and backpropagate the total error to update θ(D) to minimize the classification loss.
2.Train the Generator:
1.Take a mini-batch of random noise vectors z and generate a mini-batch of fake examples: G(z) = x*.
2.Compute the classification loss for D(x*), and backpropagate the loss to update θ(G) to maximize the classification loss.
End for
GAN训练算法:
循环开始
1.训练识别器:
1.获取一些真实样本:x
2.获取一些随机噪声变量z,生产一些假样本G(z) = x*
3.计算D(x)和D(x*)的分类损失,反向传播更新识别器参数 θ(D), 以减少分类损失
2.训练生成器:
1.获取一些随机噪声变量z,生产一些假样本G(z) = x*
2.计算D(x*)的分类损失,反向传播更新生成器参数 θ(G), 以减少分类损失
循环结束
Notice that in step 1, the Generator’s parameters are kept intact while we train the Discriminator. Similarly, in step 2, we keep the Discriminator’s parameters fixed while the Generator is trained. The reason we allow updates only to the weights and biases of the network being trained is to isolate all changes to only the parameters that are under the network’s control. This ensures that each network gets relevant signals about the updates to make, without interference from the other’s updates. You can almost think of it as two players taking turns.
一定要注意识别器与生成器训练时不能干预对方的参数
Of course, you can imagine a scenario in which each player merely undoes the other’s progress, so not even a turn-based game is guaranteed to yield a useful outcome. (Have we said yet that GANs are notoriously tricky to train?) More on this in chapter 5, where we also discuss techniques to maximize our chances of success.
第5章会讨论如何提高成功率
That’s it for theory, for the time being. Let’s now put what we learned into practice and implement our first GAN.
接下来进入实战
In this tutorial, we will implement a GAN that learns to produce realistic-looking handwritten digits. We will use the Python neural network library Keras with a TensorFlow backend. Figure 3.5 shows a high-level architecture of the GAN we will implement.
使用基于TensorFlow的Keras搭建神经网络来实现GAN,图3.5是架构图
Figure 3.5. Over the course of the training iterations, the Generator learns to turn random noise input into images that look like members of the training data: the MNIST dataset of handwritten digits. Simultaneously, the Discriminator learns to distinguish the fake images produced by the Generator from the genuine ones coming from the training dataset.
Much of the code used in this tutorial—especially the boilerplate code used in the training loop—was adapted from the open source GitHub repository of GAN implementations in Keras, Keras-GAN, created by Erik Linder-Norén (https://github.com/eriklindernoren/Keras-GAN). The repository also includes several advanced GAN variants, some of which will be covered later in this book. We revised and simplified the implementation considerably, in terms of both code and network architecture, and we renamed variables so that they are consistent with the notation used in this book.
大部分代码来源于GitHub项目Keras-GAN,其中还有一些GAN变体,后续章节会提到。本书使用之前提到的符号,对原代码进行了大量简化与修改。
A Jupyter notebook with the full implementation, including added visualizations of the training progress, is available on the book’s website at www.manning.com/books/gans-in-action and in the GitHub repository for this book at https://github.com/GANs-in-Action/gans-in-action under the chapter-3 folder. The code was tested with Python 3.6.0, Keras 2.1.6, and TensorFlow 1.8.0
.
完整的Jupyter笔记代码(包括训练可视化功能)在https://github.com/GANs-in-Action/gans-in-action可获取,测试环境为Python 3.6.0, Keras 2.1.6, and TensorFlow 1.8.0
First, we import all the packages and libraries needed to run the model. Notice we also import the MNIST
dataset of handwritten digits directly from keras.datasets
.
第一步,引入依赖库,含MNIST手写数字数据集,代码如下:
Listing 3.1. Import statements
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from keras.datasets import mnist
from keras.layers import Dense, Flatten, Reshape
from keras.layers.advanced_activations import LeakyReLU
from keras.models import Sequential
from keras.optimizers import Adam
Second, we specify the input dimensions of our model and dataset. Each image in MNIST is 28 × 28
pixels with a single channel (because the images are grayscale). The variable z_dim
sets the size of the noise vector, z
.
第二步,指定数据集维度和噪音向量,代码如下:
Listing 3.2. Model input dimensions
img_rows = 28
img_cols = 28
channels = 1
# Input image dimensions 输入图像维度
img_shape = (img_rows, img_cols, channels)
# Size of the noise vector, used as input to the Generator 噪音向量,用于生成器
z_dim = 100
Next, we implement the Generator and the Discriminator networks.
For simplicity, the Generator is a neural network with only a single hidden layer. It takes in z as input and produces a 28 × 28 × 1 image. In the hidden layer, we use the Leaky ReLU
activation function. Unlike a regular ReLU function, which maps any negative input to 0, Leaky ReLU allows a small positive gradient. This prevents gradients from dying out during training, which tends to yield better training outcomes.
简单起见,生成器神经网络只包含一个隐藏层,隐藏层使用Leaky ReLU激活函数,区别于常规的ReLU函数,它包含小的正梯度,防止梯度消失,效果更好
At the output layer, we employ the tanh
activation function, which scales the output values to the range (–1,1)
(PS:原文是[-1,1]). The reason for using tanh (as opposed to, say, sigmoid, which would output values in the more typical 0 to 1 range) is that tanh tends to produce crisper images.
输出层使用
tanh
激活函数输出范围为(–1, 1),相比sigmoid
(0,1)的范围,能产生更清晰图像
The following listing implements the Generator.
Listing 3.3. Generator
def build_generator(img_shape, z_dim):
model = Sequential()
# Fully connected layer 添加全连接层
model.add(Dense(128, input_dim=z_dim))
# Leaky ReLU activation 指定Leaky ReLU激活函数
model.add(LeakyReLU(alpha=0.01))
# Output layer with tanh activation 添加使用tanh激活函数的输出层
model.add(Dense(28 * 28 * 1, activation='tanh'))
# Reshapes the Generator output to image dimensions 将生成器输出数据重构为图像尺寸
model.add(Reshape(img_shape))
return model
The Discriminator takes in a 28 × 28 × 1 image and outputs a probability indicating whether the input is deemed real rather than fake. The Discriminator is represented by a two-layer neural network, with 128 hidden units and a Leaky ReLU activation function at the hidden layer.
识别器包含两层神经网络,隐藏层维度为128,激活函数为Leaky ReLU
For simplicity, our Discriminator network looks almost identical to the Generator. This does not have to be the case; indeed, in most GAN implementations, the Generator and Discriminator network architectures vary greatly in both size and complexity.
Notice that unlike for the Generator, in the following listing we apply the sigmoid activation function at the Discriminator’s output layer. This ensures that our output value will be between 0 and 1, so it can be interpreted as the probability the Generator assigns that the input is real.
输出层使用sigmoid激活函数,范围是(0,1),表示真实图像的可能性
Listing 3.4. Discriminator
def build_discriminator(img_shape):
model = Sequential()
# Flattens the input image 将图像数据转化为一维向量
model.add(Flatten(input_shape=img_shape))
# Fully connected layer 添加全连接层
model.add(Dense(128))
# Leaky ReLU activation 指定Leaky ReLU激活函数
model.add(LeakyReLU(alpha=0.01))
# Output layer with sigmoid activation 添加使用sigmoid激活函数的输出层
model.add(Dense(1, activation='sigmoid'))
return model
In listing 3.5, we build and compile the Generator and Discriminator models implemented previously. Notice that in the combined model used to train the Generator, we keep the Discriminator parameters fixed by setting discriminator.trainable
to False
. Also note that the combined model, in which the Discriminator is set to untrainable, is used to train the Generator only. The Discriminator is trained as an independently compiled model. (This will become apparent when we review the training loop.)
下面代码将生成器和识别器结合起来,
discriminator.trainable
变量为False
时,识别器不能进行训练和改变参数,反之生成器也是这样
We use binary cross-entropy
as the loss function we are seeking to minimize during training. Binary cross-entropy is a measure of the difference between computed probabilities and actual probabilities for predictions with only two possible classes. The greater the cross-entropy loss, the further away our predictions are from the true labels.
计算二元交叉熵作为损失函数
To optimize each network, we use the Adam
optimization algorithm. This algorithm, whose name is derived from adaptive moment estimation, is an advanced gradient-descent-based optimizer. The inner workings of this algorithm are beyond the scope of this book, but it suffices to say that Adam has become the go-to optimizer for most GAN implementations thanks to its often superior performance.
使用Adam作为梯度下降优化算法
Listing 3.5. Building and compiling the GAN
def build_gan(generator, discriminator):
model = Sequential()
# Combined Generator + Discriminator model 添加生成器和识别器
model.add(generator)
model.add(discriminator)
return model
# Builds and compiles the Discriminator 创建和编译识别器
discriminator = build_discriminator(img_shape)
discriminator.compile(loss='binary_crossentropy',
optimizer=Adam(),
metrics=['accuracy'])
# Builds the Generator 创建生成器
generator = build_generator(img_shape, z_dim)
# Keeps Discriminator’s parameters constant for Generator training 保持识别器参数不变
discriminator.trainable = False
# Builds and compiles GAN model with fixed Discriminator to train the Generator
gan = build_gan(generator, discriminator)
gan.compile(loss='binary_crossentropy', optimizer=Adam())
The training code in listing 3.6 implements the GAN training algorithm. We get a random mini-batch of MNIST images as real examples and generate a mini-batch of fake images from random noise vectors z. We then use those to train the Discriminator network while keeping the Generator’s parameters constant. Next, we generate a mini-batch of fake images and use those to train the Generator network while keeping the Discriminator’s parameters fixed. We repeat this for each iteration.
1.获取少量真实图像
2.生产少量假图像
3.训练识别器,保持生成器参数不变
4.生成少量假图像,训练生成器,保持识别器参数不变
We use one-hot-encoded labels: 1 for real images and 0 for fake ones. To generate z, we sample from the standard normal distribution (a bell curve with 0 mean and a standard deviation of 1). The Discriminator is trained to assign fake labels to the fake images and real labels to real images. The Generator is trained such that the Discriminator assigns real labels to the fake examples it produces.
热编码标签:1是真图像,0是假图像
使用随机状态分布(均值0,标准差1)产生噪声向量z
Notice that we are rescaling the real images in the training dataset from –1 to 1. As you saw in the preceding example, the Generator uses the tanh activation function at the output layer, so the fake images will be in the range (–1, 1). Accordingly, we have to rescale all the Discriminator’s inputs to the same range.
真图像训练数据被缩放到[-1,1]范围,输出的假图像范围也是(-1,1),相应地,识别器输入范围也要相同
Listing 3.6. GAN training loop
losses = []
accuracies = []
iteration_checkpoints = []
def train(iterations, batch_size, sample_interval):
# Loads the MNIST dataset 加载数据集
(X_train, _), (_, _) = mnist.load_data()
# Rescales [0, 255] grayscale pixel values to [–1, 1] 缩放像素范围到[–1, 1]
X_train = X_train / 127.5 - 1.0
X_train = np.expand_dims(X_train, axis=3)
# Labels for real images: all 1s 真实图像标签标记为1
real = np.ones((batch_size, 1))
# Labels for fake images: all 0s 虚假图像标签标记为0
fake = np.zeros((batch_size, 1))
for iteration in range(iterations):
# Gets a random batch of real images 随机获取一批真实图像
idx = np.random.randint(0, X_train.shape[0], batch_size)
imgs = X_train[idx]
# Generates a batch of fake images 生成一批虚假图像
z = np.random.normal(0, 1, (batch_size, 100))
gen_imgs = generator.predict(z)
# Trains the Discriminator 训练识别器
d_loss_real = discriminator.train_on_batch(imgs, real)
d_loss_fake = discriminator.train_on_batch(gen_imgs, fake)
d_loss, accuracy = 0.5 * np.add(d_loss_real, d_loss_fake)
# Generates a batch of fake images 生成一批假图像
z = np.random.normal(0, 1, (batch_size, 100))
gen_imgs = generator.predict(z)
# Trains the Generator 训练生成器
g_loss = gan.train_on_batch(z, real)
if (iteration + 1) % sample_interval == 0:
# Saves losses and accuracies so they can be plotted after training 保存损失和准确度以便绘图
losses.append((d_loss, g_loss))
accuracies.append(100.0 * accuracy)
iteration_checkpoints.append(iteration + 1)
# Outputs training progress 输出训练过程数据
print("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" %
(iteration + 1, d_loss, 100.0 * accuracy, g_loss))
# Outputs a sample of generated images 输出生成图像样本
sample_images(generator)
In the Generator training code, you may notice an invocation of the sample_images()
function. This function gets called every sample_interval
iterations and outputs a 4 × 4 grid of images synthesized by the Generator in the given iteration. After we run our model, we will use these images to inspect interim and final outputs.
上述训练代码中的
sample_images
函数功能是输出训练过程中迭代时生成器产生的虚假图像,可以看到训练效果的变化
Listing 3.7. Displaying generated images
def sample_images(generator, image_grid_rows=4, image_grid_columns=4):
# Sample random noise 样本随机噪声
z = np.random.normal(0, 1, (image_grid_rows * image_grid_columns, z_dim))
# Generates images from random noise 根据随机噪声生成图像
gen_imgs = generator.predict(z)
# Rescales image pixel values to [0, 1] 将像素值缩放到[0, 1]范围
gen_imgs = 0.5 * gen_imgs + 0.5
# Sets image grid 绘图设置
fig, axs = plt.subplots(image_grid_rows,
image_grid_columns,
figsize=(4, 4),
sharey=True,
sharex=True)
cnt = 0
for i in range(image_grid_rows):
for j in range(image_grid_columns):
# Outputs a grid of images 展示图像
axs[i, j].imshow(gen_imgs[cnt, :, :, 0], cmap='gray')
axs[i, j].axis('off')
cnt += 1
That brings us to the final step, shown in listing 3.8. We set the training hyperparameters—the number of iterations and the batch size—and train the model. There is no tried-and-true method to determine the right number of iterations or the right batch size; we determine them experimentally through trial and error as we observe the training progress.
设置训练超参:迭代次数、批量大小、以及训练模型,前两者是由经验设置的并不固定
That said, there are important practical constraints to these numbers: each mini-batch must be small enough to fit inside the processing memory (typical batch sizes people use are powers of 2: 32, 64, 128, 256, and 512). The number of iterations also has a practical constraint: the more iterations we have, the longer the training process takes. With complex deep learning models like GANs, this can get out of hand quickly, even with significant computing power.
迭代次数越多耗时越长;批次也要够小以防止内存不足,经典取值是2的n次方
To determine the right number of iterations, we monitor the training loss and set the iteration number around the point when the loss plateaus, indicating that we are getting little to no incremental improvement from further training. (Because this is a generative model, overfitting is as much a concern as it is for supervised learning algorithms.)
生成器模型和监督学习算法一样都会存在过拟合的问题,需要监控训练损失,将迭代次数设置在损失停滞点附近
Listing 3.8. Running the model
# Sets hyperparameters 设置超参数
iterations = 20000
batch_size = 128
sample_interval = 1000
# Trains the GAN for the specified number of iterations 指定次数训练GAN
train(iterations, batch_size, sample_interval)
Figure 3.6 shows example images produced by the Generator over the course of training iterations, from earliest to latest.
Figure 3.6. Starting from what looks to be no more than random noise, the Generator gradually learns to emulate the features of the training dataset: in our case, images of handwritten digits.
As you can see, the Generator starts out by producing little more than random noise. Over the course of the training iterations, it gets better and better at emulating the features of the training data. Each time the Discriminator rejects a generated image as false or accepts one as real, the Generator improves a little. Figure 3.7 shows examples of images the Generator can synthesize after it is fully trained.
随着迭代训练,产生的图像效果越来越好。
Figure 3.7. Although far from perfect, our simple two-layer Generator learned to produce realistic-looking numerals, such as 9 and 1.
For comparison, figure 3.8 shows a randomly selected sample of real images from the MNIST dataset.
图3.7是生成图像,图3.8是真实图像,两者对比还是有差异的
Figure 3.8. Example of real handwritten digits from the MNIST dataset used to train our GAN. Although the Generator made impressive progress toward emulating the training data, the difference between the numerals it produces and the real, human-written numerals remains clear.
Although the images our GAN generated are far from perfect, many of them are easily recognizable as real numerals—an impressive achievement, given that we used only a simple two-layer network architecture for both the Generator and the Discriminator. In the following chapter, you will learn how to improve the quality of the generated images by using a more complex and powerful neural network architecture for the Generator and Discriminator: convolutional neural networks.
我们的GAN只用了两层神经网络,生成的图像还不够好,接下来章节我们会使用更复杂的卷积神经网络来提高生成效果。
GAN分为生成器和识别器,有各自的损失函数;生成器生成的假图像尽可能让识别器误以为真,识别器尽可能识别出假图像。