图像生成对抗生成网络gan
by Thalles Silva
由Thalles Silva
Let’s say there’s a very cool party going on in your neighborhood that you really want to go to. But, there is a problem. To get into the party you need a special ticket — that was long sold out.
假设您确实想参加一个很酷的聚会。 但有个问题。 要参加聚会,您需要一张特别的票-长期售罄。
Wait up! Isn’t this a Generative Adversarial Networks article? Yes it is. But bear with me for now, it is going to be worth it.
等一下 这不是Generative Adversarial Networks的文章吗? 是的。 但是现在就忍受我,这将是值得的。
OK, since expectations are very high, the party organizers hired a qualified security agency. Their primary goal is to not allow anyone to crash the party. To do that, they placed a lot of guards at the venue’s entrance to check everyone’s tickets for authenticity.
好的,因为期望很高,所以聚会组织者雇用了合格的安全机构。 他们的主要目标是不允许任何人崩溃。 为此,他们在会场入口处布置了许多警卫人员,检查每个人的门票的真实性。
Since you don’t have any martial artistic gifts, the only way to get through is by fooling them with a very convincing fake ticket.
由于您没有任何武术艺术礼物,因此通行的唯一方法是用一张非常令人信服的假票欺骗他们。
There is a big problem with this plan though — you never actually saw how the ticket looks like.
但是,该计划存在一个大问题-您从未真正看到票证的样子。
Even if you design a ticket based on your creativity, it’s almost impossible to fool the guards at your first trial. Besides, you can’t show your face until you have a very decent replica of the party’s pass.
即使您根据自己的创造力设计票务,也几乎不可能在第一次审判中就骗过警卫。 此外,除非您拥有派对通行证的相当不错的复制品,否则您不能露面。
To help solve the problem, you decide to call your friend Bob to do the job for you.
为了帮助解决问题,您决定致电您的朋友鲍勃(Bob)为您完成这项工作。
Bob’s mission is very simple. He will try to get into the party with your fake pass. If he gets denied, he will come back to you with useful tips on how the ticket should look like.
鲍勃的任务很简单。 他将尝试使用您的假通行证参加聚会。 如果他被拒绝了,他会回来给您提供有关机票外观的实用提示。
Based on that feedback, you make a new version of the ticket and hand it to Bob, who goes to try again. This process keeps repeating until you become able to design a perfect replica.
根据这些反馈,您可以制作新版本的票证并将其交给鲍勃,鲍勃会再试一次。 这个过程不断重复,直到您能够设计出完美的副本为止。
Putting aside the ‘small holes’ in this anecdote, this is pretty much how Generative Adversarial Networks (GANs) work.
撇开这个轶事中的“小漏洞”,这几乎就是生成对抗网络(GAN)的工作方式。
Nowadays, most of the applications of GANs are in the field of computer vision. Some of the applications include training semi-supervised classifiers, and generating high resolution images from low resolution counterparts.
如今,GAN的大多数应用都在计算机视觉领域。 一些应用程序包括训练半监督分类器 ,以及从低分辨率对应物生成高分辨率图像。
This piece provides an introduction to GANs with a hands-on approach to the problem of generating images. You can clone the notebook for this post here.
本篇文章以动手方法解决生成图像的问题,介绍了GAN。 您可以在此处为该帖子克隆笔记本。
GANs are generative models devised by Goodfellow et al. in 2014. In a GAN setup, two differentiable functions, represented by neural networks, are locked in a game. The two players (the generator and the discriminator) have different roles in this framework.
GAN是由Goodfellow等人设计的生成模型。 在2014年。在GAN设置中,以神经网络为代表的两个微分功能被锁定在游戏中。 在此框架中,两个参与者(生成器和鉴别器)具有不同的角色。
The generator tries to produce data that come from some probability distribution. That would be you trying to reproduce the party’s tickets.
生成器尝试生成来自某种概率分布的数据。 那将是您试图复制聚会的门票。
The discriminator acts like a judge. It gets to decide if the input comes from the generator or from the true training set. That would be the party’s security comparing your fake ticket with the true ticket to find flaws in your design.
区分者的行为就像法官一样。 它可以决定输入是来自生成器还是来自真正的训练集。 将您的假票与真实票进行比较以发现设计中的缺陷将是聚会的安全。
In summary, the game follows with:
总而言之,游戏如下:
In the perfect equilibrium, the generator would capture the general training data distribution. As a result, the discriminator would be always unsure of whether its inputs are real or not.
在完美平衡下,生成器将捕获总体训练数据分布。 结果,鉴别器将始终不确定其输入是否真实。
In the DCGAN paper, the authors describe the combination of some deep learning techniques as key for training GANs. These techniques include: (i) the all convolutional net and (ii) Batch Normalization (BN).
在DCGAN论文中 ,作者将一些深度学习技术的组合描述为训练GAN的关键。 这些技术包括:(i)所有卷积网络和(ii)批归一化(BN)。
The first emphasizes strided convolutions (instead of pooling layers) for both: increasing and decreasing feature’s spatial dimensions. And the second normalizes the feature vectors to have zero mean and unit variance in all layers. This helps to stabilize learning and to deal with poor weight initialization problems.
第一种方法强调跨步卷积 (而不是池化层):增加和减小要素的空间尺寸。 然后,第二步将特征向量归一化为在所有层中均值和单位方差为零。 这有助于稳定学习并处理不良的体重初始化问题。
Without further ado, let’s dive into the implementation details and talk more about GANs as we go. We present an implementation of a Deep Convolutional Generative Adversarial Network (DCGAN). Our implementation uses Tensorflow and follows some practices described in the DCGAN paper.
事不宜迟,让我们深入研究实现细节,并在进行中进一步讨论GAN。 我们提出了深度卷积生成对抗网络(DCGAN)的实现。 我们的实现使用Tensorflow并遵循DCGAN论文中描述的一些实践。
The network has 4 convolutional layers, all followed by BN (except for the output layer) and Rectified Linear unit (ReLU) activations.
该网络具有4个卷积层,所有层均由BN(输出层除外)和整流线性单元(ReLU)激活所致。
It takes as an input a random vector z (drawn from a normal distribution). After reshaping z to have a 4D shape, we feed it to the generator that starts a series of upsampling layers.
它以随机向量z (从正态分布中得出)为输入。 将z重塑为4D形状后,我们将其输入到生成器中,该生成器开始一系列上采样层。
Each upsampling layer represents a transpose convolution operation with strides 2. Transpose convolutions are similar to the regular convolutions.
每个上采样层代表一个跨度为2的转置卷积运算。转置卷积类似于常规卷积。
Typically, regular convolutions go from wide and shallow layers to narrower and deeper ones. Transpose convolutions go the other way. They go from deep and narrow layers to wider and shallower.
通常,规则卷积从较宽和较浅的层到较窄和较深的层。 转置卷积则相反。 它们从深层和窄层到较宽和较浅的层。
The stride of a transpose convolution operation defines the size of the output layer. With “same” padding and stride of 2, the output features will have double the size of the input layer.
转置卷积运算的步幅定义了输出层的大小。 使用“相同”的填充和跨度为2时,输出要素的大小将是输入层的两倍。
That happens because, every time we move one pixel in the input layer, we move the convolution kernel by two pixels on the output layer. In other words, each pixel in the input image is used to draw a square in the output image.
发生这种情况是因为,每当我们在输入层中移动一个像素时,就会在输出层中将卷积核移动两个像素。 换句话说,输入图像中的每个像素用于在输出图像中绘制一个正方形。
In short, the generator begins with this very deep but narrow input vector. After each transpose convolution, z becomes wider and shallower. All transpose convolutions use a 5x5 kernel’s size with depths reducing from 512 all the way down to 3 — representing an RGB color image.
简而言之,生成器从这个非常深但狭窄的输入向量开始。 每次转置卷积后, z变得越来越宽。 所有转置卷积都使用5x5内核大小,深度从512一直减小到3,即RGB彩色图像。
def transpose_conv2d(x, output_space): return tf.layers.conv2d_transpose(x, output_space, kernel_size=5, strides=2, padding='same', kernel_initializer=tf.random_normal_initializer(mean=0.0, stddev=0.02))
The final layer outputs a 32x32x3 tensor — squashed between values of -1 and 1 through the Hyperbolic Tangent (tanh) function.
最后一层通过双曲线正切 ( tanh )函数输出一个在-1和1之间压缩的32x32x3张量。
This final output shape is defined by the size of the training images. In this case, if training for SVHN, the generator produces 32x32x3 images. However, if training for MNIST, it would generate a 28x28 greyscale image.
最终的输出形状由训练图像的大小定义。 在这种情况下,如果训练SVHN,则生成器将生成32x32x3的图像。 但是,如果对MNIST进行训练,则会生成28x28的灰度图像。
Finally, note that before feeding the input vector z to the generator, we need to scale it to the interval of -1 to 1. That is to follow the choice of using the tanh function.
最后,请注意,在将输入向量z提供给生成器之前,我们需要将其缩放为-1到1的间隔。这就是遵循使用tanh函数的选择。
def generator(z, output_dim, reuse=False, alpha=0.2, training=True): """ Defines the generator network :param z: input random vector z :param output_dim: output dimension of the network :param reuse: Indicates whether or not the existing model variables should be used or recreated :param alpha: scalar for lrelu activation function :param training: Boolean for controlling the batch normalization statistics :return: model's output """ with tf.variable_scope('generator', reuse=reuse): fc1 = dense(z, 4*4*512) # Reshape it to start the convolutional stack fc1 = tf.reshape(fc1, (-1, 4, 4, 512)) fc1 = batch_norm(fc1, training=training) fc1 = tf.nn.relu(fc1) t_conv1 = transpose_conv2d(fc1, 256) t_conv1 = batch_norm(t_conv1, training=training) t_conv1 = tf.nn.relu(t_conv1) t_conv2 = transpose_conv2d(t_conv1, 128) t_conv2 = batch_norm(t_conv2, training=training) t_conv2 = tf.nn.relu(t_conv2) logits = transpose_conv2d(t_conv2, output_dim) out = tf.tanh(logits) return out
The discriminator is also a 4 layer CNN with BN (except its input layer) and leaky ReLU activations. Many activation functions will work fine with this basic GAN architecture. However, leaky ReLUs are very popular because they help the gradients flow easier through the architecture.
鉴别器也是带有BN的4层CNN(输入层除外)和泄漏的ReLU激活。 许多激活功能都可以在此基本GAN架构下正常工作。 但是,泄漏的ReLU非常受欢迎,因为它们有助于梯度更容易地流过整个体系结构。
A regular ReLU function works by truncating negative values to 0. This has the effect of blocking the gradients to flow through the network. Instead of the function being zero, leaky ReLUs allow a small negative value to pass through. That is, the function computes the greatest value between the features and a small factor.
常规的ReLU函数通过将负值截断为0来工作。这具有阻止渐变流过网络的作用。 泄漏的ReLU不会使函数为零,而是允许传递一个小的负值。 即,该函数计算特征之间的最大值和较小的因数 。
def lrelu(x, alpha=0.2): # non-linear activation function return tf.maximum(alpha * x, x)
Leaky ReLUs represent an attempt to solve the dying ReLU problem. This situation occurs when the neurons get stuck in a state in which ReLU units always output 0s for all inputs. For these cases, the gradients are completely shut to flow back through the network.
泄漏的ReLU表示试图解决垂死的ReLU问题的尝试。 当神经元陷入ReLU单元始终为所有输入输出0的状态时,就会发生这种情况。 对于这些情况,将梯度完全关闭以流回网络。
This is especially important for GANs since the only way the generator has to learn is by receiving the gradients from the discriminator.
这对于GAN尤其重要,因为生成器必须学习的唯一方法是通过接收来自鉴别器的梯度。
The discriminator starts by receives a 32x32x3 image tensor. Opposite to the generator, the discriminator performs a series of strided 2 convolutions. Each, works by reducing the feature vector’s spatial dimensions by half its size, also doubling the number of learned filters.
鉴别器开始于接收32x32x3图像张量。 与生成器相对,鉴别器执行一系列跨越2次的卷积。 每一种方法都是通过将特征向量的空间尺寸减小一半,同时使学习的过滤器数量增加一倍来实现的。
Finally, the discriminator needs to output probabilities. For that, we use the Logistic Sigmoid activation function on the final logits.
最后,鉴别器需要输出概率。 为此,我们在最终的logit上使用Logistic Sigmoid激活功能。
def discriminator(x, reuse=False, alpha=0.2, training=True): """ Defines the discriminator network :param x: input for network :param reuse: Indicates whether or not the existing model variables should be used or recreated :param alpha: scalar for lrelu activation function :param training: Boolean for controlling the batch normalization statistics :return: A tuple of (sigmoid probabilities, logits) """ with tf.variable_scope('discriminator', reuse=reuse): # Input layer is 32x32x? conv1 = conv2d(x, 64) conv1 = lrelu(conv1, alpha) conv2 = conv2d(conv1, 128) conv2 = batch_norm(conv2, training=training) conv2 = lrelu(conv2, alpha) conv3 = conv2d(conv2, 256) conv3 = batch_norm(conv3, training=training) conv3 = lrelu(conv3, alpha) # Flatten it flat = tf.reshape(conv3, (-1, 4*4*256)) logits = dense(flat, 1) out = tf.sigmoid(logits) return out, logits
Note that in this framework, the discriminator acts as a regular binary classifier. Half of the time it receives images from the training set and the other half from the generator.
注意,在此框架中,鉴别器充当常规的二进制分类器。 它一半时间从训练集中接收图像,另一半时间从生成器接收图像。
Back to our adventure, to reproduce the party’s ticket, the only source of information you had was the feedback from our friend Bob. In other words, the quality of the feedback Bob provided to you at each trial was essential to get the job done.
回到我们的冒险,重现聚会的门票,您获得的唯一信息来源是我们朋友Bob的反馈。 换句话说,鲍勃在每次审判中提供给您的反馈质量对于完成工作至关重要。
In the same way, every time the discriminator notices a difference between the real and fake images, it sends a signal to the generator. This signal is the gradient that flows backwards from the discriminator to the generator. By receiving it, the generator is able to adjust its parameters to get closer to the true data distribution.
同样,鉴别器每次注意到真实图像与伪图像之间的差异时,都会向生成器发送信号。 该信号是从鉴别器向发生器反向流动的梯度。 通过接收它,生成器能够调整其参数以更接近真实的数据分布。
This is how important the discriminator is. In fact, the generator will be as good as producing data as the discriminator is at telling them apart.
这就是鉴别器的重要性。 实际上,生成器将与区分符区分数据一样好。
Now, let’s describe the trickiest part of this architecture — the losses. First, we know the discriminator receives images from both the training set and the generator.
现在,让我们描述这种架构中最棘手的部分-损失。 首先,我们知道鉴别器从训练集和生成器接收图像。
We want the discriminator to be able to distinguish between real and fake images. Every time we run a mini-batch through the discriminator, we get logits. These are the unscaled values from the model.
我们希望鉴别者能够区分真实图像和伪造图像。 每次我们通过鉴别器进行小批量生产时,都会得到logits。 这些是模型中未缩放的值。
However, we can divide the mini-batches that the discriminator receives in two types. The first, composed only with real images that come from the training set and the second, with only fake images — the ones created by the generator.
但是,我们可以将鉴别器收到的迷你批次分为两种类型。 第一个仅包含来自训练集的真实图像,第二个仅包含伪图像(由生成器创建的图像)。
def model_loss(input_real, input_z, output_dim, alpha=0.2, smooth=0.1): """ Get the loss for the discriminator and generator :param input_real: Images from the real dataset :param input_z: random vector z :param out_channel_dim: The number of channels in the output image :param smooth: label smothing scalar :return: A tuple of (discriminator loss, generator loss) """ g_model = generator(input_z, output_dim, alpha=alpha) d_model_real, d_logits_real = discriminator(input_real, alpha=alpha) d_model_fake, d_logits_fake = discriminator(g_model, reuse=True, alpha=alpha) # for the real images, we want them to be classified as positives, # so we want their labels to be all ones. # notice here we use label smoothing for helping the discriminator to generalize better. # Label smoothing works by avoiding the classifier to make extreme predictions when extrapolating. d_loss_real = tf.reduce_mean( tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logits_real, labels=tf.ones_like(d_logits_real) * (1 - smooth))) # for the fake images produced by the generator, we want the discriminator to clissify them as false images, # so we set their labels to be all zeros. d_loss_fake = tf.reduce_mean( tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logits_fake, labels=tf.zeros_like(d_model_fake))) # since the generator wants the discriminator to output 1s for its images, it uses the discriminator logits for the # fake images and assign labels of 1s to them. g_loss = tf.reduce_mean( tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logits_fake, labels=tf.ones_like(d_model_fake))) d_loss = d_loss_real + d_loss_fake return d_loss, g_loss
Because both networks train at the same time, GANs also need two optimizers. Each one for minimizing the discriminator and generator’s loss functions respectively.
因为两个网络都同时训练,所以GAN还需要两个优化器。 每个分别用于最小化鉴别器和发生器的损耗功能。
We want the discriminator to output probabilities close to 1 for real images and near 0 for fake images. To do that, the discriminator needs two losses. Therefore, the total loss for the discriminator is the sum of these two partial losses. One for maximizing the probabilities for the real images and another for minimizing the probability of fake images.
我们希望判别器对真实图像输出接近1的概率,对于伪图像输出接近0的概率。 为此,鉴别器需要两个损失。 因此,鉴别器的总损失是这两个部分损失的总和。 一个用于最大化真实图像的概率,另一个用于最小化伪图像的概率。
In the beginning of training two interesting situations occur. First, the generator does not know how to create images that resembles the ones from the training set. And second, discriminator does not know how to categorize the images it receives as real or fake.
在训练开始时,会发生两种有趣的情况。 首先,生成器不知道如何创建与训练集中的图像相似的图像。 其次,鉴别者不知道如何将收到的图像分类为真实或伪造。
As a result, the discriminator receives two very distinct types of batches. One, composed of true images from the training set and another containing very noisy signals. As training progresses, the generator starts to output images that look closer to the images from the training set. That happens, because the generator trains to learn the data distribution that composes the training set images.
结果,鉴别器接收两种非常不同类型的批次。 一个由训练集中的真实图像组成,另一个包含非常嘈杂的信号。 随着训练的进行,生成器开始输出看起来更接近训练集中图像的图像。 发生这种情况是因为生成器进行训练以学习组成训练集图像的数据分布。
At the same time, the discriminator starts to get real good at classifying samples as real or fake. As a consequence, the two types of mini-batches begin looking similar, in structure, to one another. That, as a result makes the discriminator unable to identify images as real or fake.
同时,鉴别器开始真正擅长将样本分类为真假。 结果,两种类型的微型批次开始在结构上看起来彼此相似。 结果,使得鉴别者无法将图像识别为真实或伪造。
For the losses, we use vanilla cross-entropy with Adam as a good choice for the optimizer.
对于损失,我们将香草交叉熵和Adam用作优化器的不错选择。
GANs are one of the hottest subjects in machine learning right now. These models have the potential of unlocking unsupervised learning methods that would expand ML to new horizons.
GAN是目前机器学习中最热门的学科之一。 这些模型具有解锁无监督学习方法的潜力,从而可以将ML扩展到新的视野。
Since its creation, researches have been developing many techniques for training GANs. In Improved Techniques for Training GANs, the authors describe state-of-the-art techniques for both image generation and semi-supervised learning.
自创建以来,研究人员一直在开发许多用于训练GAN的技术。 在《改进的GAN训练技术》中,作者描述了用于图像生成和半监督学习的最新技术。
If you are curious to dig deeper in these subjects, I recommend reading Generative Models.
如果您想深入了解这些主题,建议阅读Generative Models 。
Also, take a look at:
另外,看看:
Dive head first into advanced GANs: exploring self-attention and spectral normLately, Generative Models are drawing a lot of attention. Much of that comes from Generative Adversarial Networks…medium.freecodecamp.orgSemi-supervised learning with Generative Adversarial Networks (GANs)If you ever heard or studied about deep learning, you probably heard about MNIST, SVHN, ImageNet, PascalVoc and others…towardsdatascience.com
首先深入研究高级GAN:探索自我注意和频谱规范 最近,生成模型引起了很多关注。 其中大部分来自生殖对抗网络... medium.freecodecamp.org生殖 对抗网络(GAN)的半监督学习 如果您曾经听说过或研究过深度学习,那么您可能听说过MNIST,SVHN,ImageNet,PascalVoc等……对数据科学.com
And if you need more, that is my deep learning blog.
如果您需要更多,那是我的深度学习博客。
Enjoy, and thanks for reading!
享受,并感谢您的阅读!
Credits to Sam Williams for this awesome “clap” gif! Check it out in his post.
感谢山姆·威廉姆斯 ( Sam Williams)的精彩“拍手” gif! 在他的帖子中查看一下。
翻译自: https://www.freecodecamp.org/news/an-intuitive-introduction-to-generative-adversarial-networks-gans-7a2264a81394/
图像生成对抗生成网络gan