DCGAN论文导读、关键点说明及代码实现修改(1)

论文导读

网上关于DCGAN论文的介绍很多,我就把我觉得对于需要理解的关键点和对后面训练调参有帮助的地方拿出来说明一下,仅做参考,有错误希望大佬们指正。

0.Abstract

In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning
这里作者说希望缩小CNN在有监督学习和无监督学习之间应用的差距,也就是当前CNN在有监督场景下应用效果更好。这里关于GAN属于无监督学习我们在后面的训练部分会有深入说明。

1.Introduction

We propose that one way to build
good image representations is by training Generative Adversarial Networks (GANs) (Goodfellow
et al., 2014), and later reusing parts of the generator and discriminator networks as feature extractors
for supervised tasks
这里作者说通过训练对抗生成神经网络构建了一种更好的图像表示的方法(one way to build good image representations by training GAN),然后把GAN中训练好的判别器和生成器复用到监督任务中作为特征提取器(reuse parts of generator and discriminator network as feature extractors)也就是把GAN中训练好的判别器后续用于图像分类任务可能会取得更好的分类效果,原因可能是因为GAN训练中生成器产生了大量的类似负样本和噪声信息(这里早期生成器产生的图像可以认为是噪声,后期的生成可以理解为负样本),这样对比训练判别器来说生成器的作用可以理解为数据扩充,这是如果再单纯把判别器当做分类器分类的效果要优于没有数据扩充训练得到的分类器(We use the trained discriminators for image classification tasks, showing competitive performance with other unsupervised algorithms.)

2. related research(忽略)

3. Approach and model architecture

Core to our approach is adopting and modifying three recently demonstrated changes to CNN architectures.(主要有3点改进)
第一点改进 :The first is the all convolutional net (Springenberg et al., 2014) which replaces deterministic spatial pooling functions (such as maxpooling) with strided convolutions, allowing the network to learn its own spatial downsampling. We use this approach in our generator, allowing it to learn its own spatial upsampling, and discriminator.(使用卷积和反卷积替代池化层)
第二点改进 :Second is the trend towards eliminating fully connected layers on top of convolutional features.
The strongest example of this is global average pooling which has been utilized in state of the
art image classification models (Mordvintsev et al.). We found global average pooling increased
model stability but hurt convergence speed
. A middle ground of directly connecting the highest
convolutional features to the input and output respectively of the generator and discriminator worked
well. The first layer of the GAN, which takes a uniform noise distribution Z as input, could be called
fully connected as it is just a matrix multiplication, but the result is reshaped into a 4-dimensional
tensor and used as the start of the convolution stack. For the discriminator, the last convolution layer
is flattened and then fed into a single sigmoid output. See Fig. 1 for a visualization of an example
model architecture.
(其次是消除卷积特征之上的完全连接层。
这方面最有力的例子是全局平均池,它已被用于最新的图像分类模型(Mordvintsev等)。我们发现全局平均池化提高了模型的稳定性,但降低了收敛速度。将最高卷积特征分别与发生器和鉴别器的输入和输出直接连接的中间地带,效果很好。GAN的第一层以均匀噪声分布Z作为输入,可以被完全调用)
第三点改进 :Third is Batch Normalization (Ioffe & Szegedy, 2015) which stabilizes learning by normalizing the
input to each unit to have zero mean and unit variance. This helps deal with training problems that
arise due to poor initialization and helps gradient flow in deeper models
. This proved critical to get
deep generators to begin learning, preventing the generator from collapsing all samples to a single
point which is a common failure mode observed in GANs. Directly applying batchnorm to all layers
however, resulted in sample oscillation and model instability. This was avoided by not applying
batchnorm to the generator output layer and the discriminator input layer

(应用bn层,这里作者指出直接在每一层应用bn会出现训练中模型震荡和不稳定,因此在生成器的输出层和判别器的输入层不使用bn层。特别说明,对于生成器的输入层,代码中产生的噪声数据本来就是0到1之间按照标准正态分布产生,故也不需要bn层
DCGAN论文导读、关键点说明及代码实现修改(1)_第1张图片

4.Details of adversarial training

这里主要是网络的超参数设定技巧:
4.1 All models were trained with mini-batch stochastic gradient descent (SGD) with a mini-batch size of 128
此处关于SGD batchsize设定问题,建议最小不要小于32,而且最好不要考虑通过减小batchsize同步减小lr的方式来加速训练。原因是GAN中的判别器训练评估方式是分类问题的评估方式,就是对一张图片打分然后按照是否高于一个阈值来判断类别,也就是说acc的值变化的最小范围就是1/batchsize. 下面以mmist训练为例,我把batchsize设定为4时候的输出如下
DCGAN论文导读、关键点说明及代码实现修改(1)_第2张图片
可以看到batchsize=4去情况下, acc的值之后0, 25%。 50%和100%。这样随机性过大,不能引导模型权重按照合理的方向优化。
之前做目标检测任务有时候把batchsize直接设定成1,然后lr比正常小一个数量级通常可以很快得到一个可用的模型,所以我在尝试GAN的时候也想把batchsize设定的很小,我就从1开始,然后发现打印出的acc不是0就是100%,然后一度怀疑自己代码改错了。。。。。。。。折腾好久,直到洗澡水流淌在我的脸上才顿悟自己的愚蠢。。。。。。。。我试过的情况就是batchsize大于32都可以。
4.2 优化器
All weights were initialized from a zero-centered Normal distribution with standard deviation 0.02. In the LeakyReLU, the slope of the leak was set to 0.2 in all models. While previous GAN work has used momentum to accelerate training, we used the Adam optimizer (Kingma & Ba, 2014) with tuned hyperparameters. We found the suggested learning rate of 0.001, to be too high, using 0.0002 instead. Additionally, we found leaving the momentum term β1 at the suggested value of 0.9 resulted in training oscillation and instability while reducing it to 0.5 helped stabilize training
这里作者选择adam优化器,原始论文中adam的建议经验参数是lr=0.001到0.01之间,在GAN中作者选择0.002;动量参数设定为0.5.
DCGAN论文导读、关键点说明及代码实现修改(1)_第3张图片
上图是生成器的网络结构图,可以通过添加上采样层增加输出图像的尺寸,这里主要是注意各层数据的对齐问题,我们使用GAN的时候主要也是保证数据输入满足网络输入层要求,具体官方代码的解析及修改理解在后续文章给出。
4.3 关于GAN中最核心的公式
在这里插入图片描述
这个公式表达成一体,其实我们可以分成两部分理解,结合GAN的训练过程,如下:

#伪代码
def train_D(): #训练判别器
    for i in range(d_num):
    	freeze G #此时训练要固定住生成器,生成器权重不更新
    	train D # 此时训练判别器的目标是最大化判别的正确率,对应的让式子logD(x) + log(1 - D(G(Z))的值达到最大
def train_G(): #训练生成器
	for j in range(g_num):
		freeze D #此时训练要固定之前训练好的判别器
		train G # 训练G 使式子logD(G(z))的值达到最小,也就是D(G(z))的值越接近1,也就是G生成的数据尽量多的欺骗了D
for z in range(total): # 循环运行train_D 和 train_G
	if z % 2 == 0:
		train_D
	else:
		train_G

5. 各种评估和实验

其中我觉得最有用的是这句
One common technique for evaluating the quality of unsupervised representation learning algorithms is to apply them as a feature extractor on supervised datasets and evaluate the performance of linear models fitted on top of these features.
讲的是GAN训练的评估方法,我们不是看判别器或者生成器的loss,而是最后把判别器作为一个分离器的特征提取器(冻结权重不更新)评估此时分离器的表现,用分类器的表现衡量最终的GAN训练好坏。

你可能感兴趣的:(神经网络,深度学习,机器学习)