Generative Adversarial Nets翻译 上
code
4.2 Convergence of Algorithm 1
4.2算法1的收敛性
Proposition 2. If G and D have enough capacity, and at each step of Algorithm 1, the discriminator is allowed to reach its optimum given G, andis updated so as to improve the criterion
命题2.如果G和D具有足够的容量,并且在算法1的每一步,允许鉴别器达到其最佳给定G,并且更新以便改进标准
converges to pdata
然后收敛到pdata
Proof. Consideritself. Using a multilayer perceptron to define G introduces multiple critical points in parameter space. However, the excellent performance of multilayer perceptrons in practice suggests that they are a reasonable model to use despite their lack of theoretical guarantees.
在实践中,对抗网络通过函数本身。使用多层感知器来定义G在参数空间中引入了多个关键点。然而,多层感知器在实践中的优异性能表明它们是一种合理的模型,尽管它们缺乏理论上的保证。
5 Experiments
5实验
We trained adversarial nets an a range of datasets including MNIST[23], the Toronto Face Database (TFD) [28], and CIFAR-10 [21]. The generator nets used a mixture of rectifier linear activations [19, 9] and sigmoid activations, while the discriminator net used maxout [10] activations. Dropout [17] was applied in training the discriminator net. While our theoretical framework permits the use of dropout and other noise at intermediate layers of the generator, we used noise as the input to only the bottommost layer of the generator network.
我们训练了对抗网络的一系列数据集,包括MNIST [23],多伦多人脸数据库(TFD)[28]和CIFAR-10 [21]。发电机网使用了整体线性激活[19,9]和S形激活的混合,而鉴别器网使用了maxout [10]激活。辍学[17]用于训练鉴别网。虽然我们的理论框架允许在发电机的中间层使用压差和其他噪声,但我们使用噪声作为发电机网络最底层的输入。
We estimate probability of the test set data underby fitting a Gaussian Parzen window to the samples generated with G and reporting the log-likelihood under this distribution. The σ parameter of the Gaussians was obtained by cross validation on the validation set. This procedure was introduced in Breuleux et al. [8] and used for various generative models for which the exact likelihood is not tractable [25, 3, 5]. Results are reported in Table 1. This method of estimating the likelihood has somewhat high variance and does not perform well in high dimensional spaces but it is the best method available to our knowledge. Advances in generative models that can sample but not estimate likelihood directly motivate further research into how to evaluate such models.
我们通过将高斯Parzen窗口设置为用G生成的样本并在此分布下报告对数似然来估计下测试集数据的概率。通过验证集上的交叉验证获得高斯的σ参数。该程序在Breuleux等人的文章中介绍。 [8]并用于各种生成模型,其确切的可能性不易处理[25,3,5]。结果报告在表1中。这种估计可能性的方法具有稍高的方差,并且在高维空间中表现不佳,但它是我们所知的最佳方法。可以采样但不估计可能性的生成模型的进步直接激发了对如何评估这些模型的进一步研究。
Table 1: Parzen window-based log-likelihood estimates. The reported numbers on MNIST are the mean loglikelihood of samples on test set, with the standard error of the mean computed across examples. On TFD, we computed the standard error across folds of the dataset, with a different σ chosen using the validation set of each fold. On TFD, σ was cross validated on each fold and mean log-likelihood on each fold were computed. For MNIST we compare against other models of the real-valued (rather than binary) version of dataset.
表1:Parzen基于窗口的对数似然估计。在MNIST上报告的数字是测试集上样本的平均对数似然,并且在示例中计算平均值的标准误差。在TFD上,我们计算了数据集折叠的标准误差,使用每个折叠的验证集选择不同的σ。在TFD上,在每个折叠上交叉验证σ,并计算每个折叠的平均对数似然。对于MNIST,我们将与其他数据集的实值(而非二进制)版本进行比较。
In Figures 2 and 3 we show samples drawn from the generator net after training. While we make no claim that these samples are better than samples generated by existing methods, we believe that these samples are at least competitive with the better generative models in the literature and highlight the potential of the adversarial framework.
在图2和图3中,我们显示了训练后从发电机网抽取的样本。虽然我们没有声称这些样本比现有方法生成的样本更好,但我们认为这些样本至少与文献中更好的生成模型竞争,并突出了对抗框架的潜力。
a) b) c) d) Figure 2: Visualization of samples from the model. Rightmost column shows the nearest training example of the neighboring sample, in order to demonstrate that the model has not memorized the training set. Samples are fair random draws, not cherry-picked. Unlike most other visualizations of deep generative models, these images show actual samples from the model distributions, not conditional means given samples of hidden units. Moreover, these samples are uncorrelated because the sampling process does not depend on Markov chain mixing. a) MNIST b) TFD c) CIFAR-10 (fully connected model) d) CIFAR-10 (convolutional discriminator and “deconvolutional” generator)
a)b)c)d)图2:来自模型的样品的可视化。最右边的列显示最近的相邻样本的训练示例,以证明模型没有记住训练集。样品是公平的随机抽取,而不是挑选。与深度生成模型的大多数其他可视化不同,这些图像显示来自模型分布的实际样本,而不是给定隐藏单元样本的条件均值。而且,这些样品是不相关的,因为取样过程不依赖于马尔可夫链混合。 a)MNIST b)TFD c)CIFAR-10(完全连接模型)d)CIFAR-10(卷积鉴别器和“反卷积”发生器)
_
Figure 3: Digits obtained by linearly interpolating between coordinates in z space of the full model.
图3:通过在完整模型的z空间中的坐标之间线性插值获得的数字。
Table 2: Challenges in generative modeling: a summary of the difficulties encountered by different approaches to deep generative modeling for each of the major operations involving a model.
表2:生成建模中的挑战:对涉及模型的每个主要操作的深度生成建模的不同方法所遇到的困难的总结。
6 Advantages and disadvantages
6优点和缺点
This new framework comes with advantages and disadvantages relative to previous modeling frameworks. The disadvantages are primarily that there is no explicit representation of, and that D must be synchronized well with G during training (in particular, G must not be trained too much without updating D, in order to avoid “the Helvetica scenario” in which G collapses too many values of z to the same value of x to have enough diversity to model pdata), much as the negative chains of a Boltzmann machine must be kept up to date between learning steps. The advantages are that Markov chains are never needed, only backprop is used to obtain gradients, no inference is needed during learning, and a wide variety of functions can be incorporated into the model. Table 2 summarizes the comparison of generative adversarial nets with other generative modeling approaches.
与以前的建模框架相比,这个新框架具有优缺点。缺点主要在于没有的明确表示,并且在训练期间D必须与G很好地同步(特别是,在不更新D的情况下,G不得过多训练,以避免“Helvetica场景”,其中G将太多的z值折叠到x的相同值以具有足够的多样性来模拟pdata),就像Boltzmann机器的负链必须在学习步骤之间保持最新一样。优点是永远不需要马尔可夫链,只有backprop用于获得梯度,学习期间不需要推理,并且可以将多种功能合并到模型中。表2总结了生成性对抗网与其他生成建模方法的比较。
The aforementioned advantages are primarily computational. Adversarial models may also gain some statistical advantage from the generator network not being updated directly with data examples, but only with gradients flowing through the discriminator. This means that components of the input are not copied directly into the generator’s parameters. Another advantage of adversarial networks is that they can represent very sharp, even degenerate distributions, while methods based on Markov chains require that the distribution be somewhat blurry in order for the chains to be able to mix between modes.
上述优点主要是计算的。对抗模型也可以从发电机网络获得一些统计优势,而不是直接用数据示例更新,而是仅通过鉴别器得到梯度。这意味着输入的组件不会直接复制到生成器的参数中。对抗性网络的另一个优点是它们可以表示非常尖锐,甚至简并的分布,而基于马尔可夫链的方法要求分布有些模糊,以便链能够在模式之间混合。
7 Conclusions and future work
7结论和未来的工作
This framework admits many straightforward extensions:
该框架承认了许多简单的扩展:
1. A conditional generative modelcan be obtained by adding c as input to both G and D.
1.条件生成模型可以通过添加c作为G和D的输入来获得。
2. Learned approximate inference can be performed by training an auxiliary network to predict z given x. This is similar to the inference net trained by the wake-sleep algorithm [15] but with the advantage that the inference net may be trained for a fixed generator net after the generator net has finished training.
2.可以通过训练辅助网络来预测给定x的z来进行学习的近似推断。这类似于由唤醒 - 睡眠算法[15]训练的推理网,但具有以下优点:在发电机网完成训练之后,可以为固定的发电机网训练推理网。
3. One can approximately model all conditionalswhere S is a subset of the indices of x by training a family of conditional models that share parameters. Essentially, one can use adversarial nets to implement a stochastic extension of the deterministic MP-DBM [11].
3.通过训练共享参数的条件模型族,可以近似地模拟所有条件,其中S是x的索引的子集。从本质上讲,人们可以使用对抗网来实现确定性MP-DBM的随机扩展[11]。
4. Semi-supervised learning: features from the discriminator or inference net could improve performance of classifiers when limited labeled data is available.
4.半监督学习:当有限的标记数据可用时,鉴别器或推理网的特征可以提高分类器的性能。
5. Efficiency improvements: training could be accelerated greatly by divising better methods for coordinating G and D or determining better distributions to sample z from during training.
5.提高效率:通过分配更好的协调G和D的方法或确定在训练期间更好地分配样本z,可以大大加速训练。
This paper has demonstrated the viability of the adversarial modeling framework, suggesting that these research directions could prove useful.
本文证明了对抗性建模框架的可行性,表明这些研究方向可能有用。
Acknowledgments
致谢
We would like to acknowledge Patrice Marcotte, Olivier Delalleau, Kyunghyun Cho, Guillaume Alain and Jason Yosinski for helpful discussions. Yann Dauphin shared his Parzen window evaluation code with us. We would like to thank the developers of Pylearn2 [12] and Theano [7, 1], particularly Fr´ed´eric Bastien who rushed a Theano feature specifically to benefit this project. Arnaud Bergeron provided much-needed support with LATEX typesetting. We would also like to thank CIFAR, and Canada Research Chairs for funding, and Compute Canada, and Calcul Qu´ebec for providing computational resources. Ian Goodfellow is supported by the 2013 Google Fellowship in Deep Learning. Finally, we would like to thank Les Trois Brasseurs for stimulating our creativity.
我们要感谢Patrice Marcotte,Olivier Delalleau,Kyunghyun Cho,Guillaume Alain和Jason Yosinski的有益讨论。Yann Dauphin与我们分享了他的Parzen窗口评估代码。我们要感谢Pylearn2 [12]和Theano [7,1]的开发人员,特别是Fr'ed'eric Bastien,他特意推出了Theano功能,特别是这个项目。Arnaud Bergeron为LATEX排版提供了急需的支持。我们还要感谢CIFAR和加拿大研究主席的资助,以及Compute Canada和Calcul Qu'ebec提供的计算资源。Ian Goodfellow得到2013年谷歌深度学习奖学金的支持。最后,我们要感谢Les Trois Brasseurs激发我们的创造力。
References
参考
[1] Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I. J., Bergeron, A., Bouchard, N., and Bengio, Y. (2012). Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop.
[1] Bastien,F.,Lamblin,P.,Pascanu,R.,Bergstra,J.,Goodfellow,I。J.,Bergeron,A.,Bouchard,N。和Bengio,Y。(2012)。 Theano:新功能和速度改进。深度学习和无监督功能学习NIPS 2012研讨会。
[2] Bengio, Y. (2009). Learning deep architectures for AI. Now Publishers.
[2] Bengio,Y。(2009)。学习AI的深层架构。现在出版商。
[3] Bengio, Y., Mesnil, G., Dauphin, Y., and Rifai, S. (2013a). Better mixing via deep representations. In ICML’13.
[3] Bengio,Y.,Mesnil,G.,Dauphin,Y。和Rifai,S。(2013a)。通过深层表示更好地混合。在ICML'13。
[4] Bengio, Y., Yao, L., Alain, G., and Vincent, P. (2013b). Generalized denoising auto-encoders as generative models. In NIPS26. Nips Foundation.
[4] Bengio,Y.,Yao,L.,Alain,G。和Vincent,P。(2013b)。广义去噪自动编码器作为生成模型。在NIPS26中。 Nips基金会。
[5] Bengio, Y., Thibodeau-Laufer, E., and Yosinski, J. (2014a). Deep generative stochastic networks trainable by backprop. In ICML’14.
[5] Bengio,Y.,Thibodeau-Laufer,E。和Yosinski,J。(2014a)。由backprop训练的深度生成随机网络。在ICML'14。
[6] Bengio, Y., Thibodeau-Laufer, E., Alain, G., and Yosinski, J. (2014b). Deep generative stochastic networks trainable by backprop. In Proceedings of the 30th International Conference on Machine Learning (ICML’14).
[6] Bengio,Y.,Thibodeau-Laufer,E.,Alain,G。和Yosinski,J。(2014b)。由backprop训练的深度生成随机网络。在第30届国际机器学习大会(ICML'14)的会议录中。
[7] Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., and Bengio, Y. (2010). Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy). Oral Presentation.
[7] Bergstra,J.,Breuleux,O.,Bastien,F.,Lamblin,P.,Pascanu,R.,Desjardins,G.,Turian,J.,Warde-Farley,D。和Bengio,Y。 (2010年)。 Theano:CPU和GPU数学表达式编译器。在用于科学计算会议(SciPy)的Python会议录中。口头表达。
[8] Breuleux, O., Bengio, Y., and Vincent, P. (2011). Quickly generating representative samples from an RBM-derived process. Neural Computation, 23(8), 2053–2073.
[8] Breuleux,O.,Bengio,Y。和Vincent,P。(2011)。从RBM衍生过程快速生成代表性样本。神经计算,23(8),2053-2073。
[9] Glorot, X., Bordes, A., and Bengio, Y. (2011). Deep sparse rectifier neural networks. In AISTATS’2011.
[9] Glorot,X.,Bordes,A。和Bengio,Y。(2011)。深度稀疏整流神经网络。在AISTATS'2011。
[10] Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., and Bengio, Y. (2013a). Maxout networks. In ICML’2013.
[10] Goodfellow,I。J.,Warde-Farley,D.,Mirza,M.,Courville,A。和Bengio,Y。(2013a)。 Maxout网络。在ICML'2013。
[11] Goodfellow, I. J., Mirza, M., Courville, A., and Bengio, Y. (2013b). Multi-prediction deep Boltzmann machines. In NIPS’2013.
[11] Goodfellow,I.J。,Mirza,M.,Courville,A。和Bengio,Y。(2013b)。多预测深Boltzmann机器。在NIPS'2013。
[12] Goodfellow, I. J., Warde-Farley, D., Lamblin, P., Dumoulin, V., Mirza, M., Pascanu, R., Bergstra, J., Bastien, F., and Bengio, Y. (2013c). Pylearn2: a machine learning research library. arXiv preprint arXiv:1308.4214.
[12] Goodfellow,IJ,Warde-Farley,D.,Lamblin,P.,Dumoulin,V.,Mirza,M.,Pascanu,R.,Bergstra,J.,Bastien,F。和Bengio,Y。( 2013c)。Pylearn2:机器学习研究库。 arXiv preprint arXiv:1308.4214。
[13] Gutmann, M. and Hyvarinen, A. (2010). Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In AISTATS’2010.
[13] Gutmann,M。和Hyvarinen,A。(2010)。噪声对比估计:非标准化统计模型的一种新的估计原理。在AISTATS'2010。
[14] Hinton, G., Deng, L., Dahl, G. E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., and Kingsbury, B. (2012a). Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29(6), 82–97.
[14] Hinton,G.,Deng,L.,Dahl,GE,Mohamed,A.,Jaitly,N.,Senior,A.,Vanhoucke,V.,Nguyen,P.,Sainath,T。和Kingsbury, B.(2012a)。用于语音识别中声学建模的深度神经网络。 IEEE信号处理杂志,29(6),82-97。
[15] Hinton, G. E., Dayan, P., Frey, B. J., and Neal, R. M. (1995). The wake-sleep algorithm for unsupervised neural networks. Science, 268, 1558–1161.
[15] Hinton,G.E.,Dayan,P.,Frey,B.J。和Neal,R。M.(1995)。无监督神经网络的唤醒睡眠算法。 Science,268,1558-1161。
[16] Hinton, G. E., Osindero, S., and Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1554.
[16] Hinton,G。E.,Osindero,S。和Teh,Y。(2006)。深度信念网的快速学习算法。神经计算,18,1527-1554。
[17] Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2012b). Improving neural networks by preventing co-adaptation of feature detectors. Technical report, arXiv:1207.0580.
[17] Hinton,G。E.,Srivastava,N.,Krizhevsky,A.,Sutskever,I。和Salakhutdinov,R。(2012b)。通过防止特征检测器的共同适应来改善神经网络。技术报告,arXiv:1207.0580。
[18] Hyv¨arinen, A. (2005). Estimation of non-normalized statistical models using score matching. J. Machine Learning Res., 6.
[18]Hyv¨arinen,A。(2005)。使用得分匹配估计非标准化统计模型。 J. Machine Learning Res。,6。
[19] Jarrett, K., Kavukcuoglu, K., Ranzato, M., and LeCun, Y. (2009). What is the best multi-stage architecture for object recognition? In Proc. International Conference on Computer Vision (ICCV’09), pages 2146–2153. IEEE.
[19] Jarrett,K.,Kavukcuoglu,K.,Ranzato,M。和LeCun,Y。(2009)。什么是对象识别的最佳多阶段架构?在Proc。国际计算机视觉会议(ICCV'09),第2146-2153页。 IEEE。
[20] Kingma, D. P. and Welling, M. (2014). Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations (ICLR).
[20] Kingma,D。P.和Welling,M。(2014)。自动编码变分贝叶斯。在国际学习代表大会(ICLR)会议记录中。
[21] Krizhevsky, A. and Hinton, G. (2009). Learning multiple layers of features from tiny images. Technical report, University of Toronto.
[21] Krizhevsky,A。和Hinton,G。(2009)。从微小图像中学习多层特征。技术报告,多伦多大学。
[22] Krizhevsky, A., Sutskever, I., and Hinton, G. (2012). ImageNet classification with deep convolutional neural networks. In NIPS’2012.
[22] Krizhevsky,A.,Sutskever,I。和Hinton,G。(2012)。具有深度卷积神经网络的ImageNet分类。在NIPS'2012。
[23] LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
[23] LeCun,Y.,Bottou,L.,Bengio,Y。和Haffner,P。(1998)。基于梯度的学习应用于文档识别。 IEEE,86(11),2278-2324的会议记录。
[24] Rezende, D. J., Mohamed, S., and Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. Technical report, arXiv:1401.4082.
[24] Rezende,D。J.,Mohamed,S。和Wierstra,D。(2014)。深部生成模型中的随机反向传播和近似推断。技术报告,arXiv:1401.4082。
[25] Rifai, S., Bengio, Y., Dauphin, Y., and Vincent, P. (2012). A generative process for sampling contractive auto-encoders. In ICML’12.
[25] Rifai,S.,Bengio,Y.,Dauphin,Y。和Vincent,P。(2012)。采样压缩自动编码器的生成过程。在ICML'12。
[26] Salakhutdinov, R. and Hinton, G. E. (2009). Deep Boltzmann machines. In AISTATS’2009, pages 448– 455.
[26] Salakhutdinov,R。和Hinton,G。E.(2009)。深Boltzmann机器。在AISTATS'2009,第448-455页。
[27] Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory. In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distributed Processing, volume 1, chapter 6, pages 194–281. MIT Press, Cambridge.
[27] Smolensky,P。(1986)。动力系统中的信息处理:和谐理论的基础。在D. E. Rumelhart和J. L. McClelland,编辑,Parallel Distributed Processing,第1卷,第6章,第194-281页。麻省理工学院出版社,剑桥
[28] Susskind, J., Anderson, A., and Hinton, G. E. (2010). The Toronto face dataset. Technical Report UTML TR 2010-001, U. Toronto.
[28] Susskind,J.,Anderson,A。和Hinton,G。E.(2010)。多伦多面对数据集。技术报告UTML TR 2010-001,U。Toronto。
[29] Tieleman, T. (2008). Training restricted Boltzmann machines using approximations to the likelihood gradient. In W. W. Cohen, A. McCallum, and S. T. Roweis, editors, ICML 2008, pages 1064–1071. ACM.
[29] Tieleman,T。(2008)。使用近似似然梯度训练受限制的Boltzmann机器。在W.W.Cohen,A.McCallum和S.T.Roweis,编辑,ICML 2008,第1064-1071页。 ACM。
[30] Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.-A. (2008). Extracting and composing robust features with denoising autoencoders. In ICML 2008.
[30] Vincent,P.,Larochelle,H.,Bengio,Y。和Manzagol,P.-A。 (2008年)。使用去噪自动编码器提取和组合强大的功能。在ICML 2008中。
[31] Younes, L. (1999). On the convergence of Markovian stochastic algorithms with rapidly decreasing ergodicity rates. Stochastics and Stochastic Reports, 65(3), 177–228.
[31] Younes,L。(1999)。马尔可夫随机算法收敛性快速降低的遍历率。随机指标和随机报告,65(3),177-228。
文章引用于 http://tongtianta.site/paper/14506
编辑 Lornatang
校准 Lornatang