2016.4.15 nature deep learning review[1]

今天,我本来想膜一下,所以找到了上古时期发表再nature上的反向传播的论文,但是没看下去。。。所以,翻出来了15年发表在nature上的deep learning,相当于一个review,来阅读一下,而且感觉引文会比较重要,所以这篇中枢值较高的文献拿来学一学。

相关资料:

英文原文:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.436.894&rep=rep1&type=pdf

中文翻译:

http://www.csdn.net/article/2015-06-01/2824811

http://www.csdn.net/article/2015-06-02/2824825 

可视化资料:

http://colah.github.io/

 

 

Abstract讲深度学习近些年在各领域取得了非常好的效果。

 

第一段其实是在讲一个大框架,ml的传统方法是,如果为了去做分类等等的人物,需要自己去提特征,然后进行后续的任务,但是这种方法需要很多专业的知识,不容易再工程上上手。于是有个representation learning这个领域,就是输入数据之后,学习到一些为了目标容易区分的特征,或者说讲原始数据换一个表达方式,使得方便后面的分类啊等等的处理。Deep learning这个工具就很牛逼,我虽然什么都不知道,但是我还是能够从不同的层次抽象出来不同的特征,从而进行学习。现在呢,已经广泛应用再各个领域中了。

下面是一些近期文献:

 

图像识别

1. Krizhevsky, A., Sutskever, I. & Hinton, G. ImageNet classification with deepconvolutional neural networks. In Proc. Advances in Neural InformationProcessing Systems 25 1090–1098 (2012).This report was a breakthrough that used convolutional nets to almost halvethe error rate for object recognition, and precipitated the rapid adoption ofdeep learning by the computer vision community.

2. Farabet, C., Couprie, C., Najman, L. & LeCun, Y. Learning hierarchical features forscene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1915–1929 (2013).

3. Tompson, J., Jain, A., LeCun, Y. & Bregler, C. Joint training of a convolutionalnetwork and a graphical model for human pose estimation. In Proc. Advances inNeural Information Processing Systems 27 1799–1807 (2014).

4. Szegedy, C. et al. Going deeper with convolutions. Preprint at http://arxiv.org/abs/1409.4842 (2014).

语音识别

5. Mikolov, T., Deoras, A., Povey, D., Burget, L. & Cernocky, J. Strategies for traininglarge scale neural network language models. In Proc. Automatic SpeechRecognition and Understanding 196–201 (2011).

6. Hinton, G. et al. Deep neural networks for acoustic modeling in speechrecognition. IEEE Signal Processing Magazine 29, 82–97 (2012).This joint paper from the major speech recognition laboratories, summarizingthe breakthrough achieved with deep learning on the task of phoneticclassification for automatic speech recognition, was the first major industrialapplication of deep learning.

7. Sainath, T., Mohamed, A.-R., Kingsbury, B. & Ramabhadran, B. Deepconvolutional neural networks for LVCSR. In Proc. Acoustics, Speech and SignalProcessing 8614–8618 (2013).

药物分子

8. Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E. & Svetnik, V. Deep neural nets as amethod for quantitative structure-activity relationships. J. Chem. Inf. Model. 55,263–274 (2015).

粒子加速器的数据

9. Ciodaro, T., Deva, D., de Seixas, J. & Damazio, D. Online particle detection withneural networks based on topological calorimetry information. J. Phys. Conf.Series 368, 012030 (2012).

10. Kaggle. Higgs boson machine learning challenge. Kaggle https://www.kaggle.com/c/higgs-boson (2014).

重构大脑回路

11. Helmstaedter, M. et al. Connectomic reconstruction of the inner plexiform layerin the mouse retina. Nature 500, 168–174 (2013).

基因的疾病与表达

12. Leung, M. K., Xiong, H. Y., Lee, L. J. & Frey, B. J. Deep learning of the tissueregulatedsplicing code. Bioinformatics 30, i121–i129 (2014).

13. Xiong, H. Y. et al. The human splicing code reveals new insights into the geneticdeterminants of disease. Science 347, 6218 (2015).

 

自然语言理解

14. Collobert, R., et al. Natural language processing (almost) from scratch. J. Mach.Learn. Res. 12, 2493–2537 (2011).

问答系统

15. Bordes, A., Chopra, S. & Weston, J. Question answering with subgraph

embeddings. In Proc. Empirical Methods in Natural Language Processing http://arxiv.org/abs/1406.3676v3 (2014).

机器翻译

16. Jean, S., Cho, K., Memisevic, R. & Bengio, Y. On using very large targetvocabulary for neural machine translation. In Proc. ACL-IJCNLP http://arxiv.org/abs/1412.2007 (2015).

17. Sutskever, I. Vinyals, O. & Le. Q. V. Sequence to sequence learning with neural networks. In Proc. Advances in Neural Information Processing Systems 273104–3112 (2014).

This paper showed state-of-the-art machine translation results with thearchitecture introduced in ref. 72, with a recurrent network trained to read asentence in one language, produce a semantic representation of its meaning,and generate a translation in another language.

 

 

监督式学习一段表示,以往的提取特征线性分类或者浅层非线性分类的方法都效果不太好,深层次的非线性能够提取不变的特征也能够从背景中区分出主要的内容。

比较费力的训练方式

18. Bottou, L. & Bousquet, O. The tradeoffs of large scale learning. In Proc. Advancesin Neural Information Processing Systems 20 161–168 (2007).

空间的一半的区域的分类

19. Duda, R. O. & Hart, P. E. Pattern Classification and Scene Analysis (Wiley, 1973).

核方法

20. Sch?lkopf, B. & Smola, A. Learning with Kernels (MIT Press, 2002).

高斯核

21. Bengio, Y., Delalleau, O. & Le Roux, N. The curse of highly variable functions for local kernel machines. In Proc. Advances in Neural Information Processing Systems 18 107–114 (2005).

 

多层结构的反向传播一段讲了通过反向传播算法能够训练网络,但是在就是年代的时候,人们因为认为很少的先验知识推断出有用的特征是在扯淡,而且认为容易陷入局部最优解,所以神经网络逐渐受到冷落。但是大数据使得局部最有很少,由于初始情况不同,最后仅有很少的差异。不过本世纪初,深度网络重燃战火,是因为CIFAR采用无监督学习到了特征去初始化网络,然后采用bpfine-fune,效果非常好,尤其是在手写数字识别和行人检测的应用上。所以当时的训练如果有大量label的数据,那就训吧,但是如果label的数据比较少,还是建议先用没有label 的数据pre-training一下。卷积神经网络近些年来也逐渐兴起,尤其在cv方面。

 

早年的模式识别

22. Selfridge, O. G. Pandemonium: a paradigm for learning in mechanisation of thought processes. In Proc. Symposium on Mechanisation of Thought Processes 513–526 (1958).

23. Rosenblatt, F. The Perceptron — A Perceiving and Recognizing Automaton. Tech.Rep. 85-460-1 (Cornell Aeronautical Laboratory, 1957).

八九年代通过简单的随即梯度下降训练神经网络

24. Werbos, P. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard Univ. (1974).

25. Parker, D. B. Learning Logic Report TR–47 (MIT Press, 1985).

26. LeCun, Y. Une procédure d’apprentissage pour Réseau à seuil assymétrique in Cognitiva 85: a la Frontière de l’Intelligence Artificielle, des Sciences de la Connaissance et des Neurosciences [in French] 599–604 (1985).

27. Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).

使用ReLU从而避免unsupervised pre-training

28. Glorot, X., Bordes, A. & Bengio. Y. Deep sparse rectifier neural networks. In Proc.14th International Conference on Artificial Intelligence and Statistics 315–323(2011).

This paper showed that supervised training of very deep neural networks is much faster if the hidden layers are composed of ReLU.

哪里有什么局部最优,倒是有一些鞍点

29. Dauphin, Y. et al. Identifying and attacking the saddle point problem in highdimensional non-convex optimization. In Proc. Advances in Neural Information Processing Systems 27 2933–2941 (2014).

30. Choromanska, A., Henaff, M., Mathieu, M., Arous, G. B. & LeCun, Y. The loss surface of multilayer networks. In Proc. Conference on AI and Statistics http://arxiv.org/abs/1412.0233 (2014).

深度网络重燃战火

31. Hinton, G. E. What kind of graphical model is the brain? In Proc. 19th International Joint Conference on Artificial intelligence 1765–1775 (2005).

32. Hinton, G. E., Osindero, S. & Teh, Y.-W. A fast learning algorithm for deep belief nets. Neural Comp. 18, 1527–1554 (2006).

This paper introduced a novel and effective way of training very deep neural networks by pre-training one hidden layer at a time using the unsupervised learning procedure for restricted Boltzmann machines.

33. Bengio, Y., Lamblin, P., Popovici, D. & Larochelle, H. Greedy layer-wise training of deep networks. In Proc. Advances in Neural Information Processing Systems 19 153–160 (2006).

This report demonstrated that the unsupervised pre-training method introduced in ref. 32 significantly improves performance on test data and generalizes the method to other unsupervised representation-learning techniques, such as auto-encoders.

34. Ranzato, M., Poultney, C., Chopra, S. & LeCun, Y. Efficient learning of sparse representations with an energy-based model. In Proc. Advances in Neural Information Processing Systems 19 1137–1144 (2006).

无监督初始化,bp fine-tune

33. Bengio, Y., Lamblin, P., Popovici, D. & Larochelle, H. Greedy layer-wise trainingof deep networks. In Proc. Advances in Neural Information Processing Systems 19 153–160 (2006).

This report demonstrated that the unsupervised pre-training method introduced in ref. 32 significantly improves performance on test data and generalizes the method to other unsupervised representation-learning techniques, such as auto-encoders.

34. Ranzato, M., Poultney, C., Chopra, S. & LeCun, Y. Efficient learning of sparse representations with an energy-based model. In Proc. Advances in Neural Information Processing Systems 19 1137–1144 (2006).

35. Hinton, G. E. & Salakhutdinov, R. Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006).

小数据上采用pre-training + fine-tune进行手写数字识别和行人检测

36. Sermanet, P., Kavukcuoglu, K., Chintala, S. & LeCun, Y. Pedestrian detection with unsupervised multi-stage feature learning. In Proc. International Conference on Computer Vision and Pattern Recognition http://arxiv.org/abs/1212.0142 (2013).

采用gpu进行训练

37. Raina, R., Madhavan, A. & Ng, A. Y. Large-scale deep unsupervised learning using graphics processors. In Proc. 26th Annual International Conference on Machine Learning 873–880 (2009).

在语音识别上取得了重大的突破

小数据38

大数据39

38. Mohamed, A.-R., Dahl, G. E. & Hinton, G. Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20, 14–22 (2012).

39. Dahl, G. E., Yu, D., Deng, L. & Acero, A. Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20, 33–42 (2012).

小数据集上pre-training 防止过拟合

40. Bengio, Y., Courville, A. & Vincent, P. Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Machine Intell. 35, 1798–1828 (2013).

卷积神经网络

41. LeCun, Y. et al. Handwritten digit recognition with a back-propagation network. In Proc. Advances in Neural Information Processing Systems 396–404 (1990).

This is the first paper on convolutional networks trained by backpropagation for the task of classifying low-resolution images of handwritten digits.

42. LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).

This overview paper on the principles of end-to-end training of modular systems such as deep neural networks using gradient-based optimization showed how neural networks (and in particular convolutional nets) can be combined with search or inference mechanisms to model complex outputs that are interdependent, such as sequences of characters associated with the content of a document.

 

卷积神经网络一段,还有一些经典的层,比如说卷积层或者是池化层。还有一些经典的特征。

不过人们总说卷积的局部链接是因为一个局部的特征可能也分布在别的地方,但是我感觉这其实和概率更有关系,我看到的其实是在这个模式下的某个概率的分布。

言归正传,总的架构和视觉上LGN-V1-V2-V4-IT的整体架构很相似。当一只猴子和convnet网络面对着一张图片的时候,convnet在一半的随即抽样的某区域神经元和猴子很相似?(大概这么翻译)convents发源于神经认知机,但是架构上虽然有一些相似,但是神经认知机没有类似于反向传播似的端到端的监督式的学习算法。一个一维的convnet可以较多延迟神经网络,可以用来识别音素和基本的词语。

回溯1990年代,有很多对于time-delay neural networks(1d convent)的应用,比如说语音识别和文档阅读上。文档阅读系统使用convnet训练一个概率模型,能够实现语言的约束到某一个范围。到了90年代后期,这个系统已经识别了超过10%的支票,基于convnet的光学媳妇识别和手写数字识别被微软研究。在90年代初期,convnet用在了在自然图片上进行检测,比如说面部和手部的检测,以及面部识别。

视觉神经元启发卷积和池化层

43. Hubel, D. H. & Wiesel, T. N. Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106–154 (1962).

44. Felleman, D. J. & Essen, D. C. V. Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex 1, 1–47 (1991).

一个研究关于convnet和猴子面对同一个神经元在高层次的表现

45. Cadieu, C. F. et al. Deep neural networks rival the representation of primate it cortex for core visual object recognition. PLoS Comp. Biol. 10, e1003963 (2014).

convent和神经认知机的关系

46. Fukushima, K. & Miyake, S. Neocognitron: a new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognition15, 455–469 (1982).

一维的convnet (time-delay neural net)用来识别音素和简单词语

47. Waibel, A., Hanazawa, T., Hinton, G. E., Shikano, K. & Lang, K. Phoneme recognition using time-delay neural networks. IEEE Trans. Acoustics Speech Signal Process. 37, 328–339 (1989).

48. Bottou, L., Fogelman-Soulié, F., Blanchet, P. & Lienard, J. Experiments with time delay networks and dynamic time warping for speaker independent isolated digit recognition. In Proc. EuroSpeech 89 537–540 (1989).

微软进行光学字符识别和手写数字识别

49. Simard, D., Steinkraus, P. Y. & Platt, J. C. Best practices for convolutional neural networks. In Proc. Document Analysis and Recognition 958–963 (2003).

自然图片中的物体检测

50. Vaillant, R., Monrocq, C. & LeCun, Y. Original approach for the localisation of objects in images. In Proc. Vision, Image, and Signal Processing 141, 245–250(1994).

51. Nowlan, S. & Platt, J. in Neural Information Processing Systems 901–908 (1995).

面部识别

52. Lawrence, S., Giles, C. L., Tsoi, A. C. & Back, A. D. Face recognition: a convolutional neural-network approach. IEEE Trans. Neural Networks 8, 98–113(1997).

你可能感兴趣的:(2016.4.15 nature deep learning review[1])