Deep Learning at NIPS2012

by Kevin Duh on Dec 17, 2012

 

I went to NIPS2012 to learn about recent advances in Deep Learning (see [1] for conference pre-proceedings and [2] for a well-attended workshop program by Bengio, Bergstra, & Le). There were an amazing number of papers on this topic this year. To summarize, the area is developing extremely fast, and what we knew last year may no longer be "true" this year. For example, I had thought that the unsupervised training objective and the auto-encoder/RBM building blocks are the key ingredients of Deep Learning. However, some have called this into question, demonstrating impressive empirical results with entirely different training architectures (e.g. stacked SVMs with random projections [3], sum-product networks [4]).


This begs the question: "What exactly is Deep Learning?" And to channel John Platt, "When is all this hype going to end?" Personally, I think all hypes end someday but the ideas that repeatedly demonstrate robust empirical results will stay as a part of our machine learning toolkit. This year, we are increasingly seeing Deep Learning being attempted on new problems beyond the usual CIFAR (image) and TIMIT (speech) datasets, such as 3-D object classification [5], text-image modeling [6], video [7], and computational biology [8]. I think this is a good development, since we need experience with more datasets in order to understand when Deep Learning works, and when it fails.


I especially enjoyed the talks that seek a deep understanding of Deep Learning. For example: Pascal Vincent described his contractive auto-encoders as a way to extract the manifold structure without relying on K nearest neighbors, which may be unreliable in high dimensions [9]. Montufar & Morton give proofs on the representational power of RBMs [10]. Coates et. al. provide empirical evidence that given scale, one can interpret k-means on images as learning selective features and clustering of these features as learning pooling/invariance [11]. Personally, I was most inspired by Stephane Mallat's invited talk [12], which viewed the problem under the lens of group theory. In classification, our goal is to to reduce intra-class variability, which will then help inter-class separation. This can be achieved by characterizing a symmetric group G of operators that leaves the data S invariant (think: rotation, translation, scaling, and deformation in images). Although there are many causes of intra-class variation, if we can assume that this group can be factorized, then a deep architecture may be able to learn the stable parts/components. In the end, he conjectures that signal sparsity is an important enabler for learning these stable parts.


On the practical side, I am happy to see that NIPS is accepting papers that focus on the implementation details of these algorithms. Jeff Dean's Google team gives specifics of how they implemented distributed SGD and L-BFGS optimization on tens of thousands of cores [14]. Krizhevsky [15] describes some tricks for efficient multi-GPU implementation, including the use of non-saturating non-linearities for fast gradient descent. There is no doubt that Deep Learning is enabled by large datasets and compute power. In the future we will likely see implementations in heterogenous computing environment, in order to fully exploit the tradeoff in compute power and energy efficiency. Further, Bergstra argues that for Deep Learning to be widely-adopted, the most urgent needs are tools forhigh-throughput hyper-parameter optimization [16], in addition to open-source packages such as Theano and Torch.


So what is in store for the future? In the workshop panel, Richard Socher advocates for moving beyond flat classification and incorporating structure[16]. Bruno Olshausen encourages researchers to consider a broader set of problems and architectures in order to gain more insight into the brain's neural representations [17]. I think I'd be very happy to see Deep Learning develop along these two directions more next year.

Last but not least, I should note note that NIPS2012 isn't just about Deep Learning. There are many other exciting developments, including spectral learning, stochastic optimization, network modeling, and Bayesian methods. Here are some other blogpost I enjoyed that give good summaries [18-21]. Feel free to add anything or comment!


[1] NIPS2012 pre-proceedings: http://books.nips.cc/nips25.html
[2] Deep Learning Workshop: schedule | deeplearningnips2012
[3] Vinyals, Ji, Deng, & Darrell. Learning with Recursive Perceptual Representations. http://books.nips.cc/papers/files/nips25/NIPS2012_1290.pdf
[4] Gens & Domingos. Discriminative Learning of Sum-Product Networks.http://books.nips.cc/papers/files/nips25/NIPS2012_1484.pdf
[5] Socher, Huval, Bath, Manning, Ng. Convolutional-Recursive Deep Learning for 3D Object Classification.http://books.nips.cc/papers/files/nips25/NIPS2012_0304.pdf
[6] Srivastava & Salakhutdinov. Multimodal learning with Deep Boltzmann Machines. http://books.nips.cc/papers/files/nips25/NIPS2012_1105.pdf
[7] Zou, Ng, Zhu, Yu. Deep Learning of Invariant Features via Simulated Fixations in Video. http://books.nips.cc/papers/files/nips25/NIPS2012_1467.pdf
[8] Di Lena, Baldi, Nagata. Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction.http://books.nips.cc/papers/files/nips25/NIPS2012_0263.pdf
[9] Vincent. Modeling the Data Manifold with Autoencoders and Unnormalized probability density models. pascal vincent | deeplearningnips2012
[10] Montufar and Morton. When Does a Mixture of Products Contain a Product of Mixtures? http://arxiv.org/abs/1206.0387
[11] Coates, Karpathy, Ng. Emergence of Object-Selective Features in Unsupervised Feature Learning.http://books.nips.cc/papers/files/nips25/NIPS2012_1248.pdf
[12] Mallat. Classification with Deep Invariant Scattering Networks.http://nips.cc/Conferences/2012/Program/event.php?ID=3127
[13] Dean et. al. Large Scale Distributed Deep Networks.http://books.nips.cc/papers/files/nips25/NIPS2012_0598.pdf
[14] Krizhevsky, Sutskever, Hinton. ImageNet Classification with Deep Convolutional Neural Networks.http://books.nips.cc/papers/files/nips25/NIPS2012_0534.pdf
[15] Bergstra & Bengio. Random Search for Hyper-paramter Optimization.http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf
[16] Socher. New Directions in Deep Learning: Structured Models, Tasks, and Datasets. richard socher | deeplearningnips2012
[17] Olshausen. Can "Deep Learning" offer deep insights about Visual Representation? bruno olshausen | deeplearningnips2012
[18] Daume III. NLPers blog post: http://nlpers.blogspot.jp/2012/12/nips-stuff.html
[19] Park. Memming blog post: http://memming.wordpress.com/2012/12/15/nips-2012/
[20] Yue. Random Ponderings blog post:http://yyue.blogspot.jp/2012/12/thoughts-on-nips-2012.html
[21] Venkatasubranmanian. Geomblog post:http://geomblog.blogspot.jp/2012/12/nips-ruminations-i.html

From:https://plus.google.com/106477459404474214513/posts/GTvauSD1b9q

你可能感兴趣的:(2012)