2015年深度学习淘金热 The Deep Learning Gold Rush of 2015

The Deep Learning Gold Rush of 2015

In the last few decades, we have witnessed major technological innovations such as personal computers and the internet finally reach the mainstream. And with mobile devices and social networks on the rise, we're now more connected than ever. So what's next? When is it coming? And how will it change our lives? Today I'll tell you that the next big advance is well underway and it's being fueled by a recent technique in the field of Artificial Intelligence  known as  Deep Learning.


The California Gold Rush of 2015 is all about Deep Learning. 
It's everywhere, you just don't know how to look.


All of today's excitement in Artificial Intelligence and Machine Learning stems from ground-breaking results in speech and visual object recognition using Deep Learning[1]. These algorithms are being applied to all sorts of data, and the learned deep neural networks outperform traditional expert systems carefully designed by scientists and engineers. End-to-end learning of deep representations from raw data is now possible due to a handful of well-performing deep learning recipes ( ConvNets, Dropout,  ReLUs,  LSTM,  DQN,  ImageNet). But if there's one final takeaway that we can extract from decades of machine learning research, is that for many problems going deep isn't a choice, it's often a  requirement.

Most of the apps and services you're already using (AirBnB, Snapchat, Twitch.tv, Uber, Yelp, LinkedIn, etc) are quite data-hungry and before you know it, they're all going to go mega-deep. So whether you need to revitalize your data science team with deep learning or you're starting an AI-from-day-one operation, it's pretty clear that everybody is rushing to get some of this  Silicon Valley Gold.

From Titans to Gold Miners: Your atypical Gold Rush

Like all great gold rushes, this movement is led by new faces, which are pouring into Silicon Valley like droves. But these aren't your typical unskilled immigrants willing to pick up a hammer, nor your fresh computer science grads with some app-writing skills. The key deep learning players of today (known as the Titans of Deep Learning) are computer science professors and researchers (seldom born in the USA) leaving their academic posts and bringing their students and ideas straight into Silicon Valley.

" Turn on, Tune in, Dropout" -- Timothy Leary

Recently, Google and Facebook announced that their operations are now being powered by Deep Learning [2,3]. And with most Deep Learning Titans representing the tech giants ( Yann LeCun at Facebook Research,  Geoffrey Hinton at Google, Andrew Ng at Baidu),  Deep Learning is likely to become one of the most sought after tech skills. With  Toyota to invest in $1 Billion in Robotics and Artificial Intelligence Research (November 6, 2015), the announcement of  YC Research (October 7, 2015), and the new  Google Brain Residency Program "Pre-doc" AI jobs (October 26, 2015), Silicon Valley just got a whole lot more interesting.

Silicon Valley re-defines itself, yet again 

To understand why it took so long for Deep Learning to take-off, let's take a brief look at the key technologies which defined Silicon Valley over the last 50 years.  The following timeline gives an overview of where Silicon Valley has been and where it's going.

Figure Adapted from Steve Blank's Secret History of Silicon Valley


1970s: Semiconductors 
The story of the digital-era starts with semiconductors. "Silicon" in "Silicon Valley" originally referred to the silicon chip or integrated circuit innovations as well as the location (close to Stanford) of much tech-related activity. The dominant firm from that time period was Fairchild Semiconductor International and it eventually gave rise to more recognizable companies like Intel. For a more detailed discussion of this birthing era, take a look at Steve Blank's  Secret History of Silicon Valley[4].
Read more about Fairchild at TechCrunch's First Trillion-Dollar Startup 

1980s: Personal Computers
Initially computers were quite large and used solely by research labs, government, and big businesses. But it was the personal computer which turned computer programming from a hobby into a vital skill. You no longer needed to be an MIT student to program on one of these badboys. While both Microsoft and Apple were founded in 1975 and 1976, respectively, they persevered due to their pioneering work in graphical user interfaces. This was the birth of the modern user-friendly Operating System. IBM approached Microsoft in 1980, regarding its upcoming personal computer, and from then on Microsoft would be King for a very long time.

2015年深度学习淘金热 The Deep Learning Gold Rush of 2015_第1张图片
See Mac-history's article on Microsoft's relationship with Apple


1990s: Internet
While the nerds at Universities were posting ascii messages on newsgroups in the 90s, service providers in the 1990s like AOL helped make the internet accessible to everyone. Remember getting all those AOL disks in the mail? Buying a chunk of digital real state (your own domain name) became possible and anybody with a dial up connection and some primitive text/HTML skills could start posting online content. With a mission statement like "organize the world's information", it was eventually Google that got the most of out the late 90s dot-com bubble, and remains a very strong player in all things tech.

2000s: Mobile and Social
While the dot-com bubble was about creating an online presence for startups and established companies, the way we use the internet has dramatically changed since 2001. A ton of new social communities have emerged, and due to Facebook we're now stars in our own reality show. Social and advertising have essentially turned the modern internet into a mainstream TV-like experience. The internet is no longer only for the nerds. The kings of this era (Google and Facebook) are also the biggest players in the Deep Learning space, because they have the largest user bases and in-house apps which can benefit most from machine learning.

2010-2015: Deep Learning comes to the party
Spend more than a day in Silicon Valley and you'll hear the popular expression, "Software is eating the world."  Rampant spreading of software was only possible once the internet (1990s) AND mobile devices (2000s) became essential parts of our lives. No longer do we physically mail floppy disks, and social media fuels any app that goes viral.  What traditional software is missing (or has been missing up until now) is the ability to improve over time from everyday use.  If that same software is able to connect to a large Deep Learning system and start improving, then we have a game-changer on our hands. This is already happening with online advertising, digital assistants like Siri, and smart auto-responders like Google's new email auto-reply feature.


The hierarchical award-winning "AlexNet" Deep Learning architecture 
Visualized using MIT's Toolbox for Deep Learning Neuron Visualization


Massive hiring of deep learning experts by the leading tech companies has only begun, but we also should be on the lookout for new ventures built on top of Deep Learning, not just a revitalization of last decade's successes. On this front, keep a close look at the following  Deep Learning Cloud Service upstarts:  Richard Socher from  MetaMind,  Matthew Zeiler from  Clarifai, and  Carlos Guestrin from  Dato.

2015-2020: Deep Learning Revitalizes Robotics
Recently it has been shown that Deep Learning can be used to help robots learn tasks involving movement, object manipulation, and decision making[6,7,8,9]. Before Deep Learning, lots of different pieces of robotic software and hardware would have to be developed independently and then hacked together for demo day. Today, you can use one of a handful of "Deep Learning for Robotics recipes" and start watching your robot learn the task you care about.

2015年深度学习淘金热 The Deep Learning Gold Rush of 2015_第2张图片
Robots Learns to Grasp using Deep Learning at Carnegie Mellon University. 
Supersizing Self-supervision: Learning to Grasp from 50K Tries and 700 Robot Hours [6]

With their 2013 acquisition of Boston Dynamics (a hardware play), 2014 acquisition of DeepMind (a software play), and a serious autonomous car play, Google is definitely early to the Robotics party. But the noteworthy bits are happening at the intersection of deep learning and robotics.  I suggest taking a closer look at the Robotics research of  Pieter Abbeel of Berkeley,  Abhinav Gupta of Carnegie Mellon, and  Ashutosh Saxena of Stanford -- all likely stars in the next  Deep Learning for Robotics race. As long as  Rodney Brooks keeps creating innovative Robotics platforms like Baxter, my expectations for Robotics are off the charts.

Conclusion

Unlike in 1849, the Deep Learning Gold Rush of 2015 is  not going to bring some 300,000 gold-seekers in boats to California's mainland. This isn't a bring-your-own-hammer kind of game -- the Titans have already descended from their Ivory Towers and handed us ample mining tools. But it won't hurt to gain some experience with traditional "shallow" machine learning techniques so you can appreciate the power of Deep Learning.

I hope you enjoyed today's read and have a better sense of how Silicon Valley is undergoing a transformation. And remember, today's wave of Deep Learning upstart CEOs have PhDs, but once Deep Learning software becomes more user-friendly (TensorFlow?), maybe you won't have to wait so long to  dropout.


References

[1] Krizhevsky, A., Sutskever, I. and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS 2012.
[2] D'Onfro, J. Google is 're-thinking' all of its products to include machine learning. Business Insider. October 22, 2015.
[3] D'Onfro, J. How Facebook will use artificial intelligence to organize insane amounts of data into the perfect News Feed and a personal assistant with superpowers. Business Insider. November 3, 2015.
[4] Blank, S. Secret History of Silicon Valley. 2008.
[5] Donglai Wei, Bolei Zhou, Antonio Torralba William T. Freeman. mNeuron: A Matlab Plugin to Visualize Neurons from Deep Models. 2015.
[6] Lerrel Pinto, Abhinav Gupta. Supersizing Self-supervision: Learning to Graspfrom 50K Tries and 700 Robot Hours. arXiv. 2015.
[7] Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel. End-to-End Training of Deep Visuomotor Policies. In RSS 2015.
[8] Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529-533.
[9] Ian Lenz, Ross Knepper, and Ashutosh Saxena. DeepMPC: Learning Deep Latent Features for Model Predictive Control.  In Robotics Science and Systems (RSS), 2015









Friday, June 26, 2015

Deep down the rabbit hole: CVPR 2015 and beyond

CVPR is the premier Computer Vision conference, and it's fair to think of it as  the Olympics of Computer Vision Research.This year it was held in my own back yard -- less than a mile away from lovely Cambridge, MA!  Plenty of my MIT colleagues attended, but I wouldn't be surprised if  Google had the largest showing at CVPR 2015. I have been going to CVPR almost every year since 2004, so let's take a brief tour at what's new in the exciting world of computer vision research.

Down the rabbit hole art by frostyshadows


A lot has changed. Nothing has changed. Academics used to be on top, defending their Universities and the awesomeness happening inside their non-industrial research labs. Academics are still on top, but now defending their Google, Facebook, Amazon, and Company X affiliations. And with the hiring budget to acquire the best and a heavy publishing-oriented culture, don't be surprised if the massive academia exodus continues for years to come. It's only been two weeks since CVPR, and Google has since then been busy making  ConvNet art, showing the world that if you want to do the best Deep Learning research, they are King.

An army of PhD students and Postdocs simply cannot defeat an army of Software Engineers and Research Scientists. Back in the day, students used to typically depart after a Computer Vision PhD (there used to be few vision research jobs and Wall Street jobs were tempting). Now the former PhD students run research labs at big companies which have been feverishly getting into vision. It seems there aren't enough deep experts to fill the deep demand.

Datasets used to be the big thing -- please download my data!  Datasets are still the big thing -- but we regret to inform you that your university’s computational resources won’t make the cut (but at Company X we’re always hiring, so come join us, and help push the frontier of research together).

Related Article:  Under LeCun's Leadership, Facebook's AI Research Lab is beefing up their research presence

If you want to check out the individual papers, I recommend Andrej Karpathy's  online navigation tool for CVPR 2015 papers or take a look at the vanilla listing of  CVPR 2015 papers on the CV foundation website.  Zoya Bylinskii, an MIT PhD Candidate, also put together  a list of interesting CVPR 2015 papers.

The ConvNet Revolution: There's a pre-trained network for that

Machine Learning used to be the Queen. Machine Learning is now the King. Machine Learning used to be shallow, but today's learning approaches are so deep that the diagrams barely fit on a single slide. Grad students used to pass around jokes about Yann LeCun and his insistence that machine learning will one day do the work of the feature engineering stage. Now it seems that the entire vision community gets to ignore you when you insist that “manual feature engineering” is going to save the day. Yann LeCun gave a keynote presentation with the intriguing title "What's wrong with Deep Learning" and it seems that Convolutional Neural Networks (also called CNNs or ConvNets) are everywhere at CVPR.


Figure from Karpathy's Convolutional Neural Network tutorial


It used to be hard to publish ConvNet research papers at CVPR, it's now hard to get a CVPR paper if you didn't at least compare against a ConvNet baseline.  Got a new cool problem? Oooh, you didn’t try a ConvNet-based baseline? Well, that explains why nobody cares.

But it's not like the machines are taking over the job of the vision scientist. Today's vision scientist is much more of an applied machine learning hacker than anything else, and because of the strong CNN theme, it is much easier to understand and re-implement today's vision systems. What we're seeing at CVPR is essentially a revisiting of the classic problems like segmentation and motion, using this new machinery. As Samson Timoner phrased it at the local Boston Vision Meetup, when Mutual Information was popular, the community jumped on that bandwagon -- it's ConvNets this time around. But it's not just a trend, the non-CNN competition is getting crushed.


Figure from  Bharath Hariharan's  Hypercolumns CVPR 2015 paper on segmentation using CNNs


There's still plenty to be done by a vision scientist, and a solid formal education in mathematics is more important than ever. We used to train via gradient descent. We still train via gradient descent. We used to drink Coffee, now we all drink Caffe. But behind the scenes, it is still mathematics.

Related Page:  Caffe Model Zoo where you can download lots of pretrained ConvNets

Deep down the rabbit hole


CVPR 2015 reminds of the pre-Newtonian days of physics. A lot of smart scientists were able to predict the motions of objects using mathematics once the ingenious Descartes taught us how to embed our physical thinking into a coordinate system. And it's pretty clear that by casting your computer vision problem in the language of ConvNets, you are going to beat just about anybody doing computer vision by hand. I think of Yann LeCun (one of the fathers of Deep Learning) as a modern day Descartes, only because I think the ground-breaking work is right around the corner. His mental framework of ConvNets is like a much needed coordinate system -- we might not know what the destination looks like, but we now know how to build a map.

Deep Networks are performing better every month, but I’m still waiting for Isaac to come in and make our lives even easier. I want a simplification. But I'm not being pessimistic -- there is a flurry of activity in the ConvNet space for a very good reason (in case you didn't get to attend CVPR 2015), so I'll just be blunt: ConvNets fuckin' work! I just want the F=ma of deep learning.


Open Source Deep Learning for Computer Vision: Torch vs Caffe

CVPR 2015 started off with some excellent software tutorials on day one.  There is some great non-alpha deep learning software out there and it has been making everybody's life easier.  At CVPR, we had both a  Torch tutorial and a Caffe tutorial.  I attended the  DIY Deep Learning Caffe tutorial and it was a full house -- standing room only for slackers like me who join the party only 5 minutes before it starts. Caffe is much more popular that Torch, but when talking to some power users of Deep Learning (like  +Andrej Karpathy and other DeepMind scientists), a certain group of experts seems to be migrating from Caffe to Torch.



Caffe is developed at Berkeley, has a vibrant community, Python bindings, and seems to be quite popular among University students.  Prof. Trevor Darrell at Berkeley is even looking for a Postdoc to help the Caffe effort. If I was a couple of years younger and a fresh PhD, I would definitely apply.

Instead of following the Python trend, Torch is Lua-based. There is no need for an interpreter like Matlab or Python -- Lua gives you the magic console. Torch is heavily used by Facebook AI Research Labs and Google's DeepMind Lab in London.  For those afraid of new languages like Lua, don't worry -- Lua is going to feel "easy" if you've dabbled in Python, Javascript, or Matlab. And if you don't like editing protocol buffer files by hand, definitely check out Torch.

It's starting to become clear that the future power of deep learning is going to come with its own self-contained software package like Caffe or Torch, and not from a dying breed of all-around tool-belts like OpenCV or Matlab. When you share creations made in OpenCV, you end up sharing source files, but with the Deep Learning toolkits, you end up sharing your pre-trained networks.  No longer do you have to think about a combination of 20 "little" algorithms for your computer vision pipeline -- you just think about which popular network architecture you want, and then the dataset.  If you have the GPUs and ample data, you can do full end-to-end training.  And if your dataset is small/medium, you can fine-tune the last few layers. You can even train a linear classifier on top of the final layer, if you're afraid of getting your hands dirty -- just doing that will beat the SIFTs, the HOGs, the GISTs, and all that was celebrated in the past two decades of computer vision.

Related Article:  Torch vs Theano on fastml.com
Related Code:  Andrea Vedaldi's MatConvNet Deep Learning Library for MATLAB users

The way in which ConvNets are being used at CVPR 2015 makes me feel like we're close to something big.  But before we strike gold, ConvNets still feel like a Calculus of Shadows, merely "hoping" to get at something bigger, something deeper, and something more meaningful. I think the flurry of research which investigates visualization algorithms for ConvNets suggests that even the network architects aren't completely sure what is happening behind the scenes.

The Video Game Engine Inside Your Head: A different path towards Machine Intelligence


Josh Tenenbaum gave an invited talk titled The Video Game Engine Inside Your Head at the  Scene Understanding Workshopon the last day of the CVPR 2015 conference. You can read a summary of his ideas in a  short Scientific American article. While his talk might appear to be unconventional by CVPR standards, it is classic Tenenbaum. In his world, there is no benchmark to beat, no curves to fit to shadows, and if you allow my LeCun-Descartes analogy, then in some sense Prof. Tenenbaum might be the modern day Aristotle. As  Prof. Jianxiong Xiao introduced Josh with a grand intro, he was probably right -- this is one of the most intelligent speakers you can find.  He speaks 100 words a second, you can't help but feel your brain enlarge as you listen.

One of Josh's main research themes is going beyond the shadows of image-based recognition.  Josh's work is all about building mental models of the world, and his work can really be thought of as analysis-by-synthesis. Inside his models is something like a video game engine, and he showed lots of compelling examples of inferences that are easy for people, but nearly impossible for the data-driven ConvNets of today.  It's not surprising that his student is working at Google's DeepMind this summer.

A couple of years ago,  Probabilistic Graphical Models (the marriage of Graph Theory and Probabilistic Methods) used to be all the rage.  Josh gave us a taste of  Probabilistic Programming, and while we're not yet seeing these new methods dominate the world of computer vision research, keep your eyes open. He mentioned a recent Nature paper (citation below) from another well respected machine intelligence research, which should keep the trendsetters excited for quite some time. Just take a look at the bad-ass looking  Julia code below:

Probabilistic machine learning and artificial intelligence.  Zoubin Ghahramani. Nature 521, 452–459 (28 May 2015) doi:10.1038/nature14541




To see some of Prof. Tenenbaum's ideas in action, take a look at the following CVPR 2015 paper, titled  Picture: A Probabilistic Programming Language for Scene Perception. Congrats to  Tejas D. Kulkarni, the first author, an MIT student, who got the Best Paper Honorable Mention prize for this exciting new work. Google DeepMind, you're going to have one fun summer.


Picture: A Probabilistic Programming Language for Scene Perception


Object Detectors Emerge in Deep Scene CNNs

There were lots of great presentation as the Scene Understanding Workshop, and another talk that truly stood out was about a new large-scale dataset (MIT Places) and a thorough investigation of what happens when you train with scenes vs. objects.



Antonio Torralba from MIT gave the talk about the Places Database and an in-depth analysis of what is learned when you train on object-centric databases like ImageNet vs. Scene-scentric databases like  MIT Places. You can check out " Object Detectors Emerge" slides or their  ArXiv paper to learn more. Great work by an upcoming researcher,  Bolei Zhou!

Overheard at CVPR: ArXiv Publishing Frenzy & Baidu Fiasco 


In the long run, the recent trend of  rapidly pushing preprints to  ArXiv.org is great for academic and industry research alike. When you have a large collection of experts exploring ideas at very fast rates, waiting 6 months until the next conference deadline just doesn't make sense.  The only downside is that it makes new CVPR papers feel old. It seems like everybody has already perused the good stuff the day it went up on ArXiv. But you get your "idea claim" without worrying that a naughty reviewer will be influenced by your submission.  Double blind reviewing, get ready for a serious revamp.  We now know who's doing what, significantly before publication time.  Students, publish-or-perish just got a new name. Whether the ArXiv frenzy is a good or a bad thing, is up to you, and probably more a function of your seniority than anything else. But the CV buzz is definitely getting louder and will continue to do so.

The  Baidu cheating scandal might appear to be big news for outsiders just reading the Artificial Intelligence headlines, but overfitting to the testing set is nothing new in Computer Vision. Papers get retracted, grad students often evaluate their algorithms on test sets too many times, and the truth is that nobody's perfect.  When it's important to be #1, don't be surprised that your competition is being naughty. But it's important to realize the difference between ground-breaking research and petty percentage chasing. We all make mistakes, and under heavy pressure, we're all likely to show our weaknesses.  So let's laugh about it.   Let's hire the best of the best, encourage truly great research, and stop chasing percentages. The truth is that a lot of the top performing methods are more similar than different.


Conclusion
CVPR has been constantly growing in attendance. We now have Phd Students, startups, Professors, recruiters, big companies, and even undergraduates coming to the show. Will CVPR become the new SIGGRAPH?

CVPR attendance plot from  Changbo Hu. 


ConvNets are here to stay, but if we want ConvNets to be more than than a mere calculus of shadows, there's still ample work do be done. Geoff Hinton's capsules keep popping up during midnight discussions. " I want to replace unstructured layers with groups of neurons that I call 'capsules' that are a lot more like cortical columns" --  Geoff Hinton during his Reddit AMA. A lot of people (like  Prof. Abhinav Gupta from CMU) are also talking about unsupervised CNN training, and my prediction is that learning large ConvNets from videos without annotations is going to be big at next year's CVPR.

Most importantly, when the titans of Deep Learning get to mention what's wrong with their favorite methods, I only expect the best research to follow. Happy computing and remember, never stop learning.

from: http://www.computervisionblog.com/search/label/deep%20learning

你可能感兴趣的:(2015年深度学习淘金热 The Deep Learning Gold Rush of 2015)