IEEE Spectrum: We read about Deep Learning in the news a lot these days. What’s your least favorite definition of the term that you see in these stories?
- Yann LeCun: My least favorite description is, “It works just like the brain.” I don’t like people saying this because, while Deep Learning gets an inspiration from biology, it’s very, very far from what the brain actually does. And describing it like the brain gives a bit of the aura of magic to it, which is dangerous. It leads to hype; people claim things that are not true. AI has gone through a number of AI winters because people claimed things they couldn’t deliver.
Spectrum: So if you were a reporter covering a Deep Learning announcement, and had just eight words to describe it, which is usually all a newspaper reporter might get, what would you say?
- LeCun: I need to think about this. [Long pause.] I think it would be “machines that learn to represent the world.” That’s eight words. Perhaps another way to put it would be “end-to-end machine learning.” Wait, it’s only five words and I need to kind of unpack this. [Pause.] It’s the idea that every component, every stage in a learning machine can be trained.
Spectrum: Your editor is not going to like that.
- LeCun: Yeah, the public wouldn’t understand what I meant. Oh, okay. Here’s another way. You could think of Deep Learning as the building of learning machines, say pattern recognition systems or whatever, by assembling lots of modules or elements that all train the same way. So there is a single principle to train everything. But again, that’s a lot more than eight words.
Spectrum: What can a Deep Learning system do that other machine learning systems can’t do?
- LeCun: That may be a better question. Previous systems, which I guess we could call “shallow learning systems,” were limited in the complexity of the functions they could compute. So if you want a shallow learning algorithm like a “linear classifier” to recognize images, you will need to feed it with a suitable “vector of features” extracted from the image. But designing a feature extractor “by hand” is very difficult and time consuming.
An alternative is to use a more flexible classifier, such as a “support vector machine” or a two-layer neural network fed directly with the pixels of the image. The problem is that it’s not going to be able to recognize objects to any degree of accuracy, unless you make it so gigantically big that it becomes impractical.
Spectrum: It doesn’t sound like a very easy explanation. And that’s why reporters trying to describe Deep Learning end up saying…
LeCun: …that it’s like the brain.
- LeCun认为深度学习会比普通的线性分类器获得更好的效果,并且以图像为例来进行说明。但是深度学习的可解释性并不是很好,有时shallow learning algorithm 可以获得较好解释性。此外,shallow learning algorithm较为经典,各种实用工具已经存在,能满足基本需求。
- 深度学习和神经网络有关,但是和人脑的概念还是相距很远
- 简要什么是机器学习,后面会再扩展介绍
Spectrum: Part of the problem is that machine learning is a surprisingly inaccessible area to people not working in the field. Plenty of educated lay people understand semi-technical computing topics, like, say, the PageRank algorithm that Google uses. But I’d bet that only professionals know anything detailed about linear classifiers or vector machines. Is that because the field is inherently complicated?
LeCun: Actually, I think the basics of machine learning are quite simple to understand. I’ve explained this to high-school students and school teachers without putting too many of them to sleep.
“Imagine a box with 500 million knobs, 1,000 light bulbs, and 10 million images to train it with. That’s what a typical Deep Learning system is.”A pattern recognition system is like a black box with a camera at one end, a green light and a red light on top, and a whole bunch of knobs on the front. The learning algorithm tries to adjust the knobs so that when, say, a dog is in front of the camera, the red light turns on, and when a car is put in front of the camera, the green light turns on. You show a dog to the machine. If the red light is bright, don’t do anything. If it’s dim, tweak the knobs so that the light gets brighter. If the green light turns on, tweak the knobs so that it gets dimmer. Then show a car, and tweak the knobs so that the red light get dimmer and the green light gets brighter. If you show many examples of the cars and dogs, and you keep adjusting the knobs just a little bit each time, eventually the machine will get the right answer every time.
The interesting thing is that it may also correctly classify cars and dogs it has never seen before. The trick is to figure out in which direction to tweak each knob and by how much without actually fiddling with them. This involves computing a “gradient,” which for each knob indicates how the light changes when the knob is tweaked.
Now, imagine a box with 500 million knobs, 1,000 light bulbs, and 10 million images to train it with. That’s what a typical Deep Learning system is.
Spectrum: I assume that you use the term “shallow learning” somewhat tongue-in-cheek; I doubt people who work with linear classifiers consider their work “shallow.” Doesn’t the expression “Deep Learning” have an element of PR to it, since it implies that what is “deep” is what is being learned, when in fact the “deep” part is just the number of steps in the system?
- LeCun: Yes, it is a bit facetious, but it reflects something real: shallow learning systems have one or two layers, while deep learning systems typically have five to 20 layers. It is not the learning that is shallow or deep, but the architecture that is being trained.
- 解释了下深度学习的概念,感觉这个记者还是很地道的,确实很多人不知道机器学习,深度学习,即使像pagerank这样的算法原理也不知道。
- LeCun也说了:shallow learning algorithm is not shallow, it’s the architecture that is being trained is shallow.
Spectrum: The standard Yann LeCun biography says that you were exploring new approaches to neural networks at a time when they had fallen out of favor. What made you ignore the conventional wisdom and keep at it?
- LeCun: I have always been enamored of the idea of being able to train an entire system from end to end. You hit the system with essentially raw input, and because the system has multiple layers, each layer will eventually figure out how to transform the representations produced by the previous layer so that the last layer produces the answer. This idea—that you should integrate learning from end to end so that the machine learns good representations of the data—is what I have been obsessed with for over 30 years.
Spectrum: Is the work you do “hacking,” or is it science? Do you just try things until they work, or do you start with a theoretical insight?
LeCun: It’s very much an interplay between intuitive insights, theoretical modeling, practical implementations, empirical studies, and scientific analyses. The insight is creative thinking, the modeling is mathematics, the implementation is engineering and sheer hacking, the empirical study and the analysis are actual science. What I am most fond of are beautiful and simple theoretical ideas that can be translated into something that works.
I have very little patience for people who do theory about a particular thing simply because it’s easyvery little patience—particularly if they dismiss other methods that actually work empirically, just because the theory is too difficult. There is a bit of that in the machine learning community. In fact, to some extent, the “Neural Net Winter” during the late 1990s and early 2000s was a consequence of that philosophy; that you had to have ironclad theory, and the empirical results didn’t count. It’s a very bad way to approach an engineering problem.
“What I am most fond of are beautiful and simple theoretical ideas that can be translated into something that works.”
- But there are dangers in the purely empirical approach too. For example, the speech recognition community has traditionally been very empirical, in the sense that the only stuff that’s being paid attention to is how well you are doing on certain benchmarks. And that stifles creativity, because to get to the level where if you want to beat other teams that have been at it for years, you need to go underground for four or five years, building your own infrastructure. That’s very difficult and very risky, and so nobody does it. And so to some extent with the speech recognition community, the progress has been continuous but very incremental, at least until the emergence of Deep Learning in the last few years.
Spectrum: You seem to take pains to distance your work from neuroscience and biology. For example, you talk about “convolutional nets,” and not “convolutional neural nets.” And you talk about “units” in your algorithms, and not “neurons.”
- LeCun: That’s true. Some aspects of our models are inspired by neuroscience, but many components are not at all inspired by neuroscience, and instead come from theory, intuition, or empirical exploration. Our models do not aspire to be models of the brain, and we don’t make claims of neural relevance. But at the same time, I’m not afraid to say that the architecture of convolutional nets is inspired by some basic knowledge of the visual cortex. There are people who indirectly get inspiration from neuroscience, but who will not admit it. I admit it. It’s very helpful. But I’m very careful not to use words that could lead to hype. Because there is a huge amount of hype in this area. Which is very dangerous.
- 关于第一个问题忍不住吐曹一番:主流学术界那时不搞神经网络现在又开始搞,趋利本质无疑。想想first principle 就行。“I think it’s important to reason from first principles rather than by analogy. The normal way we conduct our lives is we reason by analogy. [With analogy] we are doing this because it’s like something else that was done, or it is like what other people are doing. [With first principles] you boil things down to the most fundamental truths…and then reason up from there.”
后面继续,休息一下。。
本Markdown编辑器使用StackEdit修改而来,用它写博客,将会带来全新的体验哦:
- 加粗
Ctrl + B
- 斜体
Ctrl + I
- 引用
Ctrl + Q
- 插入链接
Ctrl + L
- 插入代码
Ctrl + K
- 插入图片
Ctrl + G
- 提升标题
Ctrl + H
- 有序列表
Ctrl + O
- 无序列表
Ctrl + U
- 横线
Ctrl + R
- 撤销
Ctrl + Z
- 重做
Ctrl + Y
Markdown 是一种轻量级标记语言,它允许人们使用易读易写的纯文本格式编写文档,然后转换成格式丰富的HTML页面。 —— [ 维基百科 ]
使用简单的符号标识不同的标题,将某些文字标记为粗体或者斜体,创建一个链接等,详细语法参考帮助?。
本编辑器支持 Markdown Extra , 扩展了很多好用的功能。具体请参考Github.
Markdown Extra 表格语法:
项目 | 价格 |
---|---|
Computer | $1600 |
Phone | $12 |
Pipe | $1 |
可以使用冒号来定义对齐方式:
项目 | 价格 | 数量 |
---|---|---|
Computer | 1600 元 | 5 |
Phone | 12 元 | 12 |
Pipe | 1 元 | 234 |
定义 D
定义D内容
代码块语法遵循标准markdown代码,例如:
@requires_authorization
def somefunc(param1='', param2=0):
'''A docstring'''
if param1 > param2: # interesting
print 'Greater'
return (param2 - param1 + 1) or None
class SomeClass:
pass
>>> message = '''interpreter ... prompt'''
生成一个脚注1.
用 [TOC]
来生成目录:
使用MathJax渲染LaTex 数学公式,详见math.stackexchange.com.
更多LaTex语法请参考 这儿.
可以渲染序列图:
或者流程图:
即使用户在没有网络的情况下,也可以通过本编辑器离线写博客(直接在曾经使用过的浏览器中输入write.blog.csdn.net/mdeditor即可。Markdown编辑器使用浏览器离线存储将内容保存在本地。
用户写博客的过程中,内容实时保存在浏览器缓存中,在用户关闭浏览器或者其它异常情况下,内容不会丢失。用户再次打开浏览器时,会显示上次用户正在编辑的没有发表的内容。
博客发表后,本地缓存将被删除。
用户可以选择 把正在写的博客保存到服务器草稿箱,即使换浏览器或者清除缓存,内容也不会丢失。
注意:虽然浏览器存储大部分时候都比较可靠,但为了您的数据安全,在联网后,请务必及时发表或者保存到服务器草稿箱。