该系列仅在原课程基础上部分知识点添加个人学习笔记,或相关推导补充等。如有错误,还请批评指教。在学习了 Andrew Ng 课程的基础上,为了更方便的查阅复习,将其整理成文字。因本人一直在学习英语,所以该系列以英文为主,同时也建议读者以英文为主,中文辅助,以便后期进阶时,为学习相关领域的学术论文做铺垫。- ZJ
Coursera 课程 |deeplearning.ai |网易云课堂
转载请注明作者和出处:微信公众号-「SelfImprovementLab」
知乎:https://zhuanlan.zhihu.com/c_147249273
CSDN:http://blog.csdn.net/JUNJUN_ZHAO/article/details/78799587
:http://www.jianshu.com/p/a66a9eec487f
为什么深度学习会兴起?
If the basic technical idea is behind deep learning behind neural networks have been around for decades,why are they only just now taking off?
in this video let's go over some of the main drivers behind the rise of deep learning ,because I think this will help you better spot the best opportunities within your own organization to apply these to.
如果在 深度学习 和神经网络背后的基本技术理念已经有好几十年了,为什么现在才突然流行起来呢?在这个视频,来看一些让深度学习流行起来的主要因素,这将会帮助你在自己的组织中,发现好机会来应用这些东西。
over the last few years a lot of people have asked me "Andrew,why is deep learning suddenly working so well?" ,and when I answer the question ,this is usually the picture I draw for them.let's say we plot a figure where on the horizontal axis /'æksɪs/,we plot the amount of data we have for a task ,and let's say on the vertical axis we plot the performance on above ,learning algorithms such as the accuracy of our spam classifier ,or our ad click predictor,or the accuracy of our neural net for figuring out the position of other cars for our self-driving car.
在过去的几年里,很多人问我 "Andrew 为什么深度学习突然这么厉害了?"我回答的时候,通常给他们画个图 。画一条水平轴上,代表完成任务的数据数量,垂直轴代表机器学习算法的性能,比如,垃圾邮件过滤的准确率、广告点击预测的准确率、用于无人驾驶中判断其他车辆位置的神经网络的准确率。
it turns out if you plot the performance of a traditional learning algorithm like support vector machine,or logistic regression as a function of the amount of data you have,you might get a curve that looks like this,where the performance improves for a while as you add more data,but after a while the performance you know pretty much plateaus right suppose your horizontal lines enjoy that very well,you know was it they didn't know what to do with huge amounts of data,
根据图像可以发现,把传统机器学习算法的表现,比如说支持向量机,或 logistic 回归,作为数据量的一个函数,你可能得到这样的曲线 ,它的性能一开始增加数据时会上升,但是一段时间之后,它的性能进入平台期,假设水平轴拉的很长很长,那是因为这些模型无法处理海量数据,
and what happened in our society over the last 20 years ,maybe is that for a lot of problems we went from ,having a relatively small amount of data,to having you know often a fairly large amount of data ,and all of this was thanks to the digitization of a society,where so much human activity is now in the digital realm,we spend so much time on the computers on websites on mobile apps,and activities on digital devices creates data,and thanks to the rise of inexpensive cameras,built into our cell phones accelerometers,all sorts of sensors in the Internet of Things.we also just have been collecting more and more and more data.
而过去 20 年在我们的社会中,我们遇到的很多问题,早期只有相对较少的数据量,多亏了数字化社会,现在收集海量数据轻而易举。我们人类花了很多时间在数字王国中,在电脑上、在网站上 在手机软件上,数字设备上的活动都能创造数据,同时也归功于便宜的相机,被内置到移动电话,还有加速仪,以及物联网中的各种传感器,我们收集到了越来越多的数据。
so over the last 20 years for a lot of applications we just accumulate a lot more data,more than traditional learning algorithms,were able to effectively take advantage of,and what neural network lead turns out that,if you train a small neural net,then this performance maybe looks like that,if you train a somewhat larger neural net,that's called as a medium-sized neural net to performance in something a little bit better,and if you train a very large neural net,then it's the form and often just keeps getting better and better.so couple of observations, one is if you want to hit this very high level of performance then you need two things.
过去 20年,很多应用中我们收集到了大量的数据,远超过传统学习算法能发挥作用的规模,神经网络模型的话,我们发现,如果你训练一个小型的神经网络,那么性能可能会像这样,训练一个稍微大一点的神经网络,一个中等规模的神经网络,性能表现也会更好一些,如果你训练一个非常大的神经网络,性能就会是这样,还会越来越好,注意到两点,一点是,如果你想达到这么高的性能水平有两个条件。
first often you need to be able to train a big enough neural network,in order to take advantage of the huge amount of data,and second you need to be out here on the x axes /'æksiːz/,you do need a lot of data,so we often say that scale has been driving deep learning progress,and by scale I mean both of the size of neural network we need just a neural network,with a lot of hidden units, a lot of parameters, a lot of connections as well as scale of the data.
第一个是需要训练一个规模足够大的神经网络,以发挥数据规模量巨大的优点,另外,要到 x 轴的这个位置需要很多的数,因此我们经常说,规模一直在推动深度学习的进步。说到“规模” 我指的不仅是神经网络的规模,我们需要一个有许多隐藏单元的神经网络,有许多的参数,许多的连接,而且还有数据“规模”。
in fact,today one of the most reliable ways to get better performance in the neural network is often to either train a bigger network or throw more data at it,and that only works up to a point,because eventually you run out of data or eventually then your network is so big that it takes too long to train,but just improving scale has actually taken us a long way in the world of deep learning,in order to make this diagram a bit more technically precise and just add a few more things.I wrote the amount of data on the x-axis technically this is amount of labeled data where by labeled data I mean training examples we have both the input x and the label y.
事实上,要在神经网络上获得更好的表现,在今天最可靠的手段往往就是要么训练一个更大的神经网,要么投入更多的数据,这只能在一定程度上起作用。因为最终你耗尽了数据,或者最终你的网络规模太大,需要的训练时间太久,但提升规模已经让我们在深度学习的世界中获得了大量进展,为了使这个图从技术上更准确一点,我还要加点说明,我在x轴下面已经写明了的数据量,技术点说,这是“带标签的数据”量,带标签的数据,在训练样本时,我们有输入 x 和标签 y 。
I went to introduce a little bit of notation ,that we'll use later in this course ,we're going to use lowercase alphabet m ,to denote (表示) the size of my training sets,or the number of training examples ,this lowercase m,so that's the horizontal axis ,couple other details to this figure,in this regime of smaller training sets,the relative ordering of the algorithms,is actually not very well defined.so,if you don't have a lot of training data is often up to your skill at hand engineering features,that determines the performance.
我介绍一点符号约定,这在后面的课程中都会用到,我们使用小写的字母 m,表示训练集的规模,或者说训练样本的数量,这个小写字母 m,这就是水平轴图像,还有其他细节,训练集不大的这一块来说,各种算法的性能相对排名并不是很确定,训练集不大的时候效果会取决于你手工设计的组件,会决定最终的表现。
so it's quite possible that,if someone training an SVM,is more motivated to hand engineer features and someone training even larger than that ,may be in this small training set regime, the SVM could do better,so you know in this region to the left of the figure,the relative ordering between the algorithms is not that well defined,and performance depends much more on your skill at hand engineer features ,and other normal details of the algorithms,and there's only in this some big data regions very large training sets ,very large M regions in the right,that we more consistently see largely nerual nets,dominating the other approaches and so,if any of your friends ask you,why are neural net as you know taking off,I would encourage you to draw this picture for them as well.
因此很有可能假设有些人训练出了一个 SVM,可能是因为手工设计组件很厉害,有些人训练的规模会大一些,可能训练集不大的时候,SVM表现更好,在这个图形区域的左边,各种算法之间优劣并不是定义得很明确,最终的性能更多取决于手工设计组件的技能以及算法处理方面的一些细节,只有在大数据领域,非常庞大的训练集,也就是在右边 m 非常大时,我们才能见到神经网络稳定地领先其它算法,如果某个朋友问你为什么神经网络这么流行,我鼓励你也给他们画这样一个图像。
so I will say that,in the early days in their modern rise of deep learning,it was scaled data and scale of computation,just our ability to train very large neural networks,either on a CPU or GPU,that enabled us to make a lot of progress,but increasingly especially in the last several years,we've seen tremendous algorithmic innovation as well,so I also don't want to understate that,interestingly many of the algorithmic innovations,have been about trying to make neural networks run much faster,so as a concrete example ,one of the huge breakthroughs in neural networks has been switching from a sigmoid function,which looks like this to a ReLU function,which we talked about briefly in an early video.
可以这么说,在深度学习崛起的初期,是数据和计算能力规模的进展,训练一个特别大的神经网络的能力,无论是在 CPU 还是 GPU 上,是这些发展才让我们取得了巨大的进步。但是渐渐地,尤其是在最近这几年,我们也见证了算法方面的极大创新,我也不想忽略算法方面的巨大贡献。有趣的是,许多算法方面的创新,都为了让神经网络运行得更快。举一个具体的例子,神经网络方面的一个巨大突破是 ,从 sigmoid 函数,转换到这样的 ReLU 函数,这个函数我们在之前的视频里提到过。
that looks like this,if you don't understand the details of what about to say,don't worry about it,but it turns out that one of the problems of using sigmoid functions in machine learning is that,these regions here where the slope of the function,the gradient is nearly zero,and so learning becomes really slow,because when you implement gradient descent and gradient is zero,the parameters just change very slowly,and so learning is very slow,whereas by changing the what's called the activation function,the neural network to use this function,called the ReLU function of the rectified linear unit ReLU.
形状就像这样,如果你无法理解刚才我说的某个细节也不需要担心,但使用 sigmoid 函数,机器学习问题是在这个区域 sigmoid 函数的斜率,梯度会接近 0,所以学习会变得非常缓慢,因为用梯度下降法时,梯度接近0时,参数会变化得很慢,学习也会变得很慢,而通过改变激活函数,神经网络用这个函数,修正线性单元 ReLU。
the gradient is equal to one,for all positive values of input right,and so the gradient is much less likely to gradually shrink to zero,and the gradient here the slope of this line,is zero on the left,but it turns out that just by switching to the sigmoid function to the ReLU function,has made an algorithm called gradient descent work much faster,and so this is an example,of maybe relatively simple algorithm innovation,but ultimately the impact of this algorithmic innovation,was it really help computation,so there remains quite a lot of examples like this,of where we change the algorithm,because it allows that code to run much faster,and this allows us to train bigger neural networks,or to do so within reasonable amount of time,even when we have a large network with a lot of data,the other reason that fast computation is important is that,it turns out the process of training your network,it is very intuitive often. you have an idea for a neural network architecture,and so you implement your idea and code.
它的梯度,对于所有为正值的输入,输出都是1,因此梯度不会逐渐趋向 0,而这里的梯度,这条线的斜率,在这左边是 0,我们发现,只需将 sigmod 函数转换成ReLU函数,便能够使得“梯度下降法”运行得更快,这就是一个例子,有点简单的算法创新的例子,但是最终算法创新所带来的影响,是增加计算速度,有很多像这样的例子,我们通过改变算法,使得代码运行得更快,这也使得我们能够训练规模更大的神经网络,或者在合理的时间内完成计算,即使在数据量很大,网络也很大的场合,快速计算很重要的另一个原因是,训练神经网络的过程,很多时候是凭直觉的,你有了新想法,关于神经网络架构的想法,然后你写代码实现你的想法。
Implementing your idea then lets you run an experiment,which tells you how well your neural network does,and then by looking at it you go back to change the details of your neural network,and then you go around this circle over and over,and when your neural network takes a long time to train,it just takes a long time to go around this cycle,and there's a huge difference in your productivity building effective neural networks,when you can have an idea and try it,and see the work in ten minutes,or maybe almost a day worthes if you've to train your neural network for a month,which sometimes does happened,because you get a result back you know in ten minutes or maybe in a day,you should just try a lot more ideas,and be much more likely to discover a neural network,and it works well for your application,and so faster computation has really helped,
你实现自己的想法,然后跑一下实验,可以告诉你,你的神经网络效果有多好,知道结果之后再回去改,去改你的神经网络中的一些细节,然后你不断重复这个循环,当你的神经网络需要很长时间去训练,需要很长时间才能走一圈循环的话,在实现神经网络时迭代速度对你的效率影响巨大,如果你有一个想法直接去试,10 分钟后就能看到结果,或者最多花上一天。如果你训练你的神经网络 用了一个月的时间,有时候确实需要那么久,如果你能很快得到结果,比如 10 分钟或者一天内,你就可以尝试更多的想法,那你就很可能发现,适合你的应用的神经网络,所以计算速度的提升,真的有帮助提高迭代速度。
in terms of speeding up the rate,at which you can get an experimental result back,and this has really helped both practitioners of neuro networks,as well as researchers working at deep learning iterate much faster,and improve your ideas much faster,and so all this has also been a huge boom,to the entire deep learning research community,which has been incredible,inventing new algorithms and making nonstop progress on that front,so these are some of the forces powering the rise of deep learning,but the good news is that,these forces are still working powerfully,to make deep learning even better.
让你更快地得到实验结果,这也同时帮助了神经网络的从业人员,和有关项目的研究人员在深度学习的工作中迭代得更快,也能够更快地改进你的想法,所有这些都极大推动了整个深度学习社区的研究,快到令人难以置信,人们一直在发明新的算法,持续不断地进步,是这些力量支持了深度学习的崛起,但好消息是,这些力量还在不断发挥作用,让深度学习更进一步。
Take Data. society is still throwing up one more digital data,or take computation with the rise of specialized hardware like GPUs,and faster networking many types of hardware.I'm actually quite confident,that our ability to build very large neural networks,or should a computation point of view will keep on getting better,and take algorithms, wo hope deep learning research communities though continuously phenomenal at innovating on the algorithms front,so because of this I think that we can be optimistic answer the optimistic, the deep learning will keep on getting better for many years to come,so that let's go on to the last video of the section,where we'll talk a little bit more,about what you learn from this course.
我们看数据,我们的社会还在产生更多的数字化数据,我们看计算 GPU 这类专用硬件还在继续发展,网络速度更快,各种硬件更快,我很有信心,我们实现超级大规模神经网络的能力,或者从计算能力这个角度看 ,也在继续进展,我们看算法,我希望深度学习研究社区,能在算法上持续创新,基于这些,我们可以乐观地回答,深度学习,还会继续进步很多年,让我们继续看最后一个课程视频,我们会谈到,通过这门课你能学到什么。
PS: 欢迎扫码关注公众号:「SelfImprovementLab」!专注「深度学习」,「机器学习」,「人工智能」。以及 「早起」,「阅读」,「运动」,「英语 」「其他」不定期建群 打卡互助活动。