Coursera | Andrew Ng (03-week2-2.10)—是否要使用端到端的深度学习

该系列仅在原课程基础上部分知识点添加个人学习笔记,或相关推导补充等。如有错误,还请批评指教。在学习了 Andrew Ng 课程的基础上,为了更方便的查阅复习,将其整理成文字。因本人一直在学习英语,所以该系列以英文为主,同时也建议读者以英文为主,中文辅助,以便后期进阶时,为学习相关领域的学术论文做铺垫。- ZJ

Coursera 课程 |deeplearning.ai |网易云课堂


转载请注明作者和出处:ZJ 微信公众号-「SelfImprovementLab」

知乎:https://zhuanlan.zhihu.com/c_147249273

CSDN:http://blog.csdn.net/junjun_zhao/article/details/79184944


2.10 Whether to use end-to-end learning (是否要使用端到端的深度学习)

(字幕来源:网易云课堂)

Coursera | Andrew Ng (03-week2-2.10)—是否要使用端到端的深度学习_第1张图片

Let’s say in building a machine learning system you’re trying to decide whether or not to use an end-to-end approach.Let’s take a look at some of the pros and cons of end-to-end deep learning so that you can come away with some guidelines on whether or not an end-to-end approach seems promising for your application.Here are some of the benefits of applying end-to-end learning.First is that end-to-end learning really just lets the data speak.So if you have enough x, y data then whatever is the most appropriate function mapping from x to y, if you train a big enough neural network, hopefully the neural network will figure it out.And by having a pure machine learning approach, your neural network learning input from x to y may be more able to capture whatever statistics are in the data, rather than being forced to reflect human preconceptions.So for example, in the case of speech recognition earlier speech systems had this notion of a phoneme which was a basic unit of sound like C, A, and T for the word cat.And I think that phonemes are an artifact created by human linguists.I actually think that phonemes are a fantasy of linguists that are a reasonable description of language, but it’s not obvious that you want to force your learning algorithm to think in phonemes.And if you let your learning algorithm learn whatever representation it wants to learn rather than forcing your learning algorithm to use phonemes as a representation, then its overall performance might end up being better.The second benefit to end-to-end deep learning is that there’s less hand designing of components needed.And so this could also simplify your design work flow, that you just don’t need to spend a lot of time hand designing features, hand designing these intermediate representations.

Coursera | Andrew Ng (03-week2-2.10)—是否要使用端到端的深度学习_第2张图片

假设你正在搭建一个机器学习系统,你要决定是否使用端对端方法,我们来看看端到端深度学习的一些优缺点,这样你就可以根据一些准则,判断你的应用程序是否有希望使用端到端方法,这里是应用端到端学习的一些好处,首先端到端学习真的只是让数据说话,所以如果你有足够多的 x,y 数据,那么不管从 x 到 y 最适合的函数映射是什么,如果你训练一个足够大的神经网络,希望这个神经网络能自己搞清楚,而使用纯机器学习方法,直接从 x 到 y 输入去训练的神经网络,可能更能够捕获数据中的任何统计信息,而不是被迫引入人类的成见。例如 在语音识别领域,早期的识别系统有这个音位概念,就是基本的声音单元 如 cat 单词的 C A T,我觉得这个音位是人类语言学家生造出来的,我实际上认为音位其实是语音学家的幻想,用音位描述语言也还算合理,但是不要强迫你的学习算法以音位为单位思考 这点有时没那么明显,如果你让你的学习算法,学习它想学习的任意表示方式,而不是强迫你的学习算法使用音位作为表示方式,那么其整体表现可能会更好,端到端深度学习的第二个好处就是这样所需手工设计的组件更少。所以这也许能够简化你的设计工作流程,你不需要花太多时间去手工设计功能,手工设计这些中间表示方式。

How about the disadvantages.Here are some of the cons.First, it may need a large amount of data.So to learn this x to y mapping directly, you might need a lot of data of x, y and we were seeing in a previous video some examples of where you could obtain a lot of data for subtasks.Such as for face recognition, we could find a lot data for finding a face in the image, as well as identifying the face once you found a face, but there was just less data available for the entire end-to-end task.So x, this is the input end of the end-to-end learning and y is the output end.And so you need a lot of the data x y with both the input end and the output end in order to train these systems, and this is why we call it end-to-end learning as well because you’re learning a direct mapping from one end of the system all the way to the other end of the system.The other disadvantage is that it excludes potentially useful hand designed components.So machine learning researchers tend to speak disparagingly of hand designing things.But if you don’t have a lot of data, then your learning algorithm doesn’t have that much insight it can gain from your data if your training set is small.And so hand designing a component can really be a way for you to inject manual knowledge into the algorithm, and that’s not always a bad thing.

Coursera | Andrew Ng (03-week2-2.10)—是否要使用端到端的深度学习_第3张图片

那么缺点呢? 这里有一些缺点,首先 它可能需要大量的数据,要直接学到这个 x 到 y 的映射,你可能需要大量 x,y 数据,我们在以前的视频里看过一个例子,其中你可以收集大量子任务数据,比如人脸识别,我们可以收集很多数据 用来分辨图像中的人脸,当你找到一张脸后 也可以找得到很多人脸识别数据,但是对于整个端到端任务 可能只有更少的数据可用,所以 x 这是端到端学习的输入端 y 是输出端。所以你需要很多这样的 xy 数据,在输入端和输出端都有数据 这样可以训练这些系统,这就是为什么我们称之为端到端学习,因为你直接学习出从系统的一端,到系统的另一端另一个缺点是 它排除了可能有用的手工设计组件,机器学习研究人员一般都很鄙视手工设计的东西,但如果你没有很多数据,你的学习算法,就没办法从很小的训练集数据中获得洞察力,所以手工设计组件在这种情况,可能是把人类知识直接注入算法的途径,这总不是一件坏事。

I think of a learning algorithm as having two main sources of knowledge.One is the data and the other is whatever you hand design, be it components, or features, or other things.And so when you have a ton of data it’s less important to hand design things but when you don’t have much data, then having a carefully hand-designed system can actually allow humans to inject a lot of knowledge about the problem into an algorithm and that should be very helpful.So one of the downsides of end-to-end deep learning is that it excludes potentially useful hand-designed components.And hand-designed components could be very helpful if well designed.They could also be harmful if it really limits your performance, such as if you force an algorithm to think in phonemes when maybe it could have discovered a better representation by itself.So it’s kind of a double edged sword that could hurt or help but it does tend to help more, hand-designed components tend to help more when you’re training on a small training set.

Coursera | Andrew Ng (03-week2-2.10)—是否要使用端到端的深度学习_第4张图片

我觉得学习算法有两个主要的知识来源,一个是数据 另一个是你手工设计的任何东西,可能是组件 功能 或者其他东西。所以当你有成吨数据时,手工设计的东西就不太重要了 但是当你没有太多的数据时,构造一个精心设计的系统,实际上可以将人类对这个问题的很多认识直接注入到问题里,进入算法里 应该挺有帮助的,所以端到端深度学习的弊端之一是,它把可能有用的人工设计的组件排除在外了,精心设计的人工组件可能非常有用,但它们也有可能真的伤害到你的算法表现,例如 强制你的算法以音位为单位思考,也许让算法自己找到更好的表示方法更好,所以这是一把双刃剑,可能有坏处 可能有好处 但往往好处更多,手工设计的组件往往在,训练集更小的时候帮助更大。

So if you’re building a new machine learning system and you’re trying to decide whether or not to use end-to-end deep learning,I think the key question is, do you have sufficient data to learn the function of the complexity needed to map from x to y?I don’t have a formal definition of this phrase, complexity needed, but intuitively, if you’re trying to learn a function from x to y, that is looking at an image like this and recognizing the position of the bones in this image, then maybe this seems like a relatively simple problem to identify the bones of the image and maybe they won’t need that much data for that task.Or given a picture of a person, maybe finding the face of that person in the image doesn’t seem like that hard a problem, so maybe you don’t need too much data to find the face of a person.Or at least maybe you can find enough data to solve that task, whereas in contrast, the function needed to look at the hand and map that directly to the age of the child, that seems like a much more complex problem that intuitively maybe you need more data to learn if you were to apply a pure end-to-end deep learning approach.

Coursera | Andrew Ng (03-week2-2.10)—是否要使用端到端的深度学习_第5张图片

如果你在构建一个新的机器学习系统 而你在尝试,决定是否使用端到端深度学习。我认为关键的问题是,你有足够的数据,能够直接学到从 x 映射到 y 足够复杂的函数吗?我还没有正式定义过这个词 必要复杂度,但直觉上 如果你想从 x 到 y 的数据学习出一个函数,就是看着这样的图像,识别出图像中所有骨头的位置,那么也许这像是,识别图中骨头这样相对简单的问题,也许系统不需要那么多数据来学会处理这个任务,或给出一张人物照片,也许在图中把人脸找出来不是什么难事,所以你也许不需要太多数据去找到人脸,或者至少你可以找到足够数据去解决这个问题 相对来说,把手的 x 射线照片直接映射到孩子的年龄 直接去找这种函数,直觉上似乎是更为复杂的问题,如果你用纯端到端方法 需要很多数据去学习。

So let me finish this video with a more complex example.you may know that I’ve been spending time helping out an autonomous driving company, Drive.ai.So I’m actually very excited about autonomous driving.So how do you build a car that drives itself?Well, here’s one thing you could do, and this is not an end-to-end deep learning approach.you can take as input an image of what’s in front of your car, maybe radar, LIDAR other sensor readings as well, but to simplify the description, let’s just say you take a picture of what’s in front or what’s around your car.And then to drive your car safely you need to detect other cars and you also need to detect pedestrians.you need to detect other things, of course, but we’ll just present a simplified example here.Having figured out where are the other cars and pedestrians, you then need to plan your own route.So in other words, if you see where are the other cars, where are the pedestrians, you need to decide how to steer your own car, what path to steer your own car for the next several seconds.And having decided that you’re going to drive a certain path, maybe this is a top down view of a road and that’s your car.Maybe you’ve decided to drive that path, that’s what a route is, then you need to execute this by generating the appropriate steering, as well as acceleration and braking commands.

Coursera | Andrew Ng (03-week2-2.10)—是否要使用端到端的深度学习_第6张图片

视频最后我讲一个更复杂的例子,你可能知道我一直在花时间帮忙,主攻无人驾驶技术的公司 drive.ai,无人驾驶技术的发展其实让我相当激动,你怎么造出一辆自己能行驶的车呢? 好 这里你可以做一件事,这不是端到端的深度学习方法,你可以把你车前方的雷达 激光雷达,或者其他传感器的读数 看成是输入图像,但是为了说明起来简单,我们就说拍一张车前方或者周围的照片,然后驾驶要安全的话 你必须能检测到附近的车,你也需要检测到行人,你需要检测其他的东西 当然,我们这里提供的是高度简化的例子,弄清楚其他车和形如的位置之后,你就需要计划你自己的路线。所以换句话说,当你看到其他车子在哪,行人在哪里 你需要决定如何摆方向盘,在接下来的几秒钟内 引导车子的路径,如果你决定了要走特定的路径,也许这是道路的俯视图 这是你的车,也许你决定了要走那条路线,这是一条路线,那么你就需要摆动你的方向盘到合适的角度,还要发出合适的加速和制动指令。

So in going from your image or your sensory inputs to detecting cars and pedestrians, that can be done pretty well using deep learning, but then having figured out where the other cars and pedestrians are going, to select this route to exactly how you want to move your car, usually that’s not to done with deep learning.Instead that’s done with a piece of software called Motion Planning.And if you ever take a course in robotics you’ll learn about motion planning.And then having decided what’s the path you want to steer your car through, there’ll be some other algorithm, we’re going to say it’s a control algorithm that then generates the exact decision, that then decides exactly how much to turn the steering wheel and how much to step on the accelerator or step on the brake.So I think what this example illustrates is that you want to use machine learning or use deep learning to learn some individual components and when applying supervised learning you should carefully choose what types of x to y mappings you want to learn depending on what task you can get data for.And in contrast, it is exciting to talk about a pure end-to-end deep learning approach where you input the image and directly output a steering.But given data availability and the types of things we can learn with neural networks today, this is actually not the most promising approach or this is not an approach that I think teams have gotten to work best.And I think this pure end-to-end deep learning approach is actually less promising than more sophisticated approaches like this, given the availability of data and our ability to train neural networks today.

Coursera | Andrew Ng (03-week2-2.10)—是否要使用端到端的深度学习_第7张图片

所以从传感器或图像输入 到检测行人和车辆,深度学习可以做得很好,但一旦知道其他车辆和行人的位置 或者动向,选择一条车要走的路,这通常用的不是深度学习,而是用所谓的运动规划软件完成的,如果你学过机器人课程 你一定知道运动规划,然后决定了你的车子要走的路径之后,还会有一些其他算法,我们说这是一个控制算法 可以产生精确的决策,确定方向盘应该精确地转多少度,油门或刹车上应该用多少力。所以这个例子就表明了,如果你想使用机器学习或者深度学习,来学习某些单独的组件,那么当你应用监督学习时 你应该,仔细选择要学习的 x 到 y 映射类型,这取决于那些任务你可以收集数据,相比之下 空谈纯端到端深度学习方法,是很激动人心的,你输入图像 直接得出方向盘转角,但是就目前能收集到的数据而言,还有我们今天能够用神经网络学习的数据类型而言,这实际上不是最有希望的方法,或者说这个方法并不是团队想出的最好用的方法,而我认为这种纯粹的端到端深度学习方法,其实前景不如这样更复杂的多步方法,因为目前能收集到的数据 还有我们现在训练神经网络的能力 是有局限的。

So that’s it for end-to-end deep learning.It can sometimes work really well but you also have to be mindful of where you apply end-to-end deep learning.Finally, thank you and congrats on making it this far with me.If you finish last week’s videos and this week’s videos then I think you will already be much smarter and much more strategic and much more able to make good prioritization decisions in terms of how to move forward on your machine learning project, even compared to a lot of machine learning engineers and researchers that I see here in Silicon Valley.So congrats on all that you’ve learned so far andI hope you now also take a look at this week’s homework problems which should give you another opportunity to practice these ideas and make sure that you’re mastering them.

这就是端到端的深度学习。有时候效果拔群,但你也要注意应该在什么时候使用端到端深度学习,最后 谢谢你 恭喜你坚持到现在,如果你学完了上周的视频和本周的视频,那么我认为你已经变得更聪明 更具战略性,并能够做出更好的优先分配任务的决策,更好地推动你的机器学习项目,也许比很多机器学习工程师,还和我在硅谷看到的研究人员都强,所以恭喜你学到这里,我希望你能看看本周的作业,应该能再给你一个机会去实践这些理念,并确保你掌握它们。


重点总结:

端到端深度学习

定义:

相对于传统的一些数据处理系统或者学习系统,它们包含了多个阶段的处理过程,而端到端的深度学习则忽略了这些阶段,用单个神经网络来替代。

语音识别例子:

在少数据集的情况下传统的特征提取方式可能会取得好的效果;如果在有足够的大量数据集情况下,端到端的深度学习会发挥巨大的价值。

Coursera | Andrew Ng (03-week2-2.10)—是否要使用端到端的深度学习_第8张图片

优缺点:

优点:

  • 端到端学习可以直接让数据“说话”;
  • 所需手工设计的组件更少。

缺点:

  • 需要大量的数据;
  • 排除了可能有用的手工设计组件。

应用端到端学习的 Key question:是否有足够的数据能够直接学习到从 x 映射到 y 的足够复杂的函数。

参考文献:

[1]. 大树先生.吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(3-2)– 机器学习策略(2)


PS: 欢迎扫码关注公众号:「SelfImprovementLab」!专注「深度学习」,「机器学习」,「人工智能」。以及 「早起」,「阅读」,「运动」,「英语 」「其他」不定期建群 打卡互助活动。

Coursera | Andrew Ng (03-week2-2.10)—是否要使用端到端的深度学习_第9张图片

你可能感兴趣的:(深度学习,深度学习,吴恩达)