该系列仅在原课程基础上部分知识点添加个人学习笔记,或相关推导补充等。如有错误,还请批评指教。在学习了 Andrew Ng 课程的基础上,为了更方便的查阅复习,将其整理成文字。因本人一直在学习英语,所以该系列以英文为主,同时也建议读者以英文为主,中文辅助,以便后期进阶时,为学习相关领域的学术论文做铺垫。- ZJ
Coursera 课程 |deeplearning.ai |网易云课堂
转载请注明作者和出处:ZJ 微信公众号-「SelfImprovementLab」
知乎:https://zhuanlan.zhihu.com/c_147249273
CSDN:http://blog.csdn.net/JUNJUN_ZHAO/article/details/79061376
1.3 Basic “recipe” for machine learning (机器学习基础)
(字幕来源:网易云课堂)
In the previous video,you saw how looking at training error and depth error can help you diagnose whether your algorithm has a bias or a variance problem, or maybe both .It turns out that this information that lets you much more systematically using what they call a basic recipe for machine learning and lets you much more systematically go about improving your algorithms’ performance.Let’s take a look.
上节课,我们讲的是如何通过训练误差和验证误差,判断算法偏差或方差是否偏高,帮助我们更加系统地在机器学习中,运用这些基本方法,也更加系统地优化算法性能,我们一起来了解一下。
When training a neural network, here’s a basic recipe I will use.After having trained an initial model.I will first ask, does your algorithm have high bias?And so to try and evaluate if there is high bias,you should look at, really, the training set or the training data performance.Right. And so, if it does have high bias,does not even fit in the training set that well,some things you could try would be to try pick a network,such as more hidden layers or more hidden units,or you could **train it longer.**Maybe run trains longer or try some more advanced optimization algorithms,which we’ll talk about later in this course.Or you can also try, this is kind of a, maybe it work, maybe it won’t.But we’ll see later that there are a lot of different **neural network architecture**s and maybe you can find a new network architecture that’s better suited for this problem.Putting this in parentheses, because one of those things that you just have to try.Maybe you can make it work, maybe not,whereas getting a bigger network almost always helps.And training longer doesn’t always help,but it certainly never hurts.So when training a learning algorithm,I would try these things until I can at least get rid of the bias problems,as in go back after I’ve tried this and keep doing that until I can fit,at least, fit the training set pretty well.
这是我在训练神经网络时用到的基本方法,初始模型训练完成后,我首先要知道算法的偏差高不高,如果偏差较高,试着评估训练集或训练数据的性能,如果偏差的确很高,甚至无法拟合训练集,那么你要做的就是选择一个新网络,比如含有更多隐层或者隐藏单元的网络,或者花费更多时间来训练网络,花费更多时间训练算法 或者尝试更先进的优化算法,后面我们会讲到这部分内容,你也可以尝试其它方法 可能有用 也可能没用,一会儿我们会看到许多不同的神经网络架构,或你能找到一个更适合解决此问题的新网络架构,加上括号 因为其中一条就是,你必须去尝试,可能有用 也可能没用,不过采用规模更大的网络通常都会所有帮助,延长训练时间不一定有用,但也没什么坏处,训练学习算法时,我会不断尝试这些方法 直到解决掉偏差问题 这是最低标准,反复尝试 直到可以拟合数据为止,至少能够拟合训练集。
And usually if you have a big enough network,you should usually be able to fit the training data well so long as it’s a problem that is possible for someone to do, alright?If the image is very blurry,it may be impossible to fit it.But if at least a human can do well on the task,if you think base error is not too high,then by training a bigger network you should be able to, hopefully, do well, at least on the training set.To at least fit or overfit the training set.
如果网络足够大,通常可以很好的拟合训练集,只要你能扩大网络规模,如果图片很模糊,算法可能无法拟合该图片,但如果有人可以分辨出图片,如果你觉得基本误差不是很高,那么训练一个更大的网络 你就应该可以… 至少可以很好地拟合训练集,至少可以拟合或过拟合训练集。
Once you reduce bias to a acceptable amounts then ask, do you have a variance problem?**And so to evaluate that, I would look at dev set performance.Are you able to generalize from a pretty good training set performance to having a pretty good dev set performance?And if you have high variance, well,best way to solve a high variance problem is to **get more data.If you can get it this,you know, can only help.But sometimes you can’t get more data.Or you could try regularization,which we’ll talk about in the next video,to try to reduce overfitting.And then also, again, sometimes you just have to try it.But if you can find a more appropriate neural network architecture,sometimes that can reduce your variance problem as well,as well as reduce your bias problem.But how to do that?It’s harder to be totally systematic how you do that.But so I try these things and I kind of keep going back,until hopefully you find something with both low bias and low variance,where upon you would be done.
一旦偏差降低到可接受的数值,检查一下方差有没有问题,为了评估方差 我们要查看验证集性能,我们能从一个性能理想的训练集,推断出验证集的性能也理想吗,如果方差高,最好的解决办法就是采用更多数据,如果你能做到,会有一定的帮助,但有时候 我们无法获得更多数据,我们也可以尝试通过,正则化来减少过拟合,这个我们下节课会讲,有时候我们不得不反复尝试,但是 如果能找到更合适的神经网络框架,有时它可能会一箭双雕,同时减少方差和偏差,如何实现呢,想系统地说出做法很难,总之就是不断重复尝试,直到找到一个低偏差 低方差的框架,这时你就成功了。
So a couple of points to notice.First is that, depending on whether you have high bias or high variance,the set of things you should try could be quite different.So I’ll usually use the training dev set to try to diagnose if you have a bias or variance problem,and then use that to select the appropriate subset of things to try.So for example, if you actually have a high bias problem,getting more training data is actually not going to help.Or at least it’s not the most efficient thing to do.So being clear on how much of a bias problem or variance problem,or both can help you focus on selecting the most useful things to try.
有两点需要大家注意,第一点 高偏差和高方差是两种不同的情况,我们后续要尝试的方法也可能完全不同,我通常会用训练验证集来诊断算法是否存在偏差或方差问题,然后根据结果选择尝试部分方法,举个例子 如果算法存在高偏差问题,准备更多训练数据其实没有什么用,至少这不是最有效的方法,所以大家要清楚存在问题的是偏差还是方差 还是两者都有问题,明确这一点有助于我们选出最有效的方法。
Second, in the earlier era of machine learning,there used to be a lot of discussion on what is called the bias variance tradeoff.And the reason for that was that,for a lot of the things you could try,you could increase bias and reduce variance,or reduce bias and increase variance.But back in the pre-deep learning era,we didn’t have many tools, we didn’t have as many tools that just reduce bias or that just reduce variance without hurting the other one.But in the modern deep learning, big data era,so long as you can keep training a bigger network,and so long as you can keep getting more data,which isn’t always the case for either of these,but if that’s the case,then getting a bigger network almost always just reduces your bias without necessarily hurting your variance,so long as you regularize appropriately.And getting more data pretty much always reduces your variance and doesn’t hurt your bias much.So what’s really happened is that, with these two steps,the ability to train, pick a network, or get more data,we now have tools to drive down bias and just drive down bias,or drive down variance and just drive down variance,without really hurting the other thing that much network.We’ll talk about regularization starting from the next video.
第二点 在机器学习的初期阶段,关于所谓的偏差方差权衡的探讨屡见不鲜,原因是,我们能尝试的方法有很多,可以增加偏差 减少方差,也可以减少偏差 减少方差,但是在深度学习的早期阶段,我们没有太多工具可以做到,只减少偏差或方差 却不影响到另外一方,但是在当前的深度学习和大数据时代,只要持续训练一个更大的网络,只要准备了更多数据,那么也并非只有这两种情况,我们假定是这样,那么,只要正则适度,通常构建一个更大的网络便可以,在不影响方差的同时 减少偏差,而采用更多数据通常可以在不过多影响偏差的同时减少方差,这两步实际要做的工作是,训练网络 选择网络或者准备更多数据,现在我们有工具可以做到,在减少偏差或偏差的同时,不对另一方产生过多不良影响。
And I think this has been one of the big reasons that deep learning has been so useful for supervised learning,that there’s much less of this tradeoff where you have to carefully balance bias and variance,but sometimes you just have more options for reducing bias or reducing variance without necessarily increasing the other one.And, in fact, so last you have a well regularized Training a bigger network almost never hurts.And the main cost of training a neural network that’s too big is just computational time,so long as you’re regularizing.So I hope this gives you a sense of the basic structure of how too reganize your machine learning problem to diagnose bias and variance,and then try to select the right operation for you to make progress on your problem.One of the things I mentioned several times in the video is regularization is a very useful technique for reducing variance.There is a little bit of a bias variance tradeoff when you use regularization.It might increase the bias a little bit,although often not too much if you have a huge enough network.But let’s dive into more details in the next video,so you can better understand how to apply regularization to your neural network.
我觉得这就是深度学习对监督式学习大有裨益的一个重要原因,也是我们不用太过关注如何平衡偏差和方差的一个重要原因,但有时我们有很多选择,减少偏差或方差而不增加另一方,最终 我们会得到一个非常规范化的网络,从下节课开始 我们将讲解正则化,训练一个更大的网络几乎没有任何负面影响,而训练一个大型神经网络的主要代价也只是计算时间,前提是网络是比较规范化的,今天我们讲了如何通过组织机器学习来诊断偏差和方差的基本方法,然后选择解决问题的正确操作,希望大家有所了解和认识,我在课上不止一次提到了正则化,它是一种非常实用的减少方差的方法,正则化时会出现偏差方差权衡问题,偏差可能略有增加,如果网络足够大 增幅通常不会太高,我们下节课再细讲,以便大家更好地理解如何实现神经网络的正则化。
机器学习的基本方法
在训练机器学习模型的过程中,解决 High bias 和 High variance 的过程:
1.是否存在 High bias ?
2.是否存在 High variance?
参考文献:
[1]. 大树先生.吴恩达Coursera深度学习课程 DeepLearning.ai 提炼笔记(2-1)– 深度学习的实践方面
PS: 欢迎扫码关注公众号:「SelfImprovementLab」!专注「深度学习」,「机器学习」,「人工智能」。以及 「早起」,「阅读」,「运动」,「英语 」「其他」不定期建群 打卡互助活动。