Machine Learning - VII. Regularization规格化 (Week 3)

http://blog.csdn.net/pipisorry/article/details/43966361

机器学习Machine Learning - Andrew NG courses学习笔记

Regularization 规格化

The Problem of Overfitting过拟合问题

linear regression example线性规划的例子(housing prices)

Machine Learning - VII. Regularization规格化 (Week 3)_第1张图片

underfit(or the hypothesis having high bias; figure1): if the algorithmhas a very strong preconception, or a very strong bias that housing prices are going to vary linearly with their size and despite the data to the contrary.

overfit(figure3): The term high variance is a historical or technical one.But, the intuition is that,if we're fitting such a high order polynomial, then, the hypothesis can fit almost any function and this face of possible hypothesis is just too large, it's too variable.

And we don't have enough data to constrain it to give us a good hypothesis so that's called overfitting.the curve tries too hard to fit the training set, so that it even fails to generalize to new examples and fails to predict prices on new examples.


logistic regression example with two features X1 and x2逻辑规划的例子(breast tumor cancer)
Machine Learning - VII. Regularization规格化 (Week 3)_第2张图片

just_right(figure2):  add to your features these quadratic terms,you could get a decision boundary that might look more like figure 2.And that's a pretty good fit to the data.
overfit(figure3): try really hard to find a decision boundary that fits your training data or go to great lengths to contort(扭曲) itself,to fit every single training example well.
Addressing overfitting解决过拟合问题

In the previous examples, we had one or two dimensional data so,we could just plot the hypothesis and see what was going on and select the appropriate degree polynomial.

But that doesn't always work.And, in fact more often we may have learning problems that where we just have a lot of features.And there is not just a matter of selecting what degree polynomial.it also becomes much harder to plot the data and it becomes much harder to visualize it,to decide what features to keep or not.

Machine Learning - VII. Regularization规格化 (Week 3)_第3张图片

All of these features seem kind of useful.But, if we have a lot of features, and, very little training data, then, over fitting can become a problem.

解决方法

Machine Learning - VII. Regularization规格化 (Week 3)_第4张图片

disadvantage:throwing away some of the features, is also throwing away some of the information you have about the problem.
Machine Learning - VII. Regularization规格化 (Week 3)_第5张图片

 怎么检测underfit & overfit?

Debugging and diagnosing things that can go wrong with learning algorithms

Advice for Applying Machine Learning-鉴别两种类型的问题】


Cost Function代价函数

直觉上的解释

Machine Learning - VII. Regularization规格化 (Week 3)_第6张图片

规格化的好处

Machine Learning - VII. Regularization规格化 (Week 3)_第7张图片

Note: it is possible to show that having smaller values of the parameters corresponds to usually smoother functions as well for the simpler.And it is kind of hard to explain unless you implement yourself and see it for yourself.

But the example of having theta 3 and theta 4 be small gave us a simpler hypothesis helps explain why, at least give some intuition as to why this might be true.

规格化cost function
Machine Learning - VII. Regularization规格化 (Week 3)_第8张图片
regularization term: 规格化项

1. at the end to shrink every single parameter theta 1-100.
2. That sort of the convention that,the sum I equals one through N, rather than I equals zero through N. whether you include,theta zero or not, in practice, make very little difference to the results.
Machine Learning - VII. Regularization规格化 (Week 3)_第9张图片

regularization parameter: lambda,controls the trade of between the goal of fitting the training set well and the goal of keeping the parameter plan small and therefore keeping the hypothesis relatively simple to avoid overfitting.
you can get out in fact a curve that isn't quite a quadratic function, but is much smoother and much simpler and maybe a curve like the magenta line that,gives a much better hypothesis for this data.

regularization parameter: lambda设置太大带来的问题

Machine Learning - VII. Regularization规格化 (Week 3)_第10张图片

Machine Learning - VII. Regularization规格化 (Week 3)_第11张图片

if the regularization parameter monitor is set to be very large,then what will happen is we will end up penalizing the parameters very highly.Then we end up with all of these parameters close to zero,that is akin to fitting a flat horizontal straight line to the data,this is an example of underfitting.

Machine Learning - VII. Regularization规格化 (Week 3)_第12张图片

lambda设置的影响在mlclass-ex2  -  2.5 Optional (ungraded) exercises中的一个例子

Machine Learning - VII. Regularization规格化 (Week 3)_第13张图片

Machine Learning - VII. Regularization规格化 (Week 3)_第14张图片

parameter lambda怎么自动选择?

参见: 【X. Advice for Applying Machine Learning- Regularization and Bias_Variance规格化和偏差_方差】



Regularized Linear Regression规格化线性规划

For linear regression, we had worked out two learning algorithms,gradient descent and the normal equation.
There we take those two algorithms and generalize them to the case of regularized linear regression.

Gradient descent解线性规划的改进

Machine Learning - VII. Regularization规格化 (Week 3)_第15张图片

Machine Learning - VII. Regularization规格化 (Week 3)_第16张图片

Note:

theta zero单独分开原因:for our regularized linear regression,we don't penalize theta zero.so treating theta zero slightly differently.
的效果: theta J times 0.99 has the effect of shrinking theta J a little bit towards 0.this makes theta J a bit smaller.
regularized的解释: when using regularized linear regression what we're doing is on every regularization were multiplying data J by a number that is a little bit less than one, so we're shrinking the parameter a little bit, and then we're performing a similar update as before.
Normal equation解线性规划的改进

Machine Learning - VII. Regularization规格化 (Week 3)_第17张图片
推导: using the new definition of J of theta, with the regularization objective.Then this new formula for theta is the one that will give you the global minimum of J of theta.???

Machine Learning - VII. Regularization规格化 (Week 3)_第18张图片

Note: so long as the regularization parameter is strictly greater than zero.It is actually possible to prove that this matrix X transpose X plus parameter time, this matrix will not be singular and that this matrix will be invertible.


Regularized Logistic Regression规格化逻辑规划

overfitting的可能由来:More generally if you have logistic regression with a lot of features.Not necessarily polynomial ones, but just with a lot of features you can end up with overfitting.

规格化Gradient descent方法

Machine Learning - VII. Regularization规格化 (Week 3)_第19张图片

Machine Learning - VII. Regularization规格化 (Week 3)_第20张图片

Machine Learning - VII. Regularization规格化 (Week 3)_第21张图片(注意:中括号里应该是+lambda/m*thetaj)

规格化高级方法

Machine Learning - VII. Regularization规格化 (Week 3)_第22张图片

Code for regularized logistic regression

1. code to compute the regularized cost function:

J = -1/m * (y' * log(sigmoid(X*theta)) + (1-y)' * log(1 - sigmoid(X*theta))) + lambda/(2*m) * (theta'*theta - theta(1)*theta(1));

2. code to compute the gradient of the regularized  cost:

1>        #vectorized,推荐

grad = 1/m*(X'*(sigmoid(X*theta)-y));
temp = theta;temp(1)=0;
grad = grad+lambda/m*temp;

2>       #vectorized

tmp = X' * (sigmoid(X * theta) - y);
grad = (tmp + lambda * theta) / m;
grad(1) = tmp(1) / m; 

3>        #non-vectorized

grad(1) = 1/m*(sigmoid(X*theta)-y)'*X(:,1);
for i=2:size(theta)
    grad(i) = 1/m*(sigmoid(X*theta)-y)'*X(:,i) + lambda/m*theta(i);

end

Note:

end keyword in indexing:One special keyword you can use in indexing is the end keyword in indexing.This allows us to select columns (or rows) until the end of the matrix.
For example, A(:, 2:end) will only return elements from the 2nd to last column of A. Thus, you could use this together with the sum and .^ operations to compute the sum of only the elements you are interested in(e.g., sum(z(2:end).^2)).




Review复习

Machine Learning - VII. Regularization规格化 (Week 3)_第23张图片


Machine Learning - VII. Regularization规格化 (Week 3)_第24张图片


Machine Learning - VII. Regularization规格化 (Week 3)_第25张图片



More:

Whereas linear regression, logistic regression, you can form polynomial terms, but it turns out that there are much more powerful nonlinear quantifiers that can then sort of polynomial regression.

更多非线性分类器的学习在后面的讲解中,而不是使用linear\logistic regression的高维feature来实现。如神经网络Neural Networks、支持向量机SVM。

from:http://blog.csdn.net/pipisorry/article/details/43966361


你可能感兴趣的:(机器学习,NG,learning,machine,Andrew)