M.L.-Classification and Representation

1.Logistic Regression(classification regression)


Linear Regression may be not suited well for some classification problem,such as classifying the email `which is spam or not ,or judging the cancer's condition depend on its size.

So,there is another algorithm——logistic regression,which has several features Xi,and the output y only two conditions——zero or one.

Hypothesis Representation


In the linear regression,the hypothesis result is θ'x which can be larger than 1 or smaller than 0,so we use sigmoid function to modify the hypothesis result during 1 and 0.

M.L.-Classification and Representation_第1张图片

Decision Boundary



The decision boundary is the line that separates the area where y = 0 and where y = 1. It is created by our hypothesis function.

decision boundary can be linear or nonlinear ,sometimes even complicated curve.

As we can seen above,if we define:

h(z) > 0.5  —>  y = 1 ;

h(z) < 0.5  —>  y = 0 ;

which means, z > 0 is the boundary.

so,if z = θ'x ,then θ'x > 0 is the boundary which divide the area into two parts——y = 0 and y = 1; θ'x = θ0*x0 + θ1*x1 + θ2*x2 (this is a linear boundary)

Cost Function


We cannot use the same cost function that we use for linear regression because the Logistic Function will cause the output to be wavy, causing many local optima. In other words, it will not be a convex function.

so, we define the cost function of logistic regression as this :

c.f of logistic function

We can rewrite the cost equation into the form:

cost(h(x),y) = -ylog(h(x)) - (1-y)log(1-h(x))

Gradient Descent


The form is same as the gradient descent of linear regression.

M.L.-Classification and Representation_第2张图片

A vectorized implementation is:

vectorized

Advanced Optimization


"Conjugate gradient", "BFGS", and "L-BFGS" are more sophisticated, faster ways to optimize θ that can be used instead of gradient descent. We suggest that you should not write these more sophisticated algorithms yourself (unless you are an expert in numerical computing) but use the libraries instead, as they're already tested and highly optimized. Octave provides them.

2.Multi-class Classification: One-vs-others


if we have more than two categories,instead of y = {0,1} we will expand our definition so that y = {0,1...n}.We divide our problem into n+1 (+1 because the index starts at 0) binary classification problems.

M.L.-Classification and Representation_第3张图片
one vs all

To summarize:

Train a logistic regression classifier hθ(x)for each class to predict the probability that y = i .

To make a prediction on a new x, pick the class that maximizes hθ(x).

3.PROBLEM : Over-fitting


The hypothesis function may predict the examples in the training set very well,but can not predict the unseen data well.

M.L.-Classification and Representation_第4张图片
three conditions with different features

As is shown in the picture above,the first curve has few features so it does not fit the data well,which called "under-fitting" or "high bias".The second curve is right well.And the last curve fitting all the examples in the training set but it looks like a unreasonable and complicate drawing may can not predict the unseen data.So,under this condition,the curve is called "over-fitting" or "high-variance" .

What are the reasons of over-fitting?

1).too many features

2).too complicate hypothesis function

How to solve it?

1).reduce the features

2).regularization (正则化)

.Keep all the features, but reduce the magnitude of parameters θj.

.Regularization works well when we have a lot of slightly useful features.

Cost Function


M.L.-Classification and Representation_第5张图片
modified cost function 

the regular formula:

M.L.-Classification and Representation_第6张图片
regularization parameter

Regularized Linear Regression


It will change the form of gradient descent and normal equation.

Gradient Descent

M.L.-Classification and Representation_第7张图片
modified gradient descent

Normal Equation

M.L.-Classification and Representation_第8张图片
modified normal equation 

Recall that if m < n, then X'X is non-invertible. However, when we add the term λ⋅L, then X'X+ λ⋅L becomes invertible.

Regularized Logistic Regression


We can regularize logistic regression in a similar way that we regularize linear regression.

M.L.-Classification and Representation_第9张图片
regularized cost function

so,the gradient descent function is changed as following:

M.L.-Classification and Representation_第10张图片
regularized gradient descent

你可能感兴趣的:(M.L.-Classification and Representation)