机器学习系列之coursera week 3 Logistic Regression


1. Classification and Representation

1.1 Classification


Email: Spam / not spam

online transaction: Fraudulent(yes/no) 欺骗性的

tumor: Malignant / benign

0: negative class

1: positive class


机器学习系列之coursera week 3 Logistic Regression_第1张图片图1

(引自coursera week2 Classification)

threshold classifier output h(x) at 0.5

If h(x) >= 0.5, predict y = 1

If h(x) < 0.5, predict y = 0

但是加入一个样本点后, 如图2蓝色直线:

机器学习系列之coursera week 3 Logistic Regression_第2张图片图2

(引自coursera week2 Classification)


use linear regression, h(x) may > 1 or < 0

Logistic regression: 0 <= h(x) <= 1

1.2 Hypothesis representation of Logistic Regression

Logistic Regression model:


机器学习系列之coursera week 3 Logistic Regression_第3张图片

机器学习系列之coursera week 3 Logistic Regression_第4张图片图3

(引自coursera week2 hypothesis represention)

Interpretation of hypothesis output:

 = estimated probability that y=1 on input x

e.g. if x = [x0; x1] = [1; tumor size]

h(x) = 0.7 tell patient that 70% chance of tumor being malignant

机器学习系列之coursera week 3 Logistic Regression_第5张图片

1.3 Decision boundary

Logistic Regression:

机器学习系列之coursera week 3 Logistic Regression_第6张图片

suppose:  predict y=1, if h(x)>=0.5, 

 predict y=0, if h(x)<=0.5, 

Decision boundary:

suppose θ0 = -3, θ1 = 1, θ2 = 1

predict y = 1, if -3 + x1 + x2 >= 0, 即 x1 + x2 >= 3, decision boundary 如图4:

机器学习系列之coursera week 3 Logistic Regression_第7张图片图4

(引自coursera week2 decision boundary)

Non-linear decision boundary:



θ0 = -1, θ1 = 0, θ2 = 0, θ3 = 1, θ4 = 1

predict y=1, if -1 + x1^2 + x2^2 >= 0, 即 x1^2 + x2^2 >= 1, 如图4:

机器学习系列之coursera week 3 Logistic Regression_第8张图片图5

(引自coursera week2 decision boundary)


2. Logistic Regression Model

2.1 Cost function

Training set:  

x0 = 1, y = {0, 1}

机器学习系列之coursera week 3 Logistic Regression_第9张图片

how to choose θ?

Cost function:


机器学习系列之coursera week 3 Logistic Regression_第10张图片

则J(θ)为非凸函数,运用Gradient Descent不能保证能找到全局最小值。

机器学习系列之coursera week 3 Logistic Regression_第11张图片图6

(引自coursera week2 Cost Function)

Logistic regression cost function"

机器学习系列之coursera week 3 Logistic Regression_第12张图片图7机器学习系列之coursera week 3 Logistic Regression_第13张图片图8

(引自coursera week2 Cost Function)

2.2 Simplified cost function and gradient descent

LR cost function:

机器学习系列之coursera week 3 Logistic Regression_第14张图片


To fit parameters θ:

Gradient descent:

机器学习系列之coursera week 3 Logistic Regression_第15张图片

(simultaneously update all θj)


机器学习系列之coursera week 3 Logistic Regression_第16张图片

2.3 Advanced optimization

cost function J(θ), want minJ(θ)

Given θ, we have code that can comput:


-partial derivative of J(θ)

advanced optimization algorithm:

-Conjugate gradient




-NO need to manually pick α

-ofen faster than gradient descent


-more complex

机器学习系列之coursera week 3 Logistic Regression_第17张图片


function [jVal, grad] = costFunction(theta)
	jVal = (theta(1) - 5)^2 + (theta(2)-5)^2;
	grad = zeros(2, 1);
	grad(1) = 2 * (theta(1) - 5);
	grad(2) = 2 * (theta(2) - 5);

options = optimset('GradObj', 'on', 'MaxIter', 100);
initialTheta = zeros(2, 1);
[optTheta, functionVal, excitFlag] = fminunc(@costFunction, initialTheta, options);


3. Multiclass classification: one-vs-all


机器学习系列之coursera week 3 Logistic Regression_第18张图片图9

(引自coursera week2 Multiclass Classification)


on a new input x, to make a prediction, pick the class i that maximizes:


4. Solving the problem of overfitting

4.1 The problem of overfitting

E.g. Linear regression (housing prices)

机器学习系列之coursera week 3 Logistic Regression_第19张图片图10

(引自coursera week2 The problem of overfitting)

图10中第一个图为underfitting(high bias),第三个图为overfitting(high variance)

addressing overfitting:


(1) Reduce number of features

-Manually select which features to keep

-Model selecting algorithm

(2) Regularization

-keep all the features, but reduce magnitude/values of parameters θ, works well when we have a lot of features, each of which contributes a bit to predict y.

4.2 Cost function


机器学习系列之coursera week 3 Logistic Regression_第20张图片图11

(引自coursera week2 Cost function)

suppose we penalize and make θ3, θ4 really small

这能使θ3, θ4尽可能的小,θ3, θ4约等于0


small values for parameters θ0, θ1, ... ,θn

-"simpler" hypothesis

-Less prone to overfitting

for linear regression:


-features: x1, x2, ... x100

-parameters: θ0, θ1, ... ,θ100

机器学习系列之coursera week 3 Logistic Regression_第21张图片


机器学习系列之coursera week 3 Logistic Regression_第22张图片称为正则化项,λ为正则化参数



对参数惩罚太重会导致θj约等于0, 进而h(x)约等于θ0,即欠拟合

4.3 Regularized Linear Regression

机器学习系列之coursera week 3 Logistic Regression_第23张图片

Gradient descent:

机器学习系列之coursera week 3 Logistic Regression_第24张图片

normal equation:

机器学习系列之coursera week 3 Logistic Regression_第25张图片


4.4 Regularized Logistic Regression

cost function:

机器学习系列之coursera week 3 Logistic Regression_第26张图片

Gradient descent:

机器学习系列之coursera week 3 Logistic Regression_第27张图片


