Suppose that you are the administrator of a university department and you want to determine each applicant's chance of admission based on their results on two exams. You have historical data from previous applicants that you can use as a training set for logistic regression. For each training example, you have the applicant's scores on two exams and the admissions decision.
Your task is to build a classi cation model that estimates an applicant's probability of admission based the scores from those two exams. This outline and the framework code in ex2.m will guide you through the exercise.
1. 最主要的步骤是计算cost fucnction和 gradient
function [J, grad] = costFunction(theta, X, y)
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));
Jtmp=0;
h= zeros(m,1);
%step1:compute hx
hx = X*theta;
%step2:compute h(hx)
h = sigmoid(hx);
%step3:compute cost function's sum part
for i=1:m,
Jtmp=Jtmp+(-y(i)*log(h(i))-(1-y(i))*log(1-h(i)));
end;
J=(1/m)*Jtmp;
%step4:compute gradient's sum part
sum1 =zeros(size(X,2),1);%#features row
for i=1:m
sum1 = sum1+(h(i)-y(i)).*X(i,:)';
end;
grad= (1/m)*sum1;
function g = sigmoid(z)
%SIGMOID Compute sigmoid functoon
% J = SIGMOID(z) computes the sigmoid of z.
% You need to return the following variables correctly
g = zeros(size(z));
for i =1:size(z,1)
for j =1:size(z,2)
g(i,j)=1/(1+e^(-z(i,j)));
end;
end;
3.调用octave的内建函数fminunc();来获得最优的theta和最小的cost。
[cost, grad] = costFunction(initial_theta, X, y); %output cost with intitial_theta, grad will be used in fminunc func
% Set options for fminunc
options = optimset('GradObj', 'on', 'MaxIter', 400);%on means uses grad varient
% Run fminunc to obtain the optimal theta
% This function will return theta and the cost
[theta, cost] = ...
fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);%output optimal cost(formal cost varable be changed) with final theta
prob = sigmoid([1 45 85] * theta);
fprintf(['For a student with scores 45 and 85, we predict an admission ' ...
'probability of %f\n\n'], prob);
function p = predict(theta, X)
m = size(X, 1); % Number of training examples
% You need to return the following variables correctly
p = zeros(m, 1);
%step1:compute hx
hx = X*theta;
%step2:compute h(hx)
for i =1:m
if sigmoid(hx(i))>= 0.5,
p(i) = 1;
else
p(i) = 0;
end;
end;
% Compute accuracy on our training set
p = predict(theta, X);
fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);
问题描述如下:In this part of the exercise, you will implement regularized logistic regression to predict whether microchips from a fabrication plant passes quality assurance (QA). During QA, each microchip goes through various tests to ensure it is functioning correctly. Suppose you are the product manager of the factory and you have the test results for some microchips on two di erent tests. From these two tests, you would like to determine whether the microchips should be accepted or rejected. To help you make the decision, you have a dataset of test results on past microchips, from which you can build a logistic regression model. You will use another script,ex2 reg.m to complete this portion of the exercise.
如果画出训练集的图形如下:
所以如果线性的decision boundary是一定会出现欠拟合(underfitting),于是我们需要增加特征,从x1,x2生成更多的特征。从3项增加到28项。
但是过多的特征项容易造成过拟合(overfitting),需要使用Generalization来抑制过拟合。需要注意theta0不参与cost function和梯度的计算。并且grad(1)也不需要。
costFunctionReg函数如下:
function [J, grad] = costFunctionReg(theta, X, y, lambda)
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));
Jtmp=0;
h= zeros(m,1);
%step1:compute hx
hx = X*theta;
%step2:compute h(hx)
h = sigmoid(hx);
%step3:compute cost function's sum part
for i=1:m,
Jtmp=Jtmp+(-y(i)*log(h(i))-(1-y(i))*log(1-h(i)));
end;
J=(1/m)*Jtmp + (lambda/(2*m))*sum(theta(2:size(X,2)).^2);
%step4:compute gradient's sum part
sum1 =zeros(size(X,2),1);%#features row
for i=1:m
sum1 = sum1+(h(i)-y(i)).*X(i,:)';
end;
grad(1)= (1/m)*sum1(1);
grad(2:size(X,2))= (1/m)*sum1(2:size(X,2)) + (lambda/m).*theta(2:size(X,2));
下面是单元测试的结果,可以测试每个函数是否正确
Note: Unit tests are not required to have the X matrix properly formatted as a set of training examples. Values in the first column may not be exclusively 1's. That is totally OK. The X data is arbitrary, and is provided merely to exercise your cost and gradient functions.
Unit Tests for sigmoid()
% sigmoid() Test Case #1
>> sigmoid([1 2 3])
ans = 0.73106 0.88080 0.95257
% sigmoid() Test Case #2 (updated)
>>sigmoid(-[1 2 3]')
ans =
0.268941
0.119203
0.047426
sigmoid() Test Case #3:
Unit test for costFunction()
Unit tests for predict().
>> predict([0 1 0]',magic(3))
ans =
1
1
1
>> predict([2 1 -9]',magic(3))
ans =
0
0
0
Unit test for costFunctionReg, with X being non-square:
(the first instance is unregularized, the second instance is regularized)
Here are some additional results for the second costFunctionReg() unit test. This splits-out the results for the unregularized and regularized terms for each of Cost and Gradient:
J unregularized term = 4.6832
J regularized term = 3.000
grad unregularized vector:
0.31722
0.87232
1.64812
2.23787
grad regularized vector:
-1.000 ; corresponds to grad(2)
1.000 ; corresponds to grad(3)
2.000 ; corresponds to grad(4)
编程文件链接:http://pan.baidu.com/s/1i3FuBD7
PPT链接:http://pan.baidu.com/s/1nt1Fps1