主要内容:
逻辑回归:
分类问题。即训练集中的标签(y值)属于一个有穷集,如{0,1},{0,...,10}。具体例子有:判断病人是否患有癌症(2种类别);手写数字识别(10种类别);判断学生是否挂科等等。
假设函数:
在原先线性回归函数中的 θ*X 上,再套上一层激励函数。激励函数是神经网络中的一种函数,通常是非线性的。在这里的机器学习问题上,其表示形式为 ,其中 ,(这里的θ为列向量,但其实要根据具体情况以及自己的定义来确定Z的表示,不一定是这种形式。)
这个假设函数最终输出的g值,其实就是预测的 y=1的概率。从其图像可以得出,当Z>=0时,g>=0.5,预测y=1;当Z<0时,g<0.5,预测y=0 。
代价函数:
其推导过程实际是最大似然估计的过程(在吴恩达另一门课程《神经网络与深度学习》中有提到过):
P(y|x) =
两边取对数似然函数,再求最大似然估计,就可以得到J(θ)表达式。
梯度下降:
同线性回归下的梯度下降表达式相同。
多类别分类问题:
当有k个类别时(k>=3),需要构造k个分类器h(x),给定一个测试值,代入这k个分类器h(x)中,求出最大值,其所在分类器就是预测它所属的类别。
过度拟合:
overfitting,高方差。主要是通过构造多个特征量来强行拟合训练样本中的数据,但出现新数据时,无法保证拟合。
欠拟合:
underfitting,高偏差。主要是特征量过少,无法拟合训练集中的样本,自然也无法拟合新数据。
正则化:
解决过度拟合的一种方法,通过在代价函数中加入“惩罚项”,即在代价函数中加入特征量对应的θ的影响。因此对于影响较小而数值又较大的特征量对应θ值就会接近0。(即使得高次幂的特征量对应的θ值减小,才能使得代价函数J减少)
正则化下的代价函数和梯度:
在最后一项加入正则项和对应的求导,其余不变。
含正则项的正规方程:
编程作业(matlab):
costFunction.m 代价函数:
function [J, grad] = costFunction(theta, X, y)
%COSTFUNCTION Compute cost and gradient for logistic regression
% J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
% parameter for logistic regression and the gradient of the cost
% w.r.t. to the parameters.
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
% You should set J to the cost.
% Compute the partial derivatives and set grad to the partial
% derivatives of the cost w.r.t. each parameter in theta
%
% Note: grad should have the same dimensions as theta
%
newy = [y;(-1*y)+1];
newLoghx = [log(sigmoid(X*theta));log(1-sigmoid(X*theta))];
J = (-1/m)*newy'*newLoghx;
grad = (1/m)*X'*(sigmoid(X*theta)-y);
% =============================================================
end
costFunctionReg 含正则项的代价函数:
function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
% J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
% theta as the parameter for regularized logistic regression and the
% gradient of the cost w.r.t. to the parameters.
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
% You should set J to the cost.
% Compute the partial derivatives and set grad to the partial
% derivatives of the cost w.r.t. each parameter in theta
theta0_square = theta(1) * theta(1);
punish = (lambda/(2*m)) * (theta' * theta - theta0_square);
newY = [y;(1-y)];
newLoghx = [log(sigmoid(X*theta));log(1-sigmoid(X*theta))];
J = (-1/m) * newY' * newLoghx + punish;
tempM = (1/m) * X' * (sigmoid(X*theta) - y) ;
grad = tempM + (lambda / m) * theta;
grad(1) = tempM(1);
% =============================================================
end
predict.m 预测
function p = predict(theta, X)
%PREDICT Predict whether the label is 0 or 1 using learned logistic
%regression parameters theta
% p = PREDICT(theta, X) computes the predictions for X using a
% threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1)
m = size(X, 1); % Number of training examples
% You need to return the following variables correctly
p = zeros(m, 1);
% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
% your learned logistic regression parameters.
% You should set p to a vector of 0's and 1's
%
p = sigmoid(X * theta);
p(p>=0.5) = 1;
p(p<0.5) = 0;
% =========================================================================
end
sigmoid.m 激励函数
function g = sigmoid(z)
%SIGMOID Compute sigmoid function
% g = SIGMOID(z) computes the sigmoid of z.
% You need to return the following variables correctly
g = zeros(size(z));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the sigmoid of each value of z (z can be a matrix,
% vector or scalar).
g = 1 ./ ( 1 + exp(z*(-1)) );
% =============================================================
end
python版本:(含画图)
一些注意点:
1. python下主要通过numpy来做矩阵的运算,但有时运算中会出现一种“秩为1”的数组,即shape = (m,)这种类型的,需要reshape将它转成shape = (m , 1),不然会出现很多错误。
2. python下的画图使用matplotlib,和matlab的画图差不多,但在画等高线图contour的时候,要注意先将坐标用np.meshgrid函数转化一下。如:
(u,v) = np.meshgrid(u,v),不然也会出错。
3. 在python的scipy库中有很多类似的梯度下降优化函数,其功能和matlib的fminunc函数差不多。这里用的是fmin_tnc函数,要求将代价函数和梯度函数分开写。
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import scipy.optimize as op
def loadDataSet( str ):
dataMat = []
labelMat = []
fr = open(str)
for line in fr.readlines():
lineArr = line.strip().split()
#print(lineArr)
dataArr1 = lineArr[0].strip().split(',')[0]
dataArr2 = lineArr[0].strip().split(',')[1]
labelArr = lineArr[0].strip().split(',')[2]
dataMat.append([float(dataArr1),float(dataArr2)])
labelMat.append([float(labelArr)])
return dataMat,labelMat
def sigmoid( z ):
z = np.asarray(z)
g = np.zeros(z.size)
g = 1 / ( 1 + np.exp(z * (-1)))
return g
def costFunction(theta,X,y):
J = 0
grad = np.zeros(theta.size)
m = X.shape[0]
newy = np.row_stack((y,1-y))
newlogy = np.row_stack((np.log(sigmoid(np.dot(X,theta))) , np.log(1-sigmoid(np.dot(X,theta)))))
newy = np.asarray(newy).reshape(2 * X.shape[0], 1)
newlogy = np.asarray(newlogy).reshape(2 * X.shape[0], 1)
J = -1/m * np.dot(newy.T , newlogy)
grad = 1/m * np.dot(X.T , sigmoid(np.dot(X,theta)) - y)
return J,grad
def costFun(theta,X,y):
J = 0
m = X.shape[0]
newy = np.row_stack((y, 1-y))
newlogy = np.row_stack((np.log(sigmoid(np.dot(X, theta))), np.log(1-sigmoid(np.dot(X, theta)))))
newy = np.asarray(newy).reshape(2*X.shape[0] , 1)
newlogy = np.asarray(newlogy).reshape(2*X.shape[0] , 1)
J = -1/m * np.dot(newy.T, newlogy)
return J
def gradFun(theta,X,y):
grad = np.zeros(theta.size)
m = X.shape[0]
grad = 1 / m * np.dot(X.T, sigmoid(np.dot(X, theta)).reshape(X.shape[0],1) - y).reshape(X.shape[1],1)
return grad
def predict(theta,X):
m = X.shape[0]
p = np.zeros((m,1))
p = sigmoid(np.dot(X,theta))
p[p>0.5] = 1
p[p<0.5] = 0
p = np.asarray(p).reshape(m,1)
return p
def mapFeature(x1 , x2):
x1 = np.asarray(x1)
x2 = np.asarray(x2)
degree = 6
out = np.ones((x1.shape[0],1))
k = 0
for i in range(1,degree+1):
for j in range(i+1):
out[:, k] = np.power(x1, i - j) * np.power(x2, j)
out = np.column_stack((out, np.ones(x1.shape[0])))
k = k+1
return out
def costFunctionReg(theta,X,y,lambda2):
J = 0
grad = np.zeros(theta.size)
t = np.asarray(theta).reshape(theta.shape[0],1)
t[0] = 0
m = X.shape[0]
newy = np.row_stack((y, 1 - y))
newlogy = np.row_stack((np.log(sigmoid(np.dot(X, theta))), np.log(1 - sigmoid(np.dot(X, theta)))))
newy = np.asarray(newy).reshape(2 * X.shape[0], 1)
newlogy = np.asarray(newlogy).reshape(2 * X.shape[0], 1)
J = -1 / m * np.dot(newy.T, newlogy) + lambda2/(2*m) * np.sum(t*t)
grad = grad = 1 / m * np.dot(X.T, sigmoid(np.dot(X, theta)).reshape(X.shape[0], 1) - y).reshape(X.shape[1], 1) + (lambda2 / m * t)
return J, grad
def cost2Fun(theta,X,y,lambda2):
J = 0
m = X.shape[0]
t = np.asarray(theta).reshape(theta.shape[0],1)
t[0] = 0
newy = np.row_stack((y, 1 - y))
newlogy = np.row_stack((np.log(sigmoid(np.dot(X, theta))), np.log(1 - sigmoid(np.dot(X, theta)))))
newy = np.asarray(newy).reshape(2 * X.shape[0], 1)
newlogy = np.asarray(newlogy).reshape(2 * X.shape[0], 1)
J = -1 / m * np.dot(newy.T, newlogy) + lambda2 / (2 * m) * np.sum(t * t)
return J
def grad2Fun(theta,X,y,lambda2):
grad = np.zeros((theta.shape[0],1))
t = np.asarray(theta).reshape(theta.shape[0],1)
t[0] = 0
m = X.shape[0]
grad = grad = 1 / m * np.dot(X.T, sigmoid(np.dot(X, theta)).reshape(X.shape[0],1) - y).reshape(X.shape[1],1) + (lambda2/m * t)
return grad
if __name__ == "__main__":
#part 1
#load data
[dataMat , labelMat] = loadDataSet('C:/Users/apple/Desktop/ex2data1.txt')
dataMat = np.asarray(dataMat)
#print(dataMat.shape)
labelMat = np.asarray(labelMat)
#print(labelMat.shape)
#part 1.1 Visualizing the data
pos = np.asarray(np.where(labelMat==1)[0]).reshape(np.sum(labelMat==1),1)
neg = np.asarray(np.where(labelMat==0)[0]).reshape(np.sum(labelMat==0),1)
plt.scatter(dataMat[pos,0],dataMat[pos,1],marker='+',color='black',linewidths=20,edgecolors='none')
plt.scatter(dataMat[neg,0], dataMat[neg,1], marker='o', color='yellow', s=20 , edgecolors='gray')
#plt.show()
#part1.2 Implementation
#part1.2.1 sigmoid function
z = np.zeros([3,3])
print(sigmoid(z))
#part1.2.2 cost function and gradient
(m , n) = dataMat.shape
dataMat = np.column_stack((np.ones(m),dataMat))
theta = np.zeros((n+1,1))
[cost,grad] = costFunction(theta , dataMat , labelMat)
print("cost at zeros :%f" % cost)
print("gradient at zeros :" )
print(grad)
#part1.2.3 Learning parameters using fminunc
result = op.fmin_tnc(func=costFun , x0=theta , fprime=gradFun , args=(dataMat,labelMat))
theta = result[0]
[cost, grad] = costFunction(theta, dataMat, labelMat)
print("cost at theta found by fminunc :%f" % cost)
print("theta found :")
print(theta)
plot_x = [[dataMat[:,1].min()-2],[dataMat[:,2].max()+2]]
plot_x = np.asarray(plot_x)
plot_y = (-1 / theta[2]) * (theta[1] * plot_x + theta[0])
plot_y = np.asarray(plot_y)
plt.plot(plot_x , plot_y , '-')
#plt.xlim((30,100))
#plt.ylim((30,100))
plt.show()
#part 1.2.4 Evaluating logistic regression
testScore = [1,45,85]
testScore = np.asarray(testScore)
prob = sigmoid(np.dot(testScore,theta))
print("For a student with scores 45 and 85, we predict an admission probability of %f" % prob)
p = predict(theta,dataMat)
print(p.shape)
print("Train Accuracy: %f" % (np.mean((p==labelMat)) * 100) )
#part 2 Regularized logistic regression
#load data
[data2Mat, label2Mat] = loadDataSet('C:/Users/apple/Desktop/ex2data2.txt')
data2Mat = np.asarray(data2Mat)
# print(data2Mat.shape)
label2Mat = np.asarray(label2Mat)
# print(label2Mat.shape)
#part2.1 Visualizing the data
pos = np.asarray(np.where(label2Mat == 1)[0]).reshape(np.sum(label2Mat == 1), 1)
neg = np.asarray(np.where(label2Mat==0)[0]).reshape(np.sum(label2Mat==0),1)
plt.figure()
plt.scatter(data2Mat[pos,0],data2Mat[pos,1],marker='+',color='black',linewidths=20,edgecolors='none')
plt.scatter(data2Mat[neg,0], data2Mat[neg,1], marker='o', color='yellow', s=20 , edgecolors='gray')
plt.show()
#part2.2
data2Mat = mapFeature(data2Mat[:,0],data2Mat[:,1])
#part2.3
theta2 = np.zeros((data2Mat.shape[1],1))
lambda2 = 1
[cost2,grad2] = costFunctionReg(theta2,data2Mat,label2Mat,lambda2)
print("Cost at initial theta (zeros): %f" % cost2)
#part2.3.1 Learning parameters using fminunc
result2 = op.fmin_tnc(func=cost2Fun, x0=theta2, fprime=grad2Fun, args=(data2Mat, label2Mat,lambda2))
theta2 = result2[0]
[cost2, grad2] = costFunctionReg(theta2, data2Mat, label2Mat,lambda2)
# print("cost2 at theta found by fminunc :%f" % cost2)
# print("theta2 found :")
# print(theta2)
#part2.4,2.5 plot
u = np.arange(-1 , 1.5 , 0.05)
v = np.arange(-1 , 1.5 , 0.05)
u = u.reshape(u.size,1)
v = v.reshape(v.size,1)
z = np.zeros((u.shape[0] , v.shape[0]))
for i in range(u.size):
for j in range(v.size):
z[i,j] = np.dot( mapFeature(u[i],v[j]).reshape(1,theta2.shape[0]) , theta2)
z = z.T
(u,v) = np.meshgrid(u,v)
plt.contour(u,v,z,[0])
plt.show()