吴恩达机器学习第三周(含编程作业及python实现)

主要内容:

逻辑回归:

分类问题。即训练集中的标签(y值)属于一个有穷集,如{0,1},{0,...,10}。具体例子有:判断病人是否患有癌症(2种类别);手写数字识别(10种类别);判断学生是否挂科等等。

假设函数:

在原先线性回归函数中的 θ*X 上,再套上一层激励函数。激励函数是神经网络中的一种函数,通常是非线性的。在这里的机器学习问题上,其表示形式为g= \frac{1}{1+ e^{-Z}} ,其中Z=\Theta ^{T}X ,(这里的θ为列向量,但其实要根据具体情况以及自己的定义来确定Z的表示,不一定是这种形式。)

这个假设函数最终输出的g值,其实就是预测的 y=1的概率。从其图像可以得出,当Z>=0时,g>=0.5,预测y=1;当Z<0时,g<0.5,预测y=0 。

代价函数:

J\left ( \Theta \right ) = -\frac{1}{m}\sum_{i=1}^{m}\left [ y^{\left ( i \right )}log\left ( h_{\Theta }\left ( x^{\left ( i \right )} \right ) \right )+\left ( 1-y^{\left ( i \right )} \right ) log\left (1- h_{\Theta }\left ( x^{\left ( i \right )} \right ) \right )\right ]

其推导过程实际是最大似然估计的过程(在吴恩达另一门课程《神经网络与深度学习》中有提到过):

P(y|x) = \left ( h(x) \right )^{y}(1-h(x))^{1-y}

两边取对数似然函数,再求最大似然估计,就可以得到J(θ)表达式。

梯度下降:

同线性回归下的梯度下降表达式相同。

多类别分类问题:

当有k个类别时(k>=3),需要构造k个分类器h(x),给定一个测试值,代入这k个分类器h(x)中,求出最大值,其所在分类器就是预测它所属的类别。

过度拟合:

overfitting,高方差。主要是通过构造多个特征量来强行拟合训练样本中的数据,但出现新数据时,无法保证拟合。

欠拟合:

underfitting,高偏差。主要是特征量过少,无法拟合训练集中的样本,自然也无法拟合新数据。

正则化:

解决过度拟合的一种方法,通过在代价函数中加入“惩罚项”\frac{\lambda }{2m}\sum_{j=1}^{n}\Theta_{j} ^{2},即在代价函数中加入特征量对应的θ的影响。因此对于影响较小而数值又较大的特征量对应θ值就会接近0。(即使得高次幂的特征量对应的θ值减小,才能使得代价函数J减少)

正则化下的代价函数和梯度:

在最后一项加入正则项和对应的求导,其余不变。

含正则项的正规方程:

\Theta = \left ( X^{T}X+\lambda L \right )^{-1}X^{T}y

其中L为 单位矩阵把它左上角的1置0,L = 吴恩达机器学习第三周(含编程作业及python实现)_第1张图片

 

编程作业(matlab):

costFunction.m 代价函数:

function [J, grad] = costFunction(theta, X, y)
%COSTFUNCTION Compute cost and gradient for logistic regression
%   J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
%   parameter for logistic regression and the gradient of the cost
%   w.r.t. to the parameters.

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;
grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta
%
% Note: grad should have the same dimensions as theta
%

newy = [y;(-1*y)+1];
newLoghx = [log(sigmoid(X*theta));log(1-sigmoid(X*theta))]; 

J = (-1/m)*newy'*newLoghx;

grad = (1/m)*X'*(sigmoid(X*theta)-y);



% =============================================================

end

 

costFunctionReg 含正则项的代价函数:

function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
%   J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters. 

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;
grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta

theta0_square = theta(1) * theta(1);
punish = (lambda/(2*m)) * (theta' * theta - theta0_square);

newY = [y;(1-y)];
newLoghx = [log(sigmoid(X*theta));log(1-sigmoid(X*theta))];

J = (-1/m) * newY' * newLoghx + punish;

tempM =  (1/m) * X' * (sigmoid(X*theta) - y) ;
grad = tempM + (lambda / m) * theta;
grad(1) = tempM(1);


    



% =============================================================

end

 

predict.m 预测

function p = predict(theta, X)
%PREDICT Predict whether the label is 0 or 1 using learned logistic 
%regression parameters theta
%   p = PREDICT(theta, X) computes the predictions for X using a 
%   threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1)

m = size(X, 1); % Number of training examples

% You need to return the following variables correctly
p = zeros(m, 1);

% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned logistic regression parameters. 
%               You should set p to a vector of 0's and 1's
%

p = sigmoid(X * theta);
p(p>=0.5) = 1;
p(p<0.5) = 0;







% =========================================================================


end

 

sigmoid.m 激励函数

function g = sigmoid(z)
%SIGMOID Compute sigmoid function
%   g = SIGMOID(z) computes the sigmoid of z.

% You need to return the following variables correctly 
g = zeros(size(z));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the sigmoid of each value of z (z can be a matrix,
%               vector or scalar).

g = 1 ./ ( 1 + exp(z*(-1)) );



% =============================================================

end

 

python版本:(含画图)

一些注意点:

1. python下主要通过numpy来做矩阵的运算,但有时运算中会出现一种“秩为1”的数组,即shape = (m,)这种类型的,需要reshape将它转成shape = (m , 1),不然会出现很多错误。

2. python下的画图使用matplotlib,和matlab的画图差不多,但在画等高线图contour的时候,要注意先将坐标用np.meshgrid函数转化一下。如:

(u,v) = np.meshgrid(u,v),不然也会出错。

3. 在python的scipy库中有很多类似的梯度下降优化函数,其功能和matlib的fminunc函数差不多。这里用的是fmin_tnc函数,要求将代价函数和梯度函数分开写。

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import scipy.optimize as op

def loadDataSet( str ):
    dataMat = []
    labelMat = []
    fr = open(str)
    for line in fr.readlines():
        lineArr = line.strip().split()
        #print(lineArr)
        dataArr1 = lineArr[0].strip().split(',')[0]
        dataArr2 = lineArr[0].strip().split(',')[1]
        labelArr = lineArr[0].strip().split(',')[2]
        dataMat.append([float(dataArr1),float(dataArr2)])
        labelMat.append([float(labelArr)])
    return dataMat,labelMat

def sigmoid( z ):
    z = np.asarray(z)
    g = np.zeros(z.size)
    g = 1 / ( 1 + np.exp(z * (-1)))
    return g

def costFunction(theta,X,y):
    J = 0
    grad = np.zeros(theta.size)
    m = X.shape[0]
    newy = np.row_stack((y,1-y))
    newlogy = np.row_stack((np.log(sigmoid(np.dot(X,theta))) , np.log(1-sigmoid(np.dot(X,theta)))))
    newy = np.asarray(newy).reshape(2 * X.shape[0], 1)
    newlogy = np.asarray(newlogy).reshape(2 * X.shape[0], 1)
    J = -1/m * np.dot(newy.T , newlogy)
    grad = 1/m * np.dot(X.T , sigmoid(np.dot(X,theta)) - y)
    return J,grad

def costFun(theta,X,y):
    J = 0
    m = X.shape[0]
    newy = np.row_stack((y, 1-y))
    newlogy = np.row_stack((np.log(sigmoid(np.dot(X, theta))), np.log(1-sigmoid(np.dot(X, theta)))))
    newy = np.asarray(newy).reshape(2*X.shape[0] , 1)
    newlogy = np.asarray(newlogy).reshape(2*X.shape[0] , 1)
    J = -1/m * np.dot(newy.T, newlogy)
    return J

def gradFun(theta,X,y):
    grad = np.zeros(theta.size)
    m = X.shape[0]
    grad = 1 / m * np.dot(X.T, sigmoid(np.dot(X, theta)).reshape(X.shape[0],1) - y).reshape(X.shape[1],1)
    return grad

def predict(theta,X):
    m = X.shape[0]
    p = np.zeros((m,1))
    p = sigmoid(np.dot(X,theta))
    p[p>0.5] = 1
    p[p<0.5] = 0
    p = np.asarray(p).reshape(m,1)
    return p


def mapFeature(x1 , x2):
    x1 = np.asarray(x1)
    x2 = np.asarray(x2)
    degree = 6
    out = np.ones((x1.shape[0],1))
    k = 0
    for i in range(1,degree+1):
        for j in range(i+1):
            out[:, k] = np.power(x1, i - j) * np.power(x2, j)
            out = np.column_stack((out, np.ones(x1.shape[0])))
            k = k+1

    return out

def costFunctionReg(theta,X,y,lambda2):
    J = 0
    grad = np.zeros(theta.size)
    t = np.asarray(theta).reshape(theta.shape[0],1)
    t[0] = 0
    m = X.shape[0]
    newy = np.row_stack((y, 1 - y))
    newlogy = np.row_stack((np.log(sigmoid(np.dot(X, theta))), np.log(1 - sigmoid(np.dot(X, theta)))))
    newy = np.asarray(newy).reshape(2 * X.shape[0], 1)
    newlogy = np.asarray(newlogy).reshape(2 * X.shape[0], 1)
    J = -1 / m * np.dot(newy.T, newlogy) + lambda2/(2*m) * np.sum(t*t)
    grad = grad = 1 / m * np.dot(X.T, sigmoid(np.dot(X, theta)).reshape(X.shape[0], 1) - y).reshape(X.shape[1], 1) + (lambda2 / m * t)
    return J, grad

def cost2Fun(theta,X,y,lambda2):
    J = 0
    m = X.shape[0]
    t = np.asarray(theta).reshape(theta.shape[0],1)
    t[0] = 0
    newy = np.row_stack((y, 1 - y))
    newlogy = np.row_stack((np.log(sigmoid(np.dot(X, theta))), np.log(1 - sigmoid(np.dot(X, theta)))))
    newy = np.asarray(newy).reshape(2 * X.shape[0], 1)
    newlogy = np.asarray(newlogy).reshape(2 * X.shape[0], 1)
    J = -1 / m * np.dot(newy.T, newlogy) + lambda2 / (2 * m) * np.sum(t * t)
    return J


def grad2Fun(theta,X,y,lambda2):
    grad = np.zeros((theta.shape[0],1))
    t = np.asarray(theta).reshape(theta.shape[0],1)
    t[0] = 0
    m = X.shape[0]
    grad = grad = 1 / m * np.dot(X.T, sigmoid(np.dot(X, theta)).reshape(X.shape[0],1) - y).reshape(X.shape[1],1) + (lambda2/m * t)
    return grad


if __name__ == "__main__":

    #part 1
    #load data
    [dataMat , labelMat] = loadDataSet('C:/Users/apple/Desktop/ex2data1.txt')
    dataMat = np.asarray(dataMat)
    #print(dataMat.shape)
    labelMat = np.asarray(labelMat)
    #print(labelMat.shape)

    #part 1.1 Visualizing the data
    pos = np.asarray(np.where(labelMat==1)[0]).reshape(np.sum(labelMat==1),1)
    neg = np.asarray(np.where(labelMat==0)[0]).reshape(np.sum(labelMat==0),1)
    plt.scatter(dataMat[pos,0],dataMat[pos,1],marker='+',color='black',linewidths=20,edgecolors='none')
    plt.scatter(dataMat[neg,0], dataMat[neg,1], marker='o', color='yellow',  s=20 , edgecolors='gray')
    #plt.show()

    #part1.2 Implementation
    #part1.2.1 sigmoid function
    z = np.zeros([3,3])
    print(sigmoid(z))

    #part1.2.2 cost function and gradient
    (m , n) = dataMat.shape
    dataMat = np.column_stack((np.ones(m),dataMat))
    theta = np.zeros((n+1,1))
    [cost,grad] = costFunction(theta , dataMat , labelMat)
    print("cost at zeros :%f" % cost)
    print("gradient at zeros :" )
    print(grad)

    #part1.2.3 Learning parameters using fminunc
    result = op.fmin_tnc(func=costFun , x0=theta , fprime=gradFun , args=(dataMat,labelMat))
    theta = result[0]
    [cost, grad] = costFunction(theta, dataMat, labelMat)
    print("cost at theta found by fminunc :%f" % cost)
    print("theta found :")
    print(theta)

    plot_x = [[dataMat[:,1].min()-2],[dataMat[:,2].max()+2]]
    plot_x = np.asarray(plot_x)
    plot_y = (-1 / theta[2]) * (theta[1] * plot_x + theta[0])
    plot_y = np.asarray(plot_y)
    plt.plot(plot_x , plot_y , '-')
    #plt.xlim((30,100))
    #plt.ylim((30,100))
    plt.show()

    #part 1.2.4 Evaluating logistic regression
    testScore = [1,45,85]
    testScore = np.asarray(testScore)
    prob = sigmoid(np.dot(testScore,theta))
    print("For a student with scores 45 and 85, we predict an admission probability of %f" % prob)

    p = predict(theta,dataMat)
    print(p.shape)
    print("Train Accuracy: %f" % (np.mean((p==labelMat)) * 100) )

    #part 2 Regularized logistic regression
    #load data
    [data2Mat, label2Mat] = loadDataSet('C:/Users/apple/Desktop/ex2data2.txt')
    data2Mat = np.asarray(data2Mat)
    # print(data2Mat.shape)
    label2Mat = np.asarray(label2Mat)
    # print(label2Mat.shape)

    #part2.1 Visualizing the data
    pos = np.asarray(np.where(label2Mat == 1)[0]).reshape(np.sum(label2Mat == 1), 1)
    neg = np.asarray(np.where(label2Mat==0)[0]).reshape(np.sum(label2Mat==0),1)
    plt.figure()
    plt.scatter(data2Mat[pos,0],data2Mat[pos,1],marker='+',color='black',linewidths=20,edgecolors='none')
    plt.scatter(data2Mat[neg,0], data2Mat[neg,1], marker='o', color='yellow',  s=20 , edgecolors='gray')
    plt.show()

    #part2.2
    data2Mat = mapFeature(data2Mat[:,0],data2Mat[:,1])

    #part2.3
    theta2 = np.zeros((data2Mat.shape[1],1))
    lambda2 = 1
    [cost2,grad2] = costFunctionReg(theta2,data2Mat,label2Mat,lambda2)
    print("Cost at initial theta (zeros): %f" % cost2)

    #part2.3.1 Learning parameters using fminunc
    result2 = op.fmin_tnc(func=cost2Fun, x0=theta2, fprime=grad2Fun, args=(data2Mat, label2Mat,lambda2))
    theta2 = result2[0]
    [cost2, grad2] = costFunctionReg(theta2, data2Mat, label2Mat,lambda2)
    # print("cost2 at theta found by fminunc :%f" % cost2)
    # print("theta2 found :")
    # print(theta2)

    #part2.4,2.5 plot
    u = np.arange(-1 , 1.5 , 0.05)
    v = np.arange(-1 , 1.5 , 0.05)
    u = u.reshape(u.size,1)
    v = v.reshape(v.size,1)
    z = np.zeros((u.shape[0] , v.shape[0]))

    for i in range(u.size):
        for j in range(v.size):
            z[i,j] = np.dot( mapFeature(u[i],v[j]).reshape(1,theta2.shape[0]) , theta2)
    z = z.T
    (u,v) = np.meshgrid(u,v)
    plt.contour(u,v,z,[0])
    plt.show()








 

吴恩达机器学习第三周(含编程作业及python实现)_第2张图片

 

 

 

你可能感兴趣的:(机器学习)