线性回归

线性回归

仅以此文纪念过往岁月

公式

对于一些数据的线性模型如下:

Y=θTX

损失函数如下,什么是损失函数,即拟合值与实际值得平方差之和,为什么损失函数是该函数,可以参数斯坦福大学机器学习的课程。
J(θ)=12i=0m(hθ(x(i)y(i))2

对于拟合值而言,是损失函数越小越好。即:
min(Jθ)

对于上公式计算存在两种算法
1.最小二乘法
θ=(XTX)1XTY

2.梯度下降算法
dJ(θ)dθj=(hθ(x)y)xj

θj:=θj+α(y(i)hθ(x(i)))x(i)j

其中alpha为学习率,其值很关键,如果过大,梯度下降过快无法有效的收敛,而其值过小,收敛较小,耗时很长。

Python实现

from numpy import *
import matplotlib.pyplot as plt
import time

#formula
def leastSquares(train_x,train_y):
    train_xt = train_x.T
    dotTrainX = dot(train_xt,train_x)
    matX = mat(dotTrainX)
    dotTrainXInversion = matX.I
    theta = dotTrainXInversion*train_xt;
    theta = theta*train_y;
    return theta

def LMS(train_x,train_y,opts):
    numSamples, numfeatures = shape(train_x)
    alpha = opts['alpha'];
    maxIter = opts['maxIter']
    weights = ones((numfeatures, 1))
    if (opts['optimizeType'] == 'gradDescent'):
        for k in range(maxIter):
            output = train_x*weights
            error = train_y - output
            weights = weights + alpha * train_x.transpose()*error
    return  weights

def showLogRegres(weights, train_x, train_y):
    # notice: train_x and train_y is mat datatype
    numSamples, numFeatures = shape(train_x)
    if numFeatures != 2:
        print "Sorry! I can not draw because the dimension of your data is not 2!"
        return 1

    # draw all samples
    for i in xrange(numSamples):
        plt.plot(train_x[i, 1], train_y[i, 0], 'or')
    # draw the classify line
    min_x = min(train_x[:, 1])[0, 0]
    max_x = max(train_x[:, 1])[0, 0]
    weights = weights.getA()  # convert mat to array
    y_min_x = float(weights[0] + weights[1] * min_x)
    y_max_x = float(weights[0] + weights[1] * max_x)
    plt.plot([min_x, max_x], [y_min_x, y_max_x], '-g')
    plt.xlabel('X1'); plt.ylabel('X2')
    plt.show()

if __name__ == '__main__':
    train_x = mat([(1,2104),(1,1600),(1,2400),(1,1416),(1,3000)]);
    train_y = mat([400,330,369,232,540]).transpose();
    opts = {'alpha': 0.00000001,
            'maxIter': 10000,
            'optimizeType': 'gradDescent'}

    weights= LMS(train_x,train_y,opts);
    theta = leastSquares(train_x, train_y)
    showLogRegres(theta, train_x, train_y)

程序说明

该程序是对一组数据进行线性拟合,其中在梯度下降算法中会发现alpha值很小,该值是测试出来的,当该值为0.1,无法收敛。

思考

对于该模型梯度下降算法中,其中alpha很重要,有没有一种很好的办法自动调整alpha,如alpha设置过大时,自动调小,如果过小,自动调大。

你可能感兴趣的:(机器学习)