线性回归两种实现

线性回归

1.梯度下降法

h θ ( x ) = θ T x = ∑ i = 0 n ( θ i x i ) h_{\theta}(x) = \theta^Tx = \sum_{i=0}^n(\theta_ix_i) hθ(x)=θTx=i=0n(θixi)
代价函数
J ( θ ) = 1 2 m ∑ j = 1 m ( h θ ( x ) ( j ) − y ( j ) ) 2 J(\theta)=\frac{1}{2m}\sum_{j=1}^m(h_{\theta}(x)^{(j)}-y^{(j)})^2 J(θ)=2m1j=1m(hθ(x)(j)y(j))2
梯度下降求解
θ i = θ i − α 1 m ∑ j = 1 m ( h θ ( x ) ( j ) − y ( j ) ) ( x i ) ( j ) \theta_i = \theta_i - \alpha\frac{1}{m}\sum_{j=1}^m(h_\theta(x)^{(j)}-y^{(j)})(x_i)^{(j)} θi=θiαm1j=1m(hθ(x)(j)y(j))(xi)(j)

矢量化

设i = 0、1,m = 3
[ θ 0 θ 1 ] = [ θ 0 θ 1 ] − α 1 m ∑ j = 1 m ( b ( j ) ) [ x 0 x 1 ] ( j ) , b ( j ) = h θ ( x ) ( j ) − y ( j ) \begin{bmatrix}\theta_0\\\theta_1\end{bmatrix} = \begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}-\alpha\frac{1}{m}\sum_{j=1}^m(b^{(j)}) \begin{bmatrix}x_0\\x_1\end{bmatrix}^{(j)},b^{(j)}=h_\theta(x)^{(j)}-y^{(j)} [θ0θ1]=[θ0θ1]αm1j=1m(b(j))[x0x1](j),b(j)=hθ(x)(j)y(j)
[ θ 0 θ 1 ] = [ θ 0 θ 1 ] − α 1 m [ b ( 1 ) [ x 0 x 1 ] ( 1 ) + b ( 2 ) [ x 0 x 1 ] ( 2 ) + b ( 3 ) [ x 0 x 1 ] ( 3 ) ] , b ( j ) = h θ ( x ) ( j ) − y ( j ) \begin{bmatrix}\theta_0\\\theta_1\end{bmatrix} = \begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}-\alpha\frac{1}{m}\begin{bmatrix}b^{(1)}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(1)}+b^{(2)}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(2)}+b^{(3)}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(3)}\end{bmatrix},b^{(j)}=h_\theta(x)^{(j)}-y^{(j)} [θ0θ1]=[θ0θ1]αm1[b(1)[x0x1](1)+b(2)[x0x1](2)+b(3)[x0x1](3)],b(j)=hθ(x)(j)y(j)
[ θ 0 θ 1 ] = [ θ 0 θ 1 ] − α 1 m [ [ x 0 x 1 ] ( 1 ) [ x 0 x 1 ] ( 2 ) [ x 0 x 1 ] ( 3 ) ] [ b ( 1 ) b ( 2 ) b ( 3 ) ] , b ( j ) = h θ ( x ) ( j ) − y ( j ) \begin{bmatrix}\theta_0\\\theta_1\end{bmatrix} = \begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}-\alpha\frac{1}{m}\begin{bmatrix}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(1)}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(2)}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(3)}\end{bmatrix}\begin{bmatrix}b^{(1)}\\b^{(2)}\\b^{(3)}\end{bmatrix},b^{(j)}=h_\theta(x)^{(j)}-y^{(j)} [θ0θ1]=[θ0θ1]αm1[[x0x1](1)[x0x1](2)[x0x1](3)] b(1)b(2)b(3) ,b(j)=hθ(x)(j)y(j)
其中:
[ b ( 1 ) b ( 2 ) b ( 3 ) ] = [ h θ ( x ) ( 1 ) h θ ( x ) ( 2 ) h θ ( x ) ( 3 ) ] − [ y ( 1 ) y ( 2 ) y ( 3 ) ] = [ [ x 0 x 1 ] ( 1 ) [ θ 0 θ 1 ] [ x 0 x 1 ] ( 2 ) [ θ 0 θ 1 ] [ x 0 x 1 ] ( 3 ) [ θ 0 θ 1 ] ] − [ y ( 1 ) y ( 2 ) y ( 3 ) ] = [ [ x 0 x 1 ] ( 1 ) [ x 0 x 1 ] ( 2 ) [ x 0 x 1 ] ( 3 ) ] [ θ 0 θ 1 ] − [ y ( 1 ) y ( 2 ) y ( 3 ) ] \begin{bmatrix}b^{(1)}\\b^{(2)}\\b^{(3)}\end{bmatrix}=\begin{bmatrix}h_\theta(x)^{(1)}\\h_\theta(x)^{(2)}\\h_\theta(x)^{(3)}\end{bmatrix}-\begin{bmatrix}y^{(1)}\\y^{(2)}\\y^{(3)}\end{bmatrix}=\begin{bmatrix}\begin{bmatrix}x_0&x_1\end{bmatrix}^{(1)}\begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}\\\begin{bmatrix}x_0&x_1\end{bmatrix}^{(2)}\begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}\\\begin{bmatrix}x_0&x_1\end{bmatrix}^{(3)}\begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}\end{bmatrix}-\begin{bmatrix}y^{(1)}\\y^{(2)}\\y^{(3)}\end{bmatrix}=\begin{bmatrix}\begin{bmatrix}x_0&x_1\end{bmatrix}^{(1)}\\\begin{bmatrix}x_0&x_1\end{bmatrix}^{(2)}\\\begin{bmatrix}x_0&x_1\end{bmatrix}^{(3)}\end{bmatrix}\begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}-\begin{bmatrix}y^{(1)}\\y^{(2)}\\y^{(3)}\end{bmatrix} b(1)b(2)b(3) = hθ(x)(1)hθ(x)(2)hθ(x)(3) y(1)y(2)y(3) = [x0x1](1)[θ0θ1][x0x1](2)[θ0θ1][x0x1](3)[θ0θ1] y(1)y(2)y(3) = [x0x1](1)[x0x1](2)[x0x1](3) [θ0θ1] y(1)y(2)y(3)
则:
[ θ 0 θ 1 ] = [ θ 0 θ 1 ] − α 1 m [ [ x 0 x 1 ] ( 1 ) [ x 0 x 1 ] ( 2 ) [ x 0 x 1 ] ( 3 ) ] [ [ [ x 0 x 1 ] ( 1 ) [ x 0 x 1 ] ( 2 ) [ x 0 x 1 ] ( 3 ) ] [ θ 0 θ 1 ] − [ y ( 1 ) y ( 2 ) y ( 3 ) ] ] \color{red}{\begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}} = \color{red}{\begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}}-\alpha\frac{1}{m}\color{blue}{\begin{bmatrix}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(1)}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(2)}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(3)}\end{bmatrix}}\begin{bmatrix}\color{blue}{\begin{bmatrix}\begin{bmatrix}x_0&x_1\end{bmatrix}^{(1)}\\\begin{bmatrix}x_0&x_1\end{bmatrix}^{(2)}\\\begin{bmatrix}x_0&x_1\end{bmatrix}^{(3)}\end{bmatrix}}\color{red}{\begin{bmatrix}\theta_0\\\theta_1\end{bmatrix}}-\begin{bmatrix}y^{(1)}\\y^{(2)}\\y^{(3)}\end{bmatrix}\end{bmatrix} [θ0θ1]=[θ0θ1]αm1[[x0x1](1)[x0x1](2)[x0x1](3)] [x0x1](1)[x0x1](2)[x0x1](3) [θ0θ1] y(1)y(2)y(3)
X = [ [ x 0 x 1 ] ( 1 ) [ x 0 x 1 ] ( 2 ) [ x 0 x 1 ] ( 3 ) ] , θ = [ θ 0 θ 1 ] , y = [ y ( 1 ) y ( 2 ) y ( 3 ) ] \pmb{X}=\begin{bmatrix}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(1)}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(2)}\begin{bmatrix}x_0\\x_1\end{bmatrix}^{(3)}\end{bmatrix}, \pmb{\theta}=\begin{bmatrix}\theta_0\\\theta_1\end{bmatrix},\pmb{y}=\begin{bmatrix}y^{(1)}\\y^{(2)}\\y^{(3)}\end{bmatrix} XX=[[x0x1](1)[x0x1](2)[x0x1](3)],θθ=[θ0θ1],yy= y(1)y(2)y(3)
 
 
 

最终计算公式 : θ = θ − α 1 m X ( X T θ − y ) \Large{最终计算公式:\pmb{\theta}=\pmb{\theta}-\alpha\frac{1}{m}\pmb{X}(\pmb{X}^T\pmb{\theta}-\pmb{y})} 最终计算公式:θθ=θθαm1XX(XXTθθyy)

import numpy as np
import matplotlib.pyplot as plt

mean = (1.5, 1.5)
cov = [[1, 0.95], [0.95, 1]]
XY = np.random.multivariate_normal(mean, cov, 30).T
m = XY.shape[1]  # 样本数
# 显示数据
plt.scatter(XY[0,:], XY[1,:], c='b', s=10, edgecolor='none')
plt.xlabel("X", fontsize=14)
plt.ylabel("Y", fontsize=14)
plt.tick_params(axis='both', labelsize=14)
plt.axis([-1, 4, -1, 4])
plt.show()
# 回归函数 h=theta^T*X=theta0X0+theta1X1
# 要生成 y=kx+b 的回归函数,令其中 X0=1
X = np.concatenate((np.ones((1, m)), XY[0, :][np.newaxis, :]), axis=0)
Y = XY[1, :][np.newaxis, :].T
theta = np.random.random((2, 1)) - 0.5
alpha = 0.1  # 学习速率
epochs = 200  # 迭代次数
for i in range(epochs):
    theta = theta - alpha * 1/m * X.dot(np.dot(X.T, theta) - Y)

# 回归直线
p_x = np.array([-1, 4])
p_y = theta[0, 0] + theta[1, 0] * p_x
    
# 显示数据
plt.plot(XY[0,:], XY[1,:], 'x', c='b')
plt.plot(p_x, p_y, c='r')

plt.xlabel("X", fontsize=14)
plt.ylabel("Y", fontsize=14)
plt.tick_params(axis='both', labelsize=14)
plt.axis([-1, 4, -1, 4])
plt.show()

2.线性回归解析解

线性模型 : X T = [ x 11 x 12 ⋯ x 1 d x 21 x 22 ⋯ x 2 d ⋮ ⋮ ⋮ ⋮ x n 1 x n 2 ⋯ x n d ]      y = [ y 1 y 2 ⋮ y n ] 线性模型: \pmb{X}^T=\begin{bmatrix}x_{11}&x_{12}&\cdots&x_{1d}\\x_{21}&x_{22}&\cdots&x_{2d}\\\vdots&\vdots&\vdots&\vdots\\x_{n1}&x_{n2}&\cdots&x_{nd}\end{bmatrix}\space\space\space\space \pmb{y}=\begin{bmatrix}y_1\\y_2\\\vdots\\y_n\end{bmatrix} 线性模型:XXT= x11x21xn1x12x22xn2x1dx2dxnd     yy= y1y2yn
 
 
( 显然 R ( X T ) ≠ R ( X T ∣ y ) , X T θ = y  无解 )   解决问题: X T θ → y (显然R(X^T)\not=R(X^T|y), \pmb{X}^T\pmb{\theta}=\pmb{y}\space无解)\space\space解决问题:\pmb{X}^T\pmb{\theta}\rightarrow\pmb{y} (显然R(XT)=R(XTy),XXTθθ=yy 无解)  解决问题:XXTθθyy
 
 
引入代价函数: J = ∥ X T θ − y ∥ 2 2 ,  使 ∂ J ∂ θ = 0  而不是 J = 0 引入代价函数:J=\lVert\pmb{X}^T\pmb{\theta}-\pmb{y}\rVert_2^2,\space 使\frac{\partial{J}}{\partial\theta}=0\space而不是J=0 引入代价函数:J=XXTθθyy22, 使θJ=0 而不是J=0
 
J = ∥ X T θ − y ∥ 2 2 = ( X T θ − y ) T ( X T θ − y ) = ( θ T X − y T ) ( X T θ − y ) = θ T X X T θ − θ T X y − y T X T θ + y T y \begin{aligned} J&=\lVert\pmb{X}^T\pmb{\theta}-\pmb{y}\rVert_2^2=(\pmb{X}^T\pmb{\theta}-\pmb{y})^T(\pmb{X}^T\pmb{\theta}-\pmb{y})=(\pmb{\theta}^T\pmb{X}-\pmb{y}^T)(\pmb{X}^T\pmb{\theta}-\pmb{y})\\ &=\pmb{\theta}^T\pmb{X}\pmb{X}^T\pmb{\theta}-\pmb{\theta}^T\pmb{X}\pmb{y}-\pmb{y}^T\pmb{X}^T\pmb{\theta}+\pmb{y}^T\pmb{y} \end{aligned} J=XXTθθyy22=(XXTθθyy)T(XXTθθyy)=(θθTXXyyT)(XXTθθyy)=θθTXXXXTθθθθTXXyyyyTXXTθθ+yyTyy
 
其中,对于 θ T X y ,设 X 维数为 2 ,样本数为 3 其中,对于\pmb{\theta}^T\pmb{X}\pmb{y},设X维数为2,样本数为3 其中,对于θθTXXyy,设X维数为2,样本数为3
θ T X y = [ θ 1 , θ 2 ] [ x 11 x 21 x 31 x 12 x 22 x 32 ] [ y 1 y 2 y 3 ] = [ θ 1 x 11 + θ 2 x 12 , θ 1 x 21 + θ 2 x 22 , θ 1 x 31 + θ 2 x 32 ] [ y 1 y 2 y 3 ] = ( θ 1 x 11 + θ 2 x 12 ) y 1 + ( θ 1 x 21 + θ 2 x 22 ) y 2 + ( θ 1 x 31 + θ 2 x 32 ) y 3 \begin{aligned} \pmb{\theta}^T\pmb{X}\pmb{y}&=\begin{bmatrix}\theta_1,\theta_2\end{bmatrix}\begin{bmatrix}x_{11}&x_{21}&x_{31}\\x_{12}&x_{22}&x_{32}\end{bmatrix}\begin{bmatrix}y_1\\y_2\\y_3\end{bmatrix}\\ &=\begin{bmatrix}\theta_1x_{11}+\theta_2x_{12},\theta_1x_{21}+\theta_2x_{22},\theta_1x_{31}+\theta_2x_{32}\end{bmatrix}\begin{bmatrix}y_1\\y_2\\y_3\end{bmatrix}\\ &=(\theta_1x_{11}+\theta_2x_{12})y_1+(\theta_1x_{21}+\theta_2x_{22})y_2+(\theta_1x_{31}+\theta_2x_{32})y_3 \end{aligned} θθTXXyy=[θ1,θ2][x11x12x21x22x31x32] y1y2y3 =[θ1x11+θ2x12,θ1x21+θ2x22,θ1x31+θ2x32] y1y2y3 =(θ1x11+θ2x12)y1+(θ1x21+θ2x22)y2+(θ1x31+θ2x32)y3

∂ θ T X y ∂ θ = [ ∂ θ T X y ∂ θ 1 ∂ θ T X y ∂ θ 2 ] = [ x 11 y 1 + x 21 y 2 + x 31 y 3 x 12 y 1 + x 22 y 2 + x 32 y 3 ] = [ x 11 x 21 x 31 x 12 x 22 x 32 ] [ y 1 y 2 y 3 ] = X y \frac{\partial{\pmb{\theta}^T\pmb{X}\pmb{y}}}{\partial\theta}=\begin{bmatrix}\frac{\partial{\theta^TXy}}{\partial\theta_1}\\\frac{\partial{\theta^TXy}}{\partial\theta_2}\end{bmatrix}=\begin{bmatrix}x_{11}y_1+x_{21}y_2+x_{31}y_3\\x_{12}y_1+x_{22}y_2+x_{32}y_3\end{bmatrix}=\begin{bmatrix}x_{11}&x_{21}&x_{31}\\x_{12}&x_{22}&x_{32}\end{bmatrix}\begin{bmatrix}y_1\\y_2\\y_3\end{bmatrix}=\pmb{X}\pmb{y} θθθTXXyy=[θ1θTXyθ2θTXy]=[x11y1+x21y2+x31y3x12y1+x22y2+x32y3]=[x11x12x21x22x31x32] y1y2y3 =XXyy

求导: ∂ J ∂ θ = ∂ ( θ T X X T θ − θ T X y − y T X T θ + y T y ) ∂ θ = 2 X X T θ − 2 X y = 0 求导:\frac{\partial{J}}{\partial\theta}=\frac{\partial{(\pmb{\theta}^T\pmb{X}\pmb{X}^T\pmb{\theta}-\pmb{\theta}^T\pmb{X}\pmb{y}-\pmb{y}^T\pmb{X}^T\pmb{\theta}+\pmb{y}^T\pmb{y})}}{\partial\theta}=2\pmb{X}\pmb{X}^T\pmb{\theta}-2\pmb{X}\pmb{y}=0 求导:θJ=θ(θθTXXXXTθθθθTXXyyyyTXXTθθ+yyTyy)=2XXXXTθθ2XXyy=0
X X T θ − X y = 0 ⇒ θ = ( X X T ) − 1 X y ( 如果 X X T 可逆 ) \pmb{X}\pmb{X}^T\pmb{\theta}-\pmb{X}\pmb{y}=0\Rightarrow\pmb{\theta}=(\pmb{X}\pmb{X}^T)^{-1}\pmb{X}\pmb{y}(如果\pmb{X}\pmb{X}^T可逆) XXXXTθθXXyy=0θθ=(XXXXT)1XXyy(如果XXXXT可逆)

为使 X X T 可逆,由于 S = X X T 为 d × d 矩阵,为使 S 可逆, S 必须满秩,即 R ( X X T ) = d ,因此样本矩阵 X 须满足 n > d 为使\pmb{X}\pmb{X}^T可逆,由于\pmb{S}=\pmb{X}\pmb{X}^T为d\times d矩阵,为使\pmb{S}可逆,\pmb{S}必须满秩,即R(\pmb{X}\pmb{X}^T)=d,因此样本矩阵X须满足n>d 为使XXXXT可逆,由于SS=XXXXTd×d矩阵,为使SS可逆,SS必须满秩,即R(XXXXT)=d,因此样本矩阵X须满足n>d

theta_2 = np.linalg.inv(X.dot(X.T)).dot(X).dot(Y)

# 回归直线
p_x2 = np.array([-1, 4])
p_y2 = theta_2[0, 0] + theta_2[1, 0] * p_x2

# 显示数据
plt.plot(XY[0,:], XY[1,:], 'x', c='b')
plt.plot(p_x2, p_y2, c='r')

plt.xlabel("X", fontsize=14)
plt.ylabel("Y", fontsize=14)
plt.tick_params(axis='both', labelsize=14)
plt.axis([-1, 4, -1, 4])
plt.show()

你可能感兴趣的:(线性回归,机器学习,算法)