对线性回归的补充——正规方程法

目录

  • 1. 引言
  • 2. 单变量线性回归的解析解
  • 3. 多变量线性回归的解析解
  • 参考

1. 引言

  在单变量线性回归和多变量线性回归中,参数的更新都使用了梯度下降算法进行迭代,但是线性回归的参数最优值可以直接得到解析解。

2. 单变量线性回归的解析解

  模型: f ( x ) = w x + b \Large f(x)=wx+b f(x)=wx+b
  优化目标: ( w ∗ , b ∗ ) = arg min ⁡ w ∗ , b ∗ ∑ i = 1 m [ y i − f ( x i ) ] 2 = arg min ⁡ w ∗ , b ∗ ∑ i = 1 m [ y i − w x i − b ] 2 \Large (w^*, b^*)=\displaystyle\argmin \limits_{w^*, b^*}\sum_{i=1}^m[y_i-f(x_i)]^2=\argmin\limits_{w^*, b^*}\sum_{i=1}^m[y_i-wx_i-b]^2 (w,b)=w,bargmini=1m[yif(xi)]2=w,bargmini=1m[yiwxib]2
  因为 E ( w , b ) = ∑ i = 1 m [ y i − w x i − b ] 2 E(w, b)=\displaystyle \sum_{i=1}^m[y_i-wx_i-b]^2 E(w,b)=i=1m[yiwxib]2是凸函数,当它关于 w w w b b b的导数为0时,得到 w w w b b b的最优解。
   w w w b b b的最优解计算过程如下:
{ ∂ E ( w , b ) ∂ w = 2 ∑ i = 1 m − x i ( y i − w x i − b ) = 2 ∑ i = 1 m w x i 2 − 2 ∑ i = 1 m x i ( y i − b ) = 0 ( 1 ) ∂ E ( w , b ) ∂ b = 2 ∑ i = 1 m − ( y i − w x i − b ) = 2 m b − 2 ∑ i = 1 m ( y i − w x i ) = 0 ( 2 ) \left\{ \begin{aligned} \Large \frac{\partial E(w, b)}{\partial w}&=\Large 2\sum_{i=1}^m-x_i(y_i-wx_i-b)\\ &=\Large 2 \sum_{i=1}^mwx_i^2-2 \sum_{i=1}^mx_i(y_i-b)\\ &=\Large 0 \quad (1)\\ \Large \frac{\partial E(w, b)}{\partial b}&=\Large 2 \sum_{i=1}^m-(y_i-wx_i-b)\\ &=\Large 2mb-2 \sum_{i=1}^m(y_i-wx_i)\\ &=\Large 0 \quad (2)\\ \end{aligned} \right. wE(w,b)bE(w,b)=2i=1mxi(yiwxib)=2i=1mwxi22i=1mxi(yib)=0(1)=2i=1m(yiwxib)=2mb2i=1m(yiwxi)=0(2)
  把(2)中解出来的 b = 1 m ∑ i = 1 m ( y i − w x i ) \Large \displaystyle b=\frac{1}{m}\sum_{i=1}^m(y_i-wx_i) b=m1i=1m(yiwxi)代入(1):
w ∑ i = 1 m x i 2 − ∑ i = 1 m [ x i y i − x i m ∑ i = 1 m ( y i − w x i ) ] = 0 w ∑ i = 1 m x i 2 − ∑ i = 1 m x i y i + ∑ i = 1 m y i ∑ i = 1 m x i m − w m ( ∑ i = 1 m x i ) 2 = 0 w ( ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) 2 ) = ∑ i = 1 m y i ( x i − ∑ i = 1 m x i m ) w = ∑ i = 1 m y i ( x i − x ‾ ) ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) 2 \begin{aligned} \Large w \sum_{i=1}^mx_i^2-\sum_{i=1}^m[x_iy_i-\frac{x_i}{m}\sum_{i=1}^m(y_i-wx_i)]=\Large 0\\ \Large w \sum_{i=1}^mx_i^2-\sum_{i=1}^mx_iy_i+\sum_{i=1}^my_i\sum_{i=1}^m \frac{x_i}{m}-\frac{w}{m}(\sum_{i=1}^mx_i)^2=\Large 0\\ \Large w(\sum_{i=1}^mx_i^2-\frac{1}{m}(\sum_{i=1}^mx_i)^2)=\Large \sum_{i=1}^my_i(x_i-\sum_{i=1}^m \frac{x_i}{m})\\ \Large w=\Large \frac{\displaystyle \sum_{i=1}^my_i(x_i-\overline{x})}{\displaystyle \sum_{i=1}^mx_i^2-\frac{1}{m}(\sum_{i=1}^mx_i)^2} \end{aligned} wi=1mxi2i=1m[xiyimxii=1m(yiwxi)]=0wi=1mxi2i=1mxiyi+i=1myii=1mmximw(i=1mxi)2=0w(i=1mxi2m1(i=1mxi)2)=i=1myi(xii=1mmxi)w=i=1mxi2m1(i=1mxi)2i=1myi(xix)
  最后 w w w b b b的最优解为
{ w = ∑ i = 1 m y i ( x i − x ‾ ) ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) 2 b = 1 m ∑ i = 1 m ( y i − w x i ) \left\{ \begin{aligned} \Large w&=\Large \frac{\displaystyle \sum_{i=1}^my_i(x_i-\overline{x})}{\displaystyle \sum_{i=1}^mx_i^2-\frac{1}{m}(\sum_{i=1}^mx_i)^2}\\ \Large \displaystyle b&=\Large \frac{1}{m}\sum_{i=1}^m(y_i-wx_i) \end{aligned} \right. wb=i=1mxi2m1(i=1mxi)2i=1myi(xix)=m1i=1m(yiwxi)

3. 多变量线性回归的解析解

  多变量线性回归的模型为 f ( x ⃗ ) = w ⃗ x ⃗ + b f(\vec{x})=\vec{w}\vec{x}+b f(x )=w x +b。令 w ^ = ( w ⃗ ; b ) \hat{w}=(\vec{w};b) w^=(w ;b) x ^ = ( x ⃗ ; 1 ) \hat{x}=(\vec{x};1) x^=(x ;1),将 m × d m \times d m×d的数据集D表示成 m × ( d + 1 ) m \times (d+1) m×(d+1)的矩阵
X = [ x 11 x 12 ⋯ x 1 d 1 x 21 x 22 ⋯ x 2 d 1 ⋮ ⋮ ⋱ ⋮ ⋮ x m 1 x m 2 ⋯ x m d 1 ] . \Large X= \begin{bmatrix} x_{11}& x_{12}& \cdots &x_{1d} &1\\ x_{21}&x_{22} & \cdots & x_{2d}&1\\ \vdots & \vdots & \ddots & \vdots & \vdots\\ x_{m1}&x_{m2} & \cdots & x_{md} & 1 \end{bmatrix}. X= x11x21xm1x12x22xm2x1dx2dxmd111 .
  模型: f ( x ⃗ ) = w ^ x ^ \Large f(\vec{x})=\hat{w}\hat{x} f(x )=w^x^
  优化目标: w ^ ∗ = arg min ⁡ w ^ ( y ⃗ − X w ^ ) T ( y ⃗ − X w ^ ) \Large \hat{w}^*=\argmin\limits_{\hat{w}}(\vec{y}-X\hat{w})^T(\vec{y}-X\hat{w}) w^=w^argmin(y Xw^)T(y Xw^)
   w ^ \hat{w} w^最优值的解析解如下:
∂ ( y ⃗ − X w ^ ) T ( y ⃗ − X w ^ ) ∂ w ^ = 0 ∂ ( y ⃗ T − w ^ T X T ) ( y ⃗ − X w ^ ) ∂ w ^ = 0 ∂ ( y ⃗ T y ⃗ − w ^ T X T y ⃗ − y ⃗ T X w ^ + w ^ T X T X w ^ ) ∂ w ^ = 0 ∂ y ⃗ T y ⃗ ∂ w ^ − ∂ w ^ T X T y ⃗ ∂ w ^ − ∂ y ⃗ T X w ^ ∂ w ^ + ∂ w ^ T X T X w ^ ∂ w ^ = 0 0 − X T y ⃗ − X T y ⃗ + ( X T X + ( X T X ) T ) w ^ = 0 X T X w ^ = X T y ⃗ 如果 X T X 可逆,则 w ^ = ( X T X ) − 1 X T y ⃗ 。 如果 X T X 不可逆,则 w ^ 存在多个解。 \begin{aligned} &\Large \frac{\partial (\vec{y}-X\hat{w})^T(\vec{y}-X\hat{w})}{\partial \hat{w}}=\Large 0\\ &\Large \frac{\partial (\vec{y}^T-\hat{w}^TX^T)(\vec{y}-X\hat{w})}{\partial \hat{w}}=\Large 0\\ &\Large \frac{\partial (\vec{y}^T\vec{y}-\hat{w}^TX^T\vec{y}-\vec{y}^TX\hat{w}+\hat{w}^TX^TX\hat{w})}{\partial \hat{w}}=\Large 0\\ &\Large \frac{\partial \vec{y}^T\vec{y}}{\partial \hat{w}}-\frac{\partial \hat{w}^TX^T\vec{y}}{\partial\hat{w}}-\frac{\partial \vec{y}^TX\hat{w}}{\partial \hat{w}}+\frac{\partial \hat{w}^TX^TX\hat{w}}{\partial\hat{w}}=\Large 0\\ &\Large 0-X^T\vec{y}-X^T\vec{y}+(X^TX+(X^TX)^T)\hat{w}=0\\ &\Large X^TX\hat{w}=X^T\vec{y}\\ \\ &\Large 如果X^TX可逆,则\hat{w}=(X^TX)^{-1}X^T\vec{y}。\\ &\Large 如果X^TX不可逆,则\hat{w}存在多个解。 \end{aligned} w^(y Xw^)T(y Xw^)=0w^(y Tw^TXT)(y Xw^)=0w^(y Ty w^TXTy y TXw^+w^TXTXw^)=0w^y Ty w^w^TXTy w^y TXw^+w^w^TXTXw^=00XTy XTy +(XTX+(XTX)T)w^=0XTXw^=XTy 如果XTX可逆,则w^=(XTX)1XTy 如果XTX不可逆,则w^存在多个解。
∂ w ^ T X T y ⃗ ∂ w ^ = ∂ y ⃗ T X w ^ ∂ w ^ = X T y ⃗ \Large \frac{\partial \hat{w}^TX^T\vec{y}}{\partial \hat{w}}=\Large \frac{\partial \vec{y}^TX\hat{w}}{\partial \hat{w}}=X^T\vec{y} w^w^TXTy =w^y TXw^=XTy ∂ w ^ T X T X w ^ ∂ w ^ = 2 X T X w ^ \Large \frac{\partial \hat{w}^TX^TX\hat{w}}{\partial \hat{w}}=2X^TX\hat{w} w^w^TXTXw^=2XTXw^的计算见4.2.1和4.2.2。

参考

周志华 机器学习

你可能感兴趣的:(数学基础,线性回归,机器学习)