机器学习之回归——标准最小二乘法回归的数学推导过程(矩阵形式)

均方误差函数: f ( w ) = ∑ i = 1 m ( y i − x i T w ) 2 f(w) = { \sum_{i=1}^m {(y_i - x_i^Tw)^2} } f(w)=i=1m(yixiTw)2
f(w)分别对 w 1 , w 2 , . . . w d w_1, w_2, ...w_d w1,w2,...wd求偏导
∂ f ( w ) ∂ w 1 = 2 ( y 1 − x 1 T ⋅ w ) ⋅ x 1 1 + 2 ( y 2 − x 2 T ⋅ w ) ⋅ x 2 1 + ⋯ + 2 ( y m − x m T ⋅ w ) ⋅ x m 1 {{\partial f(w)}\over\partial{w_1} } = 2(y_1 - x_1^T\cdot w)\cdot x_1^1 + 2(y_2 - x_2^T\cdot w)\cdot x_2^1+\cdots +2(y_m - x_m^T\cdot w)\cdot x_m^1 w1f(w)=2(y1x1Tw)x11+2(y2x2Tw)x21++2(ymxmTw)xm1
∂ f ( w ) ∂ w 2 = 2 ( y 1 − x 1 T ⋅ w ) ⋅ x 1 2 + 2 ( y 2 − x 2 T ⋅ w ) ⋅ x 2 2 + ⋯ + 2 ( y m − x m T ⋅ w ) ⋅ x m 2 {{\partial f(w)}\over\partial{w_2} } = 2(y_1 - x_1^T\cdot w)\cdot x_1^2 + 2(y_2 - x_2^T\cdot w)\cdot x_2^2+\cdots +2(y_m- x_m^T\cdot w)\cdot x_m^2 w2f(w)=2(y1x1Tw)x12+2(y2x2Tw)x22++2(ymxmTw)xm2
⋯ \cdots
∂ f ( w ) ∂ w d = 2 ( y 1 − x 1 T ⋅ w ) ⋅ x 1 d + 2 ( y 2 − x 2 T ⋅ w ) ⋅ x 2 d + ⋯ + 2 ( y m − x m T ⋅ w ) ⋅ x m d {{\partial f(w)}\over\partial{w_d} } = 2(y_1 - x_1^T\cdot w)\cdot x_1^d + 2(y_2 - x_2^T\cdot w)\cdot x_2^d+\cdots +2(y_m- x_m^T\cdot w)\cdot x_m^d wdf(w)=2(y1x1Tw)x1d+2(y2x2Tw)x2d++2(ymxmTw)xmd
注意我们的样本矩阵X的形式: X = [ x 1 1 x 1 2 ⋯ x 1 d x 2 1 x 2 2 ⋯ x 2 d ⋯ ⋯ ⋯ ⋯ x m 1 x m 2 ⋯ x m d ] X =\left[ { \begin{matrix} x_1^1 & x_1^2 & \cdots &x_1^d \\ x_2^1 & x_2^2 & \cdots &x_2^d \\ \cdots&\cdots&\cdots&\cdots \\ x_m^1 & x_m^2 & \cdots &x_m^d \end{matrix} }\right] X=x11x21xm1x12x22xm2x1dx2dxmd
还有真实值Y矩阵(向量):
Y = [ y 1 y 2 ⋯ y m ] Y=\left[ {\begin{matrix} y_1\\y_2\\ \cdots \\ y_m \end{matrix}} \right] Y=y1y2ym
我们把各个偏导写成矩阵形式,也就是梯度向量:

[ w 1 ′ w 2 ′ ⋯ w d ′ ] = 2 [ x 1 1 x 2 1 ⋯ x m 1 x 1 2 x 2 2 ⋯ x m 2 ⋯ ⋯ ⋯ ⋯ x 1 d x 2 d ⋯ x m d ] ⋅ ( [ y 1 y 2 ⋯ y m ] − [ x 1 1 x 1 2 ⋯ x 1 d x 2 1 x 2 2 ⋯ x 2 d ⋯ ⋯ ⋯ ⋯ x m 1 x m 2 ⋯ x m d ] ⋅ [ w 1 w 2 ⋯ w d ] ) = 2 X T ( Y − X ⋅ w ) \left[ { \begin{matrix} w_1' \\w_2'\\\cdots\\w_d'\end{matrix} } \right]=2 \left[ { \begin{matrix} x_1^1 & x_2^1 & \cdots &x_m^1 \\ x_1^2 & x_2^2 & \cdots &x_m^2 \\ \cdots&\cdots&\cdots&\cdots \\ x_1^d & x_2^d & \cdots &x_m^d \end{matrix} }\right] \cdot \left(\left[ {\begin{matrix}y_1\\y_2\\\cdots \\y_m\end{matrix}} \right] - \left[ { \begin{matrix} x_1^1 & x_1^2 & \cdots &x_1^d \\ x_2^1 & x_2^2 & \cdots &x_2^d \\ \cdots&\cdots&\cdots&\cdots \\ x_m^1 & x_m^2 & \cdots &x_m^d \end{matrix} }\right]\cdot \left[ \begin{matrix} w_1\\w_2\\\cdots\\w_d\end{matrix}\right] \right) = 2X^T (Y-X\cdot w) w1w2wd=2x11x12x1dx21x22x2dxm1xm2xmdy1y2ymx11x21xm1x12x22xm2x1dx2dxmdw1w2wd=2XT(YXw)
我们可以使用已有的数据求出梯度向量,所以就可以使用梯度下降法优化w,使f(w)最小。不过,我们像求函数极值一样,让所有偏导都等于0,可以得到:
2 X T ( Y − X w ) = 2 X T Y − 2 X T X w = 0 2X^T(Y - X w) = 2X^T Y - 2X^T X w = 0 2XT(YXw)=2XTY2XTXw=0
2 X T X w = 2 X T Y 2X^TXw =2 X^TY 2XTXw=2XTY
w ^ = ( X T X ) − 1 X T Y \hat{w} = (X^TX)^{-1}X^TY w^=(XTX)1XTY
但是这里的w不是真实w,只是利用统计数据估计出来的,所以使用 w ^ \hat{w} w^表示w的最佳值。
值得注意的是, X T X X^TX XTX并不总是可逆,所以在优化函数中,要添加可逆判断增加优化函数健壮性。矩阵奇异值分解也可以解决此问题。

你可能感兴趣的:(机器学习之回归——标准最小二乘法回归的数学推导过程(矩阵形式))