本文采用矩阵求导中的分母布局,即:分子横向,分母纵向
f ( x ) = A T X f(x) = A^{T}X f(x)=ATX,其中 A T = ( a 1 , a 2 , . . . a n ) A^{T} = \begin{pmatrix} a_{1}, & a_{2}, & ... & a_{n} \end{pmatrix} AT=(a1,a2,...an), X T = ( x 1 , x 2 , . . . x n ) X^{T} = \begin{pmatrix} x_{1}, & x_{2}, & ... & x_{n} \end{pmatrix} XT=(x1,x2,...xn)
s o l : f ( x ) = A T X = ∑ i = 1 n a i x i d f ( x ) d x = ( d f ( x ) d x 1 d f ( x ) d x 2 . . . d f ( x ) d x n ) = ( a 1 a 2 . . . a n ) = A sol:~~f(x) =A^{T}X = \sum_{i=1}^{n}a_{i}x_{i} \\ \frac{df(x)}{dx} = \begin{pmatrix} \frac{df(x)}{dx_{1}} \\ \frac{df(x)}{dx_{2}}\\ ... \\ \frac{df(x)}{dx_{n}} \end{pmatrix} = \begin{pmatrix} a_{1} \\ a_{2} \\ ... \\ a_{n} \end{pmatrix} = A sol: f(x)=ATX=i=1∑naixidxdf(x)=⎝⎜⎜⎜⎛dx1df(x)dx2df(x)...dxndf(x)⎠⎟⎟⎟⎞=⎝⎜⎜⎛a1a2...an⎠⎟⎟⎞=A
所以: d A T X d x = d X T A d x = A \frac{dA^{T}X}{dx} = \frac{dX^{T}A}{dx} = A dxdATX=dxdXTA=A
f ( x ) = X T A X f(x) = X^{T}AX f(x)=XTAX,其中 X T = ( x 1 , x 2 , . . . x n ) X^{T} = \begin{pmatrix} x_{1}, & x_{2}, & ... & x_{n} \end{pmatrix} XT=(x1,x2,...xn), A = ( a 11 a 12 . . . a 1 n a 21 a 22 . . . a 2 n . . . . . . . . . . . . a n 1 a n 2 . . . a n n ) A = \begin{pmatrix} a_{11} & a_{12} & ... & a_{1n} \\ a_{21} & a_{22} & ... & a_{2n} \\ ... & ... & ... & ... \\ a_{n1} & a_{n2} & ... & a_{nn} \end{pmatrix} A=⎝⎜⎜⎛a11a21...an1a12a22...an2............a1na2n...ann⎠⎟⎟⎞
s o l : f ( x ) = X T A X = ( x 1 , x 2 , . . . x n ) ( a 11 a 12 . . . a 1 n a 21 a 22 . . . a 2 n . . . . . . . . . . . . a n 1 a n 2 . . . a n n ) ( x 1 x 2 . . . x n ) = ∑ i = 1 n ∑ j = 1 n a i j x i x j sol:~f(x) = X^{T}AX = \begin{pmatrix} x_{1}, & x_{2}, & ... & x_{n} \end{pmatrix} \begin{pmatrix} a_{11} & a_{12} & ... & a_{1n} \\ a_{21} & a_{22} & ... & a_{2n} \\ ... & ... & ... & ... \\ a_{n1} & a_{n2} & ... & a_{nn} \end{pmatrix} \begin{pmatrix} x_{1} \\ x_{2} \\ ... \\ x_{n} \end{pmatrix} = \sum_{i=1}^{n}\sum_{j=1}^{n}a_{ij}x_{i}x_{j} sol: f(x)=XTAX=(x1,x2,...xn)⎝⎜⎜⎛a11a21...an1a12a22...an2............a1na2n...ann⎠⎟⎟⎞⎝⎜⎜⎛x1x2...xn⎠⎟⎟⎞=i=1∑nj=1∑naijxixj
化简得
d f ( x ) d x = ( d f ( x ) d x 1 d f ( x ) d x 2 . . . d f ( x ) d x n ) = ( ∑ j = 1 n a 1 j x j + ∑ i = 1 n a i 1 x i ∑ j = 1 n a 2 j x j + ∑ i = 1 n a i 2 x i . . . ∑ j = 1 n a n j x j + ∑ i = 1 n a i n x i ) = ( ∑ j = 1 n a 1 j x j ∑ j = 1 n a 2 j x j . . . ∑ j = 1 n a n j x j ) + ( ∑ i = 1 n a i 1 x i ∑ i = 1 n a i 2 x i . . . ∑ i = 1 n a i n x i ) = A X + A T X \frac{df(x)}{dx} = \begin{pmatrix} \frac{df(x)}{dx_{1}} \\ \frac{df(x)}{dx_{2}}\\ ... \\ \frac{df(x)}{dx_{n}} \end{pmatrix} = \begin{pmatrix} \sum_{j=1}^{n}a_{1j}x_{j}+\sum_{i=1}^{n}a_{i1}x_{i} \\ \sum_{j=1}^{n}a_{2j}x_{j}+\sum_{i=1}^{n}a_{i2}x_{i} \\ ... \\ \sum_{j=1}^{n}a_{nj}x_{j}+\sum_{i=1}^{n}a_{in}x_{i} \end{pmatrix} \\ ~\\ = \begin{pmatrix} \sum_{j=1}^{n}a_{1j}x_{j} \\ \sum_{j=1}^{n}a_{2j}x_{j} \\ ... \\ \sum_{j=1}^{n}a_{nj}x_{j} \end{pmatrix} + \begin{pmatrix} \sum_{i=1}^{n}a_{i1}x_{i} \\ \sum_{i=1}^{n}a_{i2}x_{i} \\ ... \\ \sum_{i=1}^{n}a_{in}x_{i} \end{pmatrix} = AX+A^{T}X dxdf(x)=⎝⎜⎜⎜⎛dx1df(x)dx2df(x)...dxndf(x)⎠⎟⎟⎟⎞=⎝⎜⎜⎛∑j=1na1jxj+∑i=1nai1xi∑j=1na2jxj+∑i=1nai2xi...∑j=1nanjxj+∑i=1nainxi⎠⎟⎟⎞ =⎝⎜⎜⎛∑j=1na1jxj∑j=1na2jxj...∑j=1nanjxj⎠⎟⎟⎞+⎝⎜⎜⎛∑i=1nai1xi∑i=1nai2xi...∑i=1nainxi⎠⎟⎟⎞=AX+ATX
所以: d X T A X d x = A X + A T X \frac{dX^{T}AX}{dx} = AX+A^{T}X dxdXTAX=AX+ATX
从上面两个例子中可以得到两个结论:
- d A T X d x = d X T A d x = A \frac{dA^{T}X}{dx} = \frac{dX^{T}A}{dx} = A dxdATX=dxdXTA=A
- d X T A X d x = A X + A T X \frac{dX^{T}AX}{dx} = AX+A^{T}X dxdXTAX=AX+ATX
接下来我们会用到上面的两个结论
各个参数形式如下:
Y = ( y 1 y 2 . . . y n ) n × 1 X = ( x 1 T x 2 T . . . x n T ) n × p w = ( w 1 w 2 . . . w n ) p × 1 Y = \begin{pmatrix} y_{1} \\ y_{2} \\ ... \\ y_{n} \end{pmatrix}_{n\times 1}~~~ X = \begin{pmatrix} x_{1}^{T} \\ x_{2}^{T} \\ ... \\ x_{n}^{T} \end{pmatrix}_{n\times p}~~~ w = \begin{pmatrix} w_{1} \\ w_{2} \\ ... \\ w_{n} \end{pmatrix}_{p\times 1} Y=⎝⎜⎜⎛y1y2...yn⎠⎟⎟⎞n×1 X=⎝⎜⎜⎛x1Tx2T...xnT⎠⎟⎟⎞n×p w=⎝⎜⎜⎛w1w2...wn⎠⎟⎟⎞p×1
将最小二乘表示成矩阵相乘的形式
L ( w ) = ∑ i = 1 n ( y i − x i T w ) 2 = ∣ ∣ Y − X w ∣ ∣ 2 = ( Y − X w ) T ( Y − X w ) = ( Y T − w T X T ) ( Y − X w ) = ( Y T Y − Y T X w − w T X T Y + w T X T X w ) \begin{aligned} L(w) & = \sum_{i=1}^{n}(y_{i}-x_{i}^{T}w)^{2} \\ & = ||Y-Xw||^{2} \\ & = (Y-Xw)^{T}(Y-Xw) \\ & = (Y^{T}-w^{T}X^{T})(Y-Xw) \\ & = (Y^{T}Y-Y^{T}Xw-w^{T}X^{T}Y+w^{T}X^{T}Xw) \end{aligned} L(w)=i=1∑n(yi−xiTw)2=∣∣Y−Xw∣∣2=(Y−Xw)T(Y−Xw)=(YT−wTXT)(Y−Xw)=(YTY−YTXw−wTXTY+wTXTXw)
对上述形式的矩阵求导得到最终的结果
L ( w ) d w = d ( Y T Y ) d w − d ( Y T X w ) d w − d ( w T X T Y ) d w + d ( w T X T X w ) d w = 0 − X T Y − X T Y + 2 X T X w = 0 \begin{aligned} \frac{L(w)}{dw} & = \frac{d(Y^{T}Y)}{dw} - \frac{d(Y^{T}Xw)}{dw} - \frac{d(w^{T}X^{T}Y)}{dw} + \frac{d(w^{T}X^{T}Xw)}{dw} \\ ~~ \\ & = 0 - X^{T}Y - X^{T}Y + 2X^{T}Xw \\ & = 0 \end{aligned} dwL(w) =dwd(YTY)−dwd(YTXw)−dwd(wTXTY)+dwd(wTXTXw)=0−XTY−XTY+2XTXw=0
整理得: − X T Y − X T Y + 2 X T X w = 0 , w ∗ = ( X T X ) − 1 X T Y -X^{T}Y - X^{T}Y + 2X^{T}Xw = 0, w^{*} = (X^{T}X)^{-1}X^{T}Y −XTY−XTY+2XTXw=0,w∗=(XTX)−1XTY ,将 w ∗ w^{*} w∗带入原式
X w ∗ = X ( X T X ) − 1 X T Y = Y ^ = H ^ Y H ^ = X ( X T X ) − 1 X T Xw^{*} = X (X^{T}X)^{-1}X^{T}Y = \hat Y = \hat H Y \\ \hat H = X (X^{T}X)^{-1}X^{T} Xw∗=X(XTX)−1XTY=Y^=H^YH^=X(XTX)−1XT
各个参数形式与没有加权的回归一致
将最小二乘表示成矩阵相乘的形式
L ( w ) = ∑ i = 1 n r i ( y i − x i T w ) 2 = r ∣ ∣ Y − X w ∣ ∣ 2 = ( Y − X w ) T r ( Y − X w ) = ( Y T − w T X T ) r ( Y − X w ) = ( Y T r Y − Y T r X w − w T X T r Y + w T X T r X w ) \begin{aligned} L(w) & = \sum_{i=1}^{n}r_{i}(y_{i}-x_{i}^{T}w)^{2} \\ & = r||Y-Xw||^{2} \\ & = (Y-Xw)^{T}r(Y-Xw) \\ & = (Y^{T}-w^{T}X^{T})r(Y-Xw) \\ & = (Y^{T}rY-Y^{T}rXw-w^{T}X^{T}rY+w^{T}X^{T}rXw) \end{aligned} L(w)=i=1∑nri(yi−xiTw)2=r∣∣Y−Xw∣∣2=(Y−Xw)Tr(Y−Xw)=(YT−wTXT)r(Y−Xw)=(YTrY−YTrXw−wTXTrY+wTXTrXw)
对上述形式的矩阵求导得到最终的结果
L ( w ) d w = d ( Y T r Y ) d w − d ( Y T r X w ) d w − d ( w T X T r Y ) d w + d ( w T X T r X w ) d w = 0 − X T r Y − X T r Y + 2 X T r X w = 0 \begin{aligned} \frac{L(w)}{dw} & = \frac{d(Y^{T}rY)}{dw} - \frac{d(Y^{T}rXw)}{dw} - \frac{d(w^{T}X^{T}rY)}{dw} + \frac{d(w^{T}X^{T}rXw)}{dw} \\ ~~ \\ & = 0 - X^{T}rY - X^{T}rY + 2X^{T}rXw \\ & = 0 \end{aligned} dwL(w) =dwd(YTrY)−dwd(YTrXw)−dwd(wTXTrY)+dwd(wTXTrXw)=0−XTrY−XTrY+2XTrXw=0
整理得: − X T r Y − X T r Y + 2 X T r X w = 0 , w ∗ = ( X T r X ) − 1 X T r Y -X^{T}rY - X^{T}rY + 2X^{T}rXw = 0, w^{*} = (X^{T}rX)^{-1}X^{T}rY −XTrY−XTrY+2XTrXw=0,w∗=(XTrX)−1XTrY ,将 w ∗ w^{*} w∗带入原式
X w ∗ = X ( X T r X ) − 1 X T r Y = Y ^ = H ^ Y H ^ = X ( X T r X ) − 1 X T r Xw^{*} = X(X^{T}rX)^{-1}X^{T}rY = \hat Y = \hat H Y \\ \hat H = X(X^{T}rX)^{-1}X^{T}r Xw∗=X(XTrX)−1XTrY=Y^=H^YH^=X(XTrX)−1XTr