最小二乘法矩阵求导

本文目录

    • 一、求导法则
    • 二、两个常用例子
      • 例子1
      • 例子2
    • 三、最小二乘法
      • 1. 没有加权的回归
      • 2. 加权回归

一、求导法则

本文采用矩阵求导中的分母布局,即:分子横向,分母纵向

  • 乘法公式: d v T u d x = d u d x v + d v d x u \frac{dv^{T}u}{dx} = \frac{du}{dx}v + \frac{dv}{dx}u dxdvTu=dxduv+dxdvu
  • 加法公式: d ( u + v ) d x = d u d x + d v d x \frac{d(u+v)}{dx} = \frac{du}{dx} + \frac{dv}{dx} dxd(u+v)=dxdu+dxdv

二、两个常用例子

例子1

f ( x ) = A T X f(x) = A^{T}X f(x)=ATX,其中 A T = ( a 1 , a 2 , . . . a n ) A^{T} = \begin{pmatrix} a_{1}, & a_{2}, & ... & a_{n} \end{pmatrix} AT=(a1,a2,...an) X T = ( x 1 , x 2 , . . . x n ) X^{T} = \begin{pmatrix} x_{1}, & x_{2}, & ... & x_{n} \end{pmatrix} XT=(x1,x2,...xn)

s o l :    f ( x ) = A T X = ∑ i = 1 n a i x i d f ( x ) d x = ( d f ( x ) d x 1 d f ( x ) d x 2 . . . d f ( x ) d x n ) = ( a 1 a 2 . . . a n ) = A sol:~~f(x) =A^{T}X = \sum_{i=1}^{n}a_{i}x_{i} \\ \frac{df(x)}{dx} = \begin{pmatrix} \frac{df(x)}{dx_{1}} \\ \frac{df(x)}{dx_{2}}\\ ... \\ \frac{df(x)}{dx_{n}} \end{pmatrix} = \begin{pmatrix} a_{1} \\ a_{2} \\ ... \\ a_{n} \end{pmatrix} = A sol:  f(x)=ATX=i=1naixidxdf(x)=dx1df(x)dx2df(x)...dxndf(x)=a1a2...an=A

所以: d A T X d x = d X T A d x = A \frac{dA^{T}X}{dx} = \frac{dX^{T}A}{dx} = A dxdATX=dxdXTA=A

例子2

f ( x ) = X T A X f(x) = X^{T}AX f(x)=XTAX,其中 X T = ( x 1 , x 2 , . . . x n ) X^{T} = \begin{pmatrix} x_{1}, & x_{2}, & ... & x_{n} \end{pmatrix} XT=(x1,x2,...xn) A = ( a 11 a 12 . . . a 1 n a 21 a 22 . . . a 2 n . . . . . . . . . . . . a n 1 a n 2 . . . a n n ) A = \begin{pmatrix} a_{11} & a_{12} & ... & a_{1n} \\ a_{21} & a_{22} & ... & a_{2n} \\ ... & ... & ... & ... \\ a_{n1} & a_{n2} & ... & a_{nn} \end{pmatrix} A=a11a21...an1a12a22...an2............a1na2n...ann

s o l :   f ( x ) = X T A X = ( x 1 , x 2 , . . . x n ) ( a 11 a 12 . . . a 1 n a 21 a 22 . . . a 2 n . . . . . . . . . . . . a n 1 a n 2 . . . a n n ) ( x 1 x 2 . . . x n ) = ∑ i = 1 n ∑ j = 1 n a i j x i x j sol:~f(x) = X^{T}AX = \begin{pmatrix} x_{1}, & x_{2}, & ... & x_{n} \end{pmatrix} \begin{pmatrix} a_{11} & a_{12} & ... & a_{1n} \\ a_{21} & a_{22} & ... & a_{2n} \\ ... & ... & ... & ... \\ a_{n1} & a_{n2} & ... & a_{nn} \end{pmatrix} \begin{pmatrix} x_{1} \\ x_{2} \\ ... \\ x_{n} \end{pmatrix} = \sum_{i=1}^{n}\sum_{j=1}^{n}a_{ij}x_{i}x_{j} sol: f(x)=XTAX=(x1,x2,...xn)a11a21...an1a12a22...an2............a1na2n...annx1x2...xn=i=1nj=1naijxixj

化简得

d f ( x ) d x = ( d f ( x ) d x 1 d f ( x ) d x 2 . . . d f ( x ) d x n ) = ( ∑ j = 1 n a 1 j x j + ∑ i = 1 n a i 1 x i ∑ j = 1 n a 2 j x j + ∑ i = 1 n a i 2 x i . . . ∑ j = 1 n a n j x j + ∑ i = 1 n a i n x i )   = ( ∑ j = 1 n a 1 j x j ∑ j = 1 n a 2 j x j . . . ∑ j = 1 n a n j x j ) + ( ∑ i = 1 n a i 1 x i ∑ i = 1 n a i 2 x i . . . ∑ i = 1 n a i n x i ) = A X + A T X \frac{df(x)}{dx} = \begin{pmatrix} \frac{df(x)}{dx_{1}} \\ \frac{df(x)}{dx_{2}}\\ ... \\ \frac{df(x)}{dx_{n}} \end{pmatrix} = \begin{pmatrix} \sum_{j=1}^{n}a_{1j}x_{j}+\sum_{i=1}^{n}a_{i1}x_{i} \\ \sum_{j=1}^{n}a_{2j}x_{j}+\sum_{i=1}^{n}a_{i2}x_{i} \\ ... \\ \sum_{j=1}^{n}a_{nj}x_{j}+\sum_{i=1}^{n}a_{in}x_{i} \end{pmatrix} \\ ~\\ = \begin{pmatrix} \sum_{j=1}^{n}a_{1j}x_{j} \\ \sum_{j=1}^{n}a_{2j}x_{j} \\ ... \\ \sum_{j=1}^{n}a_{nj}x_{j} \end{pmatrix} + \begin{pmatrix} \sum_{i=1}^{n}a_{i1}x_{i} \\ \sum_{i=1}^{n}a_{i2}x_{i} \\ ... \\ \sum_{i=1}^{n}a_{in}x_{i} \end{pmatrix} = AX+A^{T}X dxdf(x)=dx1df(x)dx2df(x)...dxndf(x)=j=1na1jxj+i=1nai1xij=1na2jxj+i=1nai2xi...j=1nanjxj+i=1nainxi =j=1na1jxjj=1na2jxj...j=1nanjxj+i=1nai1xii=1nai2xi...i=1nainxi=AX+ATX

所以: d X T A X d x = A X + A T X \frac{dX^{T}AX}{dx} = AX+A^{T}X dxdXTAX=AX+ATX

从上面两个例子中可以得到两个结论:

  • d A T X d x = d X T A d x = A \frac{dA^{T}X}{dx} = \frac{dX^{T}A}{dx} = A dxdATX=dxdXTA=A
  • d X T A X d x = A X + A T X \frac{dX^{T}AX}{dx} = AX+A^{T}X dxdXTAX=AX+ATX

接下来我们会用到上面的两个结论

三、最小二乘法

1. 没有加权的回归

各个参数形式如下:

Y = ( y 1 y 2 . . . y n ) n × 1     X = ( x 1 T x 2 T . . . x n T ) n × p     w = ( w 1 w 2 . . . w n ) p × 1 Y = \begin{pmatrix} y_{1} \\ y_{2} \\ ... \\ y_{n} \end{pmatrix}_{n\times 1}~~~ X = \begin{pmatrix} x_{1}^{T} \\ x_{2}^{T} \\ ... \\ x_{n}^{T} \end{pmatrix}_{n\times p}~~~ w = \begin{pmatrix} w_{1} \\ w_{2} \\ ... \\ w_{n} \end{pmatrix}_{p\times 1} Y=y1y2...ynn×1   X=x1Tx2T...xnTn×p   w=w1w2...wnp×1

将最小二乘表示成矩阵相乘的形式

L ( w ) = ∑ i = 1 n ( y i − x i T w ) 2 = ∣ ∣ Y − X w ∣ ∣ 2 = ( Y − X w ) T ( Y − X w ) = ( Y T − w T X T ) ( Y − X w ) = ( Y T Y − Y T X w − w T X T Y + w T X T X w ) \begin{aligned} L(w) & = \sum_{i=1}^{n}(y_{i}-x_{i}^{T}w)^{2} \\ & = ||Y-Xw||^{2} \\ & = (Y-Xw)^{T}(Y-Xw) \\ & = (Y^{T}-w^{T}X^{T})(Y-Xw) \\ & = (Y^{T}Y-Y^{T}Xw-w^{T}X^{T}Y+w^{T}X^{T}Xw) \end{aligned} L(w)=i=1n(yixiTw)2=YXw2=(YXw)T(YXw)=(YTwTXT)(YXw)=(YTYYTXwwTXTY+wTXTXw)

对上述形式的矩阵求导得到最终的结果

L ( w ) d w = d ( Y T Y ) d w − d ( Y T X w ) d w − d ( w T X T Y ) d w + d ( w T X T X w ) d w    = 0 − X T Y − X T Y + 2 X T X w = 0 \begin{aligned} \frac{L(w)}{dw} & = \frac{d(Y^{T}Y)}{dw} - \frac{d(Y^{T}Xw)}{dw} - \frac{d(w^{T}X^{T}Y)}{dw} + \frac{d(w^{T}X^{T}Xw)}{dw} \\ ~~ \\ & = 0 - X^{T}Y - X^{T}Y + 2X^{T}Xw \\ & = 0 \end{aligned} dwL(w)  =dwd(YTY)dwd(YTXw)dwd(wTXTY)+dwd(wTXTXw)=0XTYXTY+2XTXw=0

整理得: − X T Y − X T Y + 2 X T X w = 0 , w ∗ = ( X T X ) − 1 X T Y -X^{T}Y - X^{T}Y + 2X^{T}Xw = 0, w^{*} = (X^{T}X)^{-1}X^{T}Y XTYXTY+2XTXw=0w=(XTX)1XTY ,将 w ∗ w^{*} w带入原式

X w ∗ = X ( X T X ) − 1 X T Y = Y ^ = H ^ Y H ^ = X ( X T X ) − 1 X T Xw^{*} = X (X^{T}X)^{-1}X^{T}Y = \hat Y = \hat H Y \\ \hat H = X (X^{T}X)^{-1}X^{T} Xw=X(XTX)1XTY=Y^=H^YH^=X(XTX)1XT

2. 加权回归

各个参数形式与没有加权的回归一致
将最小二乘表示成矩阵相乘的形式

L ( w ) = ∑ i = 1 n r i ( y i − x i T w ) 2 = r ∣ ∣ Y − X w ∣ ∣ 2 = ( Y − X w ) T r ( Y − X w ) = ( Y T − w T X T ) r ( Y − X w ) = ( Y T r Y − Y T r X w − w T X T r Y + w T X T r X w ) \begin{aligned} L(w) & = \sum_{i=1}^{n}r_{i}(y_{i}-x_{i}^{T}w)^{2} \\ & = r||Y-Xw||^{2} \\ & = (Y-Xw)^{T}r(Y-Xw) \\ & = (Y^{T}-w^{T}X^{T})r(Y-Xw) \\ & = (Y^{T}rY-Y^{T}rXw-w^{T}X^{T}rY+w^{T}X^{T}rXw) \end{aligned} L(w)=i=1nri(yixiTw)2=rYXw2=(YXw)Tr(YXw)=(YTwTXT)r(YXw)=(YTrYYTrXwwTXTrY+wTXTrXw)

对上述形式的矩阵求导得到最终的结果

L ( w ) d w = d ( Y T r Y ) d w − d ( Y T r X w ) d w − d ( w T X T r Y ) d w + d ( w T X T r X w ) d w    = 0 − X T r Y − X T r Y + 2 X T r X w = 0 \begin{aligned} \frac{L(w)}{dw} & = \frac{d(Y^{T}rY)}{dw} - \frac{d(Y^{T}rXw)}{dw} - \frac{d(w^{T}X^{T}rY)}{dw} + \frac{d(w^{T}X^{T}rXw)}{dw} \\ ~~ \\ & = 0 - X^{T}rY - X^{T}rY + 2X^{T}rXw \\ & = 0 \end{aligned} dwL(w)  =dwd(YTrY)dwd(YTrXw)dwd(wTXTrY)+dwd(wTXTrXw)=0XTrYXTrY+2XTrXw=0

整理得: − X T r Y − X T r Y + 2 X T r X w = 0 , w ∗ = ( X T r X ) − 1 X T r Y -X^{T}rY - X^{T}rY + 2X^{T}rXw = 0, w^{*} = (X^{T}rX)^{-1}X^{T}rY XTrYXTrY+2XTrXw=0w=(XTrX)1XTrY ,将 w ∗ w^{*} w带入原式

X w ∗ = X ( X T r X ) − 1 X T r Y = Y ^ = H ^ Y H ^ = X ( X T r X ) − 1 X T r Xw^{*} = X(X^{T}rX)^{-1}X^{T}rY = \hat Y = \hat H Y \\ \hat H = X(X^{T}rX)^{-1}X^{T}r Xw=X(XTrX)1XTrY=Y^=H^YH^=X(XTrX)1XTr

你可能感兴趣的:(线性代数,算法,概率论)