矩阵求导之二

上一篇:https://blog.csdn.net/m0_37567738/article/details/133444201?spm=1001.2014.3001.5502

参考网址:https://zhuanlan.zhihu.com/p/262751195

在机器学习的算法推导里,通常遵循以下布局的规范:

  1. 如果向量或者矩阵对标量求导,则以分子布局为准。
  2. 如果标量对向量或者矩阵求导,则以分母布局为准。
  3. 对于向量对对向量求导,一般以分子布局的雅克比矩阵为主,即结果是一个矩阵。
  4. 分子布局和分母布局的结果相差一个转置。

t r ( A B ) = t r ( B A ) tr(AB) = tr(BA) tr(AB)=tr(BA)

t r ( A T B ) = ∑ i , j A i j B i j , 即 t r ( A T B ) 是矩阵 A , B 的内积 ( 或卷积和 ) (2.1) tr(A^TB) = \sum_{i,j}A_{ij}B_{ij},即tr(A^TB)是矩阵A,B的内积(或卷积和)\tag{2.1} tr(ATB)=i,jAijBij,tr(ATB)是矩阵AB的内积(或卷积和)(2.1)

d f = ∑ i = 1 n ∂ f ∂ x i d x i = ∂ f ∂ x T d x (2.2) df =\sum_{i=1}^{n}\frac{\partial f}{\partial x_{i}}dx_{i} =\frac{\partial f}{\partial x } ^T d x\tag{2.2} df=i=1nxifdxi=xfTdx(2.2)

将{2.2}中的向量推广到矩阵,由{2.1}和{2.2}可得:
d f = ∑ i = 1 m ∑ j = 1 n ∂ f ∂ X i j d X i j = t r ( ∂ f ∂ X T d X ) df = \sum_{i=1}^{m}\sum_{j=1}^{n}\frac{\partial f}{\partial X_{ij}}dX_{ij} =tr(\frac{\partial f}{\partial X } ^T d X) df=i=1mj=1nXijfdXij=tr(XfTdX)

上述微分法,是对矩阵求导数的基本思想。

例1:
若 y = a T X b , 其中 y 是标量 , a 是 m 维向量, b 是 n 维向量, X 是 m ∗ n 维矩阵,求 ∂ y ∂ X 若y = a^TXb,其中y是标量,a是m维向量,b是n维向量,X是m*n维矩阵,求\frac{\partial y}{\partial X} y=aTXb,其中y是标量,am维向量,bn维向量,Xmn维矩阵,求Xy

解法1:
按照分母布局可得:
∂ y ∂ X = a b T \frac{\partial y}{\partial X} = ab^T Xy=abT

解法2:
d f = d a T X b + a T d X b + a T X d b = a T d X b df = da^T Xb + a^TdXb + a^TXdb = a^TdXb df=daTXb+aTdXb+aTXdb=aTdXb

d f = t r ( a T d X b ) = t r ( ( a b T ) T d X ) df = tr(a^TdXb)=tr((ab^T)^T dX) df=tr(aTdXb)=tr((abT)TdX)

∂ f ∂ X = a b T \frac{\partial f}{\partial X} = ab^T Xf=abT

例子2:
d A − 1 d A^{-1} dA1

解:
A − 1 A = I = > d ( A − 1 A ) = d ( A − 1 ) A + A − 1 d A = d I = > d ( A − 1 ) A = − A − 1 d A = > d ( A − 1 ) = − A − 1 d A A − 1 A^{-1} A = I => \\ d (A^{-1} A) = d(A^{-1})A + A^{-1}dA = dI=>\\ d(A^{-1})A = - A^{-1}dA =>\\ d(A^{-1}) = - A^{-1}dA A^{-1} A1A=I=>d(A1A)=d(A1)A+A1dA=dI=>d(A1)A=A1dA=>d(A1)=A1dAA1

或者:

A A − 1 = I = > d ( A A − 1 ) = d A A − 1 + A d ( A − 1 ) = d I = > A d ( A − 1 ) = − d A A − 1 = > d ( A − 1 ) = − A − 1 d A A − 1 A A^{-1} = I => \\ d (A A^{-1} ) = dA A^{-1} + A d(A^{-1}) = dI=>\\ A d(A^{-1})= -dA A^{-1}=>\\ d(A^{-1}) = - A^{-1}dA A^{-1} AA1=I=>d(AA1)=dAA1+Ad(A1)=dI=>Ad(A1)=dAA1=>d(A1)=A1dAA1

例3: 求d (detA)

解:

d ∣ X ∣ = ∣ X ∣ t r ( X − 1 d X ) = t r ( X ∗ d X ) d|X| = |X|tr(X^{-1}dX) = tr(X^*dX) dX=Xtr(X1dX)=tr(XdX)

行列式求导公式推导:https://www.cnblogs.com/analysis101/p/14677671.html

https://jingyan.baidu.com/article/a501d80cb6ef00ac620f5e21.html

例4:
f ( x ) = ∣ x + a a a a x + a a a a x + a ∣ ,求 f ′ ( x ) f(x) = \begin {vmatrix} x+ a & a& a \\ a & x+a & a\\a & a & x+a \end{vmatrix},求f'(x) f(x)= x+aaaax+aaaax+a ,求f(x)

解:
f ′ ( x ) = 3 ∣ x + a a a x + a ∣ = 3 x 2 + 6 a x f'(x) = 3 \begin {vmatrix} x+ a & a\\a&x+a\end{vmatrix} = 3x^2+6ax f(x)=3 x+aaax+a =3x2+6ax

例5:

验证 d ( A B ) = d A B + A d B d(AB) = dA B + AdB d(AB)=dAB+AdB

解:

设A= { f 1 f 2 f 3 f 4 } \begin {Bmatrix} f_1 & f_2\\f_3& f_4\end{Bmatrix} {f1f3f2f4},B= { g 1 g 2 g 3 g 4 } \begin {Bmatrix} g_1 & g_2\\g_3& g_4\end{Bmatrix} {g1g3g2g4}
则A’= { f 1 ′ f 2 ′ f 3 ′ f 4 ′ } \begin {Bmatrix} f^{'}_1 & f^{'}_2\\f^{'}_3& f^{'}_4\end{Bmatrix} {f1f3f2f4},B’= { g 1 ′ g 2 ′ g 3 ′ g 4 ′ } \begin {Bmatrix} g^{'}_1 & g^{'}_2\\g^{'}_3& g^{'}_4\end{Bmatrix} {g1g3g2g4}

d A B = d { f 1 g 1 + f 2 g 3 f 1 g 2 + f 2 g 4 f 3 g 1 + f 4 g 3 f 3 g 2 + f 4 g 4 } = { f 1 ′ g 1 + f 2 ′ g 3 f 1 ′ g 2 + f 2 ′ g 4 f 3 ′ g 1 + f 4 ′ g 3 f 3 ′ g 2 + f 4 ′ g 4 } + { f 1 g 1 ′ + f 2 g 3 ′ f 1 g 2 ′ + f 2 g 4 ′ f 3 g 1 ′ + f 4 g 3 ′ f 3 g 2 ′ + f 4 g 4 ′ } = d A b + A d B dAB = d \begin {Bmatrix} f_1g_1+f_2g_3 & f_1g_2+f_2g_4\\f_3g_1+f_4g_3& f_3g_2+f_4g_4\end{Bmatrix} =\\ \begin {Bmatrix} f^{'}_1g_1+f^{'}_2g_3 & f^{'}_1g_2+f^{'}_2g_4\\f^{'}_3g_1+f^{'}_4g_3& f^{'}_3g_2+f^{'}_4g_4\end{Bmatrix} +\\ \begin {Bmatrix} f_1g^{'}_1+f_2g^{'}_3 & f_1g^{'}_2+f_2g^{'}_4\\f_3g^{'}_1+f_4g^{'}_3& f_3g^{'}_2+f_4g^{'}_4\end{Bmatrix} =dAb+AdB dAB=d{f1g1+f2g3f3g1+f4g3f1g2+f2g4f3g2+f4g4}={f1g1+f2g3f3g1+f4g3f1g2+f2g4f3g2+f4g4}+{f1g1+f2g3f3g1+f4g3f1g2+f2g4f3g2+f4g4}=dAb+AdB

例6:

求dtr(AB)对A的导数。

解:

t r ( A B ) = ∑ i , j A i j B j i tr(AB) = \sum_{i,j}A_{ij}B_{ji} tr(AB)=i,jAijBji

d t r ( A B ) = ∂ ( ∑ i , j A i j B j i ) T ∂ A d A = ( B T ) T d A dtr(AB) = \frac{\partial (\sum_{i,j}A_{ij}B_{ji})^T}{\partial A} dA = (B^T)^TdA dtr(AB)=A(i,jAijBji)TdA=(BT)TdA

即dtr(AB)对A的导数是 B T B^T BT

你可能感兴趣的:(线性代数,矩阵,线性代数)