上一篇:https://blog.csdn.net/m0_37567738/article/details/133444201?spm=1001.2014.3001.5502
参考网址:https://zhuanlan.zhihu.com/p/262751195
在机器学习的算法推导里,通常遵循以下布局的规范:
t r ( A B ) = t r ( B A ) tr(AB) = tr(BA) tr(AB)=tr(BA)
t r ( A T B ) = ∑ i , j A i j B i j , 即 t r ( A T B ) 是矩阵 A , B 的内积 ( 或卷积和 ) (2.1) tr(A^TB) = \sum_{i,j}A_{ij}B_{ij},即tr(A^TB)是矩阵A,B的内积(或卷积和)\tag{2.1} tr(ATB)=i,j∑AijBij,即tr(ATB)是矩阵A,B的内积(或卷积和)(2.1)
d f = ∑ i = 1 n ∂ f ∂ x i d x i = ∂ f ∂ x T d x (2.2) df =\sum_{i=1}^{n}\frac{\partial f}{\partial x_{i}}dx_{i} =\frac{\partial f}{\partial x } ^T d x\tag{2.2} df=i=1∑n∂xi∂fdxi=∂x∂fTdx(2.2)
将{2.2}中的向量推广到矩阵,由{2.1}和{2.2}可得:
d f = ∑ i = 1 m ∑ j = 1 n ∂ f ∂ X i j d X i j = t r ( ∂ f ∂ X T d X ) df = \sum_{i=1}^{m}\sum_{j=1}^{n}\frac{\partial f}{\partial X_{ij}}dX_{ij} =tr(\frac{\partial f}{\partial X } ^T d X) df=i=1∑mj=1∑n∂Xij∂fdXij=tr(∂X∂fTdX)
上述微分法,是对矩阵求导数的基本思想。
例1:
若 y = a T X b , 其中 y 是标量 , a 是 m 维向量, b 是 n 维向量, X 是 m ∗ n 维矩阵,求 ∂ y ∂ X 若y = a^TXb,其中y是标量,a是m维向量,b是n维向量,X是m*n维矩阵,求\frac{\partial y}{\partial X} 若y=aTXb,其中y是标量,a是m维向量,b是n维向量,X是m∗n维矩阵,求∂X∂y
解法1:
按照分母布局可得:
∂ y ∂ X = a b T \frac{\partial y}{\partial X} = ab^T ∂X∂y=abT
解法2:
d f = d a T X b + a T d X b + a T X d b = a T d X b df = da^T Xb + a^TdXb + a^TXdb = a^TdXb df=daTXb+aTdXb+aTXdb=aTdXb
d f = t r ( a T d X b ) = t r ( ( a b T ) T d X ) df = tr(a^TdXb)=tr((ab^T)^T dX) df=tr(aTdXb)=tr((abT)TdX)
∂ f ∂ X = a b T \frac{\partial f}{\partial X} = ab^T ∂X∂f=abT
例子2:
求 d A − 1 d A^{-1} dA−1
解:
A − 1 A = I = > d ( A − 1 A ) = d ( A − 1 ) A + A − 1 d A = d I = > d ( A − 1 ) A = − A − 1 d A = > d ( A − 1 ) = − A − 1 d A A − 1 A^{-1} A = I => \\ d (A^{-1} A) = d(A^{-1})A + A^{-1}dA = dI=>\\ d(A^{-1})A = - A^{-1}dA =>\\ d(A^{-1}) = - A^{-1}dA A^{-1} A−1A=I=>d(A−1A)=d(A−1)A+A−1dA=dI=>d(A−1)A=−A−1dA=>d(A−1)=−A−1dAA−1
或者:
A A − 1 = I = > d ( A A − 1 ) = d A A − 1 + A d ( A − 1 ) = d I = > A d ( A − 1 ) = − d A A − 1 = > d ( A − 1 ) = − A − 1 d A A − 1 A A^{-1} = I => \\ d (A A^{-1} ) = dA A^{-1} + A d(A^{-1}) = dI=>\\ A d(A^{-1})= -dA A^{-1}=>\\ d(A^{-1}) = - A^{-1}dA A^{-1} AA−1=I=>d(AA−1)=dAA−1+Ad(A−1)=dI=>Ad(A−1)=−dAA−1=>d(A−1)=−A−1dAA−1
例3: 求d (detA)
解:
d ∣ X ∣ = ∣ X ∣ t r ( X − 1 d X ) = t r ( X ∗ d X ) d|X| = |X|tr(X^{-1}dX) = tr(X^*dX) d∣X∣=∣X∣tr(X−1dX)=tr(X∗dX)
行列式求导公式推导:https://www.cnblogs.com/analysis101/p/14677671.html
https://jingyan.baidu.com/article/a501d80cb6ef00ac620f5e21.html
例4:
f ( x ) = ∣ x + a a a a x + a a a a x + a ∣ ,求 f ′ ( x ) f(x) = \begin {vmatrix} x+ a & a& a \\ a & x+a & a\\a & a & x+a \end{vmatrix},求f'(x) f(x)= x+aaaax+aaaax+a ,求f′(x)
解:
f ′ ( x ) = 3 ∣ x + a a a x + a ∣ = 3 x 2 + 6 a x f'(x) = 3 \begin {vmatrix} x+ a & a\\a&x+a\end{vmatrix} = 3x^2+6ax f′(x)=3 x+aaax+a =3x2+6ax
例5:
验证 d ( A B ) = d A B + A d B d(AB) = dA B + AdB d(AB)=dAB+AdB
解:
设A= { f 1 f 2 f 3 f 4 } \begin {Bmatrix} f_1 & f_2\\f_3& f_4\end{Bmatrix} {f1f3f2f4},B= { g 1 g 2 g 3 g 4 } \begin {Bmatrix} g_1 & g_2\\g_3& g_4\end{Bmatrix} {g1g3g2g4}
则A’= { f 1 ′ f 2 ′ f 3 ′ f 4 ′ } \begin {Bmatrix} f^{'}_1 & f^{'}_2\\f^{'}_3& f^{'}_4\end{Bmatrix} {f1′f3′f2′f4′},B’= { g 1 ′ g 2 ′ g 3 ′ g 4 ′ } \begin {Bmatrix} g^{'}_1 & g^{'}_2\\g^{'}_3& g^{'}_4\end{Bmatrix} {g1′g3′g2′g4′}
d A B = d { f 1 g 1 + f 2 g 3 f 1 g 2 + f 2 g 4 f 3 g 1 + f 4 g 3 f 3 g 2 + f 4 g 4 } = { f 1 ′ g 1 + f 2 ′ g 3 f 1 ′ g 2 + f 2 ′ g 4 f 3 ′ g 1 + f 4 ′ g 3 f 3 ′ g 2 + f 4 ′ g 4 } + { f 1 g 1 ′ + f 2 g 3 ′ f 1 g 2 ′ + f 2 g 4 ′ f 3 g 1 ′ + f 4 g 3 ′ f 3 g 2 ′ + f 4 g 4 ′ } = d A b + A d B dAB = d \begin {Bmatrix} f_1g_1+f_2g_3 & f_1g_2+f_2g_4\\f_3g_1+f_4g_3& f_3g_2+f_4g_4\end{Bmatrix} =\\ \begin {Bmatrix} f^{'}_1g_1+f^{'}_2g_3 & f^{'}_1g_2+f^{'}_2g_4\\f^{'}_3g_1+f^{'}_4g_3& f^{'}_3g_2+f^{'}_4g_4\end{Bmatrix} +\\ \begin {Bmatrix} f_1g^{'}_1+f_2g^{'}_3 & f_1g^{'}_2+f_2g^{'}_4\\f_3g^{'}_1+f_4g^{'}_3& f_3g^{'}_2+f_4g^{'}_4\end{Bmatrix} =dAb+AdB dAB=d{f1g1+f2g3f3g1+f4g3f1g2+f2g4f3g2+f4g4}={f1′g1+f2′g3f3′g1+f4′g3f1′g2+f2′g4f3′g2+f4′g4}+{f1g1′+f2g3′f3g1′+f4g3′f1g2′+f2g4′f3g2′+f4g4′}=dAb+AdB
例6:
求dtr(AB)对A的导数。
解:
t r ( A B ) = ∑ i , j A i j B j i tr(AB) = \sum_{i,j}A_{ij}B_{ji} tr(AB)=i,j∑AijBji
d t r ( A B ) = ∂ ( ∑ i , j A i j B j i ) T ∂ A d A = ( B T ) T d A dtr(AB) = \frac{\partial (\sum_{i,j}A_{ij}B_{ji})^T}{\partial A} dA = (B^T)^TdA dtr(AB)=∂A∂(∑i,jAijBji)TdA=(BT)TdA
即dtr(AB)对A的导数是 B T B^T BT。