主要公式:
Trace:
trAtrAtraAtrA+BtrABtrABCtrABCD=======∑i=1naiitrATatrAtrA+trBtrBAtrBCA=trCABtrDABC=trCDAB=trBCDA(26)(27)(28)(29)(30)(31)(32) (26) t r A = ∑ i = 1 n a i i (27) t r A = t r A T (28) t r a A = a t r A (29) t r A + B = t r A + t r B (30) t r A B = t r B A (31) t r A B C = t r B C A = t r C A B (32) t r A B C D = t r D A B C = t r C D A B = t r B C D A
Derivative:
dXTdX=dXdXTd(AX)TdXd(AX)dXTd(XTX)dXd(XTAX)dX=====EATA2X(A+AT)X(33)(34)(35)(36)(37) (33) d X T d X = d X d X T = E (34) d ( A X ) T d X = A T (35) d ( A X ) d X T = A (36) d ( X T X ) d X = 2 X (37) d ( X T A X ) d X = ( A + A T ) X
Gradient:
假设有函数 f:Rm×n→R f : R m × n → R ,将 m×n m × n 的矩阵映射到实数 R R 空间去,则我们定义 f f 关于矩阵 A A 的导数为:
∇Af(A)=⎡⎣⎢⎢⎢⎢∂f∂a11⋮∂f∂an1⋯⋱⋯∂f∂a1n⋮∂f∂ann⎤⎦⎥⎥⎥⎥ ∇ A f ( A ) = [ ∂ f ∂ a 11 ⋯ ∂ f ∂ a 1 n ⋮ ⋱ ⋮ ∂ f ∂ a n 1 ⋯ ∂ f ∂ a n n ]
∇AtrAB∇A|A|∇ATf(A)∇AtrABATC====BT|A|(A−1)T(∇Af(A))TCAB+CTABT(38)(39)(40)(41) (38) ∇ A t r A B = B T (39) ∇ A | A | = | A | ( A − 1 ) T (40) ∇ A T f ( A ) = ( ∇ A f ( A ) ) T (41) ∇ A t r A B A T C = C A B + C T A B T
公式证明
(1) trAB=trBA t r A B = t r B A
假设
A=⎡⎣⎢⎢a11⋮an1…⋱…a1n⋮ann⎤⎦⎥⎥B=⎡⎣⎢⎢b11⋮bn1…⋱…b1n⋮bnn⎤⎦⎥⎥ A = [ a 11 … a 1 n ⋮ ⋱ ⋮ a n 1 … a n n ] B = [ b 11 … b 1 n ⋮ ⋱ ⋮ b n 1 … b n n ]
可得
trABtrBA==∑i=1n∑j=1naijbji∑j=1n∑i=1naijbji(96)(97) (96) t r A B = ∑ i = 1 n ∑ j = 1 n a i j b j i (97) t r B A = ∑ j = 1 n ∑ i = 1 n a i j b j i
同理得式(6)、(7)
(2) ∇AtrAB=BT ∇ A t r A B = B T
又因为 trAB=∑i=1n∑j=1naijbji t r A B = ∑ i = 1 n ∑ j = 1 n a i j b j i ,对于A 中的每一个 aij a i j 都有:
dtrABdaij=bji(98) (98) d t r A B d a i j = b j i
又因为
B B 由式(18)定义,所以
∇AtrAB=BT ∇ A t r A B = B T
(3) ∇A|A|=|A|(A−1)T ∇ A | A | = | A | ( A − 1 ) T
由行列式的性质得: |A|=∑jaijAij | A | = ∑ j a i j A i j ,其中 Aij A i j 矩阵 (i,j) ( i , j ) 处的代数余子式。所以:
∂|A|∂aij⟹∇A|A|==AijAij=(A∗)T=(|A|A−1)T=|A|(A−1)T(99)(100) (99) ∂ | A | ∂ a i j = A i j (100) ⟹ ∇ A | A | = A i j = ( A ∗ ) T = ( | A | A − 1 ) T = | A | ( A − 1 ) T
(4)
∇ATf(A)=(∇Af(A))T ∇ A T f ( A ) = ( ∇ A f ( A ) ) T
等号左边对于A 中的每一个 aij a i j 都有: df(A)daji d f ( A ) d a j i
等号右边对于A 中的每一个 aij a i j 可表示为: df(A)daij d f ( A ) d a i j
可以发现他们正好是转置的关系。
(5) ∇AtrABATC=CAB+CTABT ∇ A t r A B A T C = C A B + C T A B T
∇AtrABATC====∇AtrA(BATC)+∇Atr(CAB)AT(BATC)T+∇Atr[(CAB)AT]TCTABT+∇AtrA(CAB)CTABT+CAB(101)(102)(103)(104) (101) ∇ A t r A B A T C = ∇ A t r A ( B A T C ) + ∇ A t r ( C A B ) A T (102) = ( B A T C ) T + ∇ A t r [ ( C A B ) A T ] T (103) = C T A B T + ∇ A t r A ( C A B ) (104) = C T A B T + C A B