张量求导

张量求导

之前遇到的很多张量求导和算子运算的问题,我都采用形状法则不断尝试,或者展开成分量进行运算,这几天接触到了 Kronecker delta( δ \delta δ) 和 Levi-Civita( ϵ \epsilon ϵ)记号。终于给出了一个统一的张量求导运算框架,做个读书笔记。

约定

如果没有特殊声明, a , b , c a,b,c a,b,c 表示标量, x , y , z x, y, z x,y,z 表示向量,大写字母表示矩阵。 a i a_i ai 这一类加下标的表示张量,张量维度由下标决定,涉及叉积的都是三维向量或者算符。简记 ∂ i = ∂ ∂ x i \partial_i=\frac{\partial}{\partial{x_i}} i=xi

布局

布局分为分子布局和分母布局,
对于向量或标量(看作一维向量) x , y x, y x,y,计算 ∂ y ∂ x \frac{\partial y}{\partial x} xy
若先展开分子,则称为分子布局,此时 ∂ y ∂ x = [ ∂ y 1 ∂ x , ∂ y 2 ∂ x , . . . , ∂ y n ∂ x ] T \frac{\partial y}{\partial x}=[\frac{\partial y_1}{\partial x}, \frac{\partial y_2}{\partial x}, ..., \frac{\partial y_n}{\partial x}]^\text{T} xy=[xy1,xy2,...,xyn]T,且 d y = ( ∂ y ∂ x ) d x dy=(\frac{\partial y}{\partial x})dx dy=(xy)dx
若先展开分母,则称为分母布局,此时 ∂ y ∂ x = [ ∂ y ∂ x 1 , ∂ y ∂ x 2 , . . . , ∂ y ∂ x n ] T \frac{\partial y}{\partial x}=[\frac{\partial y}{\partial x_1}, \frac{\partial y}{\partial x_2}, ..., \frac{\partial y}{\partial x_n}]^\text{T} xy=[x1y,x2y,...,xny]T,且 d y = ( ∂ y ∂ x ) T d x dy=(\frac{\partial y}{\partial x})^\text{T}dx dy=(xy)Tdx

下面的推导一概采用分子布局。

Kronecker delta记号和Levi-Civita记号,张量乘法

离散的Kronecker delta记号( δ i j \delta_{ij} δij)定义为 I ( i = j ) I(i=j) I(i=j)

Levi-Civita记号( ϵ \epsilon ϵ)采用逆序对的形式定义,是一个反对称记号,我们为了简便,直接定义为 ϵ i 1 i 2 . . i n \epsilon_{i_1i_2..i_n} ϵi1i2..in 为行列 ∣ A i j ∣ |A_{ij}| Aij a 1 i 1 a 2 i 2 . . . a n i n a_{1i_1}a_{2i_2}...a_{ni_n} a1i1a2i2...anin 的系数 (可以为0,-1,1), 例如 ϵ 21 = − 1 , ϵ 12 = 1 , ϵ 11 = ϵ 22 = 0 \epsilon_{21}=-1,\epsilon_{12}=1,\epsilon_{11}=\epsilon_{22}=0 ϵ21=1,ϵ12=1,ϵ11=ϵ22=0

我们约定 ( a i j b j k ) i k (a_{ij}b_{jk})_{ik} (aijbjk)ik 表示矩阵乘法的张量积,即 ( a i j b j k ) i k = ∑ j a i j b j k (a_{ij}b_{jk})_{ik}=\sum \limits _{j}^{}a_{ij}b_{jk} (aijbjk)ik=jaijbjk j j j 作为下标没有的指数被求和了。类似地, a i b i a_ib_i aibi 是向量点积, δ i j a i j \delta_{ij}a_{ij} δijaij 是矩阵的秩, ( a i j b i j ) i j (a_{ij}b_{ij})_{ij} (aijbij)ij 是矩阵按位乘。

由定义 ∣ A i j ∣ = ϵ i 1 i 2 . . i n a 1 i 1 a 2 i 2 . . . a n i n |A_{ij}|=\epsilon_{i_1i_2..i_n}a_{1i_1}a_{2i_2}...a_{ni_n} Aij=ϵi1i2..ina1i1a2i2...anin

利用张量,一些算子可以表示为 ( ∇ a ) i = ∂ i a (\nabla a)_i=\partial_i a (a)i=ia, ( ∇ ⋅ x ) = δ i j ∂ i x j = ∂ i x i (\nabla \cdot x)=\delta_{ij}\partial_i x_j=\partial_i x_i (x)=δijixj=ixi, ( ∇ × x ) (\nabla \times x) (×x) 可以写成行列式,三行每一行的 i i i元素分别为 e i e_i ei ∂ i \partial_i i, x i x_i xi,即 ( ∇ × x ) i = ϵ i j k ∂ j x k (\nabla \times x)_i=\epsilon_{ijk}\partial_j x_k (×x)i=ϵijkjxk

向量求导

考察 ∂ ( W x ) ∂ x \frac{\partial (Wx)}{\partial x} x(Wx)
( ∂ ( W x ) ∂ x ) i j = ( ∂ j ( W i j x j ) ) = W i j (\frac{\partial (Wx)}{\partial x})_{ij} = (\partial_j (W_{ij}x_j))=W_{ij} (x(Wx))ij=(j(Wijxj))=Wij
∂ ( W x ) ∂ x = W \frac{\partial (Wx)}{\partial x}=W x(Wx)=W

考察链式法则,定义 l l l为标量损失函数, y = W x y=Wx y=Wx
d l = ∂ l ∂ x d x = ∂ l ∂ y d y dl=\frac{\partial l}{\partial x}dx=\frac{\partial l}{\partial y}dy dl=xldx=yldy
∂ l ∂ x = ∂ l ∂ y d y d x = ∂ l ∂ y W \frac{\partial l}{\partial x}=\frac{\partial l}{\partial y}\frac{dy}{dx}=\frac{\partial l}{\partial y}W xl=yldxdy=ylW
注意分子布局下的链式法则和分母布局不一样,这里以分子布局为例

矩阵求导

考察 ∂ ( t r ( A B ) ) ∂ A \frac{\partial (tr(AB))}{\partial A} A(tr(AB))
t r ( A B ) = ( δ i j ( A B ) i j ) = δ i j A i k B k j tr(AB)=(\delta_{ij}(AB)_{ij})=\delta_{ij}A_{ik}B_{kj} tr(AB)=(δij(AB)ij)=δijAikBkj
( ∂ t r ( A B ) ∂ A ) i k = ∂ i k ( δ i j A i k B k j ) = ∂ i k A i k B k i = B k i (\frac{\partial tr(AB)}{\partial A})_{ik}=\partial_{ik}({\delta_{ij}A_{ik}B_{kj}})=\partial_{ik}{A_{ik}B_{ki}}=B_{ki} (Atr(AB))ik=ik(δijAikBkj)=ikAikBki=Bki

利用矩阵的迹的性质
∂ t r ( A B ) ∂ A = B T \frac{\partial tr(AB)}{\partial A}=B^\text{T} Atr(AB)=BT
∂ t r ( A B ) ∂ B = ∂ t r ( B A ) ∂ B = A T \frac{\partial tr(AB)}{\partial B}=\frac{\partial tr(BA)}{\partial B}=A^\text{T} Btr(AB)=Btr(BA)=AT
∂ t r ( A T B ) ∂ A = ∂ t r ( B T A ) ∂ A = B \frac{\partial tr(A^\text{T}B)}{\partial A}=\frac{\partial tr(B^\text{T}A)}{\partial A}=B Atr(ATB)=Atr(BTA)=B
∂ t r ( A X B ) ∂ X = ∂ t r ( B A X ) ∂ X = A T B T \frac{\partial tr(AXB)}{\partial X}=\frac{\partial tr(BAX)}{\partial X}=A^\text{T}B^\text{T} Xtr(AXB)=Xtr(BAX)=ATBT

利用偏导数的链式法则
∂ t r ( X T X ) ∂ X = ∂ t r ( X 1 T X 2 ) ∂ X 1 + ∂ t r ( X 1 T X 2 ) ∂ X 2 = X 2 + X 1 = 2 X \frac{\partial tr(X^\text{T}X)}{\partial X}=\frac{\partial tr(X_1^\text{T}X_2)}{\partial X_1}+\frac{\partial tr(X_1^\text{T}X_2)}{\partial X_2}=X_2+X_1=2X Xtr(XTX)=X1tr(X1TX2)+X2tr(X1TX2)=X2+X1=2X
∂ t r ( X T A X ) ∂ X = ∂ t r ( X 1 T A X 2 ) ∂ X 1 + ∂ t r ( X 1 T A X 2 ) ∂ X 2 = A X 2 + A T X 1 = ( A + A T ) X \frac{\partial tr(X^\text{T}AX)}{\partial X}=\frac{\partial tr(X_1^\text{T}AX_2)}{\partial X_1}+\frac{\partial tr(X_1^\text{T}AX_2)}{\partial X_2}=AX_2+A^\text{T}X_1=(A+A^\text{T})X Xtr(XTAX)=X1tr(X1TAX2)+X2tr(X1TAX2)=AX2+ATX1=(A+AT)X

算子运算

考察 A × ( B × C ) A \times (B \times C) A×(B×C) ( A A A, B B B, C C C 可以是向量或者算符)
( A × ( B × C ) ) i = ( ϵ i j k A j ϵ k l m B l C m ) = ( ϵ i j k ϵ k l m A j B l C m ) (A \times (B \times C))_i=(\epsilon_{ijk}A_j\epsilon_{klm}B_lC_m)=(\epsilon_{ijk}\epsilon_{klm}A_jB_lC_m) (A×(B×C))i=(ϵijkAjϵklmBlCm)=(ϵijkϵklmAjBlCm)
注意到 ϵ i j k ϵ k l m = ϵ i j k ϵ l m k = I ( i = l , j = m ) − I ( i = m , j = l ) = δ i l δ j m − δ i m δ j l \epsilon_{ijk}\epsilon_{klm}=\epsilon_{ijk}\epsilon_{lmk}=I(i=l, j=m)-I(i=m, j=l)=\delta_{il}\delta_{jm}-\delta_{im}\delta_{jl} ϵijkϵklm=ϵijkϵlmk=I(i=l,j=m)I(i=m,j=l)=δilδjmδimδjl
( ϵ i j k ϵ k l m A j B l C m ) = ( δ i l δ j m A j B l C m − δ i m δ j l A j B l C m ) = A j B i C j − A j B j C i (\epsilon_{ijk}\epsilon_{klm}A_jB_lC_m)=(\delta_{il}\delta_{jm}A_jB_lC_m-\delta_{im}\delta_{jl}A_jB_lC_m)=A_jB_iC_j-A_jB_jC_i (ϵijkϵklmAjBlCm)=(δilδjmAjBlCmδimδjlAjBlCm)=AjBiCjAjBjCi
A × ( B × C ) = B ( A ⋅ C ) − ( A ⋅ B ) C A \times (B \times C)=B(A\cdot C)-(A\cdot B)C A×(B×C)=B(AC)(AB)C (如果 A i A_i Ai, B j B_j Bj可交换)

对于有旋场,证明无源,
考察 A ⋅ ( B × C ) = A i ϵ i j k B j C k = ∣ [ A , B , C ] ∣ A \cdot (B \times C)=A_i\epsilon_{ijk}B_jC_k=|[A, B, C]| A(B×C)=AiϵijkBjCk=[A,B,C]
A = B = ∇ A=B=\nabla A=B=时候,行列式为 0 0 0,证明完毕。

你可能感兴趣的:(张量求导)