左行右列:初等阵 P P P 左乘 A A A 有 P A PA PA,相当于对 A A A 按照 P P P 的行元素,做了一次初等行变换;初等阵 Q Q Q 右乘 A A A 有 A Q AQ AQ,相当于对 A A A 按照 Q Q Q 的列元素,做了一次初等列变换。
定义:设矩阵 A ∈ C r m × n A \in C_r^{m\times n} A∈Crm×n , λ i \lambda _i λi 是 A A H ( A H A ) AA^H(A^HA) AAH(AHA) 的非零特征值,则称 σ i = λ i \sigma _i=\sqrt{\lambda _i} σi=λi 为 A A A 的奇异值, i = 1 , 2 , ⋯ , r i=1,2,\cdots,r i=1,2,⋯,r
定理:设矩阵 A ∈ C r m × n A \in C_r^{m\times n} A∈Crm×n ,则存在 U ∈ U m × m U \in U^{m\times m} U∈Um×m, V ∈ U n × n V \in U^{n\times n} V∈Un×n,使得 A = U [ Δ 0 0 0 ] V H A=U \left [ \begin{matrix} \Delta &0\\ 0&0\end{matrix} \right]V^H A=U[Δ000]VH,其中 Δ = d i a g [ σ 1 , σ 2 , ⋯ , σ r ] \Delta=diag[\sigma _1,\sigma _2,\cdots,\sigma _r] Δ=diag[σ1,σ2,⋯,σr], σ 1 ≥ σ 2 ≥ ⋯ ≥ σ r \sigma _1 \geq \sigma _2 \geq \cdots \geq \sigma _r σ1≥σ2≥⋯≥σr 为 A A A 的奇异值。
证明:因为 A A H AA^H AAH 是正规阵,所以存在 U ∈ U m × m U\in U^{m\times m} U∈Um×m ,使得 U H A A H U = d i a g [ σ 1 2 , σ 2 2 , ⋯ , σ r 2 , 0 , ⋯ , 0 ] = [ Δ Δ H 0 0 0 ] U^HAA^HU=diag[\sigma _1^2,\sigma _2^2, \cdots,\sigma _r^2,0,\cdots,0]=\left[\begin{matrix}\Delta \Delta ^H&0\\0&0\end{matrix}\right ] UHAAHU=diag[σ12,σ22,⋯,σr2,0,⋯,0]=[ΔΔH000]
且 σ 1 2 ≥ σ 2 2 ≥ ⋯ ≥ σ r 2 \sigma _1^2 \geq \sigma _2^2 \geq \cdots \geq\sigma _r^2 σ12≥σ22≥⋯≥σr2
其中 Δ = d i a g [ σ 1 , σ 2 , ⋯ , σ r ] \Delta=diag[\sigma _1,\sigma _2,\cdots,\sigma _r] Δ=diag[σ1,σ2,⋯,σr]。设 U = [ U 1 U 2 ] U=[\begin{matrix}U_1&U_2\end{matrix}] U=[U1U2],则 U H A A H U = [ U 1 H U 2 H ] A A H [ U 1 U 2 ] = [ U 1 H U 2 H ] [ A A H U 1 A A H U 2 ] = [ U 1 H A A H U 1 U 1 H A A H U 2 U 2 H A A H U 1 U 2 H A A H U 2 ] = [ Δ Δ H 0 0 0 ] \begin{aligned}U^HAA^HU&=\left[ \begin{matrix} U_1^H\\U_2^H\end{matrix}\right]AA^H\left[\begin{matrix}U_1&U_2\end{matrix}\right]\\&=\left[ \begin{matrix}U_1^H\\U_2^H\end{matrix}\right]\left[\begin{matrix}AA^HU_1&AA^HU_2\end{matrix}\right]\\&=\left[\begin{matrix}U_1^HAA^HU_1&U_1^HAA^HU_2\\U_2^HAA^HU_1&U_2^HAA^HU_2\end{matrix}\right]\\&=\left[\begin{matrix}\Delta \Delta ^H&0\\0&0\end{matrix}\right]\end{aligned} UHAAHU=[U1HU2H]AAH[U1U2]=[U1HU2H][AAHU1AAHU2]=[U1HAAHU1U2HAAHU1U1HAAHU2U2HAAHU2]=[ΔΔH000]故有 U 1 H A A H U 1 = Δ Δ H U 1 H A A H U 2 = 0 U 2 H A A H U 1 = 0 U 2 H A A H U 2 = 0 \begin{aligned}U_1^HAA^HU_1&=\Delta \Delta ^H \\U_1^HAA^HU_2&=0\\U_2^HAA^HU_1&=0\\U_2^HAA^HU_2&=0\end{aligned} U1HAAHU1U1HAAHU2U2HAAHU1U2HAAHU2=ΔΔH=0=0=0令 V 1 = A H U 1 Δ − H V_1=A^HU_1\Delta ^{-H} V1=AHU1Δ−H,则 V 1 H V 1 = Δ − 1 U 1 H A A H U 1 Δ − H V_1^HV_1=\Delta ^{-1}U_1^HAA^HU_1\Delta ^{-H} V1HV1=Δ−1U1HAAHU1Δ−H,由 U 1 H A A H U 1 = Δ Δ H U_1^HAA^HU_1=\Delta \Delta ^H U1HAAHU1=ΔΔH 得 V 1 H V 1 = E r V_1^HV_1=E_r V1HV1=Er所以 V 1 V_1 V1 为次酉阵,即 V 1 ∈ U r n × r V_1 \in U_r^{n\times r} V1∈Urn×r,故存在 V 2 ∈ U n − r n × ( n − r ) V_2 \in U_{n-r}^{n\times (n-r)} V2∈Un−rn×(n−r),使得 V = [ V 1 V 2 ] ∈ U n × n V=[\begin{matrix}V_1&V_2\end{matrix}]\in U^{n\times n} V=[V1V2]∈Un×n,所以 U H A V = [ U 1 H U 2 H ] A [ V 1 V 2 ] = [ U 1 H A V 1 U 1 H A V 2 U 2 H A V 1 U 2 H A V 2 ] U^HAV=\left [ \begin{matrix}U_1^H\\U_2^H\end{matrix}\right]A~[\begin{matrix}V_1&V_2\end{matrix}]=\left [ \begin{matrix} U_1^HAV_1&U_1^HAV_2\\U_2^HAV_1&U_2^HAV_2\end{matrix}\right] UHAV=[U1HU2H]A [V1V2]=[U1HAV1U2HAV1U1HAV2U2HAV2]由 U 1 H A A H U 1 = Δ Δ H U_1^HAA^HU_1=\Delta \Delta ^H U1HAAHU1=ΔΔH 得 U 1 H A V 1 = U 1 H A A H U 1 Δ − H = Δ U_1^HAV_1=U_1^HAA^HU_1\Delta ^{-H}=\Delta U1HAV1=U1HAAHU1Δ−H=Δ由 U 2 H A A H U 2 = ( A H U 2 ) H ( A H U 2 ) = 0 U_2^HAA^HU_2=(A^HU_2)^H(A^HU_2)=0 U2HAAHU2=(AHU2)H(AHU2)=0 得 A H U 2 = 0 , U 2 H A = 0 A^HU_2=0,U_2^HA=0 AHU2=0,U2HA=0所以 U 2 H A V 1 = 0 , U 2 H A V 2 = 0 U_2^HAV_1=0,U_2^HAV_2=0 U2HAV1=0,U2HAV2=0。又因为 V 1 = A H U 1 Δ − H ⇒ V 1 Δ H = A H U 1 ⇒ U 1 H A = Δ V 1 H V_1=A^HU_1\Delta ^{-H}\Rightarrow V_1\Delta ^H=A^HU_1 \Rightarrow U_1^HA=\Delta V_1^H V1=AHU1Δ−H⇒V1ΔH=AHU1⇒U1HA=ΔV1H所以 U 1 H A V 2 = Δ V 1 H V 2 = 0 U_1^HAV_2=\Delta V_1^HV_2=0 U1HAV2=ΔV1HV2=0。故 U H A V = [ Δ 0 0 0 ] U^HAV=\left[\begin{matrix} \Delta&0\\0&0\end{matrix}\right] UHAV=[Δ000],即 A = U [ Δ 0 0 0 ] V H A=U\left[\begin{matrix} \Delta&0\\0&0\end{matrix}\right]V^H A=U[Δ000]VH。
- 求出 A A H ( A H A ) AA^H(A^HA) AAH(AHA) 全部非零特征值 λ i \lambda _i λi,记 Δ = d i a g [ σ 1 , σ 2 , ⋯ , σ r ] \Delta=diag[\sigma _1,\sigma _2,\cdots,\sigma _r] Δ=diag[σ1,σ2,⋯,σr],且 σ 1 ≥ σ 2 ≥ ⋯ ≥ σ r \sigma _1 \geq \sigma _2 \geq \cdots \geq \sigma _r σ1≥σ2≥⋯≥σr 为 A A A 的正奇异值。
- 求酉矩阵 U ∈ U m × m ( V ∈ V n × n ) U \in U^{m\times m}(V \in V^{n\times n}) U∈Um×m(V∈Vn×n),使得 U H A A H U = d i a g [ σ 1 2 , σ 2 2 , ⋯ , σ r 2 , 0 , ⋯ , 0 ] ( V H A H A V = d i a g [ σ 1 2 , σ 2 2 , ⋯ , σ r 2 , 0 , ⋯ , 0 ] ) U^HAA^HU=diag[\sigma _1^2,\sigma _2^2, \cdots,\sigma _r^2,0,\cdots,0]\\(V^HA^HAV=diag[\sigma _1^2,\sigma _2^2, \cdots,\sigma _r^2,0,\cdots,0]) UHAAHU=diag[σ12,σ22,⋯,σr2,0,⋯,0](VHAHAV=diag[σ12,σ22,⋯,σr2,0,⋯,0])
- 设 U = [ U 1 U 2 ] ( V = [ V 1 V 2 ] ) U=[\begin{matrix}U_1&U_2\end{matrix}](V=[\begin{matrix}V_1&V_2\end{matrix}]) U=[U1U2](V=[V1V2]),其中 U 1 ( V 1 ) U_1(V_1) U1(V1) 为 U ( V ) U(V) U(V) 的前 r r r 列,令 V 1 = A H U 1 Δ − H ( U 1 = A V 1 Δ − 1 ) V_1=A^HU_1\Delta ^{-H}(U_1=AV_1\Delta ^{-1}) V1=AHU1Δ−H(U1=AV1Δ−1),则 V 1 ( U 1 ) V_1(U_1) V1(U1) 为次酉阵,求 V 2 ∈ U n − r n × ( n − r ) ( U 2 ∈ U m − r m × ( m − r ) ) V_2\in U_{n-r}^{n\times (n-r)}(U_2\in U_{m-r}^{m\times (m-r)}) V2∈Un−rn×(n−r)(U2∈Um−rm×(m−r)),使得 V = [ V 1 V 2 ] ∈ U n × n ( U = [ U 1 U 2 ] ∈ U m × m ) V=[\begin{matrix}V_1&V_2\end{matrix}]\in U^{n\times n}(U=[\begin{matrix}U_1&U_2\end{matrix} ]\in U^{m\times m}) V=[V1V2]∈Un×n(U=[U1U2]∈Um×m)。
- A = U [ Δ 0 0 0 ] V H A=U\left[\begin{matrix} \Delta&0\\0&0\end{matrix}\right]V^H A=U[Δ000]VH
Δ \Delta Δ 视为放缩矩阵, U U U 和 V V V 视为旋转矩阵。
因此,对于 M = A N M=AN M=AN 和 A = U Δ V H A=U\Delta V^H A=UΔVH 可以直观的解释为一个图像 N N N 经过 A A A 变换成另一个图像 M M M 的过程,首先经过 V H V^H VH 的旋转,再经过 Δ \Delta Δ 的放缩,最后在经过 U U U 的旋转得到最终的图像。
变元:在初等数学里,变量或变元、元是一个用来表示值的符号,该值可以是随意的,也可能是未指定或未定的。
无特殊标记下,大多数提到的向量都是列向量。
v e c ( ⋅ ) vec(\cdot) vec(⋅) 表示矩阵化为向量, r v e c ( ⋅ ) rvec(\cdot) rvec(⋅) 表示矩阵化为行向量。
v e c ( ⋅ ) vec(\cdot) vec(⋅) 称为向量化算子, v e c ( ⋅ ) vec(\cdot) vec(⋅) 又分为按行展开和按列展开。
u n v e c ( ⋅ ) unvec(\cdot) unvec(⋅) 表示向量化为矩阵, u n r v e c ( ⋅ ) unrvec(\cdot) unrvec(⋅) 表示行向量化为矩阵。
设矩阵 A = ( a i j ) ∈ R m × n A=(a_{ij})\in R_{m\times n} A=(aij)∈Rm×n,把矩阵 A A A 的元素按行的顺序排列成一个向量:
v e c A = ( a 11 , a 12 , ⋯ , a 1 n , a 21 , a 22 , ⋯ , a 2 n , ⋯ , a m 1 , a m 2 , ⋯ , a m n ) T vecA=(a_{11},a_{12},\cdots,a_{1n},a_{21},a_{22},\cdots,a_{2n},\cdots,a_{m1},a_{m2},\cdots,a_{mn})^T vecA=(a11,a12,⋯,a1n,a21,a22,⋯,a2n,⋯,am1,am2,⋯,amn)T,则称向量 v e c A vecA vecA 为矩阵 A A A 按行展开的向量。
设矩阵 A = ( a i j ) ∈ R m × n A=(a_{ij})\in R_{m\times n} A=(aij)∈Rm×n,把矩阵 A A A 的元素按列的顺序排列成一个向量:
v e c A = ( a 11 , a 21 , ⋯ , a n 1 , a 12 , a 22 , ⋯ , a n 2 , ⋯ , a 1 n , a 2 n , ⋯ , a m n ) T vecA=(a_{11},a_{21},\cdots,a_{n1},a_{12},a_{22},\cdots,a_{n2},\cdots,a_{1n},a_{2n},\cdots,a_{mn})^T vecA=(a11,a21,⋯,an1,a12,a22,⋯,an2,⋯,a1n,a2n,⋯,amn)T,则称向量 v e c A vecA vecA 为矩阵 A A A 按列展开的向量。
向量对向量求偏导,才涉及到分子、分母布局。
分子布局(称为 J a c o b i a n Jacobian Jacobian 形式)。比如 y m × 1 y_{m\times 1} ym×1 和 x n × 1 x_{n\times 1} xn×1,则 J a c o b i a n Jacobian Jacobian 形式为 ∂ y ∂ x T \frac {\partial y}{\partial x^T} ∂xT∂y 即按照 y y y 和 x T x^T xT 的维数相乘为 m × 1 × 1 × n = m × n m \times 1 \times 1 \times n = m\times n m×1×1×n=m×n,因为分子没有变化(转置),所以称为分子布局(个人记法)。
分母布局(称为 H e s s i a n Hessian Hessian 形式)。比如 y m × 1 y_{m\times 1} ym×1 和 x n × 1 x_{n\times 1} xn×1,则 H e s s i a n Hessian Hessian 形式为 ∂ y T ∂ x \frac {\partial y^T}{\partial x} ∂x∂yT 即按照 x x x 和 y T y^T yT 的维数相乘为 n × 1 × 1 × m = n × m n \times 1 \times 1 \times m = n\times m n×1×1×m=n×m,因为分母没有变化(转置),所以称为分母布局(个人记法)。有的也称该布局为梯度,区别于 J a c o b i a n Jacobian Jacobian 形式。
分子不变,分母变为转置。
∂ S c a l a r ∂ V e c t o r / M a t r i x \frac {\partial Scalar}{\partial Vector / Matrix} ∂Vector/Matrix∂Scalar
1 × m 1\times m 1×m 行向量的偏导算子 D x = d e f ∂ ∂ x T = Δ [ ∂ ∂ x 1 , ⋯ , ∂ ∂ x m ] D_x \overset {def}{=}\frac {\partial}{\partial x^T}\overset {\Delta }{=}\left [ \frac{\partial }{\partial x_1} , \cdots , \frac {\partial}{\partial x_m} \right ] Dx=def∂xT∂=Δ[∂x1∂,⋯,∂xm∂]
标 量 函 数 f ( x ) 的 行 向 量 偏 导 标量函数f(x)的行向量偏导 标量函数f(x)的行向量偏导 | D x f ( x ) = ∂ ∂ x T f ( x ) = [ ∂ f ( x ) ∂ x 1 , ⋯ , ∂ f ( x ) ∂ x m ] D_x f(x)=\frac{\partial}{\partial x^T} f(x)=\left [ \frac {\partial f(x)}{\partial x_1},\cdots,\frac{\partial f(x)}{\partial x_m} \right ] Dxf(x)=∂xT∂f(x)=[∂x1∂f(x),⋯,∂xm∂f(x)] |
---|---|
矩 阵 形 式 定 义 实 值 标 量 函 数 f ( X ) 的 行 向 量 偏 导 矩阵形式定义实值标量函数f(X)的行向量偏导 矩阵形式定义实值标量函数f(X)的行向量偏导 | D v e c T ( X ) f ( X ) = ∂ ∂ v e c T ( X ) f ( X ) = [ ∂ f ( X ) ∂ X 11 , ⋯ , ∂ f ( X ) ∂ X m 1 , ⋯ , ∂ f ( X ) ∂ X 1 n , ⋯ , ∂ f ( X ) ∂ X m n ] D_{vec^T(X)}f(X)=\frac{\partial }{\partial vec^T(X)} f(X)\\ =\left [ \frac {\partial f(X)}{\partial X_{11}}, \cdots ,\frac {\partial f(X)}{\partial X_{m1}}, \cdots,\frac {\partial f(X)}{\partial X_{1n}}, \cdots , \frac {\partial f(X)}{\partial X_{mn}} \right ] DvecT(X)f(X)=∂vecT(X)∂f(X)=[∂X11∂f(X),⋯,∂Xm1∂f(X),⋯,∂X1n∂f(X),⋯,∂Xmn∂f(X)] |
矩 阵 形 式 定 义 实 值 标 量 函 数 f ( X ) 在 X 处 的 偏 导 矩阵形式定义实值标量函数f(X)在X处的偏导 矩阵形式定义实值标量函数f(X)在X处的偏导 | D X f ( X ) = [ ∂ f ( X ) ∂ X 11 ⋯ ∂ f ( X ) ∂ X m 1 ⋮ ⋱ ⋮ ∂ f ( X ) ∂ X 1 n ⋯ ∂ f ( X ) ∂ X m n ] ∈ R n × m D_Xf(X)=\left [ \begin{matrix}\frac{\partial f(X)}{\partial X_{11}} & \cdots &\frac{\partial f(X)}{\partial X_{m1}}\\ \vdots & \ddots & \vdots \\ \frac{\partial f(X)}{\partial X_{1n}} & \cdots &\frac{\partial f(X)}{\partial X_{mn}}\\ \end{matrix} \right ] \in R^{n\times m} DXf(X)=⎣⎢⎢⎡∂X11∂f(X)⋮∂X1n∂f(X)⋯⋱⋯∂Xm1∂f(X)⋮∂Xmn∂f(X)⎦⎥⎥⎤∈Rn×m D X f ( X ) = ∂ ∂ X T f ( X ) = [ ∂ f ( X ) ∂ X j i ] j = 1 , i = 1 m , n D_Xf(X)=\frac {\partial}{\partial X^T} f(X)=\left [ \frac{\partial f(X)}{\partial X_{ji}}\right ] _{j=1,i=1}^{m,n} DXf(X)=∂XT∂f(X)=[∂Xji∂f(X)]j=1,i=1m,n |
备 注 备注 备注 | v e c T ( X ) = [ v e c ( X ) ] T vec^T(X)=\left [ vec(X) \right ]^T vecT(X)=[vec(X)]T D v e c T ( X ) f ( X ) 和 D X f ( X ) 分 别 为 矩 阵 形 式 定 义 实 值 标 量 函 数 f ( X ) D_{vec^T(X)}f(X)和D_Xf(X)分别为矩阵形式定义实值标量函数f(X) DvecT(X)f(X)和DXf(X)分别为矩阵形式定义实值标量函数f(X) 在 X 的 行 向 量 偏 导 和 J a c o b i a n 矩 阵 在X的行向量偏导和Jacobian矩阵 在X的行向量偏导和Jacobian矩阵 |
分母不变,分子变为转置。
∂ S c a l a r ∂ V e c t o r / M a t r i x \frac {\partial Scalar}{\partial Vector / Matrix} ∂Vector/Matrix∂Scalar
m × 1 m \times 1 m×1列向量偏导算子,习惯称之为梯度算子 ∇ x = d e f ∂ ∂ x = [ ∂ ∂ x 1 , ⋯ , ∂ ∂ x m ] T \nabla_x \overset {def}{=}\frac {\partial }{\partial x}= \left [ \frac {\partial }{\partial x_1}, \cdots , \frac {\partial }{\partial x_m} \right ] ^T ∇x=def∂x∂=[∂x1∂,⋯,∂xm∂]T
标 量 函 数 f ( x ) 的 列 向 量 偏 导 标量函数f(x)的列向量偏导 标量函数f(x)的列向量偏导 习 惯 称 之 为 标 量 函 数 f ( x ) 的 梯 度 矩 阵 习惯称之为标量函数f(x)的梯度矩阵 习惯称之为标量函数f(x)的梯度矩阵 | ∇ x f ( x ) = d e f ∂ ∂ x f ( x ) = [ ∂ f ( x ) ∂ x 1 , ⋯ , ∂ f ( x ) ∂ x m ] T \nabla_xf(x) \overset {def}{=} \frac {\partial}{\partial x} f(x)=\left [ \frac {\partial f(x)}{\partial x_1}, \cdots , \frac {\partial f(x)}{\partial x_m} \right ]^T ∇xf(x)=def∂x∂f(x)=[∂x1∂f(x),⋯,∂xm∂f(x)]T |
---|---|
矩 阵 形 式 定 义 实 值 标 量 函 数 f ( X ) 的 列 向 量 偏 导 矩阵形式定义实值标量函数f(X)的列向量偏导 矩阵形式定义实值标量函数f(X)的列向量偏导 习 惯 称 之 为 标 量 函 数 f ( x ) 的 梯 度 矩 阵 的 列 向 量 形 式 习惯称之为标量函数f(x)的梯度矩阵的列向量形式 习惯称之为标量函数f(x)的梯度矩阵的列向量形式 | v e c ( ∇ x f ( X ) ) = ∂ f ( X ) ∂ v e c ( X ) = [ ∂ f ( X ) ∂ X 11 , ⋯ , ∂ f ( X ) ∂ X m 1 , ⋯ , ∂ f ( X ) ∂ X 1 n , ⋯ , ∂ f ( X ) ∂ X m n ] T vec(\nabla_xf(X))= \frac {\partial f(X)}{\partial vec(X)}\\ = \left [\frac {\partial f(X)}{\partial X_{11}},\cdots, \frac {\partial f(X)}{\partial X_{m1}},\cdots, \frac {\partial f(X)}{\partial X_{1n}},\cdots, \frac {\partial f(X)}{\partial X_{mn}} \right ] ^T vec(∇xf(X))=∂vec(X)∂f(X)=[∂X11∂f(X),⋯,∂Xm1∂f(X),⋯,∂X1n∂f(X),⋯,∂Xmn∂f(X)]T |
矩 阵 形 式 定 义 实 值 标 量 函 数 f ( X ) 在 X 处 的 偏 导 矩阵形式定义实值标量函数f(X)在X处的偏导 矩阵形式定义实值标量函数f(X)在X处的偏导 习 惯 称 之 为 标 量 函 数 f ( X ) 在 X 处 的 梯 度 矩 阵 习惯称之为标量函数f(X)在X处的梯度矩阵 习惯称之为标量函数f(X)在X处的梯度矩阵 | ∇ X f ( X ) = [ ∂ f ( X ) ∂ X 11 ⋯ ∂ f ( X ) ∂ X 1 n ⋮ ⋱ ⋮ ∂ f ( X ) ∂ X m 1 ⋯ ∂ f ( X ) ∂ X m n ] ∈ R m × n \nabla_Xf(X) = \left [ \begin{matrix} \frac {\partial f(X)}{\partial X_{11}} & \cdots & \frac {\partial f(X)}{\partial X_{1n}} \\ \vdots & \ddots &\vdots \\ \frac {\partial f(X)}{\partial X_{m1}} & \cdots & \frac {\partial f(X)}{\partial X_{mn}} \end{matrix} \right ] \in R^{m \times n} ∇Xf(X)=⎣⎢⎢⎡∂X11∂f(X)⋮∂Xm1∂f(X)⋯⋱⋯∂X1n∂f(X)⋮∂Xmn∂f(X)⎦⎥⎥⎤∈Rm×n ∇ X f ( X ) = ∂ ∂ X f ( X ) = [ ∂ f ( X ) ∂ X i j ] i = 1 , j = 1 m , n \nabla_Xf(X)=\frac {\partial }{\partial X}f(X)=\left [ \frac{\partial f(X)}{\partial X_{ij}}\right ] _{i=1,j=1}^{m,n} ∇Xf(X)=∂X∂f(X)=[∂Xij∂f(X)]i=1,j=1m,n |
矩 阵 变 元 X 化 为 列 向 量 矩阵变元X化为列向量 矩阵变元X化为列向量 关 于 矩 阵 变 元 X 的 梯 度 算 子 关于矩阵变元X的梯度算子 关于矩阵变元X的梯度算子 | ∇ v e c ( X ) = ∂ ∂ v e c ( X ) = [ ∂ ∂ X 11 , ⋯ , ∂ ∂ X m 1 , ⋯ , ∂ ∂ X 1 n , ⋯ , ∂ ∂ X m n ] T \nabla_{vec(X)}= \frac {\partial}{\partial vec(X)}\\ = \left [ \frac {\partial}{\partial X_{11}},\cdots, \frac {\partial}{\partial X_{m1}},\cdots, \frac {\partial}{\partial X_{1n}},\cdots, \frac {\partial}{\partial X_{mn}} \right ] ^T ∇vec(X)=∂vec(X)∂=[∂X11∂,⋯,∂Xm1∂,⋯,∂X1n∂,⋯,∂Xmn∂]T |
显然, ∇ X f ( X ) = D X T f ( X ) \nabla_Xf(X)=D_X^Tf(X) ∇Xf(X)=DXTf(X),也就是说标量函数 f ( X ) f(X) f(X) 的梯度矩阵与 J a c o b i a n Jacobian Jacobian 矩阵是转置的关系。由于两者之间的转换关系,行向量形式的偏导向量是列向量形式的梯度向量的协变形式(covariant form of the gradient vector),又简称为协梯度向量(cogradient vector)。同理, J a c o b i a n Jacobian Jacobian 矩阵有时被称为梯度矩阵的协变形式或简称为协梯度矩阵。协梯度是一协变算子(covariant operator),它本身虽不是梯度,但却是梯度的转置。
有鉴于此, J a c o b i a n Jacobian Jacobian 算子 ∂ ∂ x T \frac {\partial}{\partial x^T} ∂xT∂ 和 ∂ ∂ X T \frac {\partial}{\partial X^T} ∂XT∂ 又称(行)偏导算子、梯度算子的协变形式或协梯度算子(cogradient operator)。
∇ X f ( X ) = u n v e c ( D v e c T ( X ) T f ( X ) ) \nabla_Xf(X)=unvec(D_{vec^T(X)}^Tf(X)) ∇Xf(X)=unvec(DvecT(X)Tf(X))
上式说明,标量函数 f ( X ) f(X) f(X) 的梯度矩阵由行向量偏导的转置(列向量形式)的矩阵化结果决定。
若 D v e c T ( X ) f ( X ) = [ d 1 , ⋯ , d m n ] D_{vec^T(X)}f(X)= [d_1, \cdots,d_{mn}] DvecT(X)f(X)=[d1,⋯,dmn],则梯度矩阵第 ( i , j ) (i,j) (i,j) 个元素
[ ∇ X f ( X ) ] i , j = d i + ( j − 1 ) n { i = 1 , ⋯ , m j = 1 , ⋯ , n [\nabla_Xf(X)]_{i,j}=d_{i+(j-1)n} \begin{cases}i=1,\cdots,m \\ j=1,\cdots,n \end{cases} [∇Xf(X)]i,j=di+(j−1)n{i=1,⋯,mj=1,⋯,n
梯度方向的负方向称为变元 x x x 的梯度流(gradient flow),记作 x ˙ = − ∇ x f ( x ) \dot x=-\nabla_xf(x) x˙=−∇xf(x)
从梯度向量的定义式可以看出:
(1)一个以向量为变元的实值标量函数的梯度为一列向量
(2)梯度向量的每个分量给出了标量函数在该分量方向上的变化率
重要性质:梯度向量指出了当变元增大时,实值标量函数 f ( x ) f(x) f(x) 的最大增大率。相反,梯度的负值(简称负梯度)则指出了当变元增大时函数 f ( x ) f(x) f(x) 的最大减小率。这是梯度下降法的基础。
简要的说明协梯度矩阵和梯度矩阵的关系:
对实值标量函数 f ( X ) f(X) f(X),变元为 X m × n X_{m \times n} Xm×n 矩阵来说,
D X f ( X ) 的 步 骤 为 : v e c ( X ) : [ X 11 , ⋯ , X m 1 , ⋯ , X 1 n , ⋯ , X m n ] T → v e c T ( X ) : [ X 11 , ⋯ , X m 1 , ⋯ , X 1 n , ⋯ , X m n ] → u n v e c ( v e c T ( X ) ) : [ X 11 ⋯ X m 1 ⋮ ⋱ ⋮ X 1 n ⋯ X m n ] D_Xf(X)的步骤为 : vec(X) :\left [ X_{11},\cdots, X_{m1},\cdots, X_{1n},\cdots, X_{mn}\right]^T\\ \to vec^T(X) :\left [ X_{11},\cdots, X_{m1},\cdots, X_{1n},\cdots, X_{mn}\right] \\ \to unvec(vec^T(X)):\left [ \begin{matrix} X_{11} &\cdots &X_{m1}\\ \vdots & \ddots & \vdots\\ X_{1n} & \cdots &X_{mn} \end{matrix} \right ] DXf(X)的步骤为:vec(X):[X11,⋯,Xm1,⋯,X1n,⋯,Xmn]T→vecT(X):[X11,⋯,Xm1,⋯,X1n,⋯,Xmn]→unvec(vecT(X)):⎣⎢⎡X11⋮X1n⋯⋱⋯Xm1⋮Xmn⎦⎥⎤
∇ X f ( X ) 的 步 骤 为 : v e c ( X ) : [ X 11 , ⋯ , X m 1 , ⋯ , X 1 n , ⋯ , X m n ] T → u n v e c ( v e c ( X ) ) : [ X 11 ⋯ X 1 n ⋮ ⋱ ⋮ X m 1 ⋯ X m n ] \nabla_Xf(X)的步骤为:vec(X):\left [ X_{11},\cdots, X_{m1},\cdots, X_{1n},\cdots, X_{mn}\right]^T\\ \to unvec(vec(X)):\left [ \begin{matrix} X_{11} &\cdots &X_{1n}\\ \vdots & \ddots & \vdots\\ X_{m1} & \cdots &X_{mn} \end{matrix} \right ] ∇Xf(X)的步骤为:vec(X):[X11,⋯,Xm1,⋯,X1n,⋯,Xmn]T→unvec(vec(X)):⎣⎢⎡X11⋮Xm1⋯⋱⋯X1n⋮Xmn⎦⎥⎤
因此, D T = ∇ D^T=\nabla DT=∇
∂ V e c t o r / M a t r i x ∂ V e c t o r / M a t r i x \frac {\partial Vector/Matrix}{\partial Vector/Matrix} ∂Vector/Matrix∂Vector/Matrix
对 p × 1 p \times 1 p×1 实值向量函数 f ( x ) = [ f 1 ( x ) , ⋯ , f p ( x ) ] T f(x)=[f_1(x),\cdots,f_p(x)]^T f(x)=[f1(x),⋯,fp(x)]T 的元素 f i ( x ) , i = 1 , ⋯ , p f_i(x),i=1,\cdots,p fi(x),i=1,⋯,p 使用实值标量函数的行向量偏导公式,可以直接定义实值向量函数的偏导如下:
D x f ( x ) = ∂ f ( x ) ∂ x T = [ ∂ f 1 ( x ) ∂ x T ⋮ ∂ f p ( x ) ∂ x T ] = [ ∂ f 1 ( x ) ∂ x 1 ⋯ ∂ f 1 ( x ) ∂ x m ⋮ ⋱ ⋮ ∂ f p ( x ) ∂ x 1 ⋯ ∂ f p ( x ) ∂ x m ] ∈ R p × m D_xf(x)=\frac{\partial f(x)}{\partial x^T}=\left [ \begin{array}{c} \frac{\partial f_1(x)}{\partial x^T}\\ \vdots \\ \frac{\partial f_p(x)}{\partial x^T} \end{array} \right ] =\left [\begin{matrix} \frac{\partial f_1(x)}{\partial x_1} & \cdots &\frac{\partial f_1(x)}{\partial x_m}\\ \vdots & \ddots & \vdots \\ \frac{\partial f_p(x)}{\partial x_1} & \cdots &\frac{\partial f_p(x)}{\partial x_m} \end{matrix} \right ] \in R^{p \times m} Dxf(x)=∂xT∂f(x)=⎣⎢⎡∂xT∂f1(x)⋮∂xT∂fp(x)⎦⎥⎤=⎣⎢⎢⎡∂x1∂f1(x)⋮∂x1∂fp(x)⋯⋱⋯∂xm∂f1(x)⋮∂xm∂fp(x)⎦⎥⎥⎤∈Rp×m
并称之为向量函数 f ( x ) f(x) f(x) 在 x x x 处的 J a c o b i a n Jacobian Jacobian 矩阵或协梯度矩阵,其第 ( i , j ) (i,j) (i,j)个元素定义为向量函数 f ( x ) f(x) f(x) 的第 i i i 个分量 f i ( x ) f_i(x) fi(x) 相当于向量变元 x x x 的第 j j j 个偏导,即 [ D x f ( x ) ] i j = ∂ f i ( x ) ∂ x j [D_xf(x)]_{ij}=\frac {\partial f_i(x)}{\partial x_j} [Dxf(x)]ij=∂xj∂fi(x)。
矩阵列向量化,再求偏导
实值矩阵函数 F ( X ) = [ F k l ] k = 1 , l = 1 p , q ∈ R p × q F(X)=[F_{kl}]_{k=1,l=1}^{p,q}\in R^{p \times q} F(X)=[Fkl]k=1,l=1p,q∈Rp×q 的情况下,其中,矩阵变元 X ∈ R m × n X\in R^{m \times n} X∈Rm×n。
为了使用向量函数的行向量偏导和 J a c o b i a n Jacobian Jacobian 矩阵的定义,需要预先通过列向量化,将 p × q p \times q p×q 矩阵函数转换成 p q × 1 pq \times 1 pq×1 列向量:
f ( v e c X ) = Δ v e c ( F ( X ) ) ∈ R p q × 1 = [ F 11 ( X ) , ⋯ , F p 1 ( X ) , ⋯ , F 1 q ( X ) , ⋯ , F p q ( X ) ] T f(vecX) \overset {\Delta}{=} vec(F(X))\in R^{pq \times 1} \\ =[F_{11}(X),\cdots,F_{p1}(X),\cdots,F_{1q}(X),\cdots,F_{pq}(X)]^T f(vecX)=Δvec(F(X))∈Rpq×1=[F11(X),⋯,Fp1(X),⋯,F1q(X),⋯,Fpq(X)]T
于是,矩阵函数 F ( X ) F(X) F(X) 的行向量偏导定义为
D v e c T ( X ) F ( X ) = Δ ∂ f ( v e c X ) ∂ v e c T ( X ) = ∂ v e c ( F ( X ) ) ∂ v e c T ( X ) ∈ R p q × m n p q × 1 D_{vec^T(X)}F(X) \overset {\Delta}{=}\frac {\partial f(vecX)}{\partial vec^T(X)}=\frac {\partial vec(F(X))}{\partial vec^T(X)}\in R^{pq \times mn}\\pq \times 1 DvecT(X)F(X)=Δ∂vecT(X)∂f(vecX)=∂vecT(X)∂vec(F(X))∈Rpq×mnpq×1 p q × 1 pq \times 1 pq×1 表示分子维数, 1 × m n 1 \times mn 1×mn 表示分母维数,根据分子布局来看整体维数为 p q × 1 × 1 × m n = p q × m n pq \times 1 \times 1 \times mn=pq\times mn pq×1×1×mn=pq×mn
其具体表达式为 D v e c T ( X ) F ( X ) = [ ∂ F 11 ∂ v e c T ( X ) , ⋯ , ∂ F p 1 ∂ v e c T ( X ) , ⋯ , ∂ F 1 q ∂ v e c T ( X ) , ⋯ , ∂ F p q ∂ v e c T ( X ) ] T = [ ∂ F 11 ∂ X 11 ⋯ ∂ F 11 ∂ X m 1 ⋯ ∂ F 11 ∂ X 1 n ⋯ ∂ F 11 ∂ X m n ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ∂ F p 1 ∂ X 11 ⋯ ∂ F p 1 ∂ X m 1 ⋯ ∂ F p 1 ∂ X 1 n ⋯ ∂ F p 1 ∂ X m n ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ∂ F 1 q ∂ X 11 ⋯ ∂ F 1 q ∂ X m 1 ⋯ ∂ F 1 q ∂ X 1 n ⋯ ∂ F 1 q ∂ X m n ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ∂ F p q ∂ X 11 ⋯ ∂ F p q ∂ X m 1 ⋯ ∂ F p q ∂ X 1 n ⋯ ∂ F p q ∂ X m n ] D_{vec^T(X)}F(X)\\=\left [\frac {\partial F_{11}}{\partial vec^T(X)},\cdots,\frac {\partial F_{p1}}{\partial vec^T(X)},\cdots,\frac {\partial F_{1q}}{\partial vec^T(X)},\cdots,\frac {\partial F_{pq}}{\partial vec^T(X)}\right ]^T \\= \left [ \begin{matrix} \frac {\partial F_{11}}{\partial X_{11}} & \cdots &\frac {\partial F_{11}}{\partial X_{m1}} & \cdots&\frac {\partial F_{11}}{\partial X_{1n}} & \cdots&\frac {\partial F_{11}}{\partial X_{mn}} \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ \frac {\partial F_{p1}}{\partial X_{11}} & \cdots &\frac {\partial F_{p1}}{\partial X_{m1}} & \cdots&\frac {\partial F_{p1}}{\partial X_{1n}} & \cdots&\frac {\partial F_{p1}}{\partial X_{mn}} \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ \frac {\partial F_{1q}}{\partial X_{11}} & \cdots &\frac {\partial F_{1q}}{\partial X_{m1}} & \cdots&\frac {\partial F_{1q}}{\partial X_{1n}} & \cdots&\frac {\partial F_{1q}}{\partial X_{mn}} \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ \frac {\partial F_{pq}}{\partial X_{11}} & \cdots &\frac {\partial F_{pq}}{\partial X_{m1}} & \cdots&\frac {\partial F_{pq}}{\partial X_{1n}} & \cdots&\frac {\partial F_{pq}}{\partial X_{mn}} \end{matrix} \right ] DvecT(X)F(X)=[∂vecT(X)∂F11,⋯,∂vecT(X)∂Fp1,⋯,∂vecT(X)∂F1q,⋯,∂vecT(X)∂Fpq]T=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∂X11∂F11⋮∂X11∂Fp1⋮∂X11∂F1q⋮∂X11∂Fpq⋯⋮⋯⋮⋯⋮⋯∂Xm1∂F11⋮∂Xm1∂Fp1⋮∂Xm1∂F1q⋮∂Xm1∂Fpq⋯⋮⋯⋮⋯⋮⋯∂X1n∂F11⋮∂X1n∂Fp1⋮∂X1n∂F1q⋮∂X1n∂Fpq⋯⋮⋯⋮⋯⋮⋯∂Xmn∂F11⋮∂Xmn∂Fp1⋮∂Xmn∂F1q⋮∂Xmn∂Fpq⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤
∂ S c a l a r ∂ V e c t o r \frac {\partial Scalar}{\partial Vector} ∂Vector∂Scalar
实值函数 f ( x ) f(x) f(x) 相对于 m × 1 m \times 1 m×1 实向量 x x x 的二阶偏导是一个由 m 2 m^2 m2 个二阶偏导组成的矩阵(称为 H e s s i a n Hessian Hessian 矩阵),定义为
∂ 2 f ( x ) ∂ x ∂ x T = ∂ ∂ x T [ ∂ f ( x ) ∂ x ] \frac {\partial ^2f(x)}{\partial x \partial x^T}=\frac {\partial}{\partial x^T}\left [\frac {\partial f(x)}{\partial x}\right] ∂x∂xT∂2f(x)=∂xT∂[∂x∂f(x)]或记作 ∇ x 2 f ( x ) = D x ( ∇ x f ( x ) ) = D x g ( x ) \nabla _x^2f(x)=D_x(\nabla _xf(x))=D_xg(x) ∇x2f(x)=Dx(∇xf(x))=Dxg(x)即实值标量函数 f ( x ) f(x) f(x) 的 H e s s i a n Hessian Hessian 矩阵是梯度向量函数 g ( x ) = ∇ x f ( x ) g(x)=\nabla _xf(x) g(x)=∇xf(x) 的协梯度矩阵( J a c o b i a n Jacobian Jacobian 矩阵)。
∂ S c a l a r ∂ M a t r i x \frac {\partial Scalar}{\partial Matrix} ∂Matrix∂Scalar
实值函数 f ( X ) f(X) f(X) 相对于 m × n m \times n m×n 实向量 X X X 的二阶偏导是一个由 m n mn mn 个二阶偏导组成的矩阵(称为 H e s s i a n Hessian Hessian 矩阵),定义为
∂ 2 f ( X ) ∂ X ∂ X T = ∂ ∂ X T [ ∂ f ( X ) ∂ X ] \frac {\partial ^2f(X)}{\partial X \partial X^T}=\frac {\partial}{\partial X^T}\left [\frac {\partial f(X)}{\partial X}\right] ∂X∂XT∂2f(X)=∂XT∂[∂X∂f(X)]或记作 ∇ X 2 f ( X ) = ∇ X T ( ∇ X f ( X ) ) = D X G ( X ) \nabla _X^2f(X)=\nabla_{X^T}(\nabla _Xf(X))=D_XG(X) ∇X2f(X)=∇XT(∇Xf(X))=DXG(X)即实值标量函数 f ( X ) f(X) f(X) 的 H e s s i a n Hessian Hessian 矩阵是梯度向量函数 G ( X ) = ∇ X f ( X ) G(X)=\nabla _Xf(X) G(X)=∇Xf(X) 的协梯度矩阵( J a c o b i a n Jacobian Jacobian 矩阵)。
多变量的全微分:函数 f ( x 1 , ⋯ , x m ) f(x_1,\cdots,x_m) f(x1,⋯,xm) 在点 ( x 1 , ⋯ , x m ) (x_1,\cdots,x_m) (x1,⋯,xm) 可微分,记为 d f ( x 1 , ⋯ , x m ) = ∂ f ∂ x 1 d x 1 + ⋯ + ∂ f ∂ x m d x m df(x_1,\cdots,x_m)=\frac{\partial f}{\partial x_1}dx_1+\cdots+\frac{\partial f}{\partial x_m}dx_m df(x1,⋯,xm)=∂x1∂fdx1+⋯+∂xm∂fdxm
实矩阵微分:
实值标量函数 f ( x ) f(x) f(x),变元为 x = [ x 1 , ⋯ , x m ] T ∈ R m x=[x_1,\cdots,x_m]^T \in R^m x=[x1,⋯,xm]T∈Rm
d f ( x 1 , ⋯ , x m ) = ∂ f ( x ) ∂ x 1 d x 1 + ⋯ + ∂ f ( x ) ∂ x m d x m = [ ∂ f ( x ) ∂ x 1 , ⋯ , ∂ f ( x ) ∂ x m ] [ d x 1 ⋮ d x m ] df(x_1,\cdots,x_m)=\frac{\partial f(x)}{\partial x_1}dx_1+\cdots+\frac{\partial f(x)}{\partial x_m}dx_m=\left [ \frac{\partial f(x)}{\partial x_1},\cdots,\frac{\partial f(x)}{\partial x_m}\right ] \left [ \begin{array}{c}dx_1\\ \vdots \\ dx_m \end{array} \right ] df(x1,⋯,xm)=∂x1∂f(x)dx1+⋯+∂xm∂f(x)dxm=[∂x1∂f(x),⋯,∂xm∂f(x)]⎣⎢⎡dx1⋮dxm⎦⎥⎤
或简记为 d f ( x ) = ∂ f ( x ) ∂ x T d x df(x)=\frac {\partial f(x)}{\partial x^T}dx df(x)=∂xT∂f(x)dx,其中 ∂ f ( x ) ∂ x T = [ ∂ f ( x ) ∂ x 1 , ⋯ , ∂ f ( x ) ∂ x m ] \frac {\partial f(x)}{\partial x^T}=\left[ \frac{\partial f(x)}{\partial x_1},\cdots,\frac{\partial f(x)}{\partial x_m}\right ] ∂xT∂f(x)=[∂x1∂f(x),⋯,∂xm∂f(x)]
d x = [ d x 1 , ⋯ , d x m ] T dx=[dx_1,\cdots,dx_m]^T dx=[dx1,⋯,dxm]T
实值标量函数 f ( X ) f(X) f(X),变元为 m × n m \times n m×n 实矩阵 X = [ x 1 , ⋯ , x n ] ∈ R m × n X=[x_1,\cdots,x_n] \in R^{m \times n} X=[x1,⋯,xn]∈Rm×n。记 x j = [ x 1 j , ⋯ , x m j ] T , j = 1 , ⋯ , n x_j=[x_{1j},\cdots,x_{mj}]^T,j=1,\cdots,n xj=[x1j,⋯,xmj]T,j=1,⋯,n
d f ( X ) = ∂ f ( X ) ∂ x 1 d x 1 + ⋯ + ∂ f ( X ) ∂ x n d x n = [ ∂ f ( X ) ∂ X 11 , ⋯ , ∂ f ( X ) ∂ X m 1 ] [ d X 11 ⋮ d X m 1 ] + ⋯ + [ ∂ f ( X ) ∂ X 1 n , ⋯ , ∂ f ( X ) ∂ X m n ] [ d X 1 n ⋮ d X m n ] = [ ∂ f ( X ) ∂ X 11 , ⋯ , ∂ f ( X ) ∂ X m 1 , ⋯ , ∂ f ( X ) ∂ X 1 n , ⋯ , ∂ f ( X ) ∂ X m n ] [ d X 11 ⋮ d X m 1 ⋮ d X 1 n ⋮ d X m n ] df(X)=\frac{\partial f(X)}{\partial x_1}dx_1+\cdots+\frac{\partial f(X)}{\partial x_n}dx_n\\=\left [ \frac{\partial f(X)}{\partial X_{11}},\cdots,\frac{\partial f(X)}{\partial X_{m1}}\right ] \left [ \begin{array}{c}dX_{11}\\ \vdots \\ dX_{m1} \end{array} \right ]+\cdots+\left [ \frac{\partial f(X)}{\partial X_{1n}},\cdots,\frac{\partial f(X)}{\partial X_{mn}}\right ] \left [ \begin{array}{c}dX_{1n}\\ \vdots \\ dX_{mn} \end{array} \right ]\\=\left[ \frac{\partial f(X)}{\partial X_{11}},\cdots,\frac{\partial f(X)}{\partial X_{m1}},\cdots,\frac{\partial f(X)}{\partial X_{1n}},\cdots,\frac{\partial f(X)}{\partial X_{mn}}\right ]\left[\begin{array}{c}dX_{11}\\ \vdots \\dX_{m1}\\ \vdots \\dX_{1n}\\ \vdots \\dX_{mn} \end{array} \right ] df(X)=∂x1∂f(X)dx1+⋯+∂xn∂f(X)dxn=[∂X11∂f(X),⋯,∂Xm1∂f(X)]⎣⎢⎡dX11⋮dXm1⎦⎥⎤+⋯+[∂X1n∂f(X),⋯,∂Xmn∂f(X)]⎣⎢⎡dX1n⋮dXmn⎦⎥⎤=[∂X11∂f(X),⋯,∂Xm1∂f(X),⋯,∂X1n∂f(X),⋯,∂Xmn∂f(X)]⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡dX11⋮dXm1⋮dX1n⋮dXmn⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤
或简记为 d f ( X ) = r v e c ( A ) ⋅ v e c ( d X ) df(X)=rvec(A)\cdot vec(dX) df(X)=rvec(A)⋅vec(dX)式中 r e v c ( A ) revc(A) revc(A) 是 J a c o b i a n Jacobian Jacobian 矩阵的行向量化,并且 d X = [ d X 11 ⋯ d X 1 n ⋮ ⋱ ⋮ d X m 1 ⋯ d X m n ] dX=\left[ \begin{matrix} dX_{11} &\cdots &dX_{1n}\\ \vdots &\ddots & \vdots\\ dX_{m1}& \cdots &dX_{mn} \end{matrix} \right ] dX=⎣⎢⎡dX11⋮dXm1⋯⋱⋯dX1n⋮dXmn⎦⎥⎤以及 A = D X f ( X ) = ∂ f ( X ) ∂ X T = [ ∂ f ( X ) ∂ X 11 ⋯ ∂ f ( X ) ∂ X m 1 ⋯ ⋱ ⋮ ∂ f ( X ) ∂ X 1 n ⋯ ∂ f ( X ) ∂ X m n ] A=D_Xf(X)=\frac {\partial f(X)}{\partial X^T}= \left[ \begin{matrix} \frac {\partial f(X)}{\partial X_{11}} & \cdots &\frac {\partial f(X)}{\partial X_{m1}}\\ \cdots & \ddots &\vdots \\ \frac {\partial f(X)}{\partial X_{1n}} & \cdots &\frac {\partial f(X)}{\partial X_{mn}} \end{matrix} \right ] A=DXf(X)=∂XT∂f(X)=⎣⎢⎢⎡∂X11∂f(X)⋯∂X1n∂f(X)⋯⋱⋯∂Xm1∂f(X)⋮∂Xmn∂f(X)⎦⎥⎥⎤
由 r v e c ( A ) = ( v e c ( A T ) ) T rvec(A)=(vec(A^T))^T rvec(A)=(vec(AT))T t r ( B T C ) = ( v e c ( B ) ) T v e c ( C ) tr(B^TC)=(vec(B))^Tvec(C) tr(BTC)=(vec(B))Tvec(C)得 d f ( X ) = r v e c ( A ) v e c ( d X ) = ( v e c ( A T ) ) T v e c ( d X ) = t r ( A d X ) df(X)=rvec(A)vec(dX)=(vec(A^T))^Tvec(dX)=tr(AdX) df(X)=rvec(A)vec(dX)=(vec(AT))Tvec(dX)=tr(AdX)
重要的是 ∇ X f ( X ) = ∂ f ( X ) ∂ X = [ ∂ f ( X ) ∂ X 11 ⋯ ∂ f ( X ) ∂ X 1 n ⋮ ⋱ ⋮ ∂ f ( X ) ∂ X m 1 ⋯ ∂ f ( X ) ∂ X m n ] = A T \nabla_Xf(X)=\frac{\partial f(X)}{\partial X}=\left[\begin{matrix} \frac{\partial f(X)}{\partial X_{11}} & \cdots & \frac{\partial f(X)}{\partial X_{1n}}\\ \vdots & \ddots &\vdots\\ \frac{\partial f(X)}{\partial X_{m1}} & \cdots & \frac{\partial f(X)}{\partial X_{mn}} \end{matrix} \right] =A^T ∇Xf(X)=∂X∂f(X)=⎣⎢⎢⎡∂X11∂f(X)⋮∂Xm1∂f(X)⋯⋱⋯