矩阵、概率等

相关数学

        • 行列式乘法
        • 奇异值分解(Singular Value Decomposition)
          • 分解步骤(SVD)
          • 几何意义(SVD)
        • QR 分解
        • Cholesky 分解
        • 协梯度矩阵
        • 梯度矩阵
        • 实值向量函数的协梯度矩阵
        • 实值矩阵函数的协梯度矩阵
        • 实值标量函数的 H e s s i a n Hessian Hessian 矩阵 (对称矩阵)
        • 实矩阵微分计算公式
        • 概率密度函数
        • 贝叶斯公式及推断

行列式乘法

左行右列:初等阵 P P P 左乘 A A A P A PA PA,相当于对 A A A 按照 P P P 的行元素,做了一次初等行变换;初等阵 Q Q Q 右乘 A A A A Q AQ AQ,相当于对 A A A 按照 Q Q Q 的列元素,做了一次初等列变换。


奇异值分解(Singular Value Decomposition)


定义:设矩阵 A ∈ C r m × n A \in C_r^{m\times n} ACrm×n λ i \lambda _i λi A A H ( A H A ) AA^H(A^HA) AAH(AHA) 的非零特征值,则称 σ i = λ i \sigma _i=\sqrt{\lambda _i} σi=λi A A A 的奇异值, i = 1 , 2 , ⋯   , r i=1,2,\cdots,r i=1,2,,r
定理:设矩阵 A ∈ C r m × n A \in C_r^{m\times n} ACrm×n ,则存在 U ∈ U m × m U \in U^{m\times m} UUm×m V ∈ U n × n V \in U^{n\times n} VUn×n,使得 A = U [ Δ 0 0 0 ] V H A=U \left [ \begin{matrix} \Delta &0\\ 0&0\end{matrix} \right]V^H A=U[Δ000]VH,其中 Δ = d i a g [ σ 1 , σ 2 , ⋯   , σ r ] \Delta=diag[\sigma _1,\sigma _2,\cdots,\sigma _r] Δ=diag[σ1,σ2,,σr] σ 1 ≥ σ 2 ≥ ⋯ ≥ σ r \sigma _1 \geq \sigma _2 \geq \cdots \geq \sigma _r σ1σ2σr A A A 的奇异值。
证明:因为 A A H AA^H AAH 是正规阵,所以存在 U ∈ U m × m U\in U^{m\times m} UUm×m ,使得 U H A A H U = d i a g [ σ 1 2 , σ 2 2 , ⋯   , σ r 2 , 0 , ⋯   , 0 ] = [ Δ Δ H 0 0 0 ] U^HAA^HU=diag[\sigma _1^2,\sigma _2^2, \cdots,\sigma _r^2,0,\cdots,0]=\left[\begin{matrix}\Delta \Delta ^H&0\\0&0\end{matrix}\right ] UHAAHU=diag[σ12,σ22,,σr2,0,,0]=[ΔΔH000]
σ 1 2 ≥ σ 2 2 ≥ ⋯ ≥ σ r 2 \sigma _1^2 \geq \sigma _2^2 \geq \cdots \geq\sigma _r^2 σ12σ22σr2
其中 Δ = d i a g [ σ 1 , σ 2 , ⋯   , σ r ] \Delta=diag[\sigma _1,\sigma _2,\cdots,\sigma _r] Δ=diag[σ1,σ2,,σr]。设 U = [ U 1 U 2 ] U=[\begin{matrix}U_1&U_2\end{matrix}] U=[U1U2],则 U H A A H U = [ U 1 H U 2 H ] A A H [ U 1 U 2 ] = [ U 1 H U 2 H ] [ A A H U 1 A A H U 2 ] = [ U 1 H A A H U 1 U 1 H A A H U 2 U 2 H A A H U 1 U 2 H A A H U 2 ] = [ Δ Δ H 0 0 0 ] \begin{aligned}U^HAA^HU&=\left[ \begin{matrix} U_1^H\\U_2^H\end{matrix}\right]AA^H\left[\begin{matrix}U_1&U_2\end{matrix}\right]\\&=\left[ \begin{matrix}U_1^H\\U_2^H\end{matrix}\right]\left[\begin{matrix}AA^HU_1&AA^HU_2\end{matrix}\right]\\&=\left[\begin{matrix}U_1^HAA^HU_1&U_1^HAA^HU_2\\U_2^HAA^HU_1&U_2^HAA^HU_2\end{matrix}\right]\\&=\left[\begin{matrix}\Delta \Delta ^H&0\\0&0\end{matrix}\right]\end{aligned} UHAAHU=[U1HU2H]AAH[U1U2]=[U1HU2H][AAHU1AAHU2]=[U1HAAHU1U2HAAHU1U1HAAHU2U2HAAHU2]=[ΔΔH000]故有 U 1 H A A H U 1 = Δ Δ H U 1 H A A H U 2 = 0 U 2 H A A H U 1 = 0 U 2 H A A H U 2 = 0 \begin{aligned}U_1^HAA^HU_1&=\Delta \Delta ^H \\U_1^HAA^HU_2&=0\\U_2^HAA^HU_1&=0\\U_2^HAA^HU_2&=0\end{aligned} U1HAAHU1U1HAAHU2U2HAAHU1U2HAAHU2=ΔΔH=0=0=0 V 1 = A H U 1 Δ − H V_1=A^HU_1\Delta ^{-H} V1=AHU1ΔH,则 V 1 H V 1 = Δ − 1 U 1 H A A H U 1 Δ − H V_1^HV_1=\Delta ^{-1}U_1^HAA^HU_1\Delta ^{-H} V1HV1=Δ1U1HAAHU1ΔH,由 U 1 H A A H U 1 = Δ Δ H U_1^HAA^HU_1=\Delta \Delta ^H U1HAAHU1=ΔΔH V 1 H V 1 = E r V_1^HV_1=E_r V1HV1=Er所以 V 1 V_1 V1 为次酉阵,即 V 1 ∈ U r n × r V_1 \in U_r^{n\times r} V1Urn×r,故存在 V 2 ∈ U n − r n × ( n − r ) V_2 \in U_{n-r}^{n\times (n-r)} V2Unrn×(nr),使得 V = [ V 1 V 2 ] ∈ U n × n V=[\begin{matrix}V_1&V_2\end{matrix}]\in U^{n\times n} V=[V1V2]Un×n,所以 U H A V = [ U 1 H U 2 H ] A   [ V 1 V 2 ] = [ U 1 H A V 1 U 1 H A V 2 U 2 H A V 1 U 2 H A V 2 ] U^HAV=\left [ \begin{matrix}U_1^H\\U_2^H\end{matrix}\right]A~[\begin{matrix}V_1&V_2\end{matrix}]=\left [ \begin{matrix} U_1^HAV_1&U_1^HAV_2\\U_2^HAV_1&U_2^HAV_2\end{matrix}\right] UHAV=[U1HU2H]A [V1V2]=[U1HAV1U2HAV1U1HAV2U2HAV2] U 1 H A A H U 1 = Δ Δ H U_1^HAA^HU_1=\Delta \Delta ^H U1HAAHU1=ΔΔH U 1 H A V 1 = U 1 H A A H U 1 Δ − H = Δ U_1^HAV_1=U_1^HAA^HU_1\Delta ^{-H}=\Delta U1HAV1=U1HAAHU1ΔH=Δ U 2 H A A H U 2 = ( A H U 2 ) H ( A H U 2 ) = 0 U_2^HAA^HU_2=(A^HU_2)^H(A^HU_2)=0 U2HAAHU2=(AHU2)H(AHU2)=0 A H U 2 = 0 , U 2 H A = 0 A^HU_2=0,U_2^HA=0 AHU2=0U2HA=0所以 U 2 H A V 1 = 0 , U 2 H A V 2 = 0 U_2^HAV_1=0,U_2^HAV_2=0 U2HAV1=0U2HAV2=0。又因为 V 1 = A H U 1 Δ − H ⇒ V 1 Δ H = A H U 1 ⇒ U 1 H A = Δ V 1 H V_1=A^HU_1\Delta ^{-H}\Rightarrow V_1\Delta ^H=A^HU_1 \Rightarrow U_1^HA=\Delta V_1^H V1=AHU1ΔHV1ΔH=AHU1U1HA=ΔV1H所以 U 1 H A V 2 = Δ V 1 H V 2 = 0 U_1^HAV_2=\Delta V_1^HV_2=0 U1HAV2=ΔV1HV2=0。故 U H A V = [ Δ 0 0 0 ] U^HAV=\left[\begin{matrix} \Delta&0\\0&0\end{matrix}\right] UHAV=[Δ000],即 A = U [ Δ 0 0 0 ] V H A=U\left[\begin{matrix} \Delta&0\\0&0\end{matrix}\right]V^H A=U[Δ000]VH


分解步骤(SVD)

  • 求出 A A H ( A H A ) AA^H(A^HA) AAH(AHA) 全部非零特征值 λ i \lambda _i λi,记 Δ = d i a g [ σ 1 , σ 2 , ⋯   , σ r ] \Delta=diag[\sigma _1,\sigma _2,\cdots,\sigma _r] Δ=diag[σ1,σ2,,σr],且 σ 1 ≥ σ 2 ≥ ⋯ ≥ σ r \sigma _1 \geq \sigma _2 \geq \cdots \geq \sigma _r σ1σ2σr A A A 的正奇异值。
  • 求酉矩阵 U ∈ U m × m ( V ∈ V n × n ) U \in U^{m\times m}(V \in V^{n\times n}) UUm×m(VVn×n),使得 U H A A H U = d i a g [ σ 1 2 , σ 2 2 , ⋯   , σ r 2 , 0 , ⋯   , 0 ] ( V H A H A V = d i a g [ σ 1 2 , σ 2 2 , ⋯   , σ r 2 , 0 , ⋯   , 0 ] ) U^HAA^HU=diag[\sigma _1^2,\sigma _2^2, \cdots,\sigma _r^2,0,\cdots,0]\\(V^HA^HAV=diag[\sigma _1^2,\sigma _2^2, \cdots,\sigma _r^2,0,\cdots,0]) UHAAHU=diag[σ12,σ22,,σr2,0,,0](VHAHAV=diag[σ12,σ22,,σr2,0,,0])
  • U = [ U 1 U 2 ] ( V = [ V 1 V 2 ] ) U=[\begin{matrix}U_1&U_2\end{matrix}](V=[\begin{matrix}V_1&V_2\end{matrix}]) U=[U1U2](V=[V1V2]),其中 U 1 ( V 1 ) U_1(V_1) U1(V1) U ( V ) U(V) U(V) 的前 r r r 列,令 V 1 = A H U 1 Δ − H ( U 1 = A V 1 Δ − 1 ) V_1=A^HU_1\Delta ^{-H}(U_1=AV_1\Delta ^{-1}) V1=AHU1ΔH(U1=AV1Δ1),则 V 1 ( U 1 ) V_1(U_1) V1(U1) 为次酉阵,求 V 2 ∈ U n − r n × ( n − r ) ( U 2 ∈ U m − r m × ( m − r ) ) V_2\in U_{n-r}^{n\times (n-r)}(U_2\in U_{m-r}^{m\times (m-r)}) V2Unrn×(nr)(U2Umrm×(mr)),使得 V = [ V 1 V 2 ] ∈ U n × n ( U = [ U 1 U 2 ] ∈ U m × m ) V=[\begin{matrix}V_1&V_2\end{matrix}]\in U^{n\times n}(U=[\begin{matrix}U_1&U_2\end{matrix} ]\in U^{m\times m}) V=[V1V2]Un×n(U=[U1U2]Um×m)
  • A = U [ Δ 0 0 0 ] V H A=U\left[\begin{matrix} \Delta&0\\0&0\end{matrix}\right]V^H A=U[Δ000]VH

几何意义(SVD)

Δ \Delta Δ 视为放缩矩阵, U U U V V V 视为旋转矩阵。
因此,对于 M = A N M=AN M=AN A = U Δ V H A=U\Delta V^H A=UΔVH 可以直观的解释为一个图像 N N N 经过 A A A 变换成另一个图像 M M M 的过程,首先经过 V H V^H VH 的旋转,再经过 Δ \Delta Δ 的放缩,最后在经过 U U U 的旋转得到最终的图像。


QR 分解


Cholesky 分解


变元:在初等数学里,变量或变元、元是一个用来表示值的符号,该值可以是随意的,也可能是未指定或未定的。

无特殊标记下,大多数提到的向量都是列向量。
v e c ( ⋅ ) vec(\cdot) vec() 表示矩阵化为向量, r v e c ( ⋅ ) rvec(\cdot) rvec() 表示矩阵化为行向量。
v e c ( ⋅ ) vec(\cdot) vec() 称为向量化算子, v e c ( ⋅ ) vec(\cdot) vec() 又分为按行展开和按列展开。
u n v e c ( ⋅ ) unvec(\cdot) unvec() 表示向量化为矩阵, u n r v e c ( ⋅ ) unrvec(\cdot) unrvec() 表示行向量化为矩阵。

设矩阵 A = ( a i j ) ∈ R m × n A=(a_{ij})\in R_{m\times n} A=(aij)Rm×n,把矩阵 A A A 的元素按的顺序排列成一个向量:
v e c A = ( a 11 , a 12 , ⋯   , a 1 n , a 21 , a 22 , ⋯   , a 2 n , ⋯   , a m 1 , a m 2 , ⋯   , a m n ) T vecA=(a_{11},a_{12},\cdots,a_{1n},a_{21},a_{22},\cdots,a_{2n},\cdots,a_{m1},a_{m2},\cdots,a_{mn})^T vecA=(a11,a12,,a1n,a21,a22,,a2n,,am1,am2,,amn)T,则称向量 v e c A vecA vecA 为矩阵 A A A 按行展开的向量。

设矩阵 A = ( a i j ) ∈ R m × n A=(a_{ij})\in R_{m\times n} A=(aij)Rm×n,把矩阵 A A A 的元素按的顺序排列成一个向量:
v e c A = ( a 11 , a 21 , ⋯   , a n 1 , a 12 , a 22 , ⋯   , a n 2 , ⋯   , a 1 n , a 2 n , ⋯   , a m n ) T vecA=(a_{11},a_{21},\cdots,a_{n1},a_{12},a_{22},\cdots,a_{n2},\cdots,a_{1n},a_{2n},\cdots,a_{mn})^T vecA=(a11,a21,,an1,a12,a22,,an2,,a1n,a2n,,amn)T,则称向量 v e c A vecA vecA 为矩阵 A A A 按列展开的向量。

向量对向量求偏导,才涉及到分子、分母布局。
分子布局(称为 J a c o b i a n Jacobian Jacobian 形式)。比如 y m × 1 y_{m\times 1} ym×1 x n × 1 x_{n\times 1} xn×1,则 J a c o b i a n Jacobian Jacobian 形式为 ∂ y ∂ x T \frac {\partial y}{\partial x^T} xTy 即按照 y y y x T x^T xT 的维数相乘为 m × 1 × 1 × n = m × n m \times 1 \times 1 \times n = m\times n m×1×1×n=m×n,因为分子没有变化(转置),所以称为分子布局(个人记法)。
分母布局(称为 H e s s i a n Hessian Hessian 形式)。比如 y m × 1 y_{m\times 1} ym×1 x n × 1 x_{n\times 1} xn×1,则 H e s s i a n Hessian Hessian 形式为 ∂ y T ∂ x \frac {\partial y^T}{\partial x} xyT 即按照 x x x y T y^T yT 的维数相乘为 n × 1 × 1 × m = n × m n \times 1 \times 1 \times m = n\times m n×1×1×m=n×m,因为分母没有变化(转置),所以称为分母布局(个人记法)。有的也称该布局为梯度,区别于 J a c o b i a n Jacobian Jacobian 形式。


协梯度矩阵

分子不变,分母变为转置。

∂ S c a l a r ∂ V e c t o r / M a t r i x \frac {\partial Scalar}{\partial Vector / Matrix} Vector/MatrixScalar
1 × m 1\times m 1×m 行向量的偏导算子 D x = d e f ∂ ∂ x T = Δ [ ∂ ∂ x 1 , ⋯   , ∂ ∂ x m ] D_x \overset {def}{=}\frac {\partial}{\partial x^T}\overset {\Delta }{=}\left [ \frac{\partial }{\partial x_1} , \cdots , \frac {\partial}{\partial x_m} \right ] Dx=defxT=Δ[x1,,xm]

标 量 函 数 f ( x ) 的 行 向 量 偏 导 标量函数f(x)的行向量偏导 f(x) D x f ( x ) = ∂ ∂ x T f ( x ) = [ ∂ f ( x ) ∂ x 1 , ⋯   , ∂ f ( x ) ∂ x m ] D_x f(x)=\frac{\partial}{\partial x^T} f(x)=\left [ \frac {\partial f(x)}{\partial x_1},\cdots,\frac{\partial f(x)}{\partial x_m} \right ] Dxf(x)=xTf(x)=[x1f(x),,xmf(x)]
矩 阵 形 式 定 义 实 值 标 量 函 数 f ( X ) 的 行 向 量 偏 导 矩阵形式定义实值标量函数f(X)的行向量偏导 f(X) D v e c T ( X ) f ( X ) = ∂ ∂ v e c T ( X ) f ( X ) = [ ∂ f ( X ) ∂ X 11 , ⋯   , ∂ f ( X ) ∂ X m 1 , ⋯   , ∂ f ( X ) ∂ X 1 n , ⋯   , ∂ f ( X ) ∂ X m n ] D_{vec^T(X)}f(X)=\frac{\partial }{\partial vec^T(X)} f(X)\\ =\left [ \frac {\partial f(X)}{\partial X_{11}}, \cdots ,\frac {\partial f(X)}{\partial X_{m1}}, \cdots,\frac {\partial f(X)}{\partial X_{1n}}, \cdots , \frac {\partial f(X)}{\partial X_{mn}} \right ] DvecT(X)f(X)=vecT(X)f(X)=[X11f(X),,Xm1f(X),,X1nf(X),,Xmnf(X)]
矩 阵 形 式 定 义 实 值 标 量 函 数 f ( X ) 在 X 处 的 偏 导 矩阵形式定义实值标量函数f(X)在X处的偏导 f(X)X D X f ( X ) = [ ∂ f ( X ) ∂ X 11 ⋯ ∂ f ( X ) ∂ X m 1 ⋮ ⋱ ⋮ ∂ f ( X ) ∂ X 1 n ⋯ ∂ f ( X ) ∂ X m n ] ∈ R n × m D_Xf(X)=\left [ \begin{matrix}\frac{\partial f(X)}{\partial X_{11}} & \cdots &\frac{\partial f(X)}{\partial X_{m1}}\\ \vdots & \ddots & \vdots \\ \frac{\partial f(X)}{\partial X_{1n}} & \cdots &\frac{\partial f(X)}{\partial X_{mn}}\\ \end{matrix} \right ] \in R^{n\times m} DXf(X)=X11f(X)X1nf(X)Xm1f(X)Xmnf(X)Rn×m D X f ( X ) = ∂ ∂ X T f ( X ) = [ ∂ f ( X ) ∂ X j i ] j = 1 , i = 1 m , n D_Xf(X)=\frac {\partial}{\partial X^T} f(X)=\left [ \frac{\partial f(X)}{\partial X_{ji}}\right ] _{j=1,i=1}^{m,n} DXf(X)=XTf(X)=[Xjif(X)]j=1,i=1m,n
备 注 备注 v e c T ( X ) = [ v e c ( X ) ] T vec^T(X)=\left [ vec(X) \right ]^T vecT(X)=[vec(X)]T D v e c T ( X ) f ( X ) 和 D X f ( X ) 分 别 为 矩 阵 形 式 定 义 实 值 标 量 函 数 f ( X ) D_{vec^T(X)}f(X)和D_Xf(X)分别为矩阵形式定义实值标量函数f(X) DvecT(X)f(X)DXf(X)f(X) 在 X 的 行 向 量 偏 导 和 J a c o b i a n 矩 阵 在X的行向量偏导和Jacobian矩阵 XJacobian

梯度矩阵

分母不变,分子变为转置。

∂ S c a l a r ∂ V e c t o r / M a t r i x \frac {\partial Scalar}{\partial Vector / Matrix} Vector/MatrixScalar

m × 1 m \times 1 m×1列向量偏导算子,习惯称之为梯度算子 ∇ x = d e f ∂ ∂ x = [ ∂ ∂ x 1 , ⋯   , ∂ ∂ x m ] T \nabla_x \overset {def}{=}\frac {\partial }{\partial x}= \left [ \frac {\partial }{\partial x_1}, \cdots , \frac {\partial }{\partial x_m} \right ] ^T x=defx=[x1,,xm]T

标 量 函 数 f ( x ) 的 列 向 量 偏 导 标量函数f(x)的列向量偏导 f(x) 习 惯 称 之 为 标 量 函 数 f ( x ) 的 梯 度 矩 阵 习惯称之为标量函数f(x)的梯度矩阵 f(x) ∇ x f ( x ) = d e f ∂ ∂ x f ( x ) = [ ∂ f ( x ) ∂ x 1 , ⋯   , ∂ f ( x ) ∂ x m ] T \nabla_xf(x) \overset {def}{=} \frac {\partial}{\partial x} f(x)=\left [ \frac {\partial f(x)}{\partial x_1}, \cdots , \frac {\partial f(x)}{\partial x_m} \right ]^T xf(x)=defxf(x)=[x1f(x),,xmf(x)]T
矩 阵 形 式 定 义 实 值 标 量 函 数 f ( X ) 的 列 向 量 偏 导 矩阵形式定义实值标量函数f(X)的列向量偏导 f(X) 习 惯 称 之 为 标 量 函 数 f ( x ) 的 梯 度 矩 阵 的 列 向 量 形 式 习惯称之为标量函数f(x)的梯度矩阵的列向量形式 f(x) v e c ( ∇ x f ( X ) ) = ∂ f ( X ) ∂ v e c ( X ) = [ ∂ f ( X ) ∂ X 11 , ⋯   , ∂ f ( X ) ∂ X m 1 , ⋯   , ∂ f ( X ) ∂ X 1 n , ⋯   , ∂ f ( X ) ∂ X m n ] T vec(\nabla_xf(X))= \frac {\partial f(X)}{\partial vec(X)}\\ = \left [\frac {\partial f(X)}{\partial X_{11}},\cdots, \frac {\partial f(X)}{\partial X_{m1}},\cdots, \frac {\partial f(X)}{\partial X_{1n}},\cdots, \frac {\partial f(X)}{\partial X_{mn}} \right ] ^T vec(xf(X))=vec(X)f(X)=[X11f(X),,Xm1f(X),,X1nf(X),,Xmnf(X)]T
矩 阵 形 式 定 义 实 值 标 量 函 数 f ( X ) 在 X 处 的 偏 导 矩阵形式定义实值标量函数f(X)在X处的偏导 f(X)X 习 惯 称 之 为 标 量 函 数 f ( X ) 在 X 处 的 梯 度 矩 阵 习惯称之为标量函数f(X)在X处的梯度矩阵 f(X)X ∇ X f ( X ) = [ ∂ f ( X ) ∂ X 11 ⋯ ∂ f ( X ) ∂ X 1 n ⋮ ⋱ ⋮ ∂ f ( X ) ∂ X m 1 ⋯ ∂ f ( X ) ∂ X m n ] ∈ R m × n \nabla_Xf(X) = \left [ \begin{matrix} \frac {\partial f(X)}{\partial X_{11}} & \cdots & \frac {\partial f(X)}{\partial X_{1n}} \\ \vdots & \ddots &\vdots \\ \frac {\partial f(X)}{\partial X_{m1}} & \cdots & \frac {\partial f(X)}{\partial X_{mn}} \end{matrix} \right ] \in R^{m \times n} Xf(X)=X11f(X)Xm1f(X)X1nf(X)Xmnf(X)Rm×n ∇ X f ( X ) = ∂ ∂ X f ( X ) = [ ∂ f ( X ) ∂ X i j ] i = 1 , j = 1 m , n \nabla_Xf(X)=\frac {\partial }{\partial X}f(X)=\left [ \frac{\partial f(X)}{\partial X_{ij}}\right ] _{i=1,j=1}^{m,n} Xf(X)=Xf(X)=[Xijf(X)]i=1,j=1m,n
矩 阵 变 元 X 化 为 列 向 量 矩阵变元X化为列向量 X 关 于 矩 阵 变 元 X 的 梯 度 算 子 关于矩阵变元X的梯度算子 X ∇ v e c ( X ) = ∂ ∂ v e c ( X ) = [ ∂ ∂ X 11 , ⋯   , ∂ ∂ X m 1 , ⋯   , ∂ ∂ X 1 n , ⋯   , ∂ ∂ X m n ] T \nabla_{vec(X)}= \frac {\partial}{\partial vec(X)}\\ = \left [ \frac {\partial}{\partial X_{11}},\cdots, \frac {\partial}{\partial X_{m1}},\cdots, \frac {\partial}{\partial X_{1n}},\cdots, \frac {\partial}{\partial X_{mn}} \right ] ^T vec(X)=vec(X)=[X11,,Xm1,,X1n,,Xmn]T

显然, ∇ X f ( X ) = D X T f ( X ) \nabla_Xf(X)=D_X^Tf(X) Xf(X)=DXTf(X),也就是说标量函数 f ( X ) f(X) f(X) 的梯度矩阵与 J a c o b i a n Jacobian Jacobian 矩阵是转置的关系。由于两者之间的转换关系,行向量形式的偏导向量是列向量形式的梯度向量的协变形式(covariant form of the gradient vector),又简称为协梯度向量(cogradient vector)。同理, J a c o b i a n Jacobian Jacobian 矩阵有时被称为梯度矩阵的协变形式或简称为协梯度矩阵。协梯度是一协变算子(covariant operator),它本身虽不是梯度,但却是梯度的转置。
有鉴于此, J a c o b i a n Jacobian Jacobian 算子 ∂ ∂ x T \frac {\partial}{\partial x^T} xT ∂ ∂ X T \frac {\partial}{\partial X^T} XT 又称(行)偏导算子、梯度算子的协变形式或协梯度算子(cogradient operator)。
∇ X f ( X ) = u n v e c ( D v e c T ( X ) T f ( X ) ) \nabla_Xf(X)=unvec(D_{vec^T(X)}^Tf(X)) Xf(X)=unvec(DvecT(X)Tf(X))
上式说明,标量函数 f ( X ) f(X) f(X) 的梯度矩阵由行向量偏导的转置(列向量形式)的矩阵化结果决定。

D v e c T ( X ) f ( X ) = [ d 1 , ⋯   , d m n ] D_{vec^T(X)}f(X)= [d_1, \cdots,d_{mn}] DvecT(X)f(X)=[d1,,dmn],则梯度矩阵第 ( i , j ) (i,j) (i,j) 个元素
[ ∇ X f ( X ) ] i , j = d i + ( j − 1 ) n { i = 1 , ⋯   , m j = 1 , ⋯   , n [\nabla_Xf(X)]_{i,j}=d_{i+(j-1)n} \begin{cases}i=1,\cdots,m \\ j=1,\cdots,n \end{cases} [Xf(X)]i,j=di+(j1)n{i=1,,mj=1,,n
梯度方向的负方向称为变元 x x x 的梯度流(gradient flow),记作 x ˙ = − ∇ x f ( x ) \dot x=-\nabla_xf(x) x˙=xf(x)

从梯度向量的定义式可以看出:
(1)一个以向量为变元的实值标量函数的梯度为一列向量
(2)梯度向量的每个分量给出了标量函数在该分量方向上的变化率

重要性质:梯度向量指出了当变元增大时,实值标量函数 f ( x ) f(x) f(x) 的最大增大率。相反,梯度的负值(简称负梯度)则指出了当变元增大时函数 f ( x ) f(x) f(x) 的最大减小率。这是梯度下降法的基础。

简要的说明协梯度矩阵和梯度矩阵的关系:
对实值标量函数 f ( X ) f(X) f(X),变元为 X m × n X_{m \times n} Xm×n 矩阵来说,
D X f ( X ) 的 步 骤 为 : v e c ( X ) : [ X 11 , ⋯   , X m 1 , ⋯   , X 1 n , ⋯   , X m n ] T → v e c T ( X ) : [ X 11 , ⋯   , X m 1 , ⋯   , X 1 n , ⋯   , X m n ] → u n v e c ( v e c T ( X ) ) : [ X 11 ⋯ X m 1 ⋮ ⋱ ⋮ X 1 n ⋯ X m n ] D_Xf(X)的步骤为 : vec(X) :\left [ X_{11},\cdots, X_{m1},\cdots, X_{1n},\cdots, X_{mn}\right]^T\\ \to vec^T(X) :\left [ X_{11},\cdots, X_{m1},\cdots, X_{1n},\cdots, X_{mn}\right] \\ \to unvec(vec^T(X)):\left [ \begin{matrix} X_{11} &\cdots &X_{m1}\\ \vdots & \ddots & \vdots\\ X_{1n} & \cdots &X_{mn} \end{matrix} \right ] DXf(X):vec(X):[X11,,Xm1,,X1n,,Xmn]TvecT(X):[X11,,Xm1,,X1n,,Xmn]unvec(vecT(X)):X11X1nXm1Xmn

∇ X f ( X ) 的 步 骤 为 : v e c ( X ) : [ X 11 , ⋯   , X m 1 , ⋯   , X 1 n , ⋯   , X m n ] T → u n v e c ( v e c ( X ) ) : [ X 11 ⋯ X 1 n ⋮ ⋱ ⋮ X m 1 ⋯ X m n ] \nabla_Xf(X)的步骤为:vec(X):\left [ X_{11},\cdots, X_{m1},\cdots, X_{1n},\cdots, X_{mn}\right]^T\\ \to unvec(vec(X)):\left [ \begin{matrix} X_{11} &\cdots &X_{1n}\\ \vdots & \ddots & \vdots\\ X_{m1} & \cdots &X_{mn} \end{matrix} \right ] Xf(X)vec(X):[X11,,Xm1,,X1n,,Xmn]Tunvec(vec(X)):X11Xm1X1nXmn

因此, D T = ∇ D^T=\nabla DT=


∂ V e c t o r / M a t r i x ∂ V e c t o r / M a t r i x \frac {\partial Vector/Matrix}{\partial Vector/Matrix} Vector/MatrixVector/Matrix


实值向量函数的协梯度矩阵


p × 1 p \times 1 p×1 实值向量函数 f ( x ) = [ f 1 ( x ) , ⋯   , f p ( x ) ] T f(x)=[f_1(x),\cdots,f_p(x)]^T f(x)=[f1(x),,fp(x)]T 的元素 f i ( x ) , i = 1 , ⋯   , p f_i(x),i=1,\cdots,p fi(x),i=1,,p 使用实值标量函数的行向量偏导公式,可以直接定义实值向量函数的偏导如下:

D x f ( x ) = ∂ f ( x ) ∂ x T = [ ∂ f 1 ( x ) ∂ x T ⋮ ∂ f p ( x ) ∂ x T ] = [ ∂ f 1 ( x ) ∂ x 1 ⋯ ∂ f 1 ( x ) ∂ x m ⋮ ⋱ ⋮ ∂ f p ( x ) ∂ x 1 ⋯ ∂ f p ( x ) ∂ x m ] ∈ R p × m D_xf(x)=\frac{\partial f(x)}{\partial x^T}=\left [ \begin{array}{c} \frac{\partial f_1(x)}{\partial x^T}\\ \vdots \\ \frac{\partial f_p(x)}{\partial x^T} \end{array} \right ] =\left [\begin{matrix} \frac{\partial f_1(x)}{\partial x_1} & \cdots &\frac{\partial f_1(x)}{\partial x_m}\\ \vdots & \ddots & \vdots \\ \frac{\partial f_p(x)}{\partial x_1} & \cdots &\frac{\partial f_p(x)}{\partial x_m} \end{matrix} \right ] \in R^{p \times m} Dxf(x)=xTf(x)=xTf1(x)xTfp(x)=x1f1(x)x1fp(x)xmf1(x)xmfp(x)Rp×m

并称之为向量函数 f ( x ) f(x) f(x) x x x 处的 J a c o b i a n Jacobian Jacobian 矩阵或协梯度矩阵,其第 ( i , j ) (i,j) (i,j)个元素定义为向量函数 f ( x ) f(x) f(x) 的第 i i i 个分量 f i ( x ) f_i(x) fi(x) 相当于向量变元 x x x 的第 j j j 个偏导,即 [ D x f ( x ) ] i j = ∂ f i ( x ) ∂ x j [D_xf(x)]_{ij}=\frac {\partial f_i(x)}{\partial x_j} [Dxf(x)]ij=xjfi(x)


实值矩阵函数的协梯度矩阵


矩阵列向量化,再求偏导

实值矩阵函数 F ( X ) = [ F k l ] k = 1 , l = 1 p , q ∈ R p × q F(X)=[F_{kl}]_{k=1,l=1}^{p,q}\in R^{p \times q} F(X)=[Fkl]k=1,l=1p,qRp×q 的情况下,其中,矩阵变元 X ∈ R m × n X\in R^{m \times n} XRm×n

为了使用向量函数的行向量偏导和 J a c o b i a n Jacobian Jacobian 矩阵的定义,需要预先通过列向量化,将 p × q p \times q p×q 矩阵函数转换成 p q × 1 pq \times 1 pq×1 列向量:
f ( v e c X ) = Δ v e c ( F ( X ) ) ∈ R p q × 1 = [ F 11 ( X ) , ⋯   , F p 1 ( X ) , ⋯   , F 1 q ( X ) , ⋯   , F p q ( X ) ] T f(vecX) \overset {\Delta}{=} vec(F(X))\in R^{pq \times 1} \\ =[F_{11}(X),\cdots,F_{p1}(X),\cdots,F_{1q}(X),\cdots,F_{pq}(X)]^T f(vecX)=Δvec(F(X))Rpq×1=[F11(X),,Fp1(X),,F1q(X),,Fpq(X)]T

于是,矩阵函数 F ( X ) F(X) F(X) 的行向量偏导定义为
D v e c T ( X ) F ( X ) = Δ ∂ f ( v e c X ) ∂ v e c T ( X ) = ∂ v e c ( F ( X ) ) ∂ v e c T ( X ) ∈ R p q × m n p q × 1 D_{vec^T(X)}F(X) \overset {\Delta}{=}\frac {\partial f(vecX)}{\partial vec^T(X)}=\frac {\partial vec(F(X))}{\partial vec^T(X)}\in R^{pq \times mn}\\pq \times 1 DvecT(X)F(X)=ΔvecT(X)f(vecX)=vecT(X)vec(F(X))Rpq×mnpq×1 p q × 1 pq \times 1 pq×1 表示分子维数, 1 × m n 1 \times mn 1×mn 表示分母维数,根据分子布局来看整体维数为 p q × 1 × 1 × m n = p q × m n pq \times 1 \times 1 \times mn=pq\times mn pq×1×1×mn=pq×mn

其具体表达式为 D v e c T ( X ) F ( X ) = [ ∂ F 11 ∂ v e c T ( X ) , ⋯   , ∂ F p 1 ∂ v e c T ( X ) , ⋯   , ∂ F 1 q ∂ v e c T ( X ) , ⋯   , ∂ F p q ∂ v e c T ( X ) ] T = [ ∂ F 11 ∂ X 11 ⋯ ∂ F 11 ∂ X m 1 ⋯ ∂ F 11 ∂ X 1 n ⋯ ∂ F 11 ∂ X m n ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ∂ F p 1 ∂ X 11 ⋯ ∂ F p 1 ∂ X m 1 ⋯ ∂ F p 1 ∂ X 1 n ⋯ ∂ F p 1 ∂ X m n ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ∂ F 1 q ∂ X 11 ⋯ ∂ F 1 q ∂ X m 1 ⋯ ∂ F 1 q ∂ X 1 n ⋯ ∂ F 1 q ∂ X m n ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ∂ F p q ∂ X 11 ⋯ ∂ F p q ∂ X m 1 ⋯ ∂ F p q ∂ X 1 n ⋯ ∂ F p q ∂ X m n ] D_{vec^T(X)}F(X)\\=\left [\frac {\partial F_{11}}{\partial vec^T(X)},\cdots,\frac {\partial F_{p1}}{\partial vec^T(X)},\cdots,\frac {\partial F_{1q}}{\partial vec^T(X)},\cdots,\frac {\partial F_{pq}}{\partial vec^T(X)}\right ]^T \\= \left [ \begin{matrix} \frac {\partial F_{11}}{\partial X_{11}} & \cdots &\frac {\partial F_{11}}{\partial X_{m1}} & \cdots&\frac {\partial F_{11}}{\partial X_{1n}} & \cdots&\frac {\partial F_{11}}{\partial X_{mn}} \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ \frac {\partial F_{p1}}{\partial X_{11}} & \cdots &\frac {\partial F_{p1}}{\partial X_{m1}} & \cdots&\frac {\partial F_{p1}}{\partial X_{1n}} & \cdots&\frac {\partial F_{p1}}{\partial X_{mn}} \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ \frac {\partial F_{1q}}{\partial X_{11}} & \cdots &\frac {\partial F_{1q}}{\partial X_{m1}} & \cdots&\frac {\partial F_{1q}}{\partial X_{1n}} & \cdots&\frac {\partial F_{1q}}{\partial X_{mn}} \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ \frac {\partial F_{pq}}{\partial X_{11}} & \cdots &\frac {\partial F_{pq}}{\partial X_{m1}} & \cdots&\frac {\partial F_{pq}}{\partial X_{1n}} & \cdots&\frac {\partial F_{pq}}{\partial X_{mn}} \end{matrix} \right ] DvecT(X)F(X)=[vecT(X)F11,,vecT(X)Fp1,,vecT(X)F1q,,vecT(X)Fpq]T=X11F11X11Fp1X11F1qX11FpqXm1F11Xm1Fp1Xm1F1qXm1FpqX1nF11X1nFp1X1nF1qX1nFpqXmnF11XmnFp1XmnF1qXmnFpq


实值标量函数的 H e s s i a n Hessian Hessian 矩阵 (对称矩阵)


∂ S c a l a r ∂ V e c t o r \frac {\partial Scalar}{\partial Vector} VectorScalar


实值函数 f ( x ) f(x) f(x) 相对于 m × 1 m \times 1 m×1 实向量 x x x 的二阶偏导是一个由 m 2 m^2 m2 个二阶偏导组成的矩阵(称为 H e s s i a n Hessian Hessian 矩阵),定义为
∂ 2 f ( x ) ∂ x ∂ x T = ∂ ∂ x T [ ∂ f ( x ) ∂ x ] \frac {\partial ^2f(x)}{\partial x \partial x^T}=\frac {\partial}{\partial x^T}\left [\frac {\partial f(x)}{\partial x}\right] xxT2f(x)=xT[xf(x)]或记作 ∇ x 2 f ( x ) = D x ( ∇ x f ( x ) ) = D x g ( x ) \nabla _x^2f(x)=D_x(\nabla _xf(x))=D_xg(x) x2f(x)=Dx(xf(x))=Dxg(x)即实值标量函数 f ( x ) f(x) f(x) H e s s i a n Hessian Hessian 矩阵是梯度向量函数 g ( x ) = ∇ x f ( x ) g(x)=\nabla _xf(x) g(x)=xf(x) 的协梯度矩阵( J a c o b i a n Jacobian Jacobian 矩阵)。


∂ S c a l a r ∂ M a t r i x \frac {\partial Scalar}{\partial Matrix} MatrixScalar


实值函数 f ( X ) f(X) f(X) 相对于 m × n m \times n m×n 实向量 X X X 的二阶偏导是一个由 m n mn mn 个二阶偏导组成的矩阵(称为 H e s s i a n Hessian Hessian 矩阵),定义为

∂ 2 f ( X ) ∂ X ∂ X T = ∂ ∂ X T [ ∂ f ( X ) ∂ X ] \frac {\partial ^2f(X)}{\partial X \partial X^T}=\frac {\partial}{\partial X^T}\left [\frac {\partial f(X)}{\partial X}\right] XXT2f(X)=XT[Xf(X)]或记作 ∇ X 2 f ( X ) = ∇ X T ( ∇ X f ( X ) ) = D X G ( X ) \nabla _X^2f(X)=\nabla_{X^T}(\nabla _Xf(X))=D_XG(X) X2f(X)=XT(Xf(X))=DXG(X)即实值标量函数 f ( X ) f(X) f(X) H e s s i a n Hessian Hessian 矩阵是梯度向量函数 G ( X ) = ∇ X f ( X ) G(X)=\nabla _Xf(X) G(X)=Xf(X) 的协梯度矩阵( J a c o b i a n Jacobian Jacobian 矩阵)。


多变量的全微分:函数 f ( x 1 , ⋯   , x m ) f(x_1,\cdots,x_m) f(x1,,xm) 在点 ( x 1 , ⋯   , x m ) (x_1,\cdots,x_m) (x1,,xm) 可微分,记为 d f ( x 1 , ⋯   , x m ) = ∂ f ∂ x 1 d x 1 + ⋯ + ∂ f ∂ x m d x m df(x_1,\cdots,x_m)=\frac{\partial f}{\partial x_1}dx_1+\cdots+\frac{\partial f}{\partial x_m}dx_m df(x1,,xm)=x1fdx1++xmfdxm

实矩阵微分:

实值标量函数 f ( x ) f(x) f(x),变元为 x = [ x 1 , ⋯   , x m ] T ∈ R m x=[x_1,\cdots,x_m]^T \in R^m x=[x1,,xm]TRm

d f ( x 1 , ⋯   , x m ) = ∂ f ( x ) ∂ x 1 d x 1 + ⋯ + ∂ f ( x ) ∂ x m d x m = [ ∂ f ( x ) ∂ x 1 , ⋯   , ∂ f ( x ) ∂ x m ] [ d x 1 ⋮ d x m ] df(x_1,\cdots,x_m)=\frac{\partial f(x)}{\partial x_1}dx_1+\cdots+\frac{\partial f(x)}{\partial x_m}dx_m=\left [ \frac{\partial f(x)}{\partial x_1},\cdots,\frac{\partial f(x)}{\partial x_m}\right ] \left [ \begin{array}{c}dx_1\\ \vdots \\ dx_m \end{array} \right ] df(x1,,xm)=x1f(x)dx1++xmf(x)dxm=[x1f(x),,xmf(x)]dx1dxm

或简记为 d f ( x ) = ∂ f ( x ) ∂ x T d x df(x)=\frac {\partial f(x)}{\partial x^T}dx df(x)=xTf(x)dx,其中 ∂ f ( x ) ∂ x T = [ ∂ f ( x ) ∂ x 1 , ⋯   , ∂ f ( x ) ∂ x m ] \frac {\partial f(x)}{\partial x^T}=\left[ \frac{\partial f(x)}{\partial x_1},\cdots,\frac{\partial f(x)}{\partial x_m}\right ] xTf(x)=[x1f(x),,xmf(x)]
d x = [ d x 1 , ⋯   , d x m ] T dx=[dx_1,\cdots,dx_m]^T dx=[dx1,,dxm]T
实值标量函数 f ( X ) f(X) f(X),变元为 m × n m \times n m×n 实矩阵 X = [ x 1 , ⋯   , x n ] ∈ R m × n X=[x_1,\cdots,x_n] \in R^{m \times n} X=[x1,,xn]Rm×n。记 x j = [ x 1 j , ⋯   , x m j ] T , j = 1 , ⋯   , n x_j=[x_{1j},\cdots,x_{mj}]^T,j=1,\cdots,n xj=[x1j,,xmj]T,j=1,,n

d f ( X ) = ∂ f ( X ) ∂ x 1 d x 1 + ⋯ + ∂ f ( X ) ∂ x n d x n = [ ∂ f ( X ) ∂ X 11 , ⋯   , ∂ f ( X ) ∂ X m 1 ] [ d X 11 ⋮ d X m 1 ] + ⋯ + [ ∂ f ( X ) ∂ X 1 n , ⋯   , ∂ f ( X ) ∂ X m n ] [ d X 1 n ⋮ d X m n ] = [ ∂ f ( X ) ∂ X 11 , ⋯   , ∂ f ( X ) ∂ X m 1 , ⋯   , ∂ f ( X ) ∂ X 1 n , ⋯   , ∂ f ( X ) ∂ X m n ] [ d X 11 ⋮ d X m 1 ⋮ d X 1 n ⋮ d X m n ] df(X)=\frac{\partial f(X)}{\partial x_1}dx_1+\cdots+\frac{\partial f(X)}{\partial x_n}dx_n\\=\left [ \frac{\partial f(X)}{\partial X_{11}},\cdots,\frac{\partial f(X)}{\partial X_{m1}}\right ] \left [ \begin{array}{c}dX_{11}\\ \vdots \\ dX_{m1} \end{array} \right ]+\cdots+\left [ \frac{\partial f(X)}{\partial X_{1n}},\cdots,\frac{\partial f(X)}{\partial X_{mn}}\right ] \left [ \begin{array}{c}dX_{1n}\\ \vdots \\ dX_{mn} \end{array} \right ]\\=\left[ \frac{\partial f(X)}{\partial X_{11}},\cdots,\frac{\partial f(X)}{\partial X_{m1}},\cdots,\frac{\partial f(X)}{\partial X_{1n}},\cdots,\frac{\partial f(X)}{\partial X_{mn}}\right ]\left[\begin{array}{c}dX_{11}\\ \vdots \\dX_{m1}\\ \vdots \\dX_{1n}\\ \vdots \\dX_{mn} \end{array} \right ] df(X)=x1f(X)dx1++xnf(X)dxn=[X11f(X),,Xm1f(X)]dX11dXm1++[X1nf(X),,Xmnf(X)]dX1ndXmn=[X11f(X),,Xm1f(X),,X1nf(X),,Xmnf(X)]dX11dXm1dX1ndXmn

或简记为 d f ( X ) = r v e c ( A ) ⋅ v e c ( d X ) df(X)=rvec(A)\cdot vec(dX) df(X)=rvec(A)vec(dX)式中 r e v c ( A ) revc(A) revc(A) J a c o b i a n Jacobian Jacobian 矩阵的行向量化,并且 d X = [ d X 11 ⋯ d X 1 n ⋮ ⋱ ⋮ d X m 1 ⋯ d X m n ] dX=\left[ \begin{matrix} dX_{11} &\cdots &dX_{1n}\\ \vdots &\ddots & \vdots\\ dX_{m1}& \cdots &dX_{mn} \end{matrix} \right ] dX=dX11dXm1dX1ndXmn以及 A = D X f ( X ) = ∂ f ( X ) ∂ X T = [ ∂ f ( X ) ∂ X 11 ⋯ ∂ f ( X ) ∂ X m 1 ⋯ ⋱ ⋮ ∂ f ( X ) ∂ X 1 n ⋯ ∂ f ( X ) ∂ X m n ] A=D_Xf(X)=\frac {\partial f(X)}{\partial X^T}= \left[ \begin{matrix} \frac {\partial f(X)}{\partial X_{11}} & \cdots &\frac {\partial f(X)}{\partial X_{m1}}\\ \cdots & \ddots &\vdots \\ \frac {\partial f(X)}{\partial X_{1n}} & \cdots &\frac {\partial f(X)}{\partial X_{mn}} \end{matrix} \right ] A=DXf(X)=XTf(X)=X11f(X)X1nf(X)Xm1f(X)Xmnf(X)

r v e c ( A ) = ( v e c ( A T ) ) T rvec(A)=(vec(A^T))^T rvec(A)=(vec(AT))T t r ( B T C ) = ( v e c ( B ) ) T v e c ( C ) tr(B^TC)=(vec(B))^Tvec(C) tr(BTC)=(vec(B))Tvec(C) d f ( X ) = r v e c ( A ) v e c ( d X ) = ( v e c ( A T ) ) T v e c ( d X ) = t r ( A d X ) df(X)=rvec(A)vec(dX)=(vec(A^T))^Tvec(dX)=tr(AdX) df(X)=rvec(A)vec(dX)=(vec(AT))Tvec(dX)=tr(AdX)

重要的是 ∇ X f ( X ) = ∂ f ( X ) ∂ X = [ ∂ f ( X ) ∂ X 11 ⋯ ∂ f ( X ) ∂ X 1 n ⋮ ⋱ ⋮ ∂ f ( X ) ∂ X m 1 ⋯ ∂ f ( X ) ∂ X m n ] = A T \nabla_Xf(X)=\frac{\partial f(X)}{\partial X}=\left[\begin{matrix} \frac{\partial f(X)}{\partial X_{11}} & \cdots & \frac{\partial f(X)}{\partial X_{1n}}\\ \vdots & \ddots &\vdots\\ \frac{\partial f(X)}{\partial X_{m1}} & \cdots & \frac{\partial f(X)}{\partial X_{mn}} \end{matrix} \right] =A^T Xf(X)=Xf(X)=X11f(X)Xm1f(X)

你可能感兴趣的:(公式推导)