两个多维高斯分布的Kullback-Leibler divergence(KL散度)

两个高斯分布分别为:
p ( x ) = N ( x j ; μ , ∑ )                                    = 1 ( 2 π ) n 2 ∣ ∑ ∣ 1 2 e x p { − 1 2 ( x − μ ) T ( ∑ ) − 1 ( x − μ ) } p(x)=N(x_j;\mu,\sum)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\=\frac{1}{(2\pi)^{\frac{n}{2}}|\sum|^{\frac{1}{2}}}exp\bigg\{{-\frac{1}{2}}(x-\mu)^T(\sum)^{-1} (x-\mu)\bigg\} p(x)=N(xj;μ,)                                  =(2π)2n211exp{21(xμ)T()1(xμ)}
q ( x ) = N ( x j ; m , L )                                    = 1 ( 2 π ) n 2 ∣ L ∣ 1 2 e x p { − 1 2 ( x − m ) T L − 1 ( x − m ) } q(x)=N(x_j;m,L)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\=\frac{1}{(2\pi)^{\frac{n}{2}}|L|^{\frac{1}{2}}}exp\bigg\{{-\frac{1}{2}}(x-m)^TL^{-1} (x-m)\bigg\} q(x)=N(xj;m,L)                                  =(2π)2nL211exp{21(xm)TL1(xm)}


矩阵迹(tr)的性质:
t r ( α A + β B ) = α t r ( A ) + β t r ( B ) . . . . . . ① tr(\alpha A+\beta B)=\alpha tr(A)+\beta tr(B)......① tr(αA+βB)=αtr(A)+βtr(B)...... t r ( A ) = t r ( A T ) . . . . . . ② tr(A)=tr(A^T)......② tr(A)=tr(AT)...... t r ( A B ) = t r ( B A ) . . . . . . ③ tr(AB)=tr(BA) ...... ③ tr(AB)=tr(BA)...... t r ( A B C ) = t r ( B C A ) = t r ( C A B ) . . . . . . ④ ( 由 ③ 得 ) tr(ABC)=tr(BCA)=tr(CAB)...... ④(由③得) tr(ABC)=tr(BCA)=tr(CAB)......()
一个重要公式: λ T A λ = t r ( λ T A λ ) = t r ( A λ λ T ) . . . . . . ⑤ \lambda^TA\lambda=tr(\lambda^TA\lambda)=tr(A\lambda\lambda^T)......⑤ λTAλ=tr(λTAλ)=tr(AλλT)......
多元分布中期望E与协方差 ∑ \sum 的性质:
E ( x x T ) = ∑ + μ μ T . . . . . . ⑥ E(xx^T)=\sum+\mu\mu^T...... ⑥ E(xxT)=+μμT......
证明: ∑ = E [ ( x − μ ) ( x − μ T ) ] = E ( x x T − x μ T − μ x T + μ μ T ) = E ( x x T − μ μ T − μ μ T + μ μ T ) = E ( x x T ) − μ μ T                              \sum=E\big[(x-\mu)(x-\mu^T)\big] \\=E\big(xx^T-x\mu^T-\mu x^T+\mu\mu^T\big) \\=E\big(xx^T-\mu\mu^T-\mu\mu^T+\mu\mu^T\big) \\=E\big(xx^T\big)-\mu\mu^T \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \ \ \ =E[(xμ)(xμT)]=E(xxTxμTμxT+μμT)=E(xxTμμTμμT+μμT)=E(xxT)μμT                           
E ( x T A x ) = t r ( A ∑ ) + μ T A μ . . . . . . ⑦ E\big(x^TAx\big)=tr\big(A\sum\big)+\mu^TA\mu......⑦ E(xTAx)=tr(A)+μTAμ......
证明:
E ( x T A x ) = E [ t r ( x T A x ) ] = E [ t r ( A x x T ) ] = t r [ E ( A x x T ) ] = t r [ A E ( x x T ) ] = t r [ A ( ∑ + μ μ T ) ] = t r ( A ∑ ) + t r ( A μ μ T ) = t r ( A ∑ ) + t r ( μ T A μ ) = t r ( A ∑ ) + μ T A μ E\big(x^TAx\big) \\=E\big[tr(x^TAx)\big] \\=E\big[tr(Axx^T)\big] \\=tr\big[E(Axx^T)\big] \\=tr\big[AE(xx^T)\big] \\=tr\big[A(\sum+\mu\mu^T)\big] \\=tr(A\sum)+tr(A\mu\mu^T) \\=tr(A\sum)+tr(\mu^TA\mu) \\=tr(A\sum)+\mu^TA\mu E(xTAx)=E[tr(xTAx)]=E[tr(AxxT)]=tr[E(AxxT)]=tr[AE(xxT)]=tr[A(+μμT)]=tr(A)+tr(AμμT)=tr(A)+tr(μTAμ)=tr(A)+μTAμ


K L 散 度 的 定 义 : KL散度的定义: KL
K L ( p ∣ ∣ q ) = E p [ l o g p ( x ) q ( x ) ] KL(p||q)=E_p\bigg[log\frac{p(x)}{q(x)}\bigg] KL(pq)=Ep[logq(x)p(x)]

p ( x ) q ( x ) = 1 ( 2 π ) n 2 ∣ ∑ ∣ 1 2 e x p { − 1 2 ( x − μ ) T ( ∑ ) − 1 ( x − μ ) } 1 ( 2 π ) n 2 ∣ L ∣ 1 2 e x p { − 1 2 ( x − m ) T L − 1 ( x − m ) } = ( ∣ L ∣ ∣ ∑ ∣ ) 1 2 e x p { − 1 2 ( x − μ ) T ( ∑ ) − 1 ( x − μ ) − [ − 1 2 ( x − m ) T L − 1 ( x − m ) ] } = ( ∣ L ∣ ∣ ∑ ∣ ) 1 2 e x p { 1 2 [ ( x − m ) T L − 1 ( x − m ) − ( x − μ ) T ( ∑ ) − 1 ( x − μ ) ] } \frac{p(x)}{q(x)}=\frac{\frac{1}{(2\pi)^{\frac{n}{2}}|\sum|^{\frac{1}{2}}}exp\bigg\{{-\frac{1}{2}}(x-\mu)^T(\sum)^{-1} (x-\mu)\bigg\}}{\frac{1}{(2\pi)^{\frac{n}{2}}|L|^{\frac{1}{2}}}exp\bigg\{{-\frac{1}{2}}(x-m)^TL^{-1} (x-m)\bigg\}} \\=(\frac{|L|}{|\sum|})^{\frac{1}{2}}exp\bigg\{{-\frac{1}{2}}(x-\mu)^T(\sum)^{-1} (x-\mu)-\big[{-\frac{1}{2}}(x-m)^TL^{-1} (x-m)\big]\bigg\} \\=(\frac{|L|}{|\sum|})^{\frac{1}{2}}exp\bigg\{\frac{1}{2}\big[(x-m)^TL^{-1} (x-m)-(x-\mu)^T(\sum)^{-1} (x-\mu)\big]\bigg\} q(x)p(x)=(2π)2nL211exp{21(xm)TL1(xm)}(2π)2n211exp{21(xμ)T()1(xμ)}=(L)21exp{21(xμ)T()1(xμ)[21(xm)TL1(xm)]}=(L)21exp{21[(xm)TL1(xm)(xμ)T()1(xμ)]}

l o g p ( x ) q ( x ) = l o g ( ( ∣ L ∣ ∣ ∑ ∣ ) 1 2 e x p { 1 2 [ ( x − m ) T L − 1 ( x − m ) − ( x − μ ) T ( ∑ ) − 1 ( x − μ ) ] } ) = 1 2 l o g ∣ L ∣ ∣ ∑ ∣ + 1 2 [ ( x − m ) T L − 1 ( x − m ) − ( x − μ ) T ( ∑ ) − 1 ( x − μ ) ] log\frac{p(x)}{q(x)}=log\Bigg((\frac{|L|}{|\sum|})^{\frac{1}{2}}exp\bigg\{\frac{1}{2}\big[(x-m)^TL^{-1} (x-m)-(x-\mu)^T(\sum)^{-1} (x-\mu)\big]\bigg\}\Bigg) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}\big[(x-m)^TL^{-1} (x-m)-(x-\mu)^T(\sum)^{-1} (x-\mu)\big] logq(x)p(x)=log((L)21exp{21[(xm)TL1(xm)(xμ)T()1(xμ)]})=21logL+21[(xm)TL1(xm)(xμ)T()1(xμ)]

E p [ l o g p ( x ) q ( x ) ]                                                                         = E p ( 1 2 l o g ∣ L ∣ ∣ ∑ ∣ + 1 2 [ ( x − m ) T L − 1 ( x − m ) − ( x − μ ) T ( ∑ ) − 1 ( x − μ ) ] ) = 1 2 E p ( l o g ∣ L ∣ ∣ ∑ ∣ ) + 1 2 E p ( ( x − m ) T L − 1 ( x − m ) − ( x − μ ) T ( ∑ ) − 1 ( x − μ ) ) = 1 2 l o g ∣ L ∣ ∣ ∑ ∣ + 1 2 E p ( t r [ L − 1 ( x − m ) ( x − m ) T ] − t r [ ( ∑ ) − 1 ( x − μ ) ( x − μ ) T ] ) . . . . . . ( 性 质 ⑤ ) = 1 2 l o g ∣ L ∣ ∣ ∑ ∣ + 1 2 t r ( E p [ L − 1 ( x − m ) ( x − m ) T ] )                                    − 1 2 t r ( E p [ ( ∑ ) − 1 ( x − μ ) ( x − μ ) T ] ) . . . . . . ( 性 质 ① ) = 1 2 l o g ∣ L ∣ ∣ ∑ ∣ + 1 2 t r ( E p [ L − 1 ( x x T − m x T − x m T + m m T ) ] )                  − 1 2 t r ( ( ∑ ) − 1 E p [ ( ∑ ) − 1 ( x − μ ) ( x − μ ) T ] ) = 1 2 l o g ∣ L ∣ ∣ ∑ ∣ + 1 2 t r ( L − 1 [ E p ( x x T − m x T − x m T + m m T ) ] )                  − 1 2 t r ( ( ∑ ) − 1 ∑ ) = 1 2 l o g ∣ L ∣ ∣ ∑ ∣ + 1 2 t r ( L − 1 [ ∑ + μ μ T ⏟ 性 质 ⑥ − m x T − x m T + m m T ] ) − n 2 = 1 2 { l o g ∣ L ∣ ∣ ∑ ∣ − n + t r ( L − 1 ∑ ) + t r ( L − 1 [ μ μ T − m x T − x m T + m m T ] ) } = 1 2 { l o g ∣ L ∣ ∣ ∑ ∣ − n + t r ( L − 1 ∑ ) + t r ( L − 1 μ μ T − L − 1 m x T − L − 1 x m T + L − 1 m m T ) } = 1 2 { l o g ∣ L ∣ ∣ ∑ ∣ − n + t r ( L − 1 ∑ ) + t r ( μ T L − 1 μ − 2 x T L − 1 m + m T L − 1 m ) } = 1 2 { l o g ∣ L ∣ ∣ ∑ ∣ − n + t r ( L − 1 ∑ ) + t r ( L − 1 μ μ T − L − 1 m x T − L − 1 x m T + L − 1 m m T ) } = 1 2 { l o g ∣ L ∣ ∣ ∑ ∣ − n + t r ( L − 1 ∑ ) + ( x − m ) T L − 1 ( x − m ) } E_p\bigg[log\frac{p(x)}{q(x)}\bigg] \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\=E_p\bigg(\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}\big[(x-m)^TL^{-1} (x-m)-(x-\mu)^T(\sum)^{-1} (x-\mu)\big]\bigg) \\=\frac{1}{2}E_p\bigg(log\frac{|L|}{|\sum|}\bigg)+\frac{1}{2}E_p\bigg((x-m)^TL^{-1} (x-m)-(x-\mu)^T(\sum)^{-1} (x-\mu)\bigg) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}E_p\bigg(tr\big[L^{-1} (x-m)(x-m)^T\big]-tr\big[(\sum)^{-1} (x-\mu)(x-\mu)^T\big]\bigg)......(性质⑤) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}tr\bigg(E_p\big[L^{-1} (x-m)(x-m)^T\big]\bigg)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\-\frac{1}{2}tr\bigg(E_p\big[(\sum)^{-1} (x-\mu)(x-\mu)^T\big]\bigg) ......(性质①) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}tr\bigg(E_p\big[ L^{-1}(xx^T-mx^T-xm^T+mm^T)\big]\bigg)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\-\frac{1}{2}tr\bigg((\sum)^{-1}E_p\big[(\sum)^{-1} (x-\mu)(x-\mu)^T\big]\bigg) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}tr\bigg(L^{-1}\big[ E_p(xx^T-mx^T-xm^T+mm^T)\big]\bigg)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\-\frac{1}{2}tr\big((\sum)^{-1}\sum\big) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}tr\bigg(L^{-1}\big[\underbrace{\sum+\mu\mu^T}_{性质⑥}-mx^T-xm^T+mm^T\big]\bigg)-\frac{n}{2} \\=\frac{1}{2}\Bigg\{log\frac{|L|}{|\sum|}-n+tr\big(L^{-1}\sum\big)+tr\big(L^{-1}[\mu\mu^T-mx^T-xm^T+mm^T]\big)\Bigg\} \\=\frac{1}{2}\Bigg\{log\frac{|L|}{|\sum|}-n+tr\big(L^{-1}\sum\big)+tr\big(L^{-1}\mu\mu^T-L^{-1}mx^T-L^{-1}xm^T+L^{-1}mm^T\big)\Bigg\} \\=\frac{1}{2}\Bigg\{log\frac{|L|}{|\sum|}-n+tr\big(L^{-1}\sum\big)+tr\big(\mu^TL^{-1}\mu-2x^TL^{-1}m+m^TL^{-1}m\big)\Bigg\} \\=\frac{1}{2}\Bigg\{log\frac{|L|}{|\sum|}-n+tr\big(L^{-1}\sum\big)+tr\big(L^{-1}\mu\mu^T-L^{-1}mx^T-L^{-1}xm^T+L^{-1}mm^T\big)\Bigg\} \\=\frac{1}{2}\Bigg\{log\frac{|L|}{|\sum|}-n+tr\big(L^{-1}\sum\big)+\big(x-m\big)^TL^{-1}\big(x-m\big)\Bigg\} Ep[logq(x)p(x)]                                                                       =Ep(21logL+21[(xm)TL1(xm)(xμ)T()1(xμ)])=21Ep(logL)+21Ep((xm)TL1(xm)(xμ)T()1(xμ))=21logL+21Ep(tr[L1(xm)(xm)T]tr[()1(xμ)(xμ)T])......()=21logL+21tr(Ep[L1(xm)(xm)T])                                  21tr(Ep[()1(xμ)(xμ)T])......()=21logL+21tr(Ep[L1(xxTmxTxmT+mmT)])                21tr(()1Ep[()1(xμ)(xμ)T])=21logL+21tr(L1[Ep(xxTmxTxmT+mmT)])                21tr(()1)=21logL+21tr(L1[ +μμTmxTxmT+mmT])2n=21{logLn+tr(L1)+tr(L1[μμTmxTxmT+mmT])}=21{logLn+tr(L1)+tr(L1μμTL1mxTL1xmT+L1mmT)}=21{logLn+tr(L1)+tr(μTL1μ2xTL1m+mTL1m)}=21{logLn+tr(L1)+tr(L1μμTL1mxTL1xmT+L1mmT)}=21{logLn+tr(L1)+(xm)TL1(xm)}

你可能感兴趣的:(统计机器学习,机器学习,深度学习,统计学,统计模型,人工智能)