两个高斯分布分别为:
p ( x ) = N ( x j ; μ , ∑ ) = 1 ( 2 π ) n 2 ∣ ∑ ∣ 1 2 e x p { − 1 2 ( x − μ ) T ( ∑ ) − 1 ( x − μ ) } p(x)=N(x_j;\mu,\sum)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\=\frac{1}{(2\pi)^{\frac{n}{2}}|\sum|^{\frac{1}{2}}}exp\bigg\{{-\frac{1}{2}}(x-\mu)^T(\sum)^{-1} (x-\mu)\bigg\} p(x)=N(xj;μ,∑) =(2π)2n∣∑∣211exp{−21(x−μ)T(∑)−1(x−μ)}
q ( x ) = N ( x j ; m , L ) = 1 ( 2 π ) n 2 ∣ L ∣ 1 2 e x p { − 1 2 ( x − m ) T L − 1 ( x − m ) } q(x)=N(x_j;m,L)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\=\frac{1}{(2\pi)^{\frac{n}{2}}|L|^{\frac{1}{2}}}exp\bigg\{{-\frac{1}{2}}(x-m)^TL^{-1} (x-m)\bigg\} q(x)=N(xj;m,L) =(2π)2n∣L∣211exp{−21(x−m)TL−1(x−m)}
矩阵迹(tr)的性质:
t r ( α A + β B ) = α t r ( A ) + β t r ( B ) . . . . . . ① tr(\alpha A+\beta B)=\alpha tr(A)+\beta tr(B)......① tr(αA+βB)=αtr(A)+βtr(B)......① t r ( A ) = t r ( A T ) . . . . . . ② tr(A)=tr(A^T)......② tr(A)=tr(AT)......② t r ( A B ) = t r ( B A ) . . . . . . ③ tr(AB)=tr(BA) ...... ③ tr(AB)=tr(BA)......③ t r ( A B C ) = t r ( B C A ) = t r ( C A B ) . . . . . . ④ ( 由 ③ 得 ) tr(ABC)=tr(BCA)=tr(CAB)...... ④(由③得) tr(ABC)=tr(BCA)=tr(CAB)......④(由③得)
一个重要公式: λ T A λ = t r ( λ T A λ ) = t r ( A λ λ T ) . . . . . . ⑤ \lambda^TA\lambda=tr(\lambda^TA\lambda)=tr(A\lambda\lambda^T)......⑤ λTAλ=tr(λTAλ)=tr(AλλT)......⑤
多元分布中期望E与协方差 ∑ \sum ∑的性质:
E ( x x T ) = ∑ + μ μ T . . . . . . ⑥ E(xx^T)=\sum+\mu\mu^T...... ⑥ E(xxT)=∑+μμT......⑥
证明: ∑ = E [ ( x − μ ) ( x − μ T ) ] = E ( x x T − x μ T − μ x T + μ μ T ) = E ( x x T − μ μ T − μ μ T + μ μ T ) = E ( x x T ) − μ μ T \sum=E\big[(x-\mu)(x-\mu^T)\big] \\=E\big(xx^T-x\mu^T-\mu x^T+\mu\mu^T\big) \\=E\big(xx^T-\mu\mu^T-\mu\mu^T+\mu\mu^T\big) \\=E\big(xx^T\big)-\mu\mu^T \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \ \ \ ∑=E[(x−μ)(x−μT)]=E(xxT−xμT−μxT+μμT)=E(xxT−μμT−μμT+μμT)=E(xxT)−μμT
E ( x T A x ) = t r ( A ∑ ) + μ T A μ . . . . . . ⑦ E\big(x^TAx\big)=tr\big(A\sum\big)+\mu^TA\mu......⑦ E(xTAx)=tr(A∑)+μTAμ......⑦
证明:
E ( x T A x ) = E [ t r ( x T A x ) ] = E [ t r ( A x x T ) ] = t r [ E ( A x x T ) ] = t r [ A E ( x x T ) ] = t r [ A ( ∑ + μ μ T ) ] = t r ( A ∑ ) + t r ( A μ μ T ) = t r ( A ∑ ) + t r ( μ T A μ ) = t r ( A ∑ ) + μ T A μ E\big(x^TAx\big) \\=E\big[tr(x^TAx)\big] \\=E\big[tr(Axx^T)\big] \\=tr\big[E(Axx^T)\big] \\=tr\big[AE(xx^T)\big] \\=tr\big[A(\sum+\mu\mu^T)\big] \\=tr(A\sum)+tr(A\mu\mu^T) \\=tr(A\sum)+tr(\mu^TA\mu) \\=tr(A\sum)+\mu^TA\mu E(xTAx)=E[tr(xTAx)]=E[tr(AxxT)]=tr[E(AxxT)]=tr[AE(xxT)]=tr[A(∑+μμT)]=tr(A∑)+tr(AμμT)=tr(A∑)+tr(μTAμ)=tr(A∑)+μTAμ
K L 散 度 的 定 义 : KL散度的定义: KL散度的定义:
K L ( p ∣ ∣ q ) = E p [ l o g p ( x ) q ( x ) ] KL(p||q)=E_p\bigg[log\frac{p(x)}{q(x)}\bigg] KL(p∣∣q)=Ep[logq(x)p(x)]
p ( x ) q ( x ) = 1 ( 2 π ) n 2 ∣ ∑ ∣ 1 2 e x p { − 1 2 ( x − μ ) T ( ∑ ) − 1 ( x − μ ) } 1 ( 2 π ) n 2 ∣ L ∣ 1 2 e x p { − 1 2 ( x − m ) T L − 1 ( x − m ) } = ( ∣ L ∣ ∣ ∑ ∣ ) 1 2 e x p { − 1 2 ( x − μ ) T ( ∑ ) − 1 ( x − μ ) − [ − 1 2 ( x − m ) T L − 1 ( x − m ) ] } = ( ∣ L ∣ ∣ ∑ ∣ ) 1 2 e x p { 1 2 [ ( x − m ) T L − 1 ( x − m ) − ( x − μ ) T ( ∑ ) − 1 ( x − μ ) ] } \frac{p(x)}{q(x)}=\frac{\frac{1}{(2\pi)^{\frac{n}{2}}|\sum|^{\frac{1}{2}}}exp\bigg\{{-\frac{1}{2}}(x-\mu)^T(\sum)^{-1} (x-\mu)\bigg\}}{\frac{1}{(2\pi)^{\frac{n}{2}}|L|^{\frac{1}{2}}}exp\bigg\{{-\frac{1}{2}}(x-m)^TL^{-1} (x-m)\bigg\}} \\=(\frac{|L|}{|\sum|})^{\frac{1}{2}}exp\bigg\{{-\frac{1}{2}}(x-\mu)^T(\sum)^{-1} (x-\mu)-\big[{-\frac{1}{2}}(x-m)^TL^{-1} (x-m)\big]\bigg\} \\=(\frac{|L|}{|\sum|})^{\frac{1}{2}}exp\bigg\{\frac{1}{2}\big[(x-m)^TL^{-1} (x-m)-(x-\mu)^T(\sum)^{-1} (x-\mu)\big]\bigg\} q(x)p(x)=(2π)2n∣L∣211exp{−21(x−m)TL−1(x−m)}(2π)2n∣∑∣211exp{−21(x−μ)T(∑)−1(x−μ)}=(∣∑∣∣L∣)21exp{−21(x−μ)T(∑)−1(x−μ)−[−21(x−m)TL−1(x−m)]}=(∣∑∣∣L∣)21exp{21[(x−m)TL−1(x−m)−(x−μ)T(∑)−1(x−μ)]}
l o g p ( x ) q ( x ) = l o g ( ( ∣ L ∣ ∣ ∑ ∣ ) 1 2 e x p { 1 2 [ ( x − m ) T L − 1 ( x − m ) − ( x − μ ) T ( ∑ ) − 1 ( x − μ ) ] } ) = 1 2 l o g ∣ L ∣ ∣ ∑ ∣ + 1 2 [ ( x − m ) T L − 1 ( x − m ) − ( x − μ ) T ( ∑ ) − 1 ( x − μ ) ] log\frac{p(x)}{q(x)}=log\Bigg((\frac{|L|}{|\sum|})^{\frac{1}{2}}exp\bigg\{\frac{1}{2}\big[(x-m)^TL^{-1} (x-m)-(x-\mu)^T(\sum)^{-1} (x-\mu)\big]\bigg\}\Bigg) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}\big[(x-m)^TL^{-1} (x-m)-(x-\mu)^T(\sum)^{-1} (x-\mu)\big] logq(x)p(x)=log((∣∑∣∣L∣)21exp{21[(x−m)TL−1(x−m)−(x−μ)T(∑)−1(x−μ)]})=21log∣∑∣∣L∣+21[(x−m)TL−1(x−m)−(x−μ)T(∑)−1(x−μ)]
E p [ l o g p ( x ) q ( x ) ] = E p ( 1 2 l o g ∣ L ∣ ∣ ∑ ∣ + 1 2 [ ( x − m ) T L − 1 ( x − m ) − ( x − μ ) T ( ∑ ) − 1 ( x − μ ) ] ) = 1 2 E p ( l o g ∣ L ∣ ∣ ∑ ∣ ) + 1 2 E p ( ( x − m ) T L − 1 ( x − m ) − ( x − μ ) T ( ∑ ) − 1 ( x − μ ) ) = 1 2 l o g ∣ L ∣ ∣ ∑ ∣ + 1 2 E p ( t r [ L − 1 ( x − m ) ( x − m ) T ] − t r [ ( ∑ ) − 1 ( x − μ ) ( x − μ ) T ] ) . . . . . . ( 性 质 ⑤ ) = 1 2 l o g ∣ L ∣ ∣ ∑ ∣ + 1 2 t r ( E p [ L − 1 ( x − m ) ( x − m ) T ] ) − 1 2 t r ( E p [ ( ∑ ) − 1 ( x − μ ) ( x − μ ) T ] ) . . . . . . ( 性 质 ① ) = 1 2 l o g ∣ L ∣ ∣ ∑ ∣ + 1 2 t r ( E p [ L − 1 ( x x T − m x T − x m T + m m T ) ] ) − 1 2 t r ( ( ∑ ) − 1 E p [ ( ∑ ) − 1 ( x − μ ) ( x − μ ) T ] ) = 1 2 l o g ∣ L ∣ ∣ ∑ ∣ + 1 2 t r ( L − 1 [ E p ( x x T − m x T − x m T + m m T ) ] ) − 1 2 t r ( ( ∑ ) − 1 ∑ ) = 1 2 l o g ∣ L ∣ ∣ ∑ ∣ + 1 2 t r ( L − 1 [ ∑ + μ μ T ⏟ 性 质 ⑥ − m x T − x m T + m m T ] ) − n 2 = 1 2 { l o g ∣ L ∣ ∣ ∑ ∣ − n + t r ( L − 1 ∑ ) + t r ( L − 1 [ μ μ T − m x T − x m T + m m T ] ) } = 1 2 { l o g ∣ L ∣ ∣ ∑ ∣ − n + t r ( L − 1 ∑ ) + t r ( L − 1 μ μ T − L − 1 m x T − L − 1 x m T + L − 1 m m T ) } = 1 2 { l o g ∣ L ∣ ∣ ∑ ∣ − n + t r ( L − 1 ∑ ) + t r ( μ T L − 1 μ − 2 x T L − 1 m + m T L − 1 m ) } = 1 2 { l o g ∣ L ∣ ∣ ∑ ∣ − n + t r ( L − 1 ∑ ) + t r ( L − 1 μ μ T − L − 1 m x T − L − 1 x m T + L − 1 m m T ) } = 1 2 { l o g ∣ L ∣ ∣ ∑ ∣ − n + t r ( L − 1 ∑ ) + ( x − m ) T L − 1 ( x − m ) } E_p\bigg[log\frac{p(x)}{q(x)}\bigg] \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\=E_p\bigg(\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}\big[(x-m)^TL^{-1} (x-m)-(x-\mu)^T(\sum)^{-1} (x-\mu)\big]\bigg) \\=\frac{1}{2}E_p\bigg(log\frac{|L|}{|\sum|}\bigg)+\frac{1}{2}E_p\bigg((x-m)^TL^{-1} (x-m)-(x-\mu)^T(\sum)^{-1} (x-\mu)\bigg) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}E_p\bigg(tr\big[L^{-1} (x-m)(x-m)^T\big]-tr\big[(\sum)^{-1} (x-\mu)(x-\mu)^T\big]\bigg)......(性质⑤) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}tr\bigg(E_p\big[L^{-1} (x-m)(x-m)^T\big]\bigg)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\-\frac{1}{2}tr\bigg(E_p\big[(\sum)^{-1} (x-\mu)(x-\mu)^T\big]\bigg) ......(性质①) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}tr\bigg(E_p\big[ L^{-1}(xx^T-mx^T-xm^T+mm^T)\big]\bigg)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\-\frac{1}{2}tr\bigg((\sum)^{-1}E_p\big[(\sum)^{-1} (x-\mu)(x-\mu)^T\big]\bigg) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}tr\bigg(L^{-1}\big[ E_p(xx^T-mx^T-xm^T+mm^T)\big]\bigg)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \\-\frac{1}{2}tr\big((\sum)^{-1}\sum\big) \\=\frac{1}{2}log\frac{|L|}{|\sum|}+\frac{1}{2}tr\bigg(L^{-1}\big[\underbrace{\sum+\mu\mu^T}_{性质⑥}-mx^T-xm^T+mm^T\big]\bigg)-\frac{n}{2} \\=\frac{1}{2}\Bigg\{log\frac{|L|}{|\sum|}-n+tr\big(L^{-1}\sum\big)+tr\big(L^{-1}[\mu\mu^T-mx^T-xm^T+mm^T]\big)\Bigg\} \\=\frac{1}{2}\Bigg\{log\frac{|L|}{|\sum|}-n+tr\big(L^{-1}\sum\big)+tr\big(L^{-1}\mu\mu^T-L^{-1}mx^T-L^{-1}xm^T+L^{-1}mm^T\big)\Bigg\} \\=\frac{1}{2}\Bigg\{log\frac{|L|}{|\sum|}-n+tr\big(L^{-1}\sum\big)+tr\big(\mu^TL^{-1}\mu-2x^TL^{-1}m+m^TL^{-1}m\big)\Bigg\} \\=\frac{1}{2}\Bigg\{log\frac{|L|}{|\sum|}-n+tr\big(L^{-1}\sum\big)+tr\big(L^{-1}\mu\mu^T-L^{-1}mx^T-L^{-1}xm^T+L^{-1}mm^T\big)\Bigg\} \\=\frac{1}{2}\Bigg\{log\frac{|L|}{|\sum|}-n+tr\big(L^{-1}\sum\big)+\big(x-m\big)^TL^{-1}\big(x-m\big)\Bigg\} Ep[logq(x)p(x)] =Ep(21log∣∑∣∣L∣+21[(x−m)TL−1(x−m)−(x−μ)T(∑)−1(x−μ)])=21Ep(log∣∑∣∣L∣)+21Ep((x−m)TL−1(x−m)−(x−μ)T(∑)−1(x−μ))=21log∣∑∣∣L∣+21Ep(tr[L−1(x−m)(x−m)T]−tr[(∑)−1(x−μ)(x−μ)T])......(性质⑤)=21log∣∑∣∣L∣+21tr(Ep[L−1(x−m)(x−m)T]) −21tr(Ep[(∑)−1(x−μ)(x−μ)T])......(性质①)=21log∣∑∣∣L∣+21tr(Ep[L−1(xxT−mxT−xmT+mmT)]) −21tr((∑)−1Ep[(∑)−1(x−μ)(x−μ)T])=21log∣∑∣∣L∣+21tr(L−1[Ep(xxT−mxT−xmT+mmT)]) −21tr((∑)−1∑)=21log∣∑∣∣L∣+21tr(L−1[性质⑥ ∑+μμT−mxT−xmT+mmT])−2n=21{log∣∑∣∣L∣−n+tr(L−1∑)+tr(L−1[μμT−mxT−xmT+mmT])}=21{log∣∑∣∣L∣−n+tr(L−1∑)+tr(L−1μμT−L−1mxT−L−1xmT+L−1mmT)}=21{log∣∑∣∣L∣−n+tr(L−1∑)+tr(μTL−1μ−2xTL−1m+mTL−1m)}=21{log∣∑∣∣L∣−n+tr(L−1∑)+tr(L−1μμT−L−1mxT−L−1xmT+L−1mmT)}=21{log∣∑∣∣L∣−n+tr(L−1∑)+(x−m)TL−1(x−m)}