一维正态分布的KL散度证明思路
记 q ( x ) = N ( x ; μ 1 , σ 1 2 ) , p ( x ) = N ( x ; μ 2 , σ 2 2 ) , ϕ ( x ) = log q ( x ) p ( x ) = c 2 x 2 + c 1 x + c 0 q(x) = \mathcal{N}(x; \mu_1, \sigma_1^2), p(x) = \mathcal{N}(x; \mu_2, \sigma_2^2), \phi(x) = \log \frac{q(x)}{p(x)} = c_2 x^2 + c_1 x + c_0 q(x)=N(x;μ1,σ12),p(x)=N(x;μ2,σ22),ϕ(x)=logp(x)q(x)=c2x2+c1x+c0.
则 K L ( q ∥ p ) = ∫ q ( x ) ϕ ( x ) d x = c 2 ( ∫ q ( x ) x 2 d x ) + c 1 ( ∫ q ( x ) x d x ) + c 0 ( ∫ q ( x ) d x ) \mathrm{KL}(q \| p) = \int q(x) \phi(x) \mathrm{d} x = c_2 \left( \int q(x) x^2 \mathrm{d} x \right) + c_1 \left( \int q(x) x \mathrm{d} x \right) + c_0 \left( \int q(x) \mathrm{d} x \right) KL(q∥p)=∫q(x)ϕ(x)dx=c2(∫q(x)x2dx)+c1(∫q(x)xdx)+c0(∫q(x)dx).
其中 ∫ q ( x ) x 2 d x = E q ( x ) [ x 2 ] = V a r [ x ] + E q ( x ) [ x ] 2 = σ 1 2 + μ 1 2 , ∫ q ( x ) x d x = E q ( x ) [ x ] = μ 1 , ∫ q ( x ) d x = 1 \int q(x) x^2 \mathrm{d} x = \mathop{\mathbb{E}}\limits_{q(x)}[x^2] = \mathrm{Var}[x] + \mathop{\mathbb{E}}\limits_{q(x)}[x]^2 = \sigma_1^2 + \mu_1^2, \int q(x) x \mathrm{d} x = \mathop{\mathbb{E}}\limits_{q(x)}[x] = \mu_1, \int q(x) \mathrm{d} x = 1 ∫q(x)x2dx=q(x)E[x2]=Var[x]+q(x)E[x]2=σ12+μ12,∫q(x)xdx=q(x)E[x]=μ1,∫q(x)dx=1.
高维正态分布的KL散度证明提示
引理1 协方差矩阵定义 q ( x ) ∼ N ( x ; μ , Σ ) , E q ( x ) [ ( x − μ ) ( x − μ ) T ] = Σ q(x) \sim \mathcal{N}(x; \mu, \Sigma), \mathop{\mathbb{E}}\limits_{q(x)}\left[ (x-\mu) (x-\mu)^T \right] = \Sigma q(x)∼N(x;μ,Σ),q(x)E[(x−μ)(x−μ)T]=Σ.
引理2 迹的性质 t r ( A B ) = t r ( B A ) , t r ( A B C ) = t r ( B C A ) = t r ( C A B ) , t r ( A x x T ) = t r ( x T A x ) = x T A x \mathrm{tr}(AB) = \mathrm{tr}(BA), \mathrm{tr}(ABC) = \mathrm{tr}(BCA) = \mathrm{tr}(CAB), \mathrm{tr}(A xx^T) = \mathrm{tr}(x^T A x) = x^T A x tr(AB)=tr(BA),tr(ABC)=tr(BCA)=tr(CAB),tr(AxxT)=tr(xTAx)=xTAx.
q ( x ) ∼ N ( x ; μ , Σ ) , E q ( x ) [ ( x − μ ) T Σ − 1 ( x − μ ) ] = E q ( x ) [ t r ( Σ − 1 ( x − μ ) ( x − μ ) T ) ] = t r ( Σ − 1 E q ( x ) [ ( x − μ ) ( x − μ ) T ] ) = 1 q(x) \sim \mathcal{N}(x; \mu, \Sigma), \mathop{\mathbb{E}}\limits_{q(x)}\left[ (x-\mu)^T \Sigma^{-1} (x-\mu) \right] = \mathop{\mathbb{E}}\limits_{q(x)}\left[ \mathrm{tr}\left( \Sigma^{-1} (x-\mu) (x-\mu)^T \right) \right] = \mathrm{tr}\left( \Sigma^{-1} \mathop{\mathbb{E}}\limits_{q(x)}\left[ (x-\mu) (x-\mu)^T \right] \right) = 1 q(x)∼N(x;μ,Σ),q(x)E[(x−μ)TΣ−1(x−μ)]=q(x)E[tr(Σ−1(x−μ)(x−μ)T)]=tr(Σ−1q(x)E[(x−μ)(x−μ)T])=1.
一维正态分布的KL散度
K L ( N ( μ 1 , σ 1 2 ) ∥ N ( μ 2 , σ 2 2 ) ) = log σ 2 σ 1 + σ 1 2 + ( μ 1 − μ 2 ) 2 2 σ 2 2 − 1 2 \mathrm{KL}\left( \mathcal{N}(\mu_1, \sigma_1^2) \| \mathcal{N}(\mu_2, \sigma_2^2) \right) = \log \frac{\sigma_2}{\sigma_1} + \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2 \sigma_2^2} - \frac{1}{2} KL(N(μ1,σ12)∥N(μ2,σ22))=logσ1σ2+2σ22σ12+(μ1−μ2)2−21
高维正态分布的KL散度
K L ( N ( μ 1 , Σ 1 ) ∥ N ( μ 2 , Σ 2 ) ) = 1 2 log ∣ Σ 2 ∣ ∣ Σ 1 ∣ + 1 2 t r ( Σ 1 Σ 2 − 1 ) + 1 2 ( μ 1 − μ 2 ) T Σ 2 − 1 ( μ 1 − μ 2 ) − d 2 \mathrm{KL}\left( \mathcal{N}(\mu_1, \Sigma_1) \| \mathcal{N}(\mu_2, \Sigma_2) \right) = \frac{1}{2} \log \frac{|\Sigma_2|}{|\Sigma_1|} + \frac{1}{2} \mathrm{tr}(\Sigma_1 \Sigma_2^{-1}) + \frac{1}{2} (\mu_1 - \mu_2)^T \Sigma_2^{-1} (\mu_1 - \mu_2) - \frac{d}{2} KL(N(μ1,Σ1)∥N(μ2,Σ2))=21log∣Σ1∣∣Σ2∣+21tr(Σ1Σ2−1)+21(μ1−μ2)TΣ2−1(μ1−μ2)−2d
高维分量独立正态分布的KL散度
K L ( N ( μ 1 , d i a g ( σ 1 k ) k = 1 d ) ∥ N ( μ 2 , d i a g ( σ 2 k ) k = 1 d ) ) = ∑ k = 1 d ( log σ 2 k σ 1 k + σ 1 k 2 + ( μ 1 k − μ 2 k ) 2 2 σ 2 k 2 − 1 2 ) \mathrm{KL}\left( \mathcal{N}(\mu_1, \mathrm{diag}(\sigma_{1k})_{k=1}^{d}) \| \mathcal{N}(\mu_2, \mathrm{diag}(\sigma_{2k})_{k=1}^{d}) \right) = \sum_{k=1}^{d} \left( \log \frac{\sigma_{2k}}{\sigma_{1k}} + \frac{\sigma_{1k}^2 + (\mu_{1k} - \mu_{2k})^2}{2 \sigma_{2k}^2} - \frac{1}{2} \right) KL(N(μ1,diag(σ1k)k=1d)∥N(μ2,diag(σ2k)k=1d))=k=1∑d(logσ1kσ2k+2σ2k2σ1k2+(μ1k−μ2k)2−21)
https://blog.csdn.net/hegsns/article/details/104857277
(很详细)https://math.stackexchange.com/questions/4013102/proof-that-expectation-of-xt-sigma-1-x-dim-x