高斯分布的KL散度

两个高斯分布的KL散度其实很简单,只要找到合适的方法。

一. 一维高斯分布
KL散度的定义为:
K L ( N ( μ 1 , σ 1 2 ) ∣ ∣ N ( μ 2 , σ 2 2 ) ) = ∫ x 1 2 π σ 1 e − ( x − μ 1 ) 2 2 σ 1 2 log ⁡ 1 2 π σ 1 e − ( x − μ 1 ) 2 2 σ 1 2 1 2 π σ 2 e − ( x − μ 2 ) 2 2 σ 2 2 d x = ∫ x 1 2 π σ 1 e − ( x − μ 1 ) 2 2 σ 1 2 [ log ⁡ σ 2 σ 1 − ( x − μ 1 ) 2 2 σ 1 2 + ( x − μ 2 ) 2 2 σ 2 2 ] d x \begin{aligned} KL(\mathcal{N}(\mu_1, \sigma_1^2) || \mathcal{N}(\mu_2, \sigma_2^2)) &= \int_x \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} \log \frac{\frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}}}{\frac{1}{\sqrt{2\pi}\sigma_2} e^{-\frac{(x-\mu_2)^2}{2\sigma_2^2}}} dx \\ &= \int_x \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} \Bigg[ \log \frac{\sigma_2}{\sigma_1} - \frac{(x-\mu_1)^2}{2\sigma_1^2} + \frac{(x-\mu_2)^2}{2\sigma_2^2} \Bigg] dx \end{aligned} KL(N(μ1,σ12)N(μ2,σ22))=x2π σ11e2σ12(xμ1)2log2π σ21e2σ22(xμ2)22π σ11e2σ12(xμ1)2dx=x2π σ11e2σ12(xμ1)2[logσ1σ22σ12(xμ1)2+2σ22(xμ2)2]dx

第一项很简单,用全积分为1的性质即可:
log ⁡ σ 2 σ 1 ∫ x 1 2 π σ 1 e − ( x − μ 1 ) 2 2 σ 1 2 d x = log ⁡ σ 2 σ 1 \begin{aligned} \log \frac{\sigma_2}{\sigma_1} \int_x \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} dx = \log \frac{\sigma_2}{\sigma_1} \end{aligned} logσ1σ2x2π σ11e2σ12(xμ1)2dx=logσ1σ2

第二项需要分辨出积分项为方差:
− 1 2 σ 1 2 ∫ x ( x − μ 1 ) 2 1 2 π σ 1 e − ( x − μ 1 ) 2 2 σ 1 2 d x = − 1 2 σ 1 2 σ 1 2 = − 1 2 \begin{aligned} -\frac{1}{2\sigma_1^2} \int_x (x-\mu_1)^2 \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} dx = -\frac{1}{2\sigma_1^2} \sigma_1^2 = -\frac{1}{2} \end{aligned} 2σ121x(xμ1)22π σ11e2σ12(xμ1)2dx=2σ121σ12=21

第三项的积分内部分别是均方值、均值和常数,因此可以得到:
1 2 σ 2 2 ∫ x ( x − μ 2 ) 2 1 2 π σ 1 e − ( x − μ 1 ) 2 2 σ 1 2 d x = 1 2 σ 2 2 ∫ x ( x 2 − 2 μ 2 x + μ 2 2 ) 1 2 π σ 1 e − ( x − μ 1 ) 2 2 σ 1 2 d x = σ 1 2 + μ 1 2 − 2 μ 1 μ 2 + μ 2 2 2 σ 2 2 = σ 1 2 + ( μ 1 − μ 2 ) 2 2 σ 2 2 \begin{aligned} \frac{1}{2\sigma_2^2} \int_x (x-\mu_2)^2 \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} dx &= \frac{1}{2\sigma_2^2} \int_x ( x^2 - 2\mu_2 x + \mu_2^2 ) \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} dx \\ &= \frac{\sigma_1^2 + \mu_1^2 - 2 \mu_1 \mu_2+ \mu_2^2}{2\sigma_2^2} = \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2\sigma_2^2} \end{aligned} 2σ221x(xμ2)22π σ11e2σ12(xμ1)2dx=2σ221x(x22μ2x+μ22)2π σ11e2σ12(xμ1)2dx=2σ22σ12+μ122μ1μ2+μ22=2σ22σ12+(μ1μ2)2
也可以用一个小技巧来化简,其中第一项为方差,第二项为奇函数全积分为0,第三项为常数可以提取为系数:
1 2 σ 2 2 ∫ x ( x − μ 2 ) 2 1 2 π σ 1 e − ( x − μ 1 ) 2 2 σ 1 2 d x = 1 2 σ 2 2 ∫ x [ ( x − μ 1 ) 2 + 2 ( μ 1 − μ 2 ) ( x − μ 1 ) + ( μ 1 − μ 2 ) 2 ] 1 2 π σ 1 e − ( x − μ 1 ) 2 2 σ 1 2 d x = σ 1 2 + ( μ 1 − μ 2 ) 2 2 σ 2 2 \begin{aligned} \frac{1}{2\sigma_2^2} \int_x (x-\mu_2)^2 \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} dx &= \frac{1}{2\sigma_2^2} \int_x \big[ (x-\mu_1)^2 + 2(\mu_1 - \mu_2)(x - \mu_1) + (\mu_1 - \mu_2)^2 \big] \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} dx \\ &= \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2\sigma_2^2} \end{aligned} 2σ221x(xμ2)22π σ11e2σ12(xμ1)2dx=2σ221x[(xμ1)2+2(μ1μ2)(xμ1)+(μ1μ2)2]2π σ11e2σ12(xμ1)2dx=2σ22σ12+(μ1μ2)2

整理最终结果,两个高斯分布的KL散度为:
K L ( N ( μ 1 , σ 1 2 ) ∣ ∣ N ( μ 2 , σ 2 2 ) ) = log ⁡ σ 2 σ 1 − 1 2 + σ 1 2 + ( μ 1 − μ 2 ) 2 2 σ 2 2 KL(\mathcal{N}(\mu_1, \sigma_1^2) || \mathcal{N}(\mu_2, \sigma_2^2)) = \log \frac{\sigma_2}{\sigma_1} -\frac{1}{2} + \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2\sigma_2^2} KL(N(μ1,σ12)N(μ2,σ22))=logσ1σ221+2σ22σ12+(μ1μ2)2

二. 多元高斯分布

N ( x ∣ μ , Σ ) = 1 ( 2 π ) K 2 ∣ Σ ∣ 1 2 e − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) \begin{aligned} \mathcal{N}(x | \mu, \Sigma) = \frac{1}{(2\pi)^\frac{K}{2} |\Sigma|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu)^T \Sigma^{-1} (x - \mu)} \end{aligned} N(xμ,Σ)=(2π)2KΣ211e21(xμ)TΣ1(xμ)

K L ( N ( x ∣ μ 1 , Σ 1 ) ∣ ∣ N ( x ∣ μ 2 , Σ 2 ) ) = ∫ x 1 ⋯ ∫ x K 1 ( 2 π ) K 2 ∣ Σ 1 ∣ 1 2 e − 1 2 ( x − μ 1 ) T Σ 1 − 1 ( x − μ 1 ) log ⁡ 1 ( 2 π ) K 2 ∣ Σ 1 ∣ 1 2 e − 1 2 ( x − μ 1 ) T Σ 1 − 1 ( x − μ 1 ) 1 ( 2 π ) K 2 ∣ Σ 2 ∣ 1 2 e − 1 2 ( x − μ 2 ) T Σ 2 − 1 ( x − μ 2 ) d x 1 ⋯ d x K = ∫ x 1 ⋯ ∫ x K 1 ( 2 π ) K 2 ∣ Σ 1 ∣ 1 2 e − 1 2 ( x − μ 1 ) T Σ − 1 ( x − μ 1 ) [ 1 2 log ⁡ ∣ Σ 2 ∣ ∣ Σ 1 ∣ − 1 2 ( x − μ 1 ) T Σ 1 − 1 ( x − μ 1 ) + 1 2 ( x − μ 2 ) T Σ 2 − 1 ( x − μ 2 ) ] d x 1 ⋯ d x K \begin{aligned} KL(\mathcal{N}(x | \mu_1, \Sigma_1) || \mathcal{N}(x | \mu_2, \Sigma_2)) &= \int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma_1|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1)} \log \frac{\frac{1}{(2\pi)^\frac{K}{2} |\Sigma_1|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1)}}{\frac{1}{(2\pi)^\frac{K}{2} |\Sigma_2|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_2)^T \Sigma_2^{-1} (x - \mu_2)}} dx_1 \cdots dx_K \\ &= \int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma_1|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_1)^T \Sigma^{-1} (x - \mu_1)} \Bigg[ \frac{1}{2} \log \frac{|\Sigma_2|}{|\Sigma_1|} - \frac{1}{2}(x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1) + \frac{1}{2}(x - \mu_2)^T \Sigma_2^{-1} (x - \mu_2) \Bigg] dx_1 \cdots dx_K \end{aligned} KL(N(xμ1,Σ1)N(xμ2,Σ2))=x1xK(2π)2KΣ1211e21(xμ1)TΣ11(xμ1)log(2π)2KΣ2211e21(xμ2)TΣ21(xμ2)(2π)2KΣ1211e21(xμ1)TΣ11(xμ1)dx1dxK=x1xK(2π)2KΣ1211e21(xμ1)TΣ1(xμ1)[21logΣ1Σ221(xμ1)TΣ11(xμ1)+21(xμ2)TΣ21(xμ2)]dx1dxK
同样分别计算三项的结果,第一项:
1 2 log ⁡ ∣ Σ 2 ∣ ∣ Σ 1 ∣ ∫ x 1 ⋯ ∫ x K 1 ( 2 π ) K 2 ∣ Σ 1 ∣ 1 2 e − 1 2 ( x − μ 1 ) T Σ 1 − 1 ( x − μ 1 ) d x 1 ⋯ d x K = 1 2 log ⁡ ∣ Σ 2 ∣ ∣ Σ 1 ∣ \begin{aligned} \frac{1}{2} \log \frac{|\Sigma_2|}{|\Sigma_1|} \int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma_1|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1)} dx_1 \cdots dx_K = \frac{1}{2} \log \frac{|\Sigma_2|}{|\Sigma_1|} \end{aligned} 21logΣ1Σ2x1xK(2π)2KΣ1211e21(xμ1)TΣ11(xμ1)dx1dxK=21logΣ1Σ2
第二项:
− 1 2 ∫ x 1 ⋯ ∫ x K 1 ( 2 π ) K 2 ∣ Σ 1 ∣ 1 2 e − 1 2 ( x − μ 1 ) T Σ 1 − 1 ( x − μ 1 ) ( x − μ 1 ) T Σ 1 − 1 ( x − μ 1 ) d x 1 ⋯ d x K \begin{aligned} &-\frac{1}{2} \int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma_1|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1)} (x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1) dx_1 \cdots dx_K \\ \end{aligned} 21x1xK(2π)2KΣ1211e21(xμ1)TΣ11(xμ1)(xμ1)TΣ11(xμ1)dx1dxK
Σ 1 \Sigma_1 Σ1为半正定对称矩阵,设 Σ 1 − 1 = U T U \Sigma_1^{-1} = U^T U Σ11=UTU, y = U ( x − μ 1 ) y = U(x - \mu_1) y=U(xμ1),由于线性变换矩阵就是雅克比矩阵,因此
d y 1 ⋯ d y K = ∣ U ∣ d x 1 ⋯ d x K dy_1 \cdots dy_K = |U| dx_1 \cdots dx_K dy1dyK=Udx1dxK
∣ Σ 1 − 1 ∣ = ∣ U ∣ 2 |\Sigma_1^{-1}| = |U|^2 Σ11=U2,可知 ∣ Σ 1 − 1 2 ∣ = ∣ Σ 1 ∣ − 1 2 = ∣ U ∣ |\Sigma_1^{-\frac{1}{2}}| = |\Sigma_1|^{-\frac{1}{2}} = |U| Σ121=Σ121=U,因此
− 1 2 ∣ Σ 1 ∣ 1 2 ∫ y 1 ⋯ ∫ y K 1 ( 2 π ) K 2 e − 1 2 y T y y T y ∣ U ∣ − 1 d y 1 ⋯ d y K = − 1 2 ∣ Σ 1 ∣ 1 2 ∣ Σ 1 ∣ 1 2 ⋅ K = − K 2 \begin{aligned} &-\frac{1}{2 |\Sigma_1|^{\frac{1}{2}}} \int_{y_1} \cdots \int_{y_K} \frac{1}{(2\pi)^\frac{K}{2} } e^{-\frac{1}{2} y^Ty} y^Ty |U|^{-1} dy_1 \cdots dy_K \\ &= -\frac{1}{2 |\Sigma_1|^{\frac{1}{2}}} |\Sigma_1|^{\frac{1}{2}} \cdot K = -\frac{K}{2} \end{aligned} 2Σ1211y1yK(2π)2K1e21yTyyTyU1dy1dyK=2Σ1211Σ121K=2K

第三项需要利用一个小技巧:
x T A x = t r ( A x x T ) x^T A x = tr(A xx^T) xTAx=tr(AxxT)

1 2 ∫ x 1 ⋯ ∫ x K 1 ( 2 π ) K 2 ∣ Σ 1 ∣ 1 2 e − 1 2 ( x − μ 1 ) T Σ 1 − 1 ( x − μ 1 ) ( x − μ 2 ) T Σ 2 − 1 ( x − μ 2 ) d x 1 ⋯ d x K = 1 2 ∫ x 1 ⋯ ∫ x K 1 ( 2 π ) K 2 ∣ Σ 1 ∣ 1 2 e − 1 2 ( x − μ 1 ) T Σ 1 − 1 ( x − μ 1 ) t r [ Σ 2 − 1 ( x − μ 2 ) ( x − μ 2 ) T ] d x 1 ⋯ d x K = 1 2 t r [ Σ 2 − 1 ∫ x 1 ⋯ ∫ x K 1 ( 2 π ) K 2 ∣ Σ 1 ∣ 1 2 e − 1 2 ( x − μ 1 ) T Σ 1 − 1 ( x − μ 1 ) ( x − μ 2 ) ( x − μ 2 ) T ] d x 1 ⋯ d x K = 1 2 t r [ Σ 2 − 1 ∫ x 1 ⋯ ∫ x K 1 ( 2 π ) K 2 ∣ Σ 1 ∣ 1 2 e − 1 2 ( x − μ 1 ) T Σ 1 − 1 ( x − μ 1 ) ( x x T − μ 2 x T − x μ 2 T + μ 2 μ 2 T ) ] d x 1 ⋯ d x K \begin{aligned} &\frac{1}{2} \int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma_1|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1)} (x - \mu_2)^T \Sigma_2^{-1} (x - \mu_2) dx_1 \cdots dx_K \\ &= \frac{1}{2} \int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma_1|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1)} tr[ \Sigma_2^{-1} (x - \mu_2) (x - \mu_2)^T ] dx_1 \cdots dx_K\\ &= \frac{1}{2} tr \Bigg[ \Sigma_2^{-1} \int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma_1|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1)} (x - \mu_2) (x - \mu_2)^T \Bigg] dx_1 \cdots dx_K\\ &= \frac{1}{2} tr \Bigg[ \Sigma_2^{-1} \int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma_1|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1)} (x x^T - \mu_2 x^T - x \mu_2^T + \mu_2 \mu_2^T ) \Bigg] dx_1 \cdots dx_K\\ \end{aligned} 21x1xK(2π)2KΣ1211e21(xμ1)TΣ11(xμ1)(xμ2)TΣ21(xμ2)dx1dxK=21x1xK(2π)2KΣ1211e21(xμ1)TΣ11(xμ1)tr[Σ21(xμ2)(xμ2)T]dx1dxK=21tr[Σ21x1xK(2π)2KΣ1211e21(xμ1)TΣ11(xμ1)(xμ2)(xμ2)T]dx1dxK=21tr[Σ21x1xK(2π)2KΣ1211e21(xμ1)TΣ11(xμ1)(xxTμ2xTxμ2T+μ2μ2T)]dx1dxK
其中积分之后第一项为均方值,第二、三项为均值,第三项为常数:
1 2 t r [ Σ 2 − 1 ∫ x 1 ⋯ ∫ x K 1 ( 2 π ) K 2 ∣ Σ 1 ∣ 1 2 e − 1 2 ( x − μ 1 ) T Σ 1 − 1 ( x − μ 1 ) ( x x T − μ 2 x T − x μ 2 T + μ 2 μ 2 T ) ] d x 1 ⋯ d x K = 1 2 t r [ Σ 2 − 1 ( Σ 1 + μ 1 μ 1 T − μ 2 μ 1 T − μ 1 μ 2 T + μ 2 μ 2 T ) ] = 1 2 [ t r ( Σ 2 − 1 Σ 1 ) + t r ( Σ 2 − 1 ( μ 1 − μ 2 ) ( μ 1 − μ 2 ) T ) ] = 1 2 [ t r ( Σ 2 − 1 Σ 1 ) + ( μ 1 − μ 2 ) T Σ 2 − 1 ( μ 1 − μ 2 ) ] \begin{aligned} &\frac{1}{2} tr \Bigg[ \Sigma_2^{-1} \int_{x_1} \cdots \int_{x_K} \frac{1}{(2\pi)^\frac{K}{2} |\Sigma_1|^{\frac{1}{2}}} e^{-\frac{1}{2}(x - \mu_1)^T \Sigma_1^{-1} (x - \mu_1)} (x x^T - \mu_2 x^T - x \mu_2^T + \mu_2 \mu_2^T ) \Bigg] dx_1 \cdots dx_K\\ &= \frac{1}{2} tr [ \Sigma_2^{-1} (\Sigma_1 + \mu_1 \mu_1^T - \mu_2 \mu_1^T - \mu_1 \mu_2^T + \mu_2 \mu_2^T)] \\ &= \frac{1}{2} \big[ tr ( \Sigma_2^{-1} \Sigma_1 ) + tr( \Sigma_2^{-1} (\mu_1 - \mu_2) (\mu_1 - \mu_2)^T ) \big] \\ &= \frac{1}{2} \big[ tr ( \Sigma_2^{-1} \Sigma_1 ) + (\mu_1 - \mu_2)^T \Sigma_2^{-1} (\mu_1 - \mu_2) \big] \\ \end{aligned} 21tr[Σ21x1xK(2π)2KΣ1211e21(xμ1)TΣ11(xμ1)(xxTμ2xTxμ2T+μ2μ2T)]dx1dxK=21tr[Σ21(Σ1+μ1μ1Tμ2μ1Tμ1μ2T+μ2μ2T)]=21[tr(Σ21Σ1)+tr(Σ21(μ1μ2)(μ1μ2)T)]=21[tr(Σ21Σ1)+(μ1μ2)TΣ21(μ1μ2)]

整理最终结果,两个高斯分布的KL散度为:
K L ( N ( x ∣ μ 1 , Σ 1 ) ∣ ∣ N ( x ∣ μ 2 , Σ 2 ) ) = 1 2 [ log ⁡ ∣ Σ 2 ∣ ∣ Σ 1 ∣ − K + t r ( Σ 2 − 1 Σ 1 ) + ( μ 1 − μ 2 ) T Σ 2 − 1 ( μ 1 − μ 2 ) ] \begin{aligned} KL(\mathcal{N}(x | \mu_1, \Sigma_1) || \mathcal{N}(x | \mu_2, \Sigma_2)) = \frac{1}{2} \Bigg[ \log \frac{|\Sigma_2|}{|\Sigma_1|} - K + tr ( \Sigma_2^{-1} \Sigma_1 ) + (\mu_1 - \mu_2)^T \Sigma_2^{-1} (\mu_1 - \mu_2) \Bigg] \\ \end{aligned} KL(N(xμ1,Σ1)N(xμ2,Σ2))=21[logΣ1Σ2K+tr(Σ21Σ1)+(μ1μ2)TΣ21(μ1μ2)]

你可能感兴趣的:(高斯分布的KL散度)