两个高斯分布之间的KL散度

一元高斯分布的概率分布函数为

f ( x ) = 1 2 π σ e − ( x − μ ) 2 2 σ 2 . (1.1) f\left( x \right) = \frac{1}{{\sqrt {2\pi } \sigma }}{e^{ - \frac{{{{\left( {x - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}.\tag{1.1} f(x)=2π σ1e2σ2(xμ)2.(1.1)

给定两个随机分布 p ( x ) p\left( x \right) p(x) q ( x ) q\left( x \right) q(x),两者的 KL 散度定义为

K L ( p ∥ q ) = ∫ p ( x ) log ⁡ p ( x ) q ( x ) d x . (1.2) KL\left( {p\parallel q} \right) = \int {p\left( x \right)\log \frac{{p\left( x \right)}}{{q\left( x \right)}}dx} .\tag{1.2} KL(pq)=p(x)logq(x)p(x)dx.(1.2)

那么,对于两独立高斯分布 X 1 ∼ p ( x ) = N ( μ 1 , σ 1 2 ) {X_1} \sim p\left( x \right) = {\cal N}\left( {{\mu _1},\sigma _1^2} \right) X1p(x)=N(μ1,σ12) X 2 ∼ q ( x ) = N ( μ 2 , σ 2 2 ) {X_2} \sim q\left( x \right) = {\cal N}\left( {{\mu _2},\sigma _2^2} \right) X2q(x)=N(μ2,σ22),可求得两分布的 KL 散度为

K L ( p ∥ q ) = ∫ p ( x ) log ⁡ p ( x ) q ( x ) d x = ∫ p ( x ) log ⁡ σ 2 σ 1 e − ( x − μ 1 ) 2 2 σ 1 2 + ( x − μ 2 ) 2 2 σ 2 2 d x = ∫ p ( x ) log ⁡ σ 2 σ 1 d x + ∫ p ( x ) [ − ( x − μ 1 ) 2 2 σ 1 2 + ( x − μ 2 ) 2 2 σ 2 2 ] d x = σ 1 2 − σ 2 2 2 σ 1 2 σ 2 2 ∫ p ( x ) x 2 d x + μ 1 σ 2 2 − μ 2 σ 1 2 σ 1 2 σ 2 2 ∫ p ( x ) x d x + ( μ 2 2 σ 1 2 − μ 1 2 σ 2 2 2 σ 1 2 σ 2 2 + log ⁡ σ 2 σ 1 ) ∫ p ( x ) d x . (1.3) \begin{aligned} KL\left( {p\parallel q} \right) &= \int {p\left( x \right)\log \frac{{p\left( x \right)}}{{q\left( x \right)}}dx} \\ &= \int {p\left( x \right)\log \frac{{{\sigma _2}}}{{{\sigma _1}}}{e^{ - \frac{{{{\left( {x - {\mu _1}} \right)}^2}}}{{2\sigma _1^2}} + \frac{{{{\left( {x - {\mu _2}} \right)}^2}}}{{2\sigma _2^2}}}}dx} \\ &= \int {p\left( x \right)\log \frac{{{\sigma _2}}}{{{\sigma _1}}}dx} + \int {p\left( x \right)\left[ { - \frac{{{{\left( {x - {\mu _1}} \right)}^2}}}{{2\sigma _1^2}} + \frac{{{{\left( {x - {\mu _2}} \right)}^2}}}{{2\sigma _2^2}}} \right]dx} \\ &= \frac{{\sigma _1^2 - \sigma _2^2}}{{2\sigma _1^2\sigma _2^2}}\int {p\left( x \right){x^2}dx} + \frac{{{\mu _1}\sigma _2^2 - {\mu _2}\sigma _1^2}}{{\sigma _1^2\sigma _2^2}}\int {p\left( x \right)xdx} \\ &+ \left( {\frac{{\mu _2^2\sigma _1^2 - \mu _1^2\sigma _2^2}}{{2\sigma _1^2\sigma _2^2}} + \log \frac{{{\sigma _2}}}{{{\sigma _1}}}} \right)\int {p\left( x \right)dx} . \end{aligned} \tag{1.3} KL(pq)=p(x)logq(x)p(x)dx=p(x)logσ1σ2e2σ12(xμ1)2+2σ22(xμ2)2dx=p(x)logσ1σ2dx+p(x)[2σ12(xμ1)2+2σ22(xμ2)2]dx=2σ12σ22σ12σ22p(x)x2dx+σ12σ22μ1σ22μ2σ12p(x)xdx+(2σ12σ22μ22σ12μ12σ22+logσ1σ2)p(x)dx.(1.3)

以上可分为三部分。对于第二部分,积分部分即为 E ( X 1 ) E\left( {{X_1}} \right) E(X1),所以有

μ 1 σ 2 2 − μ 2 σ 1 2 σ 1 2 σ 2 2 ∫ p ( x ) x d x = μ 1 σ 2 2 − μ 2 σ 1 2 σ 1 2 σ 2 2 E ( X 1 ) = μ 1 2 σ 2 2 − μ 1 μ 2 σ 1 2 σ 1 2 σ 2 2 . (1.4) \frac{{{\mu _1}\sigma _2^2 - {\mu _2}\sigma _1^2}}{{\sigma _1^2\sigma _2^2}}\int {p\left( x \right)xdx} = \frac{{{\mu _1}\sigma _2^2 - {\mu _2}\sigma _1^2}}{{\sigma _1^2\sigma _2^2}}E\left( {{X_1}} \right) = \frac{{\mu _1^2\sigma _2^2 - {\mu _1}{\mu _2}\sigma _1^2}}{{\sigma _1^2\sigma _2^2}}.\tag{1.4} σ12σ22μ1σ22μ2σ12p(x)xdx=σ12σ22μ1σ22μ2σ12E(X1)=σ12σ22μ12σ22μ1μ2σ12.(1.4)

对于第三部分,积分部分即为 E ( 1 ) E\left( 1 \right) E(1),恒为 1,所以有

( μ 2 2 σ 1 2 − μ 1 2 σ 2 2 2 σ 1 2 σ 2 2 + log ⁡ σ 2 σ 1 ) ∫ p ( x ) d x = μ 2 2 σ 1 2 − μ 1 2 σ 2 2 2 σ 1 2 σ 2 2 + log ⁡ σ 2 σ 1 . (1.5) \left( {\frac{{\mu _2^2\sigma _1^2 - \mu _1^2\sigma _2^2}}{{2\sigma _1^2\sigma _2^2}} + \log \frac{{{\sigma _2}}}{{{\sigma _1}}}} \right)\int {p\left( x \right)dx} = \frac{{\mu _2^2\sigma _1^2 - \mu _1^2\sigma _2^2}}{{2\sigma _1^2\sigma _2^2}} + \log \frac{{{\sigma _2}}}{{{\sigma _1}}}.\tag{1.5} (2σ12σ22μ22σ12μ12σ22+logσ1σ2)p(x)dx=2σ12σ22μ22σ12μ12σ22+logσ1σ2.(1.5)

对于第一部分,积分部分即为 E ( X 1 2 ) E\left( {X_1^2} \right) E(X12),根据方差与均值的关系,即

D ( X ) = E ( X 2 ) − E 2 ( X ) . (1.6) D\left( X \right) = E\left( {{X^2}} \right) - {E^2}\left( X \right).\tag{1.6} D(X)=E(X2)E2(X).(1.6)

所以有

σ 1 2 − σ 2 2 2 σ 1 2 σ 2 2 ∫ p ( x ) x 2 d x = σ 1 2 − σ 2 2 2 σ 1 2 σ 2 2 [ D ( X 1 ) + E 2 ( X 1 ) ] = σ 1 2 − σ 2 2 2 σ 1 2 σ 2 2 ( σ 1 2 + μ 1 2 ) . (1.7) \frac{{\sigma _1^2 - \sigma _2^2}}{{2\sigma _1^2\sigma _2^2}}\int {p\left( x \right){x^2}dx} = \frac{{\sigma _1^2 - \sigma _2^2}}{{2\sigma _1^2\sigma _2^2}}\left[ {D\left( {{X_1}} \right) + {E^2}\left( {{X_1}} \right)} \right] = \frac{{\sigma _1^2 - \sigma _2^2}}{{2\sigma _1^2\sigma _2^2}}\left( {\sigma _1^2 + \mu _1^2} \right).\tag{1.7} 2σ12σ22σ12σ22p(x)x2dx=2σ12σ22σ12σ22[D(X1)+E2(X1)]=2σ12σ22σ12σ22(σ12+μ12).(1.7)

综合可得

K L ( p ∥ q ) = σ 1 2 − σ 2 2 2 σ 1 2 σ 2 2 ( σ 1 2 + μ 1 2 ) + μ 1 2 σ 2 2 − μ 1 μ 2 σ 1 2 σ 1 2 σ 2 2 + μ 2 2 σ 1 2 − μ 1 2 σ 2 2 2 σ 1 2 σ 2 2 + log ⁡ σ 2 σ 1 = ( μ 1 − μ 2 ) 2 + ( σ 1 2 − σ 2 2 ) 2 σ 2 2 + log ⁡ σ 2 σ 1 . (1.8) \begin{aligned} KL\left( {p\parallel q} \right) &= \frac{{\sigma _1^2 - \sigma _2^2}}{{2\sigma _1^2\sigma _2^2}}\left( {\sigma _1^2 + \mu _1^2} \right) + \frac{{\mu _1^2\sigma _2^2 - {\mu _1}{\mu _2}\sigma _1^2}}{{\sigma _1^2\sigma _2^2}} + \frac{{\mu _2^2\sigma _1^2 - \mu _1^2\sigma _2^2}}{{2\sigma _1^2\sigma _2^2}} + \log \frac{{{\sigma _2}}}{{{\sigma _1}}}\\ &= \frac{{{{\left( {{\mu _1} - {\mu _2}} \right)}^2} + \left( {\sigma _1^2 - \sigma _2^2} \right)}}{{2\sigma _2^2}} + \log \frac{{{\sigma _2}}}{{{\sigma _1}}}. \end{aligned} \tag{1.8} KL(pq)=2σ12σ22σ12σ22(σ12+μ12)+σ12σ22μ12σ22μ1μ2σ12+2σ12σ22μ22σ12μ12σ22+logσ1σ2=2σ22(μ1μ2)2+(σ12σ22)+logσ1σ2.(1.8)

X 2 ∼ q ( x ) = N ( 0 , 1 ) {X_2} \sim q\left( x \right) = {\cal N}\left( {0,1} \right) X2q(x)=N(0,1) 即为标准正态分布时,可化简为

K L ( p ∥ q ) = − log ⁡ σ 1 + μ 1 2 + σ 1 2 − 1 2 = − 1 2 ( 1 + log ⁡ σ 1 2 − μ 1 2 − σ 1 2 ) . (1.9) \begin{aligned} KL\left( {p\parallel q} \right) &= - \log {\sigma _1} + \frac{{\mu _1^2 + \sigma _1^2 - 1}}{2}\\ &= - \frac{1}{2}\left( {1 + \log \sigma _1^2 - \mu _1^2 - \sigma _1^2} \right). \end{aligned} \tag{1.9} KL(pq)=logσ1+2μ12+σ121=21(1+logσ12μ12σ12).(1.9)

因此,高斯分布之间的KL散度具有非常简洁的解析式,而无需进行真正的积分运算,这也是为什么高斯分布被广泛应用于概率分析的原因。对于多元高斯分布,一般假设各分量独立,即协方差矩阵为对角矩阵,此时只需要将各分量的 KL 散度求和即可。

你可能感兴趣的:(算法,概率论,算法,机器学习,变分自编码器)