KL散度公式详解

目录

文章目录

    • Jensen's inequality
    • 讲解KL散度(又名relative entropy)
    • mutual information

Jensen’s inequality

  • f ( ∫ x p ( x ) d x ) ⩽ ∫ f ( x ) p ( x ) d x f(\int\mathrm{x}p(x)dx)\leqslant\int\mathbb{f}(x)p(x)dx f(xp(x)dx)f(x)p(x)dx,根据 f ( E ( x ) ) ⩽ E ( f ( x ) ) f(E(x))\leqslant\mathbb{E}(f(x)) f(E(x))E(f(x))Jensen’s inequality推。
  • K L ( p ∥ q ) = − ∫ p ( x ) ln ⁡ { q ( x ) p ( x ) } d x ⩾ − ln ⁡ ∫ q ( x ) d x = 0 \mathrm{KL}(p \| q)=-\int p(\mathbf{x}) \ln \left\{\frac{q(\mathbf{x})}{p(\mathbf{x})}\right\} \mathrm{d} \mathbf{x} \geqslant-\ln \int q(\mathbf{x}) \mathrm{d} \mathbf{x}=0 KL(pq)=p(x)ln{p(x)q(x)}dxlnq(x)dx=0,只有当 p ( x ) p(x) p(x), q ( x ) q(x) q(x)相等时等号成立。

讲解KL散度(又名relative entropy)

  • 定义 K L ( p ∥ q ) = − ∫ p ( x ) ln ⁡ { q ( x ) p ( x ) } d x \mathrm{KL}(p \| q)=-\int p(\mathbf{x}) \ln \left\{\frac{q(\mathbf{x})}{p(\mathbf{x})}\right\} \mathrm{d} \mathbf{x} KL(pq)=p(x)ln{p(x)q(x)}dx
  • − l n x -lnx lnx是严格的凸函数,由Jensen’s inequality有 K L ( p ∥ q ) = − ∫ p ( x ) ln ⁡ { q ( x ) p ( x ) } d x ⩾ − ln ⁡ ∫ q ( x ) d x = 0 \mathrm{KL}(p \| q)=-\int p(\mathbf{x}) \ln \left\{\frac{q(\mathbf{x})}{p(\mathbf{x})}\right\} \mathrm{d} \mathbf{x} \geqslant-\ln \int q(\mathbf{x}) \mathrm{d} \mathbf{x}=0 KL(pq)=p(x)ln{p(x)q(x)}dxlnq(x)dx=0
  • 在实际应用中 K L ( p ∥ q ) ≃ ∑ n = 1 N { − ln ⁡ q ( x n ∣ θ ) + ln ⁡ p ( x n ) } \mathrm{KL}(p \| q) \simeq \sum_{n=1}^{N}\left\{-\ln q\left(\mathbf{x}_{n} | \boldsymbol{\theta}\right)+\ln p\left(\mathbf{x}_{n}\right)\right\} KL(pq)n=1N{lnq(xnθ)+lnp(xn)}
    • 注释:对于前面KL定义可知用的样本点服从 p ( x ) p(x) p(x),故原来积分可等于上式,例如 E ( x ) = ∫ x f ( x ) d x ≃ 1 N ∑ f ( x i ) E(x)=\int\mathrm{x}f(x)dx\simeq\frac{1}{N}\sum\mathrm{f}(x_{i}) E(x)=xf(x)dxN1f(xi),重要性采样等方法都用到这个方法。

mutual information

1.如果数据集变量x与y不独立,就考虑 p ( x ) p ( y ) p(x)p(y) p(x)p(y)去近似,就可得到mutual information:

I [ x , y ] ≡ K L ( p ( x , y ) ∥ p ( x ) p ( y ) ) = − ∬ p ( x , y ) ln ⁡ ( p ( x ) p ( y ) p ( x , y ) ) d x d y \begin{aligned} \mathrm{I}[\mathbf{x}, \mathbf{y}] & \equiv \mathrm{KL}(p(\mathbf{x},\mathbf{y})\|p(\mathbf{x})p(\mathbf{y})) \\ &=-\iint p(\mathbf{x}, \mathbf{y})\ln\left(\frac{p(\mathbf{x}) p(\mathbf{y})}{p(\mathbf{x}, \mathbf{y})}\right) \mathrm{d} \mathbf{x} \mathrm{d} \mathbf{y} \end{aligned} I[x,y]KL(p(x,y)p(x)p(y))=p(x,y)ln(p(x,y)p(x)p(y))dxdy

2.利用概率的和法则和乘积法则,可以得出互信息与条件熵的关系:

I [ x , y ] = H [ x ] − H [ x ∣ y ] = H [ y ] − H [ y ∣ x ] \mathrm{I}[\mathbf{x}, \mathbf{y}]=\mathrm{H}[\mathbf{x}]-\mathrm{H}[\mathbf{x} | \mathbf{y}]=\mathrm{H}[\mathbf{y}]-\mathrm{H}[\mathbf{y} | \mathbf{x}] I[x,y]=H[x]H[xy]=H[y]H[yx]

你可能感兴趣的:(算法)