李航《统计学习方法》EM算法导出,式9.13详细推导

感觉书中对式子9.13的推导不严谨,补充式子(9-13)完整推导:
对于观测数据Y(不完全数据)关于参数 θ 的对数似然函数:
L ( θ ) = log ⁡ P ( Y ∣ θ ) = log ⁡ ∑ Z P ( Y , Z ∣ θ ) = log ⁡ ( ∑ Z P ( Y ∣ Z , θ ) P ( Z ∣ θ ) ) \begin{aligned} L(\theta) &=\log P(Y | \theta)=\log \sum_{Z} P(Y, Z | \theta) \\ &=\log \left(\sum_{Z} P(Y | Z, \theta) P(Z | \theta)\right) \end{aligned} L(θ)=logP(Yθ)=logZP(Y,Zθ)=log(ZP(YZ,θ)P(Zθ))
为了通过迭代的方法找出该极大似然函数,我们希望新的估计值θ能使*L(θ)*增加,即
L ( θ ) > L ( θ ( i ) ) L(\theta)>L\left(\theta^{(i)}\right) L(θ)>L(θ(i))
为此,考虑两者的差
L ( θ ) − L ( θ ( i ) ) = log ⁡ ( ∑ Z P ( Y ∣ Z , θ ) P ( Z ∣ θ ) ) − log ⁡ P ( Y ∣ θ ( i ) ) = log ⁡ ( ∑ Z P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Y ∣ θ ( i ) ) ) = log ⁡ ( ∑ Z P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Y ∣ θ ( i ) ) ) = log ⁡ ( ∑ Z P ( Z ∣ Y , θ ( i ) ) P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Y ∣ θ ( i ) ) P ( Z ∣ Y , θ ( i ) ) ) L(\theta)-L\left(\theta^{(i)}\right) = \log \left(\sum_{Z} P(Y | Z, \theta) P(Z | \theta)\right)-\log P\left(Y | \theta^{(i)}\right) \\ =\log \left(\frac{\sum_{Z} P(Y | Z, \theta) P(Z | \theta)}{P\left(Y | \theta^{(i)}\right)}\right) \\ =\log \left(\sum_{Z} \frac{P(Y | Z, \theta) P(Z | \theta)}{P\left(Y | \theta^{(i)}\right)}\right) \\ =\log \left(\sum_{Z}P(Z|Y,\theta^{(i)}) \frac{P(Y | Z, \theta) P(Z | \theta)}{P\left(Y | \theta^{(i)}\right)P(Z|Y,\theta^{(i)})}\right) L(θ)L(θ(i))=log(ZP(YZ,θ)P(Zθ))logP(Yθ(i))=log(P(Yθ(i))ZP(YZ,θ)P(Zθ))=log(ZP(Yθ(i))P(YZ,θ)P(Zθ))=log(ZP(ZY,θ(i))P(Yθ(i))P(ZY,θ(i))P(YZ,θ)P(Zθ))
根据Jenson不等式,有
L ( θ ) − L ( θ ( i ) ) = log ⁡ ( ∑ Z P ( Z ∣ Y , θ ( i ) ) P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Y ∣ θ ( i ) ) P ( Z ∣ Y , θ ( i ) ) ) ≥ ∑ Z P ( Z ∣ Y , θ ( i ) ) log ⁡ ( P ( Y ∣ Z , θ ) P ( Z ∣ θ ) P ( Y ∣ θ ( i ) ) P ( Z ∣ Y , θ ( i ) ) ) L(\theta)-L\left(\theta^{(i)}\right)=\log \left(\sum_{Z}P(Z|Y,\theta^{(i)}) \frac{P(Y | Z, \theta) P(Z | \theta)}{P\left(Y | \theta^{(i)}\right)P(Z|Y,\theta^{(i)})}\right) \\ \geq \sum_{Z}P(Z|Y,\theta^{(i)}) \log \left(\frac{P(Y | Z, \theta) P(Z | \theta)}{P\left(Y | \theta^{(i)}\right)P(Z|Y,\theta^{(i)})} \right) L(θ)L(θ(i))=log(ZP(ZY,θ(i))P(Yθ(i))P(ZY,θ(i))P(YZ,θ)P(Zθ))ZP(ZY,θ(i))log(P(Yθ(i))P(ZY,θ(i))P(YZ,θ)P(Zθ))
接下来的过程书本很详细了。

附注:Jenson不等式有限形式

Ω \Omega Ω 是有限集合 { x 1 , x 2 , … , x n } \left\{x_{1}, x_{2}, \ldots, x_{n}\right\} {x1,x2,,xn} ,而 μ \mu μ Ω \Omega Ω 上的正规计数测度,则不等式的一般形式可以简单地用和式表示:
φ ( ∑ i = 1 n g ( x i ) λ i ) ≤ ∑ i = 1 n φ ( g ( x i ) ) λ i \varphi\left(\sum_{i=1}^{n} g\left(x_{i}\right) \lambda_{i}\right) \leq \sum_{i=1}^{n} \varphi\left(g\left(x_{i}\right)\right) \lambda_{i} φ(i=1ng(xi)λi)i=1nφ(g(xi))λi
其中 λ 1 + λ 2 + ⋯ + λ n = 1 , λ i ≥ 0 \lambda_{1}+\lambda_{2}+\cdots+\lambda_{n}=1, \lambda_{i} \geq 0 λ1+λ2++λn=1,λi0
φ \varphi φ是凹函数,只需把不等式符号调转

参考链接

  • https://zh.wikipedia.org/wiki/延森不等式
  • https://zhuanlan.zhihu.com/p/39315786 证明

你可能感兴趣的:(李航《统计学习方法》学习笔记)