EM算法

CMU 10601 的课程笔记。EM 算法计算含有隐含变量的概率模型参数估计,能使用在一些无监督的聚类方法上。在 EM 算法总结提出以前就有该算法思想的方法提出,例如 HMM 中用的 Baum-Welch 算法。

MLE 的通用步骤:
Steps:

  1. 写出似然函数;
  2. 对似然函数取对数,并整理;
  3. 求导数,令导数为0,得到似然方程;
  4. 解似然方程,得到的参数即为所求;

然而有的时候我们会陷入困境,因为似然方程没法求解,比如 Mixtures of Gaussians。

One Gaussian

Given data: X1,…Xn
Modeling assumption: Xi(independently) ~ Gaussian, $sigma^2$=1/2
LE
$$P(x_1,..,x_n|mu)={1 over sqrt{2 pi sigma^2}} e^{-(x_i-mu)^2 over 2 sigma^2} $$

MLE: $hat mu_{ML}= argmax L(x_1,..,x_n|mu)$
$mu_{ML} = {sum_{i=1}^n x_i over n} = overline x_i$

K Gaussians

Given data: X1,…Xn
Modeling assumption: Xi(independently) ~ Mixture of K Gaussians $sigma^2$=1/2, $mu_1,…mu_k$
Prior probability over Gaussians: $lambda_1,…,lambda_k 0 le lambda_j le 1 sum lambda_j=1$, distribution of j is unknown, $lambda_j, mu_j$ is unknown
LE
$$
begin{aligned}
P(x_1,..,x_n|hat lambda,hat mu,sigma^2=0.5) & = prod_i^n[sum_j^k P(Gaussian j was chosen)P(xi|Gaussian j was chosen)] \
& = prod_i^n[sum_j^k lambda_j{1 over sqrt{2 pi sigma^2}} e^{-(x_i-mu_j)^2 over 2 sigma_j^2}]
end{aligned}
$$

MLE: $(hat mu, hat lambda)_{ML}= argmax L(x1,..,xn|hat lambda,hat mu,sigma^2=0.5)$

How to find MLE?

“SOFT” K-MEANS

soft: our guesses are probabilities and taking values in [0,1]
hard: represents a single best guess (taking values in {0,1} or {1,…k})

Special case of the EM Algorithm
$sigma^2$=1/2

初始化
初始化分布参数θ, $mu^{(0)}$,$lambda^{(0)}$

重复 EM 步骤直到收敛
1.E-step
Fill in the missing variables with the expected values
根据参数初始值或上一次迭代的模型参数来计算出隐性变量的后验概率,其实就是隐性变量的期望。作为隐藏变量的现估计值:
$$w_j^{(i)}=P(z^{(i)}=j|x^{(i)};mu,lambda,sigma^2=0.5) $$

2.M-Step
Regular maximum likelihood estimation (MLE) using the values computed in the E step and the values of the other variables
将似然函数最大化以获得新的参数值 $mu^{(l+1)}$,$lambda^{(l+1)}$

  • l: index over EM iterations
  • j: index of Gaussians
  • i: index of datapoints
    $$hat lambda_j={# j was chosen over n}={sum_{i=1}^n w_j^{(i)} over n}$$
    $$hat mu_j={sum_{i=1}^n w_j^{(i)} x^{(i)} over sum_{i=1}^n w_j^{(i)}}$$

EM Algorithm guarantees:
$$L(D|theta^{[0]}) le L(D|theta^{[1]}) le L(D|theta^{[2]})…$$
This is a Non-decreasing function. And if the likelihood function is bounded, the sequence will converge. Here the example is bounded, because $sigma$ is fixed.
大专栏  EM算法>It might end up with local optimal.

例题

EM算法_第1张图片

第一次 iteration 的表格表示
EM算法_第2张图片

General EM algorithm

X observed data
Z unobserved ‘data’
Complete data Y=(X,Z)
Likelihood function $L(X,Z|theta)$ known

Problem: find MLE
$$
begin{aligned}
hat theta_{ML} & = argmax L(X|theta)= argmax sum_Z P(X,Z|theta) \
& = argmax sum_Z P(Z|theta)P(X|Z,theta) \
& = argmax sum_Z P(X|theta)P(Z|X,theta)P(X|Z,theta)
end{aligned}
$$

Chicken-egg Problem with regard to P(Z),$theta$
先有鸡还是先有蛋的问题。当我们知道了哪些 datapoints 属于同一个高斯分布的时候,我们才能够对这个分布的参数作出靠谱的预测,然而现在 datapoints 混在了一起,我们不知道哪个属于哪个,也就没办法估计参数。反过来,只有当我们对 K 个分布的参数作了准确的估计的时候,才知道哪些 datapoints 属于同一个高斯分布。
所以陷入了僵局,怎么办?不如就先随便整一个值出来,看你怎么变,然后我再根据你的变化调整我的变化,然后如此迭代着不断互相推导,最终就会收敛到一个解。这就是EM算法的基本思想。 -> chicken stay, egg change

EM solution

  1. Initialize $theta$ to some value
  2. EM 方程
    $$theta^{[l+1]}=argmax E_{Z|theta^l}[logP(Z,X|theta)]$$
    E 就是 expectation 的 E, write expression for E as function of $theta$, expression: auxiliary function $Q(theta^{[l]}|theta)$
    M 就是 argmax 的 M, find the MAX over $theta$
    =>
    Simple EM solution:
    • E-step: express MLE as function of X’s and Z’s
      know the value of Z, what would the ML solution
    • M-step: replace each Z with $E[Z|theta]$

EM guarantees
$$L(X|theta^{[l]}) le L(X|theta^{l+1})$$

If likelihood is bounded, the sequence will converge.

Special but most commonly use:
$L(X,Z|theta)$ is a member of the exponential family

适用情景

When to use:

  • Data is only partially observable
  • Unsupervised clustering(target value unobservable)
  • Supervised learning(some instance attributes unobservable)

Some uses:

  • Train Bayesian Belief Networks
  • Unsupervised clustering(AUTOCLASS)
  • Learning Hidden Markov Models

参考链接:
从最大似然到EM算法浅解
(EM算法)The EM Algorithm
EM(Expectation-Maximization)算法

你可能感兴趣的:(EM算法)