Andrew Ng机器学习课程笔记(十五)之无监督学习之混合贝叶斯模型与EM算法

文章目录

  • @[toc] Preface
  • Mixture of Naive Bayes Model
  • EM Algorithm for NBMM
    • EM Algorithm for NBMM

Preface

Mixture of Naive Bayes Model(NBMM,混合朴素贝叶斯模型)
EM Algorithm for NBMM

Bernoulli

Mixture of Naive Bayes Model

NBMM是对朴素贝叶斯(Naive Bayes,NB,前面有一篇博文介绍)的一个推广。
回想到我们在那一篇博文中所讲述的文本分类的例子,即对于大小为m数据集 { x ( 1 ) , x ( 2 ) , . . . , x ( m ) } , x ( i ) ∈ { 0 , 1 } n \{x^{(1)},x^{(2)},...,x^{(m)}\},x^{(i)}\in\{0,1\}^n {x(1),x(2),...,x(m)},x(i){0,1}n,例如 x j ( i ) x_j^{(i)} xj(i)表示词语j是否在文档i中出现,然后我们对于 z ( i ) z^{(i)} z(i)建模,希望找到当前邮件是否为垃圾邮件。
对于混合朴素贝叶斯模型,我们需要对大小为m无标记数据集 { x ( 1 ) , x ( 2 ) , . . . , x ( m ) } , x ( i ) ∈ { 0 , 1 } n \{x^{(1)},x^{(2)},...,x^{(m)}\},x^{(i)}\in\{0,1\}^n {x(1),x(2),...,x(m)},x(i){0,1}n进行处理分类,我们隐含类别标签用 z ( i ) z^{(i)} z(i)表示,并认为 z ( i ) z^{(i)} z(i)服从参数为 ϕ \phi ϕ的伯努利分布,即为 z ( i ) ∼ B e r n o u l l i ( ϕ ) z^{(i)}\sim Bernoulli(\phi) z(i)Bernoulli(ϕ),且 z ∈ { 0 , 1 } z\in\{0,1\} z{0,1}
同时, p ( x ( i ) ∣ z ( i ) ) = ∏ j = 1 n p ( x j ( i ) ∣ z ( i ) ) p(x^{(i)}|z^{(i)})=\prod_{j=1}^n p(x_j^{(i)}|z^{(i)}) p(x(i)z(i))=j=1np(xj(i)z(i)),具体的有 p ( x j ( i ) = 1 ∣ z ( i ) = 0 ) = ϕ j ∣ z ( i ) = 0 p(x_j^{(i)}=1|z^{(i)}=0)=\phi_{j|z^{(i)}=0} p(xj(i)=1z(i)=0)=ϕjz(i)=0
整个模型简单描述为对于每个样例 x ( i ) x^{(i)} x(i),我们先从k 个类别中按多项式分布抽取一个 z ( i ) z^{(i)} z(i), 然后根据 z ( i ) z^{(i)} z(i)所对应的 k 个多值伯努利分布中的一个生成样例 x ( i ) x^{(i)} x(i),。整个过程称作混合伯努利模型
它的joint似然函数为:
L ( ϕ z , ϕ j ∣ z ( i ) = 1 , ϕ j ∣ z ( i ) = 0 ) = ∏ m i = 1 l o g    p ( x ( i ) , y ( i ) ) \begin{aligned} L(\phi_{z},\phi_{j|z^{(i)}=1},\phi_{j|z^{(i)}=0})&=\underset{i=1}{\overset{m}{\prod}}log\; p(x^{(i)},y^{(i)}) \end{aligned} L(ϕz,ϕjz(i)=1,ϕjz(i)=0)=i=1mlogp(x(i),y(i))
由于 z ( i ) z^{(i)} z(i)未知,上式我们无法使用求偏导,并令其等于0求得 ϕ z , ϕ j ∣ z ( i ) = 1 , ϕ j ∣ z ( i ) = 0 \phi_{z},\phi_{j|z^{(i)}=1},\phi_{j|z^{(i)}=0} ϕz,ϕjz(i)=1,ϕjz(i)=0参数。与GMM一样,我们使用EM算法来解决问题。

EM Algorithm for NBMM

回顾我们在上上一篇博文中提到的EM算法

Repeat until convergence{

  1. (E-step) for each i, set
    w j ( i ) : = Q i ( z ( i ) = j ) : = p ( z ( i ) = j ∣ x ( i ) ; θ ) \begin{aligned} w_j^{(i)}:=Q_{i}(z^{(i)}=j):=p(z^{(i)}=j|x^{(i)};\theta) \end{aligned} wj(i):=Qi(z(i)=j):=p(z(i)=jx(i);θ)
  2. (M-step) set
    θ : = arg      max θ      ∑ i ∑ z ( i ) Q i ( z ( i ) ) l o g p ( x ( i ) , z ( i ) ; θ ) Q i ( z ( i ) ) \begin{aligned}\theta:=\text{arg}\;\;\underset{\theta}{\text{max}}\;\;\underset{i}{\sum}\underset{z^{(i)}}{\sum}Q_{i}(z^{(i)})log\frac{p(x^{(i)},z^{(i)};\theta)}{Q_{i}(z^{(i)})} \end{aligned} θ:=argθmaxiz(i)Qi(z(i))logQi(z(i))p(x(i),z(i);θ)

使用NBMM中的 ϕ z , ϕ j ∣ z ( i ) = 1 , ϕ j ∣ z ( i ) = 0 \phi_{z},\phi_{j|z^{(i)}=1},\phi_{j|z^{(i)}=0} ϕz,ϕjz(i)=1,ϕjz(i)=0参数替换一般化EM算法中的 θ \theta θ参数,然后在依次解决 w j ( i ) w_j^{(i)} wj(i) ϕ z , ϕ j ∣ z ( i ) = 1 , ϕ j ∣ z ( i ) = 0 \phi_{z},\phi_{j|z^{(i)}=1},\phi_{j|z^{(i)}=0} ϕz,ϕjz(i)=1,ϕjz(i)=0参数的更新问题就好。

由于在上一篇博文Andrew Ng机器学习课程笔记(十四)之无监督学习之混合高斯模型与EM算法中我们已经详细推导过一遍EM算法在GMM中的具体化过程(在NBMM中推导过程相似,就不在重复了),这里我们直接给出NBMM的EM算法。

EM Algorithm for NBMM

将其具体应用到NBMM中就变成了:

EM Algorithm for NBMM的E-step与M-step为:

Repeat until convergence{

  1. (E-step) for each i, set
    w j ( i ) = P ( z ( i ) = 1 ∣ x ( i ) ; ϕ z , ϕ j ∣ z ( i ) = 1 , ϕ j ∣ z ( i ) = 0 ) \begin{aligned} w_j^{(i)}=P(z^{(i)}=1|x^{(i)};\phi_{z},\phi_{j|z^{(i)}=1},\phi_{j|z^{(i)}=0}) \end{aligned} wj(i)=P(z(i)=1x(i);ϕz,ϕjz(i)=1,ϕjz(i)=0)
    表示的含义为这是对当前文档属于哪一类的猜测。

  2. (M-step) set

ϕ j ∣ z ( i ) = 1 : = ∑ i = 1 m w ( i )    1 { x j ( i ) = 1 } ∑ i = 1 m w ( i ) \begin{aligned} \phi_{j|z^{(i)}=1}:=\frac{\sum_{i=1}^{m}w^{(i)}\;1\{x_j^{(i)}=1\}}{\sum_{i=1}^{m}w^{(i)}} \end{aligned} ϕjz(i)=1:=i=1mw(i)i=1mw(i)1{xj(i)=1}

ϕ j ∣ z ( i ) = 0 : = ∑ i = 1 m ( 1 − w ( i ) )    1 { x j ( i ) = 1 } ∑ i = 1 m ( 1 − w ( i ) ) \begin{aligned} \phi_{j|z^{(i)}=0}:=\frac{\sum_{i=1}^{m}(1-w^{(i)})\;1\{x_j^{(i)}=1\}}{\sum_{i=1}^{m}(1-w^{(i)})} \end{aligned} ϕjz(i)=0:=i=1m(1w(i))i=1m(1w(i))1{xj(i)=1}

ϕ z ( i ) : = ∑ i = 1 m w ( i ) m \begin{aligned} \phi_{z^{(i)}}:=\frac{\sum_{i=1}^{m}w^{(i)}}{m} \end{aligned} ϕz(i):=mi=1mw(i)

你可能感兴趣的:(机器学习,人工智能,机器学习——基础篇)