数据学习(3)·生成学习算法

作者课堂笔记整理,[email protected]

Preview

  • 判别和生成模型
  • 高斯判别分析
  • 朴素贝叶斯

两种学习方法

分类输入的数据x,成两个类别 y ∈ { 0 , 1 } y\in\{0,1\} y{0,1}
数据学习(3)·生成学习算法_第1张图片

判别学习算法

该算法学习条件概率 p ( y ∣ x ) p(y|x) p(yx)或者直接学习函数映射。
举例:线性回归,Logistic回归,K近邻…

生成学习算法

该学习算法学习联合概率 p ( x , y ) p(x,y) p(x,y)

  • 生成算法学习 p ( x ∣ y ) , p ( y ) p(x|y),p(y) p(xy),p(y)
  • p ( y ) p(y) p(y)称为先验概率。
  • 使用Bayes规则,转变 p ( y ∣ x ) p(y|x) p(yx)

贝叶斯法则

p ( y ∣ x ) = p ( x ∣ y ) p ( y ) p ( x ) p(y|x)=\frac{p(x|y)p(y)}{p(x)} p(yx)=p(x)p(xy)p(y)
a r g m a x y p ( y ∣ x ) = a r g m a x y p ( x ∣ y ) p ( y ) p ( x ) = a r g m a x y p ( x ∣ y ) p ( y ) argmax_yp(y|x)=argmax_y\frac{p(x|y)p(y)}{p(x)}=argmax_yp(x|y)p(y) argmaxyp(yx)=argmaxyp(x)p(xy)p(y)=argmaxyp(xy)p(y)
没有必要计算p(x).

1 生成模型

生成分类算法:

  • 连续输入:高斯判别分析
  • 离散输入:朴素贝叶斯

1.1 多元高斯分布 N ( μ , Σ ) N(\mu,\Sigma) N(μ,Σ)

  • μ ∈ R n \mu\in R^n μRn是平均算法
  • Σ ∈ R n × n \Sigma\in R^{n\times n} ΣRn×n是协方差矩阵, Σ \Sigma Σ是对称的和SPD。
    p ( x ; μ , Σ ) = 1 ( 2 π ) n / 2 ∣ Σ ∣ 1 / 2 e 1 2 ( x − μ ) T Σ − 1 ( x − μ ) p(x;\mu,\Sigma)=\frac{1}{(2\pi)^{n/2}|\Sigma|^{1/2}}e^{\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)} p(x;μ,Σ)=(2π)n/2Σ1/21e21(xμ)TΣ1(xμ)数据学习(3)·生成学习算法_第2张图片
    E [ X ] = μ , C o v ( x ) = E [ ( X − E ( X ) ) ( X − E [ X ] ) T ] = Σ E[X]=\mu,Cov(x)=E[(X-E(X))(X-E[X])^T]=\Sigma E[X]=μ,Cov(x)=E[(XE(X))(XE[X])T]=Σ

1.2 高斯判别分析(GDA)

给定参数 ϕ , μ 0 , μ 1 , Σ \phi,\mu_0,\mu_1,\Sigma ϕ,μ0,μ1,Σ
y ∼ B e r n o u l l i ( ϕ ) x ∣ y = 0 ∼ N ( μ 0 , Σ ) x ∣ y = 1 ∼ N ( μ 1 , Σ ) y\sim Bernoulli(\phi)\quad x|y=0\sim N(\mu_0,\Sigma)\quad x|y=1\sim N(\mu_1,\Sigma) yBernoulli(ϕ)xy=0N(μ0,Σ)xy=1N(μ1,Σ)
p ( y ) = ϕ y ( 1 − ϕ ) 1 − y p(y)=\phi^y(1-\phi)^{1-y} p(y)=ϕy(1ϕ)1y
p ( x ∣ y = 0 ) = 1 ( 2 π ) n / 2 ∣ Σ ∣ 1 / 2 e 1 2 ( x − μ 0 ) T Σ − 1 ( x − μ 0 ) p(x|y=0)=\frac{1}{(2\pi)^{n/2}|\Sigma|^{1/2}}e^{\frac{1}{2}(x-\mu_0)^T\Sigma^{-1}(x-\mu_0)} p(xy=0)=(2π)n/2Σ1/21e21(xμ0)TΣ1(xμ0)
p ( x ∣ y = 1 ) = 1 ( 2 π ) n / 2 ∣ Σ ∣ 1 / 2 e 1 2 ( x − μ 1 ) T Σ − 1 ( x − μ 1 ) p(x|y=1)=\frac{1}{(2\pi)^{n/2}|\Sigma|^{1/2}}e^{\frac{1}{2}(x-\mu_1)^T\Sigma^{-1}(x-\mu_1)} p(xy=1)=(2π)n/2Σ1/21e21(xμ1)TΣ1(xμ1)
Log 数据似然函数:
I ( ϕ , μ 0 , μ 1 , Σ ) = l o g ∏ i = 1 m p ( x ( i ) , y ( i ) ; ϕ , μ 0 , μ 1 , Σ ) = l o g ∏ i = 1 m p ( x ( i ) ∣ y ( i ) ) ; μ 0 , μ 1 , Σ ) p ( y ( i ) ; ϕ ) I(\phi,\mu_0,\mu_1,\Sigma)=log\prod_{i=1}^{m}p(x^{(i)},y^{(i)};\phi,\mu_0,\mu_1,\Sigma)=log\prod_{i=1}^{m}p(x^{(i)}|y^{(i)});\mu_0,\mu_1,\Sigma)p(y^{(i)};\phi) I(ϕ,μ0,μ1,Σ)=logi=1mp(x(i),y(i);ϕ,μ0,μ1,Σ)=logi=1mp(x(i)y(i));μ0,μ1,Σ)p(y(i);ϕ)
最大似然函数估计:

l ( ϕ , μ 0 , μ 1 , Σ ) = l o g ∏ i = 1 m p ( x ( i ) , y ( i ) ) = l o g ∏ i = 1 m p ( x ( i ) ∣ y ( i ) ) p ( y ( i ) ) = ∑ i = 1 m l o g    p ( x ( i ) ∣ y ( i ) ) + ∑ i = 1 m l o g    p ( y ( i ) ) = ∑ i = 1 m l o g    (    p ( x ( i ) ∣ y ( i ) = 0 ) 1 − y ( i ) ∗ p ( x ( i ) ∣ y ( i ) = 1 ) y ( i )    ) + ∑ i = 1 m l o g    p ( y ( i ) ) = ∑ i = 1 m ( 1 − y ( i ) ) l o g    p ( x ( i ) ∣ y ( i ) = 0 ) + ∑ i = 1 m y ( i ) l o g    p ( x ( i ) ∣ y ( i ) = 1 ) + ∑ i = 1 m l o g    p ( y ( i ) ) \begin{aligned} l(\phi,\mu_0,\mu_1,\Sigma) &=log{\prod_{i=1}^m{p(x^{(i)},y^{(i)})}}=log{\prod_{i=1}^m{p(x^{(i)}|y^{(i)})p(y^{(i)})}} \\ &=\sum_{i=1}^m{log\;p(x^{(i)}|y^{(i)})}+\sum_{i=1}^m{log\;p(y^{(i)})} \\ &=\sum_{i=1}^m{log\;(\;p(x^{(i)}|y^{(i)}=0)^{1-y^{(i)}}*p(x^{(i)}|y^{(i)}=1)^{y^{(i)}}\;)}+\sum_{i=1}^m{log\;p(y^{(i)})} \\ &=\sum_{i=1}^m{(1-y^{(i)})log\;p(x^{(i)}|y^{(i)}=0)}+\sum_{i=1}^m{{y^{(i)}}log\;p(x^{(i)}|y^{(i)}=1)}+\sum_{i=1}^m{log\;p(y^{(i)})} \end{aligned} l(ϕ,μ0,μ1,Σ)=logi=1mp(x(i),y(i))=logi=1mp(x(i)y(i))p(y(i))=i=1mlogp(x(i)y(i))+i=1mlogp(y(i))=i=1mlog(p(x(i)y(i)=0)1y(i)p(x(i)y(i)=1)y(i))+i=1mlogp(y(i))=i=1m(1y(i))logp(x(i)y(i)=0)+i=1my(i)logp(x(i)y(i)=1)+i=1mlogp(y(i))
ϕ \phi ϕ求导:
∂    l ( ϕ , μ 0 , μ 1 , Σ ) ∂ ϕ = ∑ i = 1 m l o g    p ( y ( i ) ) ∂ ϕ = ∂ ∑ i = 1 m l o g    ϕ y ( i ) ( 1 − ϕ ) 1 − y ( i ) ) ∂ ϕ = ∂ ∑ i = 1 m y ( i )    l o g    ϕ + ( 1 − y ( i ) ) l o g ( 1 − ϕ ) ∂ ϕ = ∑ i = 1 m ( y ( i ) 1 ϕ − ( 1 − y ( i ) ) 1 1 − ϕ ) = ∑ i = 1 m ( I ( y ( i ) = 1 ) 1 ϕ − I ( y ( i ) = 0 ) 1 1 − ϕ ) \begin{aligned} \frac{\partial\;l(\phi,\mu_0,\mu_1,\Sigma)}{\partial\phi}&=\frac{\sum_{i=1}^m{log\;p(y^{(i)})}}{\partial\phi} \\&= \frac{\partial\sum_{i=1}^m{log\;\phi^{y^{(i)}}(1-\phi)^{1-y^{(i)}})}}{\partial\phi} \\&=\frac{\partial\sum_{i=1}^m{y^{(i)}\;log\;\phi+(1-y^{(i)})log(1-\phi)}}{\partial\phi} \\&=\sum_{i=1}^m{(y^{(i)}\frac{1}{\phi}-(1-y^{(i)})\frac{1}{1-\phi})} \\&=\sum_{i=1}^m{(I(y^{(i)}=1)\frac{1}{\phi}-I(y^{(i)}=0)\frac{1}{1-\phi})} \end{aligned} ϕl(ϕ,μ0,μ1,Σ)=ϕi=1mlogp(y(i))=ϕi=1mlogϕy(i)(1ϕ)1y(i))=ϕi=1my(i)logϕ+(1y(i))log(1ϕ)=i=1m(y(i)ϕ1(1y(i))1ϕ1)=i=1m(I(y(i)=1)ϕ1I(y(i)=0)1ϕ1)
上式等于0,可得:
ϕ = I ( y ( i ) = 1 ) I ( y ( i ) = 0 ) + I ( y ( i ) = 1 ) = ∑ i = 1 m I ( y ( i ) = 1 ) m \begin{aligned} \phi=\frac{I(y^{(i)}=1)}{I(y^{(i)}=0)+I(y^{(i)}=1)}=\frac{\sum_{i=1}^mI(y^{(i)}=1)}{m}\end{aligned} ϕ=I(y(i)=0)+I(y(i)=1)I(y(i)=1)=mi=1mI(y(i)=1)
μ 0 \mu_0 μ0求导:
∂    l ( ϕ , μ 0 , μ 1 , Σ ) ∂ μ 0 = ∂ ∑ i = 1 m ( 1 − y ( i ) ) l o g    p ( x ( i ) ∣ y ( i ) = 0 ) ∂ μ 0 = ∂ ∑ i = 1 m ( 1 − y ( i ) ) ( l o g 1 ( 2 π ) n ∣ Σ ∣ − 1 2 ( x ( i ) − μ 0 ) T Σ − 1 ( x ( i ) − μ 0 ) ) ∂ μ 0 = ∑ i = 1 m ( 1 − y ( i ) ) ( Σ − 1 ( x ( i ) − μ 0 ) ) = ∑ i = 1 m I ( y ( i ) = 0 ) Σ − 1 ( x ( i ) − μ 0 ) \begin{aligned} \frac{\partial\;l(\phi,\mu_0,\mu_1,\Sigma)}{\partial\mu_0}&=\frac{\partial\sum_{i=1}^m{(1-y^{(i)})log\;p(x^{(i)}|y^{(i)}=0)}}{\partial\mu_0} \\&=\frac{\partial\sum_{i=1}^m{(1-y^{(i)})(log\frac{1}{\sqrt{(2\pi)^n|\Sigma|}}-\frac{1}{2}(x^{(i)}-\mu_0)^T\Sigma^{-1}(x^{(i)}-\mu_0))}}{\partial\mu_0} \\&=\sum_{i=1}^m{(1-y^{(i)})(\Sigma^{-1}(x^{(i)}-\mu_0))} \\&=\sum_{i=1}^m{I(y^{(i)}=0)\Sigma^{-1}(x^{(i)}-\mu_0)} \end{aligned} μ0l(ϕ,μ0,μ1,Σ)=μ0i=1m(1y(i))logp(x(i)y(i)=0)=μ0i=1m(1y(i))(log(2π)nΣ 121(x(i)μ0)TΣ1(x(i)μ0))=i=1m(1y(i))(Σ1(x(i)μ0))=i=1mI(y(i)=0)Σ1(x(i)μ0)
上式等于0:
μ 0 = ∑ i = 1 m I ( y ( i ) = 0 ) x ( i ) ∑ i = 1 m I ( y ( i ) = 0 ) \begin{aligned} \mu_0=\frac{\sum_{i=1}^m{I(y^{(i)}=0)x^{(i)}}}{\sum_{i=1}^m{I(y^{(i)}=0)}} \end{aligned} μ0=i=1mI(y(i)=0)i=1mI(y(i)=0)x(i)
μ 1 = ∑ i = 1 m I ( y ( i ) = 1 ) x ( i ) ∑ i = 1 m I ( y ( i ) = 1 ) \begin{aligned} \mu_1=\frac{\sum_{i=1}^m{I(y^{(i)}=1)x^{(i)}}}{\sum_{i=1}^m{I(y^{(i)}=1)}} \end{aligned} μ1=i=1mI(y(i)=1)i=1mI(y(i)=1)x(i)
∑ i = 1 m ( 1 − y ( i ) ) l o g    p ( x ( i ) ∣ y ( i ) = 0 ) + ∑ i = 1 m y ( i ) l o g    p ( x ( i ) ∣ y ( i ) = 1 ) = ∑ i = 1 m ( 1 − y ( i ) ) ( l o g 1 ( 2 π ) n ∣ Σ ∣ − 1 2 ( x ( i ) − μ 0 ) T Σ − 1 ( x ( i ) − μ 0 ) ) + ∑ i = 1 m y ( i ) ( l o g 1 ( 2 π ) n ∣ Σ ∣ − 1 2 ( x ( i ) − μ 1 ) T Σ − 1 ( x ( i ) − μ 1 ) ) = ∑ i = 1 m l o g 1 ( 2 π ) n ∣ Σ ∣ − 1 2 ∑ i = 1 m ( x ( i ) − μ y ( i ) ) T Σ − 1 ( x ( i ) − μ y ( i ) ) = ∑ i = 1 m ( − n 2 l o g ( 2 π ) − 1 2 l o g ( ∣ Σ ∣ ) ) − 1 2 ∑ i = 1 m ( x ( i ) − μ y ( i ) ) T Σ − 1 ( x ( i ) − μ y ( i ) ) \begin{aligned} &\sum_{i=1}^m{(1-y^{(i)})log\;p(x^{(i)}|y^{(i)}=0)}+\sum_{i=1}^m{{y^{(i)}}log\;p(x^{(i)}|y^{(i)}=1)}\\&=\sum_{i=1}^m{(1-y^{(i)})(log\frac{1}{\sqrt{(2\pi)^n|\Sigma|}}-\frac{1}{2}(x^{(i)}-\mu_0)^T\Sigma^{-1}(x^{(i)}-\mu_0))}+\sum_{i=1}^m{{y^{(i)}}(log\frac{1}{\sqrt{(2\pi)^n|\Sigma|}}-\frac{1}{2}(x^{(i)}-\mu_1)^T\Sigma^{-1}(x^{(i)}-\mu_1))}\\&=\sum_{i=1}^m{log\frac{1}{\sqrt{(2\pi)^n|\Sigma|}}}-\frac{1}{2}\sum_{i=1}^m{(x^{(i)}-\mu_{y^{(i)}})^T\Sigma^{-1}(x^{(i)}-\mu_{y^{(i)}})}\\&=\sum_{i=1}^m{(-\frac{n}{2}log(2\pi)-\frac{1}{2}log(|\Sigma|))}-\frac{1}{2}\sum_{i=1}^m{(x^{(i)}-\mu_{y^{(i)}})^T\Sigma^{-1}(x^{(i)}-\mu_{y^{(i)}})} \end{aligned} i=1m(1y(i))logp(x(i)y(i)=0)+i=1my(i)logp(x(i)y(i)=1)=i=1m(1y(i))(log(2π)nΣ 121(x(i)μ0)TΣ1(x(i)μ0))+i=1my(i)(log(2π)nΣ 121(x(i)μ1)TΣ1(x(i)μ1))=i=1mlog(2π)nΣ 121i=1m(x(i)μy(i))TΣ1(x(i)μy(i))=i=1m(2nlog(2π)21log(Σ))21i=1m(x(i)μy(i))TΣ1(x(i)μy(i))
进而对 Σ \Sigma Σ求导:
∂    l ( ϕ , μ 0 , μ 1 , Σ ) ) ∂ Σ = − 1 2 ∑ i = 1 m ( 1 ∣ Σ ∣ ∣ Σ ∣ Σ − 1 ) − 1 2 ∑ i = 1 m ( x ( i ) − μ y ( i ) ) ( x ( i ) − μ y ( i ) ) T ∂ Σ − 1 ∂ Σ = − m 2 Σ − 1 − 1 2 ∑ i = 1 m ( x ( i ) − μ y ( i ) ) ( x ( i ) − μ y ( i ) ) T ( − Σ − 2 ) ) \begin{aligned} \frac{\partial\;l(\phi,\mu_0,\mu_1,\Sigma))}{\partial\Sigma}&=-\frac{1}{2}\sum_{i=1}^m(\frac{1}{|\Sigma|}|\Sigma|\Sigma^{-1})-\frac{1}{2}\sum_{i=1}^m(x^{(i)}-\mu_{y^{(i)}})(x^{(i)}-\mu_{y^{(i)}})^T\frac{\partial\Sigma^{-1}}{\partial\Sigma}\\&=-\frac{m}{2}\Sigma^{-1}-\frac{1}{2}\sum_{i=1}^m(x^{(i)}-\mu_{y^{(i)}})(x^{(i)}-\mu_{y^{(i)}})^T(-\Sigma^{-2})) \end{aligned} Σl(ϕ,μ0,μ1,Σ))=21i=1m(Σ1ΣΣ1)21i=1m(x(i)μy(i))(x(i)μy(i))TΣΣ1=2mΣ121i=1m(x(i)μy(i))(x(i)μy(i))T(Σ2))
∂ ∣ Σ ∣ ∂ Σ = ∣ Σ ∣ Σ − 1 \begin{aligned} \frac{\partial|\Sigma|}{\partial\Sigma}=|\Sigma|\Sigma^{-1}\end{aligned} ΣΣ=ΣΣ1
∂ Σ − 1 ∂ Σ = − Σ − 2 \begin{aligned} \frac{\partial\Sigma^{-1}}{\partial\Sigma}=-\Sigma^{-2}\end{aligned} ΣΣ1=Σ2
上式为0求得:
Σ = 1 m ∑ i = 1 m ( x ( i ) − μ y ( i ) ) ( x ( i ) − μ y ( i ) ) T \begin{aligned} \Sigma=\frac{1}{m}\sum_{i=1}^m(x^{(i)}-\mu_{y^{(i)}})(x^{(i)}-\mu_{y^{(i)}})^T\end{aligned} Σ=m1i=1m(x(i)μy(i))(x(i)μy(i))T


1.3 高斯判别模型和Logistic回归

p ( y = 1 ∣ x ; μ 0 , μ 1 , Σ ) p(y=1|x;\mu_0,\mu_1,\Sigma) p(y=1x;μ0,μ1,Σ)可以被写成:
p ( y = 1 ∣ x ; ϕ , Σ , μ 0 , μ 1 ) = 1 1 + e − θ T x p(y=1|x;\phi,\Sigma,\mu_0,\mu_1)=\frac{1}{1+e^{-\theta^Tx}} p(y=1x;ϕ,Σ,μ0,μ1)=1+eθTx1
θ = [ l o g 1 − ϕ ϕ − 1 2 ( μ 0 T Σ − 1 μ 0 − μ 1 T Σ − 1 μ 1 ) Σ − 1 ( μ 0 − μ 1 ) ] \theta=\begin{bmatrix}log\frac{1-\phi}{\phi}-\frac{1}{2}(\mu_0^T\Sigma^{-1}\mu_0-\mu_1^T\Sigma^{-1}\mu_1)\\\Sigma^{-1}(\mu_0-\mu_1)\end{bmatrix} θ=[logϕ1ϕ21(μ0TΣ1μ0μ1TΣ1μ1)Σ1(μ0μ1)]


GDA:

  • 最大化联合似然函数: ∏ i = 1 m p ( x ( i ) , y ( i ) ) \prod_{i=1}^mp(x^{(i)},y^{(i)}) i=1mp(x(i),y(i))
  • 模型假设: x ∣ y = b ∼ N ( μ b , Σ ) , y ∼ B e r n o u l l i ( ϕ ) x|y=b\sim N(\mu_b,\Sigma),y\sim Bernoulli(\phi) xy=bN(μb,Σ),yBernoulli(ϕ)
  • 当假设正确时,GDA渐进有效,数据有效。

Logistic:

  • 最大化条件概率 ∏ i = 1 m p ( y ( i ) ∣ x ( i ) ) \prod_{i=1}^mp(y^{(i)}|x^{(i)}) i=1mp(y(i)x(i))
  • 模型假设: p ( y ∣ x ) p(y|x) p(yx)是Logistic函数。
  • 对不太正确的建模假设更加稳健和不太敏感。

1.4 朴素贝叶斯

是一种针对离散输入的一个简单的生成学习算法。
实战:垃圾邮件分类

项目源代码:
给出每封信x,判断是否属于垃圾邮件(y=0或者y=1)
每封信的词由一个字典维度大小的向量代表, x i = 1 x_i=1 xi=1表示第i个词在这封信中,反之则不在。

朴素贝叶斯模型

朴素贝叶斯假设:
p ( x 1 , x 2 , . . . x n ∣ y ) = ∏ i = 1 n p ( x i ∣ y ) p(x_1,x_2,...x_n|y)=\prod_{i=1}^np(x_i|y) p(x1,x2,...xny)=i=1np(xiy)
参数学习
多变量伯努利事件模型:
p ( x , y ) = p ( y ) ∏ i = 1 n p ( x i ∣ y ) p(x,y)=p(y)\prod_{i=1}^np(x_i|y) p(x,y)=p(y)i=1np(xiy)

  • 假设垃圾邮件和非垃圾邮件是随机的, p ( y ) = ϕ y p(y)=\phi_y p(y)=ϕy
  • 给出y,每封信中的单词可以表示成 p ( x i = 1 ∣ y ) = ϕ i ∣ y p(x_i=1|y)=\phi_{i|y} p(xi=1y)=ϕiy

最大似然:
L = ∏ i = 1 n p ( x ( i ) , y ( i ) ) L=\prod_{i=1}^np(x^{(i)},y^{(i)}) L=i=1np(x(i),y(i))
由此可以求出最大似然估计的各个参数值。
数据学习(3)·生成学习算法_第3张图片进行预测
给一个新的样本,计算 p ( y = 1 ∣ x ) p(y=1|x) p(y=1x):
p ( y = 1 ∣ x ) = p ( x ∣ y = 1 ) p ( y = 1 ) p ( x ) = p ( x ∣ y = 1 ) p ( y = 1 ) p ( x ∣ y = 1 ) p ( y = 1 ) + p ( x ∣ y = 0 ) p ( y = 0 ) p(y=1|x)=\frac{p(x|y=1)p(y=1)}{p(x)}=\frac{p(x|y=1)p(y=1)}{p(x|y=1)p(y=1)+p(x|y=0)p(y=0)} p(y=1x)=p(x)p(xy=1)p(y=1)=p(xy=1)p(y=1)+p(xy=0)p(y=0)p(xy=1)p(y=1)
. . . = ∏ i = 1 n p ( x i ∣ y = 1 ) p ( y = 1 ) ∏ i = 1 n p ( x i ∣ y = 1 ) p ( y = 1 ) + ∏ i = 1 n p ( x i ∣ y = 0 ) p ( y = 0 ) ...=\frac{\prod_{i=1}^np(x_i|y=1)p(y=1)}{\prod_{i=1}^np(x_i|y=1)p(y=1)+\prod_{i=1}^np(x_i|y=0)p(y=0)} ...=i=1np(xiy=1)p(y=1)+i=1np(xiy=0)p(y=0)i=1np(xiy=1)p(y=1)
如果 p ( y = 1 ∣ x ) > 0.5 , y = 1 p(y=1|x)>0.5,y=1 p(y=1x)>0.5,y=1.


拉普拉斯平滑

  • 如果一个新的单词没有出现在训练集中,所以导致 ϕ i ∣ y = 1 = ϕ i ∣ y = 0 \phi_{i|y=1}=\phi_{i|y=0} ϕiy=1=ϕiy=0,导致不能计算 p ( y = 1 ∣ x ) = 0 0 p(y=1|x)=\frac{0}{0} p(y=1x)=00
    定义符号:
    n 1 n_1 n1 :在所有垃圾邮件中单词 x 1 x_1 x1出现的次数。如果 x 1 x_1 x1 没有出现过,则 n 1 = 0 n_1 = 0 n1=0
    n:属于 c 1 c_1 c1类的所有文档的出现过的单词总数目。
    得到公式 p ( x 1 ∣ c 1 ) = n 1 / n p(x_1|c_1)= n_1 /n p(x1c1)=n1/n
    而拉普拉斯平滑就是将上式修改为:
    p ( x 1 ∣ c 1 ) = ( n 1 + 1 ) / ( n + N ) p(x_1|c_1)= (n_1 + 1) / (n + N) p(x1c1)=(n1+1)/(n+N)
    p ( x 2 ∣ c 1 ) = ( n 2 + 1 ) / ( n + N ) p(x_2|c_1)= (n_2 + 1) / (n + N) p(x2c1)=(n2+1)/(n+N)
    其中,N是所有单词的数目。修正分母是为了保证概率和为1。

2 练习二次判别分析

源代码:https://github.com/Miraclemin/Quadratic-Discriminant-Analysis
数据学习(3)·生成学习算法_第4张图片

你可能感兴趣的:(机器学习)