Naive Bayes 朴素贝叶斯

Naive Bayesian 朴素贝叶斯

内容来自于CS229,浙江大学机器学习听课笔记以及百度百科,blog补充


We would use an example to show this algorithm, this examples is still using in practice now.

Spam email classifier
X X X is 1/0 vector correspond to dictionary and each dimension represents a word. If a word shows up in a email, then its corresponding value equals to 1.
We assume that X i X_i Xi are independently and identically distribution(IID), although it is obvious that they are not for the meaning of email.
We choose top 10,000 common used words as the dictionary.
P ( x 1 . . . x 10000 ∣ y ) = P ( x 1 ∣ y ) P ( x 2 ∣ x 1 , y ) . . . P ( x 10000 ∣ x 9999 , x 9998 . . . x 1 , y ) P(x_1...x_{10000}|y) = P(x_1|y)P(x_2|x_1,y)...P(x_{10000}|x_{9999},x_{9998}...x_1,y) P(x1...x10000y)=P(x1y)P(x2x1,y)...P(x10000x9999,x9998...x1,y)
P ( x 1 . . . x 10000 ∣ y ) = ∏ i = 1 10000 P ( x i ∣ y ) P(x_1...x_{10000}|y) = \prod\limits_{i=1}^{10000}P(x_i|y) P(x1...x10000y)=i=110000P(xiy)
Parameters:
ϕ j ∣ y = 1 = P ( x j = 1 ∣ y = 1 ) \phi_{j|y=1} = P(x_j=1|y=1) ϕjy=1=P(xj=1y=1)
ϕ j ∣ y = 0 = P ( x j = 1 ∣ y = 0 ) \phi_{j|y=0} = P(x_j=1|y=0) ϕjy=0=P(xj=1y=0)
ϕ y = P ( y = 1 ) \phi_y = P(y=1) ϕy=P(y=1)
y = 1 y=1 y=1 means the email is spam email.

Joint likelihood:
L ( ϕ y , ϕ j ∣ y ) = ∏ i = 1 m ( x ( i ) , y ( i ) ; ϕ y , ϕ j ∣ y ) L(\phi_y,\phi_{j|y}) = \prod\limits_{i=1}^m(x^{(i)},y^{(i)};\phi_y,\phi_{j|y}) L(ϕy,ϕjy)=i=1m(x(i),y(i);ϕy,ϕjy)
MLE:
ϕ y = ∑ i = 1 m 1 { y ( i ) = 1 } m \phi_y=\frac{\sum\limits_{i=1}^m1\{y^{(i)}=1\}}{m} ϕy=mi=1m1{y(i)=1}
ϕ j ∣ y = 1 = ∑ i = 1 m 1 { x ( i ) = 1 , y ( i ) = 1 } ∑ i = 1 m 1 { y ( i ) = 1 } \phi_{j|y=1} = \frac{\sum\limits_{i=1}^m1\{x^{(i)}=1,y^{(i)}=1\}}{\sum\limits_{i=1}^m1\{y^{(i)}=1\}} ϕjy=1=i=1m1{y(i)=1}i=1m1{x(i)=1,y(i)=1}
ϕ j ∣ y = 0 = ∑ i = 1 m 1 { x ( i ) = 1 , y ( i ) = 0 } ∑ i = 1 m 1 { y ( i ) = 0 } \phi_{j|y=0} = \frac{\sum\limits_{i=1}^m1\{x^{(i)}=1,y^{(i)}=0\}}{\sum\limits_{i=1}^m1\{y^{(i)}=0\}} ϕjy=0=i=1m1{y(i)=0}i=1m1{x(i)=1,y(i)=0}

Laplace moving
If a new word(not in the top 10000 common words) shows up in a email, ∑ i = 1 m 1 { x ( i ) = 1 , y ( i ) = 1 } = 0 \sum\limits_{i=1}^m1\{x^{(i)}=1,y^{(i)}=1\}=0 i=1m1{x(i)=1,y(i)=1}=0, then ϕ j ∣ j = 0 = 0 \phi_{j|j=0}=0 ϕjj=0=0. This seems not robust, so we use Laplace moving to optimize this equation.
ϕ j ∣ y = 1 = ∑ i = 1 m 1 { x ( i ) = 1 , y ( i ) = 1 } + 1 ∑ i = 1 m 1 { y ( i ) = 1 } + 10000 \phi_{j|y=1} = \frac{\sum\limits_{i=1}^m1\{x^{(i)}=1,y^{(i)}=1\}+1}{\sum\limits_{i=1}^m1\{y^{(i)}=1\}+10000} ϕjy=1=i=1m1{y(i)=1}+10000i=1m1{x(i)=1,y(i)=1}+1
ϕ j ∣ y = 0 = ∑ i = 1 m 1 { x ( i ) = 1 , y ( i ) = 0 } + 1 ∑ i = 1 m 1 { y ( i ) = 0 } + 10000 \phi_{j|y=0} = \frac{\sum\limits_{i=1}^m1\{x^{(i)}=1,y^{(i)}=0\}+1}{\sum\limits_{i=1}^m1\{y^{(i)}=0\}+10000} ϕjy=0=i=1m1{y(i)=0}+10000i=1m1{x(i)=1,y(i)=0}+1

你可能感兴趣的:(Machine,Learning)