样本集如下:
Day |
Outlook |
Temperature |
Humidity |
Wind |
PlayTennis |
D1 |
Sunny |
Hot |
High |
Weak |
No |
D2 |
Sunny |
Hot |
High |
Strong |
No |
D3 |
Overcast |
Hot |
High |
Weak |
Yes |
D4 |
Rain |
Mild |
High |
Weak |
Yes |
D5 |
Rain |
Cool |
Normal |
Weak |
Yes |
D6 |
Rain |
Cool |
Normal |
Strong |
No |
D7 |
Overcast |
Cool |
Normal |
Strong |
Yes |
D8 |
Sunny |
Mild |
High |
Weak |
No |
D9 |
Sunny |
Cool |
Normal |
Weak |
Yes |
D10 |
Rain |
Mild |
Normal |
Weak |
Yes |
D11 |
Sunny |
Mild |
Normal |
Strong |
Yes |
D12 |
Overcast |
Mild |
High |
Strong |
Yes |
D13 |
Overcast |
Hot |
Normal |
Weak |
Yes |
D14 |
Rain |
Mild |
High |
Strong |
No |
可以看到样本数据集提供了14个训练样本,我们将使用此表的数据,并结合朴素贝叶斯分类器来分类下面的新实例:
x = (Outlook=Sunny,Temperature=Cool,Humidity=High,Wind=Strong)
在这个例子中,属性向量X=(Outlook,Temperature,Humidity,Wind),类集合Y={Yes,No},我们需要利用训练数据计算后验概率P(Yes|x)和P(No|x),如果P(Yes|x)>P(No|x),那么新实例分类为Yes,否则为No.
为了计算后验概率,我们需要计算先验概率P(Yes)和P(No)和类条件概率P(xi|Y).
先验概率计算如下:
因为有9个样本属于Yes,5个样本属于No,所以P(Yes)=9/14,P(No)=5/14.
类条件概率计算如下:
P(Outlook=Sunny|Yes)=2/9; P(Outlook=Sunny|No)=3/5;
P(Temperature=Cool|Yes)=3/9; P(Temperature=Cool|No)=1/5;
P(Humidity=High|Yes)=3/9; P(Humidity=High|No)=4/5;
P(Wind=Strong|Yes)=3/9; P(Wind=Strong|No)=3/5;
后验概率计算如下:
P(Yes|x) = P(Outlook=Sunny|Yes)×P(Temperature=Cool|Yes)×P(Humidity=High|Yes)
×P(Wind=Strong|Yes)×P(Yes) = 2/9×3/9×3/9×3/9×9/14=2/243=9/1701≈0.00529
P(No|x) = P(Outlook=Sunny|No)×P(Temperature=Cool|No)×P(Humidity=High|No)
×P(Wind=Strong|No)×P(No) = 3/5×1/5×4/5×3/5×5/14=18/875≈0.02057
通过计算得出P(No|x)>P(Yes|x),所以该样本分类为No