Bayes分类器案例 学习笔记

样本集如下:
Day Outlook Temperature Humidity Wind PlayTennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No


可以看到样本数据集提供了14个训练样本,我们将使用此表的数据,并结合朴素贝叶斯分类器来分类下面的新实例:
x = (Outlook=Sunny,Temperature=Cool,Humidity=High,Wind=Strong)

在这个例子中,属性向量X=(Outlook,Temperature,Humidity,Wind),类集合Y={Yes,No},我们需要利用训练数据计算后验概率P(Yes|x)和P(No|x),如果P(Yes|x)>P(No|x),那么新实例分类为Yes,否则为No.

为了计算后验概率,我们需要计算先验概率P(Yes)和P(No)和类条件概率P(xi|Y).

先验概率计算如下:
因为有9个样本属于Yes,5个样本属于No,所以P(Yes)=9/14,P(No)=5/14.

类条件概率计算如下:
P(Outlook=Sunny|Yes)=2/9; P(Outlook=Sunny|No)=3/5;
P(Temperature=Cool|Yes)=3/9; P(Temperature=Cool|No)=1/5;
P(Humidity=High|Yes)=3/9; P(Humidity=High|No)=4/5;
P(Wind=Strong|Yes)=3/9; P(Wind=Strong|No)=3/5;


后验概率计算如下:
P(Yes|x) = P(Outlook=Sunny|Yes)×P(Temperature=Cool|Yes)×P(Humidity=High|Yes)
×P(Wind=Strong|Yes)×P(Yes) = 2/9×3/9×3/9×3/9×9/14=2/243=9/1701≈0.00529

P(No|x) = P(Outlook=Sunny|No)×P(Temperature=Cool|No)×P(Humidity=High|No)
×P(Wind=Strong|No)×P(No) = 3/5×1/5×4/5×3/5×5/14=18/875≈0.02057

通过计算得出P(No|x)>P(Yes|x),所以该样本分类为No

你可能感兴趣的:(Mahout)