python实现之构建朴实贝叶斯分类器

给定一个训练集Training-data.txt,构建朴实贝叶斯分类器,并对如下测试用例作出推断:

X1 = (age <=30, Income = medium, Student = yes, Credit_rating = Fair)
X2 = (30<= age <40, Income = high, Student = no, Credit_rating = Fair)
X3 = (age > 40, Income = medium, Student = no, Credit_rating = Fair)

training-data.txt:

age income  student credit_rating   buys_computer   
<=30    high    no  fair    no  
<=30    high    no  excellent   no  
3140   high    no  fair    yes 
>40 medium  no  fair    yes 
>40 low yes fair    yes 
>40 low yes excellent   no  
3140   low yes excellent   yes 
<=30    medium  no  fair    no  
<=30    low yes fair    yes 
>40 medium  yes fair    yes 
<=30    medium  yes excellent   yes 
3140   medium  no  excellent   yes 
3140   high    yes fair    yes 
>40 medium  no  excellent   no

解题思路:首先读入训练数据,使用朴素贝叶斯分类来预测未知元组的类标号。利用列表存储各个类标号属性的数目,可以得出P(x|c)P(c),从而预测未知元组的类标号
以下是python代码实现过程:

train_file=open('training-data.txt','r')
train_list=[]
lines=train_file.readlines()
for i in lines:
    curdata=i.strip().split('\t')
    train_list.append(curdata)
del[train_list[0]]
for i in train_list:
    if i[0]=='31\xa1\xad40':
        i[0]='31_40'
print train_list
yes_count=0
no_count=0

for  i in range(len(train_list)):
    if train_list[i][len(train_list[i])-1]=='yes':
        yes_count+=1
    else:
        no_count+=1
#print yes_count,no_count
p_yes=round(yes_count/float(len(train_list)),3)
p_no=round(no_count/float(len(train_list)),3)

def bayes(age_data,income_data,isstudent,cr_data):
    x=list()
    x=[age_data,income_data,isstudent,cr_data]
    x_yes=[0,0,0,0]
    x_no=[0,0,0,0]
    for i in range(len(train_list)):
        for j in range(0,len(train_list[i])-1):
            if train_list[i][j]==x[j]:
                if train_list[i][len(train_list[i])-1]=='yes':
                    temp=x_yes[j]
                    temp+=1
                    x_yes[j]=temp

                else:
                    temp=x_no[j]
                    temp+=1
                    x_no[j]=temp 
    a=1
    for i in x_yes:
        a*=i

    b=1
    for i in x_no:
        b*=i 
    p_x_yes=a/float(yes_count**4)
    p_x_no=b/float(no_count**4)
    fina_p_yes=round(p_x_yes*p_yes,3)
    print 'P(X|buy_computer=yes)=',fina_p_yes
    fina_p_no=round(p_x_no*p_no,4)
    print 'P(X|buy_computer=no)=',fina_p_no
    class_result=""
    if fina_p_yes>fina_p_no:
        class_result="it belongs to this class:yes"
    else:
        class_result="it belongs to this class:no"
    return class_result

print bayes('<=30','medium','yes','fair')
print bayes('31_40','high','no','fair')

下图为实验结果截图:
python实现之构建朴实贝叶斯分类器_第1张图片

你可能感兴趣的:(python,数据处理与分析)