#数据来自UCI Machine Learning知识库的Iris数据集
#紧邻算法,通过计算测试集与训练集上诉特征之间的距离,分类,《Data Analysis with open Source Tools》中没有产生上述图的程序,所以,也模拟了下上述图形,但是暂#时没有平滑效果
from numpy import * import matplotlib.pylab as pl train = loadtxt("D:\\iris.trn",delimiter=',',usecols=(0,1,2,3)) trainlabel= loadtxt("D:\\iris.trn",delimiter=',',usecols=(4,),dtype=str) test = loadtxt("D:\\iris.tst",delimiter=',',usecols=(0,1,2,3)) testlabel= loadtxt("D:\\iris.tst",delimiter=',',usecols=(4,),dtype=str) hit,miss=0,0 for i in range(test.shape[0]): dist = sqrt(sum((test[i]-train)**2,axis=1)) k = argmin(dist) if trainlabel[k]== testlabel[i]: flag='+' hit +=1 else: flag='-' miss +=1 print flag,"\t Predicted:",trainlabel[k],"\t True:",testlabel[i] print print hit ,"out of",hit + miss ,"correct-Accuracy:",hit/(hit+miss+0.0)
运行结果
+ Predicted: Iris-setosa True: Iris-setosa
+ Predicted: Iris-setosa True: Iris-setosa
+ Predicted: Iris-versicolor True: Iris-versicolor
+ Predicted: Iris-versicolor True: Iris-versicolor
+ Predicted: Iris-virginica True: Iris-virginica
5 out of 5 correct-Accuracy: 1.0