K邻近算法

简称kNN,工作原理是:存在一个训练样本集并且对应的每个数据都有标签。输入新数据后,将新数据的每个特征与样本集中数据对应的特征进行比较,然后K邻近算法会提取样本集中特征最相似数据的分类标签作为新数据的分类。
换一种说法:计算每个样本与待测数据的距离,得到的距离被排序后取出前K名,K名中样本数据属于的分类类型最多的类型就是新数据类型。

#coding=utf-8
from numpy import *
import operator
def createDataSet():
    group = array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
    labels = ['A','A','B','B']
    return group,labels

print createDataSet()

def KNNClassify(inX,dataSet,labels,k):
    dataSetSize = dataSet.shape[0]#dataSet的样本集个数
    diffMat = tile(inX,(dataSetSize,1)) - dataSet
    sqDiffMat = diffMat ** 2
    '''行向量相加(默认的axis=0 就是普通的相加,
    而当加入axis=1以后就是将一个矩阵的每一行向量相加)'''
    sqDistances = sqDiffMat.sum(axis = 1)
    distances = sqDistances ** 0.5
    sortedDistIndicies = distances.argsort()
    classCount = {}
    for i in range(k):
        votelabel = labels[sortedDistIndicies[i]]
        classCount[votelabel] = classCount.get(votelabel,0) + 1
    sortedClassCount = sorted(classCount.iteritems(),key = operator.itemgetter(1),reverse=True)
    return sortedClassCount[0][0]#sortedClassCount:[('A', 2), ('B', 1)])

dataSet, labels = createDataSet()
 
testX = array([1.2, 1.0])
k = 3
outputLabel = KNNClassify(testX, dataSet, labels, 3)
 
print("Your input is:", testX, "and classified to class: ", outputLabel)
 
 
testX = array([0.1, 0.3])
k = 3
outputLabel = KNNClassify(testX, dataSet, labels, 3)
 
print("Your input is:", testX, "and classified to class: ", outputLabel)

结果:

(array([[ 1. ,  1.1],
       [ 1. ,  1. ],
       [ 0. ,  0. ],
       [ 0. ,  0.1]]), ['A', 'A', 'B', 'B'])
('Your input is:', array([ 1.2,  1. ]), 'and classified to class: ', 'A')
('Your input is:', array([ 0.1,  0.3]), 'and classified to class: ', 'B')

你可能感兴趣的:(K邻近算法)