




d = ( x 1 − x 0 ) 2 + ( y 1 − y 0 ) 2 d=\sqrt{(x_1-x_0)^2+(y_1-y_0)^2} d=(x1x0)2+(y1y0)2




  • 收集数据:可以使用任何方法
  • 准备数据:距离计算所需要的数值,最好是结构化的数据
  • 分析数据:可以使用任何方法
  • 测试算法:计算错误率
  • 使用算法:首先需要输入样本数据和仅够花的输出结果,然后运行k-邻近算法判定输入数据分别属于那个分类,最后应用对计算出的分类执行后续结果.



  • 不喜欢的人
  • 魅力一般的人
  • 极具魅力的人


  • 每年获得飞行常客的里程数
  • 玩视频游戏所耗时间百分比
  • 每周消费的冰淇淋公斤数



def file2matrix(filename):
    fr = open(filename)
    numberOfLines = len(fr.readlines())         #get the number of lines in the file
    returnMat = zeros((numberOfLines,3))        #prepare matrix to return
    classLabelVector = []                       #prepare labels return
    fr = open(filename)
    index = 0
    for line in fr.readlines():
        line = line.strip()
        listFromLine = line.split('\t')
        returnMat[index,:] = listFromLine[0:3]
        index += 1
    return returnMat,classLabelVector


import matplotlib.pyplot as plot
from numpy import *
import numpy as np
import operator
ax.scatter(datingDateSet[:,  1], datingDateSet[:, 2],15.0*array(datingLabels),15.0*array(datingLabels))





N e w V a l u e = ( o l d V a l u e − m i n ) ( m a x − m i n ) NewValue=\frac{(oldValue-min)}{(max-min)} NewValue=(maxmin)(oldValuemin)

def autoNorm(dataSet):
    minVals=dataSet.min(0)        # numpy.min(x)  根据参数的x来比较维度的最小,
    return normalDataSet, ranges,minVals


kNN 算法


# 简单kNN 算法
def knn_classifier(inX, dataSet, lables, k):      # inX是需要分类新数据,dataSet测试数据,label是训练数据的结果
    dataSetSize = dataSet.shape[0]
    diffMat = np.tile(inX, (dataSetSize, 1)) - dataSet
    sqDiffMat = diffMat ** 2
    sqDistances = sqDiffMat.sum(axis=1)
    distances = sqDistances ** 0.5
    sortedDistIndicies = distances.argsort()
    classCount = {
    for i in range(k):
        voteILabel = lables[sortedDistIndicies[i]]
        classCount[voteILabel] = classCount.get(voteILabel, 0) + 1
        sortClassCount = sorted(classCount.items(), key=operator.itemgetter(1), reverse=True)
        return sortClassCount[0][0]

算法过程: 我们首先得到数组的数据量(列数),将需要测试的数据进行数据量维度的复制扩容到跟训练数据的维度一样.然后进行相减.然后将数据集合进行平方,在对数列纵向求sum,求和后开根号,argsort函数 ,argsort函数返回的是数组值从小到大的索引值.返回后就可以根据k来计算最近距离的k个数据的分类是什么,最多的那个就是我们想要的结果.


p = C o u n t T C o u n t A p=\frac{Count_T}{Count_A} p=CountACountT


def datingClassTest():
    for i in range(numTestVecs):
        print("the classifier came back with:%d,the real answer is %d" %(classifierResult,datingLabels[i]))
        if classifierResult!=datingLabels[i]:
    print("this total error rate is: %f" % (errorcount/float(numTestVecs)))

the classifier came back with:3,the real answer is 3
the classifier came back with:2,the real answer is 2
the classifier came back with:1,the real answer is 1
the classifier came back with:1,the real answer is 1
the classifier came back with:1,the real answer is 1
the classifier came back with:1,the real answer is 1
the classifier came back with:3,the real answer is 3
the classifier came back with:3,the real answer is 3
the classifier came back with:1,the real answer is 1
the classifier came back with:3,the real answer is 3
the classifier came back with:1,the real answer is 1
the classifier came back with:1,the real answer is 1
the classifier came back with:2,the real answer is 2
the classifier came back with:1,the real answer is 1
the classifier came back with:1,the real answer is 1
the classifier came back with:1,the real answer is 1
the classifier came back with:1,the real answer is 1
the classifier came back with:1,the real answer is 1
the classifier came back with:2,the real answer is 2
the classifier came back with:3,the real answer is 3
the classifier came back with:2,the real answer is 2
the classifier came back with:1,the real answer is 1
the classifier came back with:3,the real answer is 2
the classifier came back with:3,the real answer is 3
the classifier came back with:2,the real answer is 2
the classifier came back with:3,the real answer is 3
the classifier came back with:2,the real answer is 2
the classifier came back with:3,the real answer is 3
the classifier came back with:2,the real answer is 2
the classifier came back with:1,the real answer is 1
the classifier came back with:3,the real answer is 3
the classifier came back with:1,the real answer is 1
the classifier came back with:3,the real answer is 3
the classifier came back with:1,the real answer is 1
the classifier came back with:3,the real answer is 2
the classifier came back with:1,the real answer is 1
the classifier came back with:1,the real answer is 1
the classifier came back with:2,the real answer is 2
the classifier came back with:3,the real answer is 3
the classifier came back with:3,the real answer is 3
the classifier came back with:1,the real answer is 1
the classifier came back with:2,the real answer is 2
the classifier came back with:3,the real answer is 3
the classifier came back with:3,the real answer is 3
the classifier came back with:3,the real answer is 3
the classifier came back with:1,the real answer is 1
the classifier came back with:1,the real answer is 1
the classifier came back with:1,the real answer is 1
the classifier came back with:3,the real answer is 1
the classifier came back with:2,the real answer is 2
the classifier came back with:2,the real answer is 2
the classifier came back with:1,the real answer is 1
the classifier came back with:3,the real answer is 3
the classifier came back with:2,the real answer is 2
the classifier came back with:2,the real answer is 2
the classifier came back with:2,the real answer is 2
the classifier came back with:2,the real answer is 2
the classifier came back with:3,the real answer is 3
the classifier came back with:1,the real answer is 1
the classifier came back with:2,the real answer is 2
the classifier came back with:1,the real answer is 1
the classifier came back with:2,the real answer is 2
the classifier came back with:2,the real answer is 2
the classifier came back with:3,the real answer is 2
the classifier came back with:2,the real answer is 2
the classifier came back with:2,the real answer is 2
the classifier came back with:3,the real answer is 3
the classifier came back with:2,the real answer is 2
the classifier came back with:3,the real answer is 3
the classifier came back with:1,the real answer is 1
the classifier came back with:2,the real answer is 2
the classifier came back with:3,the real answer is 3
the classifier came back with:2,the real answer is 2
the classifier came back with:2,the real answer is 2
the classifier came back with:3,the real answer is 1
the classifier came back with:3,the real answer is 3
the classifier came back with:1,the real answer is 1
the classifier came back with:1,the real answer is 1
the classifier came back with:3,the real answer is 3
the classifier came back with:3,the real answer is 3
the classifier came back with:1,the real answer is 1
the classifier came back with:2,the real answer is 2
the classifier came back with:3,the real answer is 3
the classifier came back with:3,the real answer is 1
the classifier came back with:3,the real answer is 3
the classifier came back with:1,the real answer is 1
the classifier came back with:2,the real answer is 2
the classifier came back with:2,the real answer is 2
the classifier came back with:1,the real answer is 1
the classifier came back with:1,the real answer is 1
the classifier came back with:3,the real answer is 3
the classifier came back with:2,the real answer is 3
the classifier came back with:1,the real answer is 1
the classifier came back with:2,the real answer is 2
the classifier came back with:1,the real answer is 1
the classifier came back with:3,the real answer is 3
the classifier came back with:3,the real answer is 3
the classifier came back with:2,the real answer is 2
the classifier came back with:2,the real answer is 1
the classifier came back with:1,the real answer is 1
this total error rate is: 0.080000
