K近邻法(k-NearestNeighbor)是一种很基本的机器学习方法,能做分类和回归任务
(1)距离度量
(2)k值的选择
(3)决策规则
(1)蛮力法(brute-force)
(2)KD树(KDTree)
(3)球树(BallTree)
(1)优点
(2)缺点
1.假设有两个样本点x1,x2,它们两者间的闵可夫斯基距离Lp定义为
(1)投票法[0,1]
#投票法做分类
#[x1, x2, y]
T = [[3, 104, 0],
[2, 100, 0],
[1, 81, 0],
[101, 10, 1],
[99, 5, 1],
[98, 2, 1]]
#预测分类
x = [18, 90]
#用最近k个点做预测
K = 5
#记录所有样本点到预测点距离
dis = []
import math
for i in T:
d = math.sqrt((x[0]-i[0])**2 + (x[1] - i[1])**2)
#添加距离和标签
dis.append([d,i[2]])
#按照距离排序
dis.sort(key=lambda x:x[0])
print(dis)
# box 距离最近的前k个标签点
print(1 if sum([i[1] for i in dis[:k]])>k/2 else 0)
[[18.867962264113206, 0], [19.235384061671343, 0], [20.518284528683193, 0], [115.27792503337315, 1], [117.41379816699569, 1], [118.92854997854805, 1]]
0
(2)投票法[-1,1]
#分类
#训练样本点
T = [[3, 104, -1],
[2, 100, -1],
[1, 81, -1],
[101, 10, 1],
[99, 5, 1],
[98, 2, 1]]
#测试样本
x = [18, 90]
K = 5
dis = []
import numpy as np
for i in T:
d = np.sqrt((x[0] - i[0])**2 + (x[1] - i[1])**2)
dis.append([d,i[2]])
dis.sort(key=lambda x:x[0])
print(dis)
# 计算距离为正还是负, 如果是正,那么结果是+1
print(-1 if sum([1/i[0]*i[1] for i in dis[:K]])<0 else 1)
[[18.867962264113206, -1], [19.235384061671343, -1], [20.518284528683193, -1], [115.27792503337315, 1], [117.41379816699569, 1], [118.92854997854805, 1]]
-1
(3)回归[求均值]
#回归
T = [[3, 104, 98],
[2, 100, 93],
[1, 81, 95],
[101, 10, 16],
[99, 5, 8],
[98, 2, 7]]
x = [18, 90]
K = 5
dis = []
from math import sqrt
for i in T:
d = sqrt((x[0]-i[0])**2 + (x[1]-i[1])**2)
dis.append([d, i[2]])
dis.sort(key=lambda x:x[0])
from numpy import mean
#求平均值
print(mean([i[1] for i in dis[0:K]]))
62.0
(4)带权回归
#带权回归
T = [
[3, 104, 98],
[2, 100, 93],
[1, 81, 95],
[101, 10, 16],
[99, 5, 8],
[98, 2, 7]
]
x = [18, 90]
K = 4
dis = []
import math
for i in T:
d = (x[0]-i[0])**2 + (x[1]-i[1])**2
#pow方法 幂函数
dis.append([pow(d,1/2), i[2]])
dis.sort(key=lambda x:x[0])
dis = [[1/i[0],i[1]] for i in dis][0:K]
a = 1 / sum([i[0] for i in dis])
print(dis)
res = sum([i[0]*i[1] for i in dis])
print(res*a)
[[0.052999894000318, 93], [0.05198752449100364, 95], [0.04873701788285793, 98], [0.008674687714152543, 16]]
91.02775527644329
导库求预测
from sklearn.neighbors import KNeighborsClassifier
# 编写一个kNN算法函数,并通过数据来验证算法的正确性。
# 1、四个样本数据点:[1.0, 0.9], [1.0, 1.0], [0.1, 0.2], [0.0, 0.1]
x = [1.0, 0.9], [1.0, 1.0], [0.1, 0.2], [0.0, 0.1]
# 2、四个样本数据点所属分类:['A', 'A', 'B', 'B']
y = ['A', 'A', 'B', 'B']
print(x,y)
model = KNeighborsClassifier(n_neighbors=2)
model.fit(x,y)
# 3、通过调用kNN算法,分别检验两个点[1.2, 1.0]和[0.1, 0.3]的分类
s = [1.2, 1.0], [0.1, 0.3]
print(model.predict(s))
([1.0, 0.9], [1.0, 1.0], [0.1, 0.2], [0.0, 0.1]) ['A', 'A', 'B', 'B']
['A' 'B']
sklearn.neighbors.KNeighborsClassifier(n_neighbors = 5,weights =‘uniform’,algorithm =‘auto’,leaf_size = 30,p = 2,metric =‘minkowski’,metric_params = None,n_jobs = None,kwargs
参数
方法