sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=None, **kwargs)[source]
class sklearn.neighbors.KNeighborsRegressor(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=None, **kwargs)
参数 | n_neighbors | int型,可选,k的值 |
weights | 字符类型或callable类型,可选,含义: uniform:所有示例都一视同仁 distance:权重为距离的倒数,距离近的对分类判决影响大 [callable]:用户自定义函数,入参为距离数组,出参为权重数组 |
|
algorithm | 可选,含义: ball_tree:使用球体树 kd_tree:使用kd树 brute:暴力搜索 auto:根据fit函数入参决定最合适的方法 注意:当输入数据稀疏,会忽略并统一用brute |
|
leaf_size | int型,可选,默认值30 决定kd树和球体树的叶子节点大小,影响树构建和查询的速度,还有存储大小 |
|
p | integer,可选,默认值2 表示距离度量Minkowski 距离的p幂次参数,p=2 为欧几里得距离,p=1 为曼哈顿距离 |
|
metric | string或callable类型,默认为minkowski,树的距离度量 | |
metric_params | 距离度量函数的其他关键参数,默认值None | |
n_jobs | 近邻搜索的并行度,默认为None,表示1;-1表示使用所有cpu | |
属性 | classes_ | (n_classes,) 数组,label数,回归模型没有 |
effective_metric_ | string或callable类型,距离度量机制,与metric对应 | |
effective_metric_params_ | dict类型,度量函数的关键参数 | |
outputs_2d_ | bool型,y的格式为 (n_samples, ) 或 (n_samples, 1) 为true,回归模型没有 |
algorithm
和leaf_size 的选择参考: Nearest Neighbors
注意:k近邻算法,若第k个近邻和第k+1个近邻对目标x距离相同,但label不同,结果取决于训练集的顺序
创建模型
X = [[0], [1], [2], [3]]
y = [0, 0, 1, 1]
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier(n_neighbors=3) #k为3
neigh.fit(X, y)
print(neigh.predict([[1.5]]))
#输出
[0]
print(neigh.classes_)
print(neigh.effective_metric_)
print(neigh.outputs_2d_)
输出:
[0 1]
euclidean
False
X = [[0], [1], [2], [3]]
y = [0, 0, 1, 1]
from sklearn.neighbors import KNeighborsRegressor
neigh = KNeighborsRegressor(n_neighbors=2) #k为2
neigh.fit(X, y)
print(neigh.predict([[1.5]]))
#输出
[0.5]
模型方法
fit (self, X, y) |
模型拟合,根据X训练集和标注y |
get_params (self[, deep]) |
获得模型参数 |
kneighbors (self[, X, n_neighbors, …]) |
获取某节点的k个近邻 |
kneighbors_graph (self[, X, n_neighbors, mode]) |
计算X实例的k近邻权重图 |
predict (self, X) |
预测X的类别 |
predict_proba (self, X) |
返回X的概率估计,回归模型没有 |
score (self, X, y[, sample_weight]) |
返回指定测试集的平均准确度 |
set_params (self, \*\*params) |
设置模型参数 |
neigh.get_params()
输出:
{'algorithm': 'auto',
'leaf_size': 30,
'metric': 'minkowski',
'metric_params': None,
'n_jobs': None,
'n_neighbors': 3,
'p': 2,
'weights': 'uniform'}
print(neigh.kneighbors([[1.1]])) # 返回实例1.1最近3个近邻和对应的距离
输出:
(array([[0.1, 0.9, 1.1]]), array([[1, 2, 0]]))
#第一个array 是距离,第二个是对应的下标
A = neigh.kneighbors_graph(X,mode='connectivity')
A.toarray()
输出:
array([[1., 1., 1., 0.],
[1., 1., 1., 0.],
[0., 1., 1., 1.],
[0., 1., 1., 1.]])
#返回的是每个样本和k近邻的连通度,第一个样本是[1., 1., 1., 0.],表示k个近邻中第1、2、3个样本连通度为1
A = neigh.kneighbors_graph(X,mode='distance')
A.toarray()
输出:
array([[0., 1., 2., 0.],
[1., 0., 1., 0.],
[0., 1., 0., 1.],
[0., 2., 1., 0.]])
# 每项是每个样本与k近邻的距离
neigh.predict([[1.1]])
#输出
array([0])
neigh.predict_proba([[1.1]])
#输出
array([[0.66666667, 0.33333333]])
因为1.1 的3个近邻为0,1,2样本,其中两个是0,一个是1,所以0的概率是0.66666667
print(neigh.score([[1.1]],[1]))
print(neigh.score([[1.1]],[0]))
#输出
0.0
1.0
计算模型在指定测试集的得分
print(__doc__)
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn import neighbors, datasets
n_neighbors = 15
# import some data to play with
iris = datasets.load_iris()
# we only take the first two features. We could avoid this ugly
# slicing by using a two-dim dataset
X = iris.data[:, :2]
y = iris.target
h = .02 # step size in the mesh
# Create color maps
cmap_light = ListedColormap(['orange', 'cyan', 'cornflowerblue'])
cmap_bold = ListedColormap(['darkorange', 'c', 'darkblue'])
for weights in ['uniform', 'distance']:
# we create an instance of Neighbours Classifier and fit the data.
clf = neighbors.KNeighborsClassifier(n_neighbors, weights=weights)
clf.fit(X, y)
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold,
edgecolor='k', s=20)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title("3-Class classification (k = %i, weights = '%s')"
% (n_neighbors, weights))
plt.show()
print(__doc__)
# Author: Alexandre Gramfort
# Fabian Pedregosa
#
# License: BSD 3 clause (C) INRIA
# #############################################################################
# Generate sample data
import numpy as np
import matplotlib.pyplot as plt
from sklearn import neighbors
np.random.seed(0)
X = np.sort(5 * np.random.rand(40, 1), axis=0)
T = np.linspace(0, 5, 500)[:, np.newaxis]
y = np.sin(X).ravel()
# Add noise to targets
y[::5] += 1 * (0.5 - np.random.rand(8))
# #############################################################################
# Fit regression model
n_neighbors = 5
for i, weights in enumerate(['uniform', 'distance']):
knn = neighbors.KNeighborsRegressor(n_neighbors, weights=weights)
y_ = knn.fit(X, y).predict(T)
plt.subplot(2, 1, i + 1)
plt.scatter(X, y, color='darkorange', label='data')
plt.plot(T, y_, color='navy', label='prediction')
plt.axis('tight')
plt.legend()
plt.title("KNeighborsRegressor (k = %i, weights = '%s')" % (n_neighbors,
weights))
plt.tight_layout()
plt.show()
这里给y加入噪声后,如果回归采用的权重是distance,感觉和k选择较小值的影响类似,距离近的影响较大,所以图二对噪声较为敏感
参考:
sklearn.neighbors.KNeighborsClassifier
scikit-learn学习之K最近邻算法(KNN)
机器学习实战 之 kNN 分类
基于scikit-learn包实现机器学习之KNN(K近邻)-完整示例
scikit-learn K近邻法类库使用小结 刘建平大佬的博客