class sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, weights=’uniform’, algorithm=’auto’, leaf_size=30, p=2, metric=’minkowski’, metric_params=None, n_jobs=1, **kwargs)
Parameters: |
n_neighbors : int, optional (default = 5)
weights : str or callable, optional (default = ‘uniform’)
algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional
leaf_size : int, optional (default = 30)
p : integer, optional (default = 2)
metric : string or callable, default ‘minkowski’
metric_params : dict, optional (default = None)
n_jobs : int, optional (default = 1)
|
---|
重要的参数:
n_jobs : 并行参数
Methods
fit (X, y) |
Fit the model using X as training data and y as target values |
get_params ([deep]) |
Get parameters for this estimator. |
kneighbors ([X, n_neighbors, return_distance]) |
Finds the K-neighbors of a point. |
kneighbors_graph ([X, n_neighbors, mode]) |
Computes the (weighted) graph of k-Neighbors for points in X |
predict (X) |
Predict the class labels for the provided data |
predict_proba (X) |
Return probability estimates for the test data X. |
score (X, y[, sample_weight]) |
Returns the mean accuracy on the given test data and labels. |
set_params (**params) |
Set the parameters of this estimator. |
几个重要方法:
predict
(
X
)
[source]
Predict the class labels for the provided data
Parameters: | X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’
|
---|---|
Returns: | y : array of shape [n_samples] or [n_samples, n_outputs]
|
predict_proba
(
X
)
[source]
Return probability estimates for the test data X.
Parameters: | X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’
|
---|---|
Returns: | p : array of shape = [n_samples, n_classes], or a list of n_outputs
|
score
(
X,
y,
sample_weight=None
)
[source]
Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Parameters: | X : array-like, shape = (n_samples, n_features)
y : array-like, shape = (n_samples) or (n_samples, n_outputs)
sample_weight : array-like, shape = [n_samples], optional
|
---|---|
Returns: | score : float
|
Example1
from sklearn.neighbors import KNeighborsClassifier
X = [[0], [1], [2], [3]]
y = [0, 0, 1, 1]
neigh = KNeighborsClassifier(n_neighbors=3)
neigh.fit(X,y)
print(neigh.predict([[1.1]])) #预测出所在类样本标签
print(neigh.predict_proba([[0.9]])) #预测
'''
[0]
[[0.66666667 0.33333333]] # 分别对应这个 标签为 0 , 1 的可能性
'''
Example2
from sklearn.neighbors import NearestNeighbors
samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]]
neigh = NearestNeighbors(n_neighbors=1)
neigh.fit(samples)
print(neigh.kneighbors([[1., 1., 1.]]))
'''
resuult:
(array([[0.5]]), array([[2]], dtype=int64))
'''
#第一个数组代表最近距离,第二个数组代表最近点的索引,当参数return_distance = False时 不返回距离 只返回索引
Example3
from sklearn.neighbors import KNeighborsClassifier
X = [[0], [1], [2], [3]]
y = [0, 0, 1, 1]
neigh = KNeighborsClassifier(n_neighbors=3)
neigh.fit(X,y)
print('到每个样本的距离及本身索引:')
print(neigh.kneighbors())#默认 计算每个样本的 所有近邻 并返回距离
print('准确率:',neigh.score([[0.8],[1.5]],[1,0]))
'''
到每个样本的距离及本身索引:
(array([[1., 2., 3.], 表示第2,3,4样本到第一个样本的距离
[1., 1., 2.],
[1., 1., 2.],
[1., 2., 3.]]), array([[1, 2, 3],表示到第一个样本的索引
[0, 2, 3],
[3, 1, 0],
[2, 1, 0]], dtype=int64))
准确率: 0.5
'''
一个分类的小练习:
from sklearn import datasets, neighbors, linear_model
digits = datasets.load_digits()
X_digits = digits.data
y_digits = digits.target
n_samples = len(X_digits)
X_train = X_digits[:int(.9 * n_samples)]
y_train = y_digits[:int(.9 * n_samples)]
X_test = X_digits[int(.9 * n_samples):]
y_test = y_digits[int(.9 * n_samples):]
knn = neighbors.KNeighborsClassifier()
logistic = linear_model.LogisticRegression()
print('KNN score: %f' % knn.fit(X_train, y_train).score(X_test, y_test))
print('LogisticRegression score: %f'
% logistic.fit(X_train, y_train).score(X_test, y_test))
多个分类算法的对比:
from itertools import product
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier
# Loading some example data
iris = datasets.load_iris()
X = iris.data[:, [0, 2]]
y = iris.target
# Training classifiers
clf1 = DecisionTreeClassifier(max_depth=4)
clf2 = KNeighborsClassifier(n_neighbors=7)
clf3 = SVC(kernel='rbf', probability=True)
eclf = VotingClassifier(estimators=[('dt', clf1), ('knn', clf2),
('svc', clf3)],
voting='soft', weights=[2, 1, 2])
clf1.fit(X, y)
clf2.fit(X, y)
clf3.fit(X, y)
eclf.fit(X, y)
# Plotting decision regions
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),
np.arange(y_min, y_max, 0.1))
f, axarr = plt.subplots(2, 2, sharex='col', sharey='row', figsize=(10, 8))
for idx, clf, tt in zip(product([0, 1], [0, 1]),
[clf1, clf2, clf3, eclf],
['Decision Tree (depth=4)', 'KNN (k=7)',
'Kernel SVM', 'Soft Voting']):
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
print(type(Z))
Z = Z.reshape(xx.shape)
axarr[idx[0], idx[1]].contourf(xx, yy, Z, alpha=0.4)
axarr[idx[0], idx[1]].scatter(X[:, 0], X[:, 1], c=y,
s=20, edgecolor='k')
axarr[idx[0], idx[1]].set_title(tt)
plt.show()