机器学习算法之KNN

1、基本思想

物以类聚、人以群分,一个实例与它周围的实例属于同一类的概率较大。

2、算法

给定一个训练数据集,对新输入的实例,在训练数据集中找到与该实例最邻近的k个实例,这k个实例的多数属于某个类,就把该输入实例分为这个类。

3、代码实现

这里,选用了欧氏距离,k的默认值为3,使用了sklearn提供的digits数据集来进行测试。

'''
Input:  X_train: (M, N) matrix
        y_train: (M, ) vector
        X_test: (K, L) matrix
        y_test: (K, ) vector
''' 
import numpy as np
import numpy.linalg as la
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_digits

class KNN():
    def __init__(self, k=3):
        self.k = k 

    def fit(self, X_train, y_train):
        self.X_train = X_train
        self.y_train = y_train
        
    def predict_(self, one_data):   
        dist = la.norm(self.X_train - one_data, ord=2, axis=1)
        index = dist.argsort()
        class_count = {}
        for i in range(self.k):
            vote_class = self.y_train[index[i]]
            class_count[vote_class] = class_count.get(vote_class, 0) + 1
            
        sorted_class_count = sorted(class_count.items(), key=lambda d: d[1], reverse=True)

        return sorted_class_count[0][0]

    def predict(self, X_test):
        return np.array([self.predict_(val) for i, val in enumerate(X_test)])

    def score(self, X_test, y_test):
        return sum(self.predict(X_test)==y_test) / len(y_test)
    
    
digits = load_digits()
X = digits.data 
y = digits.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

knn = KNN()
knn.fit(X_train, y_train)
res = knn.predict(X_test)

print('Real--->Predicted')

for i, val in enumerate(y_test):
    print('  %d ---> %d' % (val, res[i]))

print('预测准确率:')
print(knn.score(X_test, y_test))

你可能感兴趣的:(机器学习)