KNN - 基于scikit实现程序

本文之编写程序涉及到API介绍,程序的完整实现,具体算法原理请查看之前所写的KNN算法介绍

一、基础准备

1、python 基础

2、numpy 基础

np.mean
求平均值

print(np.mean([1,2,3,4]))
# >> 2.5

3、scikit 基础

fit
(X, y)
符合模型使用X作为训练数据和y值作为目标
get_params
([deep])
得到的参数估计量。
.

kneighbors
([X, n_neighbors, return_distance])
发现的K-neighbors点。

kneighbors_graph
([X, n_neighbors, mode])
计算(加权)图k-Neighbors X点

predict
(X)
预测类标签所提供的数据

predict_proba
(X)
回归测试数据的概率估计X。

score
(X, y[, sample_weight])
返回意味着在给定的精度测试数据和标签。

set_params
(**params)
设置的参数估计量。
.

二、完整程序

# -*- coding: utf-8 -*-
import numpy as np
from sklearn import neighbors, preprocessing
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import classification_report
from sklearn.cross_validation import train_test_split

def file2Mat(testFileName, parammterNumber):
    fr = open(testFileName)
    lines = fr.readlines()
    lineNums = len(lines)
    resultMat = np.zeros((lineNums, parammterNumber))
    classLabelVector = []
    for i in range(lineNums):
        line = lines[i].strip()
        itemMat = line.split('\t')
        resultMat[i, :] = itemMat[0:parammterNumber]
        classLabelVector.append(itemMat[-1])
    fr.close()
    return resultMat, classLabelVector;


# 为了防止某个属性对结果产生很大的影响,所以有了这个优化,比如:10000,4.5,6.8 10000就对结果基本起了决定作用
def autoNorm(dataSet):
    minVals = dataSet.min(0)
    maxVals = dataSet.max(0)
    ranges = maxVals - minVals
    normMat = np.zeros(np.shape(dataSet))
    size = normMat.shape[0]
    normMat = dataSet - np.tile(minVals, (size, 1))
    normMat = normMat / np.tile(ranges, (size, 1))
    return normMat, minVals, ranges

if __name__=='__main__':
    trainigSetFileName = 'data\\datingTrainingSet.txt'
    testFileName = 'data\\datingTestSet.txt'

    # 读取训练数据
    trianingMat, classLabel = file2Mat(trainigSetFileName, 3)
    # 对数据进行归一化的处理
    autoNormTrianingMat, minVals, ranges = autoNorm(trianingMat)
    # 读取测试数据
    testMat, testLabel = file2Mat(testFileName, 3)
    autoNormTestMat = []
    for i in range(len(testLabel)):
        autoNormTestMat.append( (testMat[i] - minVals) / ranges)
    # testMat = preprocessing.normalize(testMat)
    print(autoNormTestMat)
    # ''''' 训练KNN分类器 '''
    clf = neighbors.KNeighborsClassifier(n_neighbors=5, algorithm='kd_tree')
    clf.fit(autoNormTrianingMat, classLabel)

    answer = clf.predict(autoNormTestMat)

    print(np.sum(answer != testLabel))

    # 计算分数
    print(clf.score(autoNormTestMat, testLabel))
    print(np.mean(answer == testLabel))
    print(clf.predict([0.44832535,  0.39805139,  0.56233353]))
    print(clf.predict_proba([0.44832535,  0.39805139,  0.56233353]))
    # '''''准确率与召回率'''
    # precision, recall, thresholds = precision_recall_curve(testLabel, clf.predict(testMat))

你可能感兴趣的:(KNN - 基于scikit实现程序)