k近邻算法识别手写数字Python实现

数据集:(参见python实战教程)

训练数据:trainingDigits 2000多个.txt文件

测试数据:testDigits  约900个.txt文件

均为32*32大小

test_handWritting.py:

from numpy import *
import os
import knnOperator
import pdb

def img2vector(filename,d): #d=32
    returnVector = zeros((1,d*d))
    fr = open(filename)
    for i in range(d):
        linstr = fr.readline()
        for j in range(d):
            returnVector[0,i*d+j] = int(linstr[j])
    return returnVector
    
def handwritingClassTest(filepath,d):
    trainFilePath = filepath + 'trainingDigits\\'
    trainFileList = os.listdir(trainFilePath)
    nTrain = len(trainFileList)
    trainData = zeros((nTrain,d*d))
    trainlabels = []
    for i in range(nTrain):
        trainFilei = trainFileList[i]
        trainFileName = trainFilePath + trainFilei
        vector = img2vector(trainFileName,d)
        trainData[i,:] = vector
        trainFileClass = trainFilei.split('_')[0]
        trainlabels.append(trainFileClass)
       
    testFilePath = filepath + 'testDigits\\'
    testFileList = os.listdir(testFilePath)
    nTest = len(testFileList)
    k = 4
    count = 0
    for j in range(nTest):
        #pdb.set_trace()
        testFilej = testFileList[j]
        testFileName = testFilePath + testFilej
        testSample = img2vector(testFileName,d)
        test_label = knnOperator.knnOperator(testSample,trainData,trainlabels,k)
        truth_label = testFilej.split('_')[0]
        if (truth_label == test_label):
            count += 1
    rate = float(count) / float(nTest)
    print rate

knnOperator函数参见: http://blog.csdn.net/u013593585/article/details/51284537

主实现:

import test_handWritting
filepath = 'E:\\ZForWorks\\MLPython\\knn\\digits\\'
d = 32
handwritingClassTest(filepath,d)
准确率:98.3%



你可能感兴趣的:(k近邻算法识别手写数字Python实现)