本文翻译自:Simple Digit Recognition OCR in OpenCV-Python
I am trying to implement a "Digit Recognition OCR" in OpenCV-Python (cv2). 我正在尝试在OpenCV-Python(cv2)中实现“数字识别OCR”。 It is just for learning purposes. 它仅用于学习目的。 I would like to learn both KNearest and SVM features in OpenCV. 我想在OpenCV中学习KNearest和SVM功能。
I have 100 samples (ie images) of each digit. 我有每个数字的100个样本(即图像)。 I would like to train with them. 我想和他们一起训练。
There is a sample letter_recog.py
that comes with OpenCV sample. OpenCV示例附带了一个示例letter_recog.py
。 But I still couldn't figure out on how to use it. 但我仍然无法弄清楚如何使用它。 I don't understand what are the samples, responses etc. Also, it loads a txt file at first, which I didn't understand first. 我不明白什么是样本,响应等。另外,它首先加载一个txt文件,我首先不明白。
Later on searching a little bit, I could find a letter_recognition.data in cpp samples. 稍后搜索一下,我可以在cpp示例中找到一个letter_recognition.data。 I used it and made a code for cv2.KNearest in the model of letter_recog.py (just for testing): 我使用它并在letter_recog.py模型中为cv2.KNearest创建了一个代码(仅用于测试):
import numpy as np
import cv2
fn = 'letter-recognition.data'
a = np.loadtxt(fn, np.float32, delimiter=',', converters={ 0 : lambda ch : ord(ch)-ord('A') })
samples, responses = a[:,1:], a[:,0]
model = cv2.KNearest()
retval = model.train(samples,responses)
retval, results, neigh_resp, dists = model.find_nearest(samples, k = 10)
print results.ravel()
It gave me an array of size 20000, I don't understand what it is. 它给了我一个20000的数组,我不明白它是什么。
Questions: 问题:
1) What is letter_recognition.data file? 1)letter_recognition.data文件是什么? How to build that file from my own data set? 如何从我自己的数据集构建该文件?
2) What does results.reval()
denote? 2) results.reval()
表示什么?
3) How we can write a simple digit recognition tool using letter_recognition.data file (either KNearest or SVM)? 3)我们如何使用letter_recognition.data文件(KNearest或SVM)编写简单的数字识别工具?
参考:https://stackoom.com/question/dUo4/OpenCV-Python中的简单数字识别OCR
For those who interested in C++ code can refer below code. 对于那些对C ++代码感兴趣的人可以参考下面的代码。 Thanks Abid Rahman for the nice explanation. 感谢Abid Rahman的好解释。
The procedure is same as above but, the contour finding uses only first hierarchy level contour, so that the algorithm uses only outer contour for each digit. 该过程与上述相同,但轮廓查找仅使用第一层级轮廓,因此算法仅对每个数字使用外轮廓。
//Process image to extract contour
Mat thr,gray,con;
Mat src=imread("digit.png",1);
cvtColor(src,gray,CV_BGR2GRAY);
threshold(gray,thr,200,255,THRESH_BINARY_INV); //Threshold to find contour
thr.copyTo(con);
// Create sample and label data
vector< vector > contours; // Vector for storing contour
vector< Vec4i > hierarchy;
Mat sample;
Mat response_array;
findContours( con, contours, hierarchy,CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE ); //Find contour
for( int i = 0; i< contours.size(); i=hierarchy[i][0] ) // iterate through first hierarchy level contours
{
Rect r= boundingRect(contours[i]); //Find bounding rect for each contour
rectangle(src,Point(r.x,r.y), Point(r.x+r.width,r.y+r.height), Scalar(0,0,255),2,8,0);
Mat ROI = thr(r); //Crop the image
Mat tmp1, tmp2;
resize(ROI,tmp1, Size(10,10), 0,0,INTER_LINEAR ); //resize to 10X10
tmp1.convertTo(tmp2,CV_32FC1); //convert to float
sample.push_back(tmp2.reshape(1,1)); // Store sample data
imshow("src",src);
int c=waitKey(0); // Read corresponding label for contour from keyoard
c-=0x30; // Convert ascii to intiger value
response_array.push_back(c); // Store label to a mat
rectangle(src,Point(r.x,r.y), Point(r.x+r.width,r.y+r.height), Scalar(0,255,0),2,8,0);
}
// Store the data to file
Mat response,tmp;
tmp=response_array.reshape(1,1); //make continuous
tmp.convertTo(response,CV_32FC1); // Convert to float
FileStorage Data("TrainingData.yml",FileStorage::WRITE); // Store the sample data in a file
Data << "data" << sample;
Data.release();
FileStorage Label("LabelData.yml",FileStorage::WRITE); // Store the label data in a file
Label << "label" << response;
Label.release();
cout<<"Training and Label data created successfully....!! "<
Mat thr,gray,con;
Mat src=imread("dig.png",1);
cvtColor(src,gray,CV_BGR2GRAY);
threshold(gray,thr,200,255,THRESH_BINARY_INV); // Threshold to create input
thr.copyTo(con);
// Read stored sample and label for training
Mat sample;
Mat response,tmp;
FileStorage Data("TrainingData.yml",FileStorage::READ); // Read traing data to a Mat
Data["data"] >> sample;
Data.release();
FileStorage Label("LabelData.yml",FileStorage::READ); // Read label data to a Mat
Label["label"] >> response;
Label.release();
KNearest knn;
knn.train(sample,response); // Train with sample and responses
cout<<"Training compleated.....!!"< > contours; // Vector for storing contour
vector< Vec4i > hierarchy;
//Create input sample by contour finding and cropping
findContours( con, contours, hierarchy,CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE );
Mat dst(src.rows,src.cols,CV_8UC3,Scalar::all(0));
for( int i = 0; i< contours.size(); i=hierarchy[i][0] ) // iterate through each contour for first hierarchy level .
{
Rect r= boundingRect(contours[i]);
Mat ROI = thr(r);
Mat tmp1, tmp2;
resize(ROI,tmp1, Size(10,10), 0,0,INTER_LINEAR );
tmp1.convertTo(tmp2,CV_32FC1);
float p=knn.find_nearest(tmp2.reshape(1,1), 1);
char name[4];
sprintf(name,"%d",(int)p);
putText( dst,name,Point(r.x,r.y+r.height) ,0,1, Scalar(0, 255, 0), 2, 8 );
}
imshow("src",src);
imshow("dst",dst);
imwrite("dest.jpg",dst);
waitKey();
In the result the dot in the first line is detected as 8 and we haven't trained for dot. 在结果中,第一行中的点被检测为8,并且我们没有训练过点。 Also I am considering every contour in first hierarchy level as the sample input, user can avoid it by computing the area. 此外,我正在考虑第一层次级别中的每个轮廓作为样本输入,用户可以通过计算区域来避免它。
If you are interested in the state of the art in Machine Learning, you should look into Deep Learning. 如果您对机器学习的最新技术感兴趣,您应该研究深度学习。 You should have a CUDA supporting GPU or alternatively use the GPU on Amazon Web Services. 您应该拥有支持GPU的CUDA,或者在Amazon Web Services上使用GPU。
Google Udacity has a nice tutorial on this using Tensor Flow . Google Udacity使用Tensor Flow提供了一个很好的教程。 This tutorial will teach you how to train your own classifier on hand written digits. 本教程将教您如何在手写数字上训练自己的分类器。 I got an accuracy of over 97% on the test set using Convolutional Networks. 使用Convolutional Networks,我在测试集上获得了超过97%的准确率。
Well, I decided to workout myself on my question to solve above problem. 好吧,我决定在我的问题上自己解决以解决上述问题。 What I wanted is to implement a simpl OCR using KNearest or SVM features in OpenCV. 我想要的是在OpenCV中使用KNearest或SVM功能实现简化的OCR。 And below is what I did and how. 下面是我做了什么以及如何做。 ( it is just for learning how to use KNearest for simple OCR purposes). (它仅用于学习如何将KNearest用于简单的OCR目的)。
1) My first question was about letter_recognition.data file that comes with OpenCV samples. 1)我的第一个问题是关于OpenCV样本附带的letter_recognition.data文件。 I wanted to know what is inside that file. 我想知道那个文件里面有什么。
It contains a letter, along with 16 features of that letter. 它包含一个字母,以及该字母的16个特征。
And this SOF
helped me to find it. 而this SOF
帮我找到了它。 These 16 features are explained in the paper Letter Recognition Using Holland-Style Adaptive Classifiers
. 这些16个特征在Letter Recognition Using Holland-Style Adaptive Classifiers
得到了解释。 ( Although I didn't understand some of the features at end) (虽然我最后还不了解一些功能)
2) Since I knew, without understanding all those features, it is difficult to do that method. 2)因为我知道,如果不了解所有这些功能,就很难做到这一点。 I tried some other papers, but all were a little difficult for a beginner. 我试了一些其他的论文,但对初学者来说都有点困难。
So I just decided to take all the pixel values as my features.
(I was not worried about accuracy or performance, I just wanted it to work, at least with the least accuracy) (我并不担心准确性或性能,我只是想让它起作用,至少准确度最低)
I took below image for my training data: 我在下面的图片中找到了我的训练数据:
( I know the amount of training data is less. But, since all letters are of same font and size, I decided to try on this). (我知道训练数据的数量较少。但是,由于所有字母都是相同的字体和大小,我决定尝试这个)。
To prepare the data for training, I made a small code in OpenCV. 为了准备培训数据,我在OpenCV中编写了一个小代码。 It does following things: 它做了以下事情:
key press manually
. 在一个字母周围绘制边界矩形并等待key press manually
。 This time we press the digit key ourselves corresponding to the letter in box. 这次我们按下数字键,对应于方框中的字母。 At the end of manual classification of digits, all the digits in the train data( train.png) are labeled manually by ourselves, image will look like below: 在手动数字分类结束时,列车数据(train.png)中的所有数字都由我们自己手动标记,图像如下所示:
Below is the code I used for above purpose ( of course, not so clean): 以下是我用于上述目的的代码(当然,不是那么干净):
import sys
import numpy as np
import cv2
im = cv2.imread('pitrain.png')
im3 = im.copy()
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)
################# Now finding Contours ###################
contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
samples = np.empty((0,100))
responses = []
keys = [i for i in range(48,58)]
for cnt in contours:
if cv2.contourArea(cnt)>50:
[x,y,w,h] = cv2.boundingRect(cnt)
if h>28:
cv2.rectangle(im,(x,y),(x+w,y+h),(0,0,255),2)
roi = thresh[y:y+h,x:x+w]
roismall = cv2.resize(roi,(10,10))
cv2.imshow('norm',im)
key = cv2.waitKey(0)
if key == 27: # (escape to quit)
sys.exit()
elif key in keys:
responses.append(int(chr(key)))
sample = roismall.reshape((1,100))
samples = np.append(samples,sample,0)
responses = np.array(responses,np.float32)
responses = responses.reshape((responses.size,1))
print "training complete"
np.savetxt('generalsamples.data',samples)
np.savetxt('generalresponses.data',responses)
Now we enter in to training and testing part. 现在我们进入培训和测试部分。
For testing part I used below image, which has same type of letters I used to train. 对于测试我在下面使用的图像部分,它具有我用来训练的相同类型的字母。
For training we do as follows : 对于培训我们做如下 :
For testing purposes, we do as follows: 出于测试目的,我们执行以下操作:
I included last two steps ( training and testing) in single code below: 我在下面的单个代码中包含了最后两个步骤(培训和测试):
import cv2
import numpy as np
####### training part ###############
samples = np.loadtxt('generalsamples.data',np.float32)
responses = np.loadtxt('generalresponses.data',np.float32)
responses = responses.reshape((responses.size,1))
model = cv2.KNearest()
model.train(samples,responses)
############################# testing part #########################
im = cv2.imread('pi.png')
out = np.zeros(im.shape,np.uint8)
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,1,1,11,2)
contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
for cnt in contours:
if cv2.contourArea(cnt)>50:
[x,y,w,h] = cv2.boundingRect(cnt)
if h>28:
cv2.rectangle(im,(x,y),(x+w,y+h),(0,255,0),2)
roi = thresh[y:y+h,x:x+w]
roismall = cv2.resize(roi,(10,10))
roismall = roismall.reshape((1,100))
roismall = np.float32(roismall)
retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1)
string = str(int((results[0][0])))
cv2.putText(out,string,(x,y+h),0,1,(0,255,0))
cv2.imshow('im',im)
cv2.imshow('out',out)
cv2.waitKey(0)
And it worked, below is the result I got: 它起作用,下面是我得到的结果:
Here it worked with 100% accuracy. 在这里它以100%的准确度工作。 I assume this is because all the digits are of same kind and same size. 我认为这是因为所有的数字都是相同种类和相同的大小。
But any way, this is a good start to go for beginners ( I hope so). 但无论如何,这对初学者来说是个好开始(我希望如此)。