ANPR(Automatic Number Plate Recognition) is divided in two main steps: plate detection and plate recognition. Plate detection has the purpose of detecting the location of the plate in the whole camera frame. When a plate is detected in an image, the plate segment is passed to the second step --plate recognition -- which uses an OCR algorithm to determine the alphanumeric characters on the plate.
一、Plate detection
1、 Segmentation: It is the process of dividing an image into multiple segments.
One important feature of plate segmentation is the high number of vertical edges in a license plate assuming that the image was taken frontally, and the plate is not rotated and is without perspective distortion.
Before finding vertical edges, we need to convert the color image to a grayscale image, (because color can't help us in this task), and remove possible noise generated by the camera or other ambient noise. We will apply a Gaussian blur of 5 x 5 and remove noise. If we don't apply a noise-removal method, we can get a lot of vertical edges that produce a falied detection.
To find the vertical edges, we will use a Sobel filter and find the first horizontal derivative. The derivative is a mathematical function that allows us to find the vertical edgeson an image.
After a Sobel filter, we apply a threshold filter to obtain a binary image with a threshold value obtained through Otsu's method.
By applying a close morphological operation, we can remove blank spaces between each vertical edge line, and connect all regions that have a high number of edges. In this step we have the possible regions that can contain plates.
After applying these functions, we have regions in the image that could contain a plate; however,most of the regions will not contain license plates. These regions can be split with a connected-component analysis or by using the findContours function. This last function retrieves the contours of a binary image with different methods and results. We only need to get the external contours with any hierarchical relationship and any polygonal approximation results. For each contour detected, extract the bounding rectangle of minimal area. We make basic validations about the regions detected based on its area and aspect ratio.
We can make more improvements using the license plate's white background property. All plates have the same background color and we can use a flood fill algorithm to retrieve the rotated rectangle for precise cropping. The first step to crop the license plate is to get several seeds near the last rotated rectangle center.Then get the minimum size of plate between the width and height, and use it to generate random seeds near the patch center. We want to select the white region and we need several seeds to touch at least one white pixel. Then for each seed, we use a floodFill function to draw a new mask image to store the new closest cropping region. Once we have a crop mask, we get a minimal area rectangle from the image-mask points and check the valid size again.
2、Now that the segmentation process is finished and we have valid regions, we can crop each detected region, remove any possible rotation, crop the image region, resize the image,and equalize the light of cropped image regions.
First, we need to generate the transform matrix with getRotationMatrix2D to remove possible rotations in the detected region.
After we rotate the image, we crop the image with getRectSubPix, which crops and copies an image portion of given width and height centered in a point. If the image was rotated, we need to change the width and height sizes with the C++ swap function.
Cropped images are not good for use in training and classification since they do not have the same size. Also, each image contains different light conditions, increasing their relative differences. To resolve this, we resize all images to the same width and height and apply light histogram equalization.
3、Classification: After we preprocess and segment all possible parts of an image, we now need to decide if each segmentis (or is not) a license plate. To do this, we will use a Support Vector Machine (SVM) algorithm.
We need to train the algorithm with an amount of data that is labeled; each data set needs to have a class. We trained our system with 75 license-plate images and 35 images without license plates of 144 x 33 pixels. In a real application, we would need to train with more data.
We need to set the SVM parameters that define the basic parameters to use in an SVM algorithm; we will use the CvSVMParams structure to define it. It is a mapping done to the training data to improve its resemblance to a linearly separable set of data. This mapping consists of increasing the dimensionality of the data and is done efficiently using a kernel function. We choose here the CvSVM::LINEAR types which means that no mapping is done. We then create and train our classifier. OpenCV defines the CvSVM class for the Support Vector Machine algorithm and we initialize it with the training data, classes,and parameter data.
Our classifier is ready to predict a possible cropped image using the predict function of our SVM class; this function returns the class identifier i. In our case, we label a plate class with 1 and no plate class with 0. Then for each detected region that can be a plate, we use SVM to classify it as a plate or no plate, and save only the correct responses.
二、Plate recognition
The second step in license plate recognition aims to retrieve the characters of the license plate with optical character recognition. For each detected plate, we proceed to segment the plate for each character, and use an Artificial Neural Network (ANN) machine-learning algorithm to recognize the character.
1、 OCR segmentation
First, we obtain a plate image patch as the input to the segmentation OCR function with an equalized histogram, we then need to apply a threshold filter and use this threshold image as the input of a Find contours algorithm.
We use the CV_THRESH_BINARY_INV parameter to invert the threshold output by turning the white input values black and black input values white. This is needed to get the contours of each character, because the contours algorithm looks for white pixels.
For each detected contour, we can make a size verification and remove all regions where the size is smaller or the aspect is not correct. In our case, the characters have a 45/77 aspect, and we can accept a 35 percent error of aspect for rotated or distorted characters. If an area is higher than 80 percent, we consider that region to be a black block, and not a character. For counting the area, we can use the countNonZero function that counts the number of pixels with a value higher than 0.
If a segmented character is verified, we have to preprocess it to set the same size and position for all characters and save it in a vector with the auxiliary CharSegment class. This class saves the segmented character image and the position that we need to order the characters because the Find Contour algorithm does not return the contours in the required order.
2、 Feature extraction
The next step for each segmented character is to extract the features for training and classifying the Artificial Neural Network algorithm.
Unlike the plate detection feature-extraction step that is used in SVM, we don't use all of the image pixels; we will apply more common features used in optical character recognition containing horizontal and vertical accumulation histograms and a low-resolution image sample. Each image has a low-resolution 5 x 5 and the histogram accumulations.
For each character, we count the number of pixels in a row or column with a nonzero value using the countNonZero function and store it in a new data matrix called mhist. We normalize it by looking for the maximum value in the data matrix using the minMaxLoc function and divide all elements of mhist by the maximum value with the convertTo function. We create the ProjectedHistogram function to create the accumulation histograms that have as input a binary image and the type of histogram we need—horizontalor vertical.
Other features use a low-resolution sample image. Instead of using the whole character image, we create a low-resolution character, for example 5 x 5. We train the system with 5 x 5, 10 x 10, 15 x 15,and 20 x 20 characters, and then evaluate which one returns the best result so that we can use it in our system. Once we have all the features, we create a matrix of M columns by one row where the columns are the features.
3、 OCR classification
In the classification step, we use an Artificial Neural Network machine-learning algorithm. More specifically, a Multi-Layer Perceptron (MLP), which is the most commonly used ANN algorithm. MLP consists of a network of neurons with an input layer, output layer, and one or more hidden layers. Each layer has one or more neurons connected with the previous and next layer.
An ANN-trained network has a vector of input with features. It passes the values to the hidden layer and computes the results with the weights and activation function. It passes outputs further downstream until it gets the output layer that has the number of neuron classes.
The weight of each layer, synapses, and neuron is computed and learned by training the ANN algorithm. To train our classifier, we create two matrices of data as we did in the SVM training, but the training labels are a bit different. Instead of an N x 1 matrix where N stands for training data rows and 1 is the column, we use the label number identifier. We have to create an N x M matrix where N is the training/samples data and M is the classes (10 digits + 20 letters in our case), and set 1 in a position (i,j) if the data row i is classified with class j.
We create an OCR::train function to create all the needed matrices and train our system, with the training data matrix,classes matrix, and the number of hidden neurons in the hidden layers. The training data is loaded from an XML file just as we did for the SVM training. We have to define the number of neurons in each layer to initialize the ANN class.For our sample, we only use one hidden layer, then we define a matrix of 1 row and 3 columns. The first column position is the number of features, the second column position is the number of hidden neurons in the hidden layer, and the third column position is the number of classes. OpenCV defines a CvANN_MLP class for ANN. With the create function, we can initiate the class by defining the number of layers and neurons, the activation function, and the alpha and beta parameters.
After training, we can classify any segmented plate feature using the OCR::classify function.
The CvANN_MLP class uses the predict function for classifying a feature vector in a class.Unlike the SVM classify function, the ANN's predict function returns a row with the size equal to the number of classes with the probability of belonging to the input feature of each class.