本文转自http://www.ricoh.com/about/company/technology/voice/column/020.html
Animals, human beings included, behave based on environmental perceptions, which are made quickly and often unconsciously. If the decision was wrong, the embedded learning mechanism corrects the decision criterion instantly. Using this principle based on an animal's flexible recognition adaptability, pattern recognition techniques follow the same strategy. In the traditional approach, we tried to obtain more and more detailed information at the beginning and then took decisions based on statistical data processing. However, we already know a more efficient way. When we see an unknown living object on the road, we may guess it is a dog or cat based on its silhouette size. Drawing closer we may find it is actually a fox by the detailed shape of its silhouette and color. We add another possible animal, fox, to the list in the similar size category. This is the typical strategy an animal uses in which repeated simple decisions lead to quick, flexible and accurate recognition.
Here is another simple example of pattern recognition. In Figure 1 is a circle. With the traditional approach, we tried hard to create powerful (strong) classifiers to identify the circle correctly. If we already know that the circle has a radius 5 at (I,J), we can simply test the following equation (1) at each sampling point(x,y).
(x-I) 2+(y-J) 2=5 (1)
We quickly see that all the colored sample points in figure 1 do not satisfy equation (1). For example, point (5,0) satisfies equation (1) perfectly, but point (4,2) does not. In general, we need to expect sampling errors in the digital space. Including these errors, the figure is satisfactorily regarded as a "circle." What about the modified circle in Figure 2? Using equation (1), shifting five pixels from the original positions increases the error greatly, but it still looks like a circle to the human eye, doesn't it? Therefore, it may be reasonable to train the classifier that Figure 2 is still a circle. What kind of classifier other than equation (1) should we apply? This is not an easy task even though the image has changed a little.
Now we try another approach to define the circle using many weak classifiers; each classifier's classification ability is weak based on simple decision rules, but as a whole, a high precision ratio is achieved. Defining the circle by many line segments with cross points, we can define the circle much simpler. With a small number of lines, it is difficult to define the circle precisely, but using many lines, it becomes precise. Here, the green lines show the error margins.
Replacing the original circle with the modified circle of figure 2, the resulting image is shown in figure 4. The only difference between the original and modified circle is cross point A, between the circle and line . As other relations remain unchanged, this method can be adapted easily by changing only a small part of the condition. It is plausible that even a complicated pattern could be identified by many weak classifiers (detection lines). This methodology using simple and fast decisions is called Support Vector Machine or Boosting.
Now we will test face detection based on this methodology. What is the most effective classifier for a face? The decision should be simple, quick and effective. Here is an example,
P. Viola and M. Jones, "Rapid Object Detection using a Boosted Cascade of Simple Features," in Conf. On Computer Vision and Pattern Recognition, Hawaii, 2001 (http://research.microsoft.com/~viola/Pubs/Detect/violaJones_CVPR2001.pdf)
The referenced paper suggests Haar transform as a kernel function. Haar transform is expressed as a matrix with the components of (1, -1), whose graphical representation is the black and white filter set shown in Figure 5, (1) and (2). This function is represented with pairs of darker and lighter areas, like the eye and forehead. If we prepare thousands of similar filters, the human face may be a good Haar filtered model.
High Speed Recognition
Placing the two filters on top of the three human face locations, dark and light pair/triple areas match well with human features. However, if hair masks the forehead, misrecognition may occur. Figure 6 is an example where two Haar filters have three different matched areas. These three pattern matched areas are considered unique to the human face. Combining two filters at three locations, each weak filter (classifier) combines to become a powerful (strong) classifier. Using this strong classifier at the first step of the flow chart in Figure 7, this classifier will exclude most of the image that does not include this face feature. This strategy works well to achieve the fast recognition system. Photo-1 is an extracted face example using the Figure 7 flowchart. As this photo's face differs a bit from modern face features, the system requires 20 times more computations to extract the face area. It requires many weak classifiers to achieve a score larger than the defined threshold.
In our society, there are many occasions in which decisions must be made with ambiguous evidence. My next article will show how these weak classifiers work well under those circumstances.
(Ej, 2006.12)