These generally involve four main steps:(1)、Face detection;(2)、Face preprocessing;(3)、Collect and learn faces;(4)、Face recognition.
一、Face detection(Haar-based、LBP-based)
LBP-based detectors are potentially several times fasterthan Haar-based detectors.
The basic idea of the Haar-based facedetector is that if you look at most frontal faces, the region with the eyes should be darker than the forehead and cheeks, and the region with the mouth should be darker than cheeks, and so on. It typically performs about 20 stages of comparisons like this to decide if it is a face or not, but it must do this at each possible position in the image and for each possible size of the face,so in fact it often does thousands of checks per image. The basic idea of the LBP-based face detector is similar to the Haar-based one, but it uses histograms of pixel intensity comparisons, such as edges, corners, and flatregions.
These cascade classifier detectors are typically trained using at least 1,000 unique face images and 10,000 non-face images (for example, photos of trees, cars, and text), and the training process can take a long time even on a multi-core desktop (typically a few hours forLBP but one week for Haar!).
Type of cascade classifier |
XML filename |
Face detector (default) |
haarcascade_frontalface_default.xml |
Face detector (fast Haar) |
haarcascade_frontalface_alt2.xml |
Face detector (fast LBP) |
lbpcascade_frontalface.xml |
Profile (side-looking) face detector |
haarcascade_profileface.xml |
Eye detector (separate for left and right) |
haarcascade_lefteye_2splits.xml |
Mouth detector |
haarcascade_mcs_mouth.xml |
Nose detector |
haarcascade_mcs_nose.xml |
Whole person detector |
haarcascade_fullbody.xml |
This pretrained LBP face detector that comes with OpenCV v2.x is not tuned as well as the pretrained Haar face detectors, so if you want more reliable face detection then you may want to train your own LBP face detector or use a Haar face detector.
1、 Loading a Haar or LBP detector for object orface detection.
2、 Grayscale color conversion: Face detection only works on grayscale images. So we should convert the color camera frame to grayscale.
3、 Shrinking the camera image: The speed of face detection depends on the size of the input image (it is very slow for large images but fast for small images), and yet detection is still fairly reliable even at low resolutions. So we should shrink the camera image to a more reasonable size. Face detection usually works quite well for any image size greater than 240 x 240 pixels (unless you need to detect faces that are far away from the camera), because it will look for any faces larger than the minFeatureSize (typically 20 x 20 pixels).
4、 Histogram equalization: Face detection is not as reliable in low-light conditions. So we should perform histogram equalization to improve the contrast and brightness.
5、 Detecting the face:we are ready to detect the faces using the CascadeClassi fier::detectMultiScale() function. If we gave a shrunken image to the face detector, the results will also be shrunk, so we need to enlarge them if we want to know the face regions for the original image. We also need to make sure faces on the border of the image stay completely within the image.
二、Face preprocessing
1、 Eye detection:
Eye detectors that detect open or closed eyes are as follows:
(1)、haarcascade_mcs_lefteye.xml (and haarcascade_mcs_righteye.xml)
(2)、haarcascade_lefteye_2splits.xml (and haarcascade_righteye_2splits.xml)
Eye detectors that detect open eyes only areas follows:
(1)、haarcascade_eye.xml
(2)、haarcascade_eye_tree_eyeglasses.xml(can detect the eyes if the person is wearing glasses, but is not reliable if they don't wear glasses)
The list of four eye detectors mentioned is ranked in approximate order from most reliable to least reliable, so if you know you don't need to find people with glasses then the first detector is probably the best choice. Different eye detectors are better suited to different regions of the face, for example, the haarcascade_eye.xml detector works best if it only searches in a very tight region around the actual eye,whereas the haarcascade_mcs_lefteye.xml and haarcascade_lefteye_2splits.xml detectors work best when there is a large region around the eye.
The following table lists some good search regions of the face for different eye detectors (when using the LBP face detector), using relative coordinates within the detected face rectangle:
Cascade Classifier |
EYE_SX |
EYE_SY |
EYE_SW |
EYE_SH |
haarcascade_eye.xml |
0.16 |
0.26 |
0.30 |
0.28 |
haarcascade_mcs_lefteye.xml |
0.10 |
0.19 |
0.40 |
0.36 |
haarcascade_lefteye_2splits.xml |
0.12 |
0.17 |
0.37 |
0.36 |
Speed is typically much faster when eyes are found than when eyes are not found, as it must scan the entire image, but the haarcascade_mcs_lefteye.xmlis still much slower than the other eye detectors.
While it is recomended to shrink the camera image before detecting faces, you should detect eyes at the full camera resolution because eyes will obviously be much smaller than faces, so you needas much resolution as you can get.For many tasks, it is useful to detect eyes whether they are opened or closed,so if speed is not crucial, it is best to search with the mcs_*eye detector first, and if it fails then search with the eye_2splits detector. But for face recognition, a person will appear quite different if their eyes are closed, so it is best to search with the plain haarcascade_eye detector first, and if it fails then search with the haarcascade_eye_tree_eyeglasses detector.
2、Geometrical transformation and cropping: This process would include scaling,rotating, and translating the images. Rotate the face so that the two eyes are horizontal. Scale the face so that the distance between the two eyes is always the same. Translate the face so that the eyes are always centered horizontally and at a desired height. Crop the outer parts of the face, since we want to crop away the image background, hair, forehead, ears, and chin.
3、Separate histogram equalization for left and right sides: This process standardizes the brightness and contrast on both the left- and right-hand sides of the face independently.
4、Smoothing: This process reduces the image noise using a bilateral filter.To reduce the effect of pixel noise, we will use a bilateral filter on the face, as a bilateral filteris very good at smoothing most of an image while keeping edges sharp. Histogram equalization can significantly increase the pixel noise.
5、Elliptical mask: The elliptical mask removes some remaining hair and background from the face image.Although we have already removed most of the image background and forehead and hair when we did the geometrical transformation, we can apply an elliptical mask to remove some of the corner region such as the neck, which might be in shadow from the face,particularly if the face is not looking perfectly straight towards the camera.To create the mask, we will draw a black-filled ellipse onto a white image. One ellipse to perform this has a horizontal radius of 0.5 (that is, it covers the face width perfectly), a vertical radius of 0.8 (as faces are usually taller than they are wide), and centered at the coordinates 0.5, 0.4.The elliptical mask can remove some unwanted corners from the face.
三、Collecting faces and learning from them
It is important that you provide a good training set that covers the types of variations you expect to occur in your testing set.
1、 Collecting preprocessed faces for training: Make sure there is at least a one-second gap between collecting new faces.To compare the similarity of two images, pixel by pixel, you can find the relative L2 error, which just involves subtracting one image from the other,summing the squared value of it, and then getting the square root of it. As the result is summed over all pixels, the value will depend on the image resolution. So to get the mean error we should divide this value by the total number of pixels in the image. This similarity will often be less than 0.2 if the image did not move much, and higher than 0.4 if the image did move, so let's use 0.3 as our threshold for collecting a new face.
There are many tricks we can play to obtain more training data, such as using mirrored faces, adding random noise, shifting the face by a few pixels, scaling the face by a percentage, or rotating the face by a few degrees (even though we specifically tried to remove these effects when preprocessing the face!). Let's add mirrored faces to the training set, so that we have both, a larger training set as well as a reduction in the problems of asymmetrical faces or if a user is always oriented slightly to the left or right during training but not testing.
2、Training the face recognition system from collected faces(Eigenfaces(PCA)、Fisherfaces(LDA)、LBPH)
In simple terms, the basic principle of Eigenfaces is that it will calculate a set of special images(eigenfaces) and blending ratios (eigenvalues).If the training set had 5 people with 20 faces for each person, then there would be 100 eigenfaces and eigenvalues to differentiate the 100 total faces in the training set, and in fact these would be sorted so the first few eigenfaces and eigenvalues would be the most critical differentiators, and the last few eigenfaces and eigenvalues would just be random pixel noises that don't actually help to differentiate the data. So it is common practice to discard some of the last eigenfaces and just keep the first 50 or so eigenfaces. Incomparison, the basic principle of Fisherfaces is that instead of calculating a special eigenvector and eigenvalue for each image in the training set, it only calculates one special eigenvector and eigenvalue for each person. So in the preceding example that has 5 people with 20 faces for each person, the Eigenfaces algorithm would use 100 eigenfaces and eigenvalues whereas the Fisherfaces algorithm would use just 5 fisherfaces and eigenvalues.
Both the Eigenfaces and Fisherfaces algorithms first calculate the average face that is the mathematical average of all the training images, so they can subtract the average image from each facial image to have better face recognition results.
四、Face recognition
We can identify the person in a photo simply by calling the FaceRecognizer::predict() function on a facial image.The problem with this identification is that it will always predict one of the given people, even if the input photo is of an unknown person or of a car. It would still tell you which person is the most likely person in that photo. To confirm if the result of the prediction is reliable or whether it should be taken as an unknownperson, we perform face verification. The method we will use is to reconstructthe facial image using the eigenvectors and eigenvalues, and compare this reconstructed image with the input image. If the person had many of their faces included in the training set, then the reconstruction should work quite well from the learnt eigenvectors and eigenvalues, but if the person did not have any faces in the training set (or did not have any that have similar lightingand facial expressions as the test image), then the reconstructed face will look very different from the input face, signaling that it is probably an unknown face. OpenCV's FaceRecognizer class makes it quite easy to generate a reconstructed face from any input image, by using the subspaceProject() function to project onto the eigenspace and the subspaceReconstruct() function to go back from eigenspace to image space.
We can now calculate how similar this reconstructed face is to the input face by using the same getSimilarity() function we created previously for comparing two images, where a value less than 0.3 implies that the two images are very similar. For Eigenfaces, there is one eigenvector for each face, so reconstruction tends to work well and therefore we can typically use a threshold of 0.5, but Fisherfaces has just one eigenvector for each person, so reconstruction will not work as well and therefore it needs a higher threshold, say 0.7.