OpenCV 3 & Keras 实现基于CNN的人脸检测与识别

人脸识别是计算机视觉一个很重要的领域，本文实现了一个基于卷积神经网络的人脸识别程序，能够识别摄像头中指定的人脸。

参考：
how i implemented iphone xs faceid using deep learning in python

Github：https://github.com/xiaochus/FaceRecognition

环境

Python 3.6
Tensorflow-gpu 1.5.0
Keras 2.1.3
OpenCV 3.4
Scikit-learn 0.19

模型

特征提取

训练模型主要由两部分组成，如下图。其中主要的部分是特征提取网络(即model_1)，其接收一个(64, 64, 3)的张量，输出一个(128,)的张量，这一部分我们使用一个简化的MobileNetV2实现。其主要作用是提取一个人脸的特征。
其次就是双生网络，我们在特征提取网络的基础上，输入成对的数据(input_1和input_2)，分别计算出他们的特征，最后求出特征之间的欧式距离(lambda_1)。其主要作用是使得相似的输入尽量提取到相似的特征。
需要注意的是，虽然有两个输入，但是他们之间并不会相互连接，也不会单独对网络参数进行调整。可以理解为两个输入分别通过一次网络求的特征，最后根据两个特征之间的距离来计算损失。

face_net.png

keras使用共享层的概念来实现这个功能，其本质是层的节点。无论何时，当你在某个输入上调用层时，你就创建了一个新的张量（即该层的输出），同时你也在为这个层增加一个“（计算）节点”。这个节点将输入张量映射为输出张量。当你多次调用该层时，这个层就有了多个节点，其下标分别为0，1，2...

因为在后面的特征提取的任务中，我们不需要对比与距离，只需要中间的特征提取模型，因此我们可以将其提取出来。

def get_feature_model():
    """Get face features extraction model.

    # Returns
        feat_model: Model, face features extraction model.
    """
    model = get_model((64, 64, 3))
    model.load_weights('model/weight.h5')

    feat_model = Model(inputs=model.get_layer('model_1').get_input_at(0),
                       outputs=model.get_layer('model_1').get_output_at(0))

    return feat_model

对比损失

为了使模型能够有效的提取特征，采用的损失函数是对比损失(contrastive loss)，这种损失函数可以有效的处理成对数据的关系，其表达式如下(y代表是否相似， d代表输出的欧式距离)：

loss-func.png

这种损失函数最初来源于Yann LeCun的Dimensionality Reduction by Learning an Invariant Mapping，主要是用在降维中。即本来相似的样本，在经过降维后，在特征空间中两个样本仍旧相似；而原本不相似的样本，在经过降维后，在特征空间中两个样本仍旧不相似。

当y=1（即样本相似）时，损失函数只剩下左边的部分，即相似样本的欧式距离平方和的均值。如果损失值比较大，说明相似样本之间的特征的欧式距离较大。而当y=0时（即样本不相似）时，损失函数只剩下右边的部分，即不相似样本的欧式距离的反值。如果损失值比较大，说明不相似样本的特征之间的欧式距离比较小。这样的组合损失正好能够符合我们的任务。

数据处理

我们使用Face Recognition Data - grimace (University of Essex, UK)数据库作为训练和测试数据。

read_img()函数用于读入每个人的图片数据。
get_paris()函数用于对读入的人脸进行配对，随机搭配为同一个人与不同的人。
create_generator()用于将输入的数据生成一个生成器，用于训练。
get_train_test()用于将数据打乱并按照3:1划分为训练集和测试集。

"""Data process.
Data process and generation.
"""

import os
import cv2
import numpy as np
from sklearn.model_selection import train_test_split


def read_img(path):
    """Read image
    This function read images from folders for different person.

    # Arguments
        path: String, path of database.
    # Returns
        res: List, images for different person.
    """
    res = []

    for (root, dirs, files) in os.walk(path):
        if files:
            tmp = []
            files = np.random.choice(files, 4)
            for f in files:
                img = os.path.join(root, f)
                image = cv2.imread(img)
                image = cv2.resize(image, (64, 64),
                                   interpolation=cv2.INTER_CUBIC)
                image = np.array(image, dtype='float32')
                image /= 255.
                tmp.append(image)

            res.append(tmp)

    return res


def get_paris(path):
    """Make pairs.
    This function make pairs for same person and different person.

    # Arguments
        path: String, path of database.
    # Returns
        sm1: List, first object in pairs.
        sm2: List, second object in pairs.
        y1: List, pairs mark (same: 0, different: 1).
    """
    sm1, sm2, df1, df2 = [], [], [], []
    res = read_img(path)

    persons = len(res)

    for i in range(persons):
        for j in range(i, persons):
            p1 = res[i]
            p2 = res[j]

            if i == j:
                for pi in p1:
                    for pj in p2:
                        sm1.append(pi)
                        sm2.append(pj)
            else:
                df1.append(p1[0])
                df2.append(p2[0])

    df1 = df1[:len(sm1)]
    df2 = df2[:len(sm2)]
    y1 = list(np.zeros(len(sm1)))
    y2 = list(np.ones(len(df1)))

    sm1.extend(df1)
    sm2.extend(df2)
    y1.extend(y2)

    return sm1, sm2, y1


def create_generator(x, y, batch):
    """Create data generator.
    This function is a data generator.

    # Arguments
        x: List, Input data.
        y: List, Data label.
        batch: Integer, batch size for data generator.
    # Returns
        [x1, x2]: List, pairs data with batch size.
        yb: List, Data label.
    """
    while True:
        index = np.random.choice(len(y), batch)
        x1, x2, yb = [], [], []
        for i in index:
            x1.append(x[i][0])
            x2.append(x[i][1])
            yb.append(y[i])
        x1 = np.array(x1)
        x2 = np.array(x2)

        yield [x1, x2], yb


def get_train_test(path):
    """Get train and test data
    This function split train and test data and shuffle it.

    # Arguments
        path: String, path of database.
    # Returns
        X_train: List, Input data for train.
        X_test: List, Data label for train.
        y_train: List, Input data for test.
        y_test: List, Data label for test.
    """
    im1, im2, y = get_paris(path)
    im = list(zip(im1, im2))

    X_train, X_test, y_train, y_test = train_test_split(
        im, y, test_size=0.33)

    return X_train, X_test, y_train, y_test

实验

运行下列命令来训练模型。
python train.py

运行下列命令来可视化实验。
python vis.py

因为数据集比较小并且姿态等比较单一，模型训练了50个epochs后其训练损失与评估损失基本接近平稳。

loss.png

从数据集中随机选择几个人，对每个人的20张照片进行特征提取，然后通过t-SNE将他们映射到2维空间上，结果如下图。每个颜色代表一个人，可以看出相同人的照片映射的特征明显聚集在一起，说明模型能够使同一个人的人脸特征尽可能的靠近。

tsne.png

使用不同于训练集的数据进行模型评估，我们使用图片0作为基准，图片1是是基准的另外一张照片，剩下的都是不同的人。

image.png

他们之间的欧式距离计算结果如下，可以看出不同人之间人脸的特征距离明显大于同一个人的人脸特征距离。

特征距离：
[0.05845242, 0.44077098, 0.1820661, 0.6669458, 0.090522714]

distance.png

从摄像头中识别指定人脸

程序主要有两个重要的部分：人脸的检测跟指定人脸的识别。

人脸检测

我们使用OpenCV内置的两种模型来进行人脸检测，分别是 haar cascade classifier 和SSD 300。通过构建检测器类时输入的type变量来指定使用哪一种检测器。根据测试SSD更为有效。

"""Face detection model.
"""

import cv2
import numpy as np


class FaceDetector:
    def __init__(self, type, threshold=0.5):
        """Init.
        """
        self.type = type
        self.t = threshold

        if type == 'harr':
            self.detector = self._create_harr_detector()
        elif type == 'ssd':
            self.detector = self._create_ssd_detector()
        else:
            raise 'You must select a FaceDetector type!'

    def _create_haar_detector(self):
        """Create haar cascade classifier.

        # Arguments
            path: String, path to xml data.

        # Returns
            face_cascade: haar cascade classifier.
        """
        path = 'data/haarcascades/haarcascade_frontalface_default.xml'
        face_cascade = cv2.CascadeClassifier(path)

        return face_cascade

    def _create_ssd_detector(self):
        """Create ssd face classifier.

        # Returns
            ssd: ssd 300 * 300 face classifier.
        """
        prototxt = 'data/ssd/deploy.prototxt.txt'
        model = 'data/ssd/ssd300.caffemodel'
        ssd = cv2.dnn.readNetFromCaffe(prototxt, model)

        return ssd

    def _ssd_box(self, detections, h, w):
        """Resize the detection boxes of ssd.

        # Arguments
            detections: String, path to xml data.
            h: Integer, original height of frame.
            w: Integer, original width of frame.

        # Returns
            rects: detection boxes.
        """
        rects = []

        for i in range(0, detections.shape[2]):
            confidence = detections[0, 0, i, 2]

            if confidence < self.t:
                continue

            box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
            (x1, y1, x2, y2) = box.astype("int")
            rects.append((x1, y1, x2 - x1, y2 - y1))

        return rects

    def detect(self, frame):
        """Detect face with haar cascade classifier.

        # Arguments
            frame: ndarray(n, n, 3), video frame.

        # Returns
            faces: List, faces rectangles in the frame.
        """
        pic = frame.copy()

        if self.type == 'harr':
            gray = cv2.cvtColor(pic, cv2.COLOR_BGR2GRAY)
            faces = self.detector.detectMultiScale(gray, 1.3, 5)
        if self.type == 'ssd':
            h, w = pic.shape[:2]
            blob = cv2.dnn.blobFromImage(
                cv2.resize(pic, (300, 300)), 1.0,
                (300, 300), (104.0, 177.0, 123.0))
            self.detector.setInput(blob)
            detections = self.detector.forward()
            faces = self._ssd_box(detections, h, w)

        return faces

人脸识别

下面是人脸识别的主程序。

首先对每一帧的图像进行人脸检测
如果已经载入特征值，就对检测到的人脸提取特征值，否则直接显示检测结果。
将提取的特征值与保存的特征值分别进行欧式距离的计算，提取出最小的一个值。
如果小于阈值，那就是我们要识别的人，否则就不是。
显示检测结果，使用不同的颜色标注检测到的人脸。

通过多次按space键进行人脸ID的录入，每次录入同一个人脸的不同姿态，最后并将其特征保存下来。

"""Face recognition of PC camera.
"""

import os
import cv2
import numpy as np
import utils.utils as u
from utils.window_manager import WindowManager
from utils.face_detector import FaceDetector


class Face:
    def __init__(self, threshold):
        """Init.

        # Arguments
            threshold: Float, threshold for specific face.
        """
        self._t = threshold
        self._key = self._load_key()
        self._key_cache = []
        self._model = u.get_feature_model()
        self._windowManager = WindowManager('Face', self.on_keypress)
        self._faceDetector = FaceDetector('ssd', 0.5)

    def run(self):
        """Run the main loop.
        """
        capture = cv2.VideoCapture(0)

        self._windowManager.create_window()
        while self._windowManager.is_window_created:

            success = capture.grab()
            _, frame = capture.retrieve()

            if frame is not None and success:
                faces = self._faceDetector.detect(frame)

                if self._key is not None and faces is not None:
                    label = self._compare_distance(frame, faces)
                    f = self._draw(frame, faces, label)
                else:
                    f = self._draw(frame, faces)

                self._windowManager.show(f)
            self._windowManager.process_events(frame, faces)

    def _load_key(self):
        """Load the key feature.
        """

        kpath = 'data/key.npy'

        if os.path.exists(kpath):
            key = np.load('data/key.npy')
        else:
            key = None

        return key

    def _get_feat(self, frame, face):
        """Get face feature from frame.

        # Arguments
            frame: ndarray, video frame.
            face: tuple, coordinates of face in the frame.

        # Returns
            feat: ndarray (128, ), face feature.
        """
        x, y, w, h = face
        img = frame[y: y + h, x: x + w, :]
        image = u.process_image(img)
        feat = self._model.predict(image)[0]

        return feat

    def _compare_distance(self, frame, faces):
        """Compare faces feature in the frame with key.

        # Arguments
            frame: ndarray, video frame.
            faces: List, coordinates of faces in the frame.

        # Returns
            label: list, if match the key.
        """
        label = []

        for (x, y, w, h) in faces:
            feat = self._get_feat(frame, (x, y, w, h))

            dist = []
            for k in self._key:
                dist.append(np.linalg.norm(k - feat))
            dist = min(dist)
            print(dist)
            if dist < self._t:
                label.append(1)
            else:
                label.append(0)
        print(label)
        return label

    def _draw(self, frame, faces, label=None):
        """Draw the rectangles in the frame.

        # Arguments
            frame: ndarray, video frame.
            faces: List, coordinates of faces in the frame.
            label: List, if match the key.

        # Returns
            f: ndarray, frame with rectangles.
        """
        f = frame.copy()
        color = [(0, 0, 255), (255, 0, 0)]
        if label is None:
            label = [0 for _ in range(len(faces))]

        for rect, i in zip(faces, label):
            (x, y, w, h) = rect
            f = cv2.rectangle(f, (x, y),
                              (x + w, y + h),
                              color[i], 2)

        return f

    def on_keypress(self, keycode, frame, faces):
        """Handle a keypress event.
        Press esc to  quit window.
        Press space 5 times to record different gestures of the face.

        # Arguments
            keycode: Integer, keypress event.
            frame: ndarray, video frame.
            faces: List, coordinates of faces in the frame.
        """
        if keycode == 32:  # space -> save face id.
            nums = len(self._key_cache)

            if nums < 5:
                feat = self._get_feat(frame, faces[0])
                self._key_cache.append(feat)
                print('Face id {0} recorded!'.format(nums + 1))
            else:
                np.save('data/key.npy', np.array(self._key_cache))
                print('All face ID recorded!')
                self._key = self._key_cache
                self._key_cache = []
        elif keycode == 27:  # escape -> quit
            self._windowManager.destroy_window()


if __name__ == '__main__':
    face = Face(0.3)
    face.run()

因为不想露脸所以没有效果图~