深度学习实战(五) 基于MTCNN和Facenet的视频流人脸识别实战

前言:


本文开发的是一个完整的视频流人脸识别系统,主要流程如下:

首先,通过openCV抓取摄像头的视频流

第二,通过MTCNN对每帧图片进行人脸检测和对齐,当然考虑到效率我们可以每n个间隔帧进行一次检测

第三,通过facenet预训练模型对第二步得到的人脸进行512维的特征值提取

第四,收集目标数据集来训练自己的分类模型

第五,将第三部得到的512维的特征值作为第四部的输入,然后输出即为我们类别值

准备工作:


安装openCV

pip3 install opencv-python

下载facenet,其中src/align下是MTCNN的tensorflow实现及预训练模型

git clone --recursive https://github.com/davidsandberg/facenet.git

LFW数据集下载地址:http://vis-www.cs.umass.edu/lfw/

通过以下命令对LFW数据进行人脸检测和对齐,这里我们获取160*160大小的图像以备后面使用,如果你有自己的数据集,可以忽略,先设置环境变量

export PYTHONPATH=/Users/admin/facenet/src
for N in {1..4}; do python3 /Users/admin/facenet/src/align/align_dataset_mtcnn.py /Users/admin/lfw /Users/admin/lfw_160 --image_size 160 --margin 32 --random_order --gpu_memory_fraction 0.25 & done

第一阶段


那么,既然要识别人脸,我们首先要做的就是训练一个二元分类模型,当我们输入人脸进去的时候,它能判断该人脸是不是我们的目标对象,那么我们需要一个已标记数据集,里面有几个目标对象的人脸或者其他人脸,目标人脸我通过以下程序从摄像头分别抓取了一对双胞胎各近300张人脸,这里抓取的时候已经做了人脸对齐处理,后面可以直接用来训练模型,你也可以从LFW数据集随机选取了一些人脸作为“other”类别或者从MS-Celeb-1M数据集选取一些类别作为训练数据集,当然你也可以选取其他数据集。

#  从视频流中采集目标对象的人脸,用于训练分类模型,这个模型用于主程序中摄像头实时视频中人脸识别出目标对象
#  每1帧采集一张人脸,一共采集100个目标对象的样本,使用mtcnn对采集帧进行人脸检测和对齐

import cv2
import sys
import os
import tensorflow as tf
import numpy as np
import align.detect_face
import facenet

video_capture = cv2.VideoCapture(0)
capture_interval = 1
capture_num = 100
capture_count = 0
frame_count = 0
detect_multiple_faces = False #因为是训练目标对象,一次只有一张人脸

#这里引用facenet/src/align/align_dataset_mtcnn.py文件的代码对采集帧进行人脸检测和对齐
minsize = 20 # minimum size of face
threshold = [ 0.6, 0.7, 0.7 ]  # three steps's threshold
factor = 0.709 # scale factor
        
with tf.Graph().as_default():
    gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)
    sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options, log_device_placement=False))
    with sess.as_default():
        pnet, rnet, onet = align.detect_face.create_mtcnn(sess, None)

while True:

    ret, frame = video_capture.read()
    
    #每1帧采集一张人脸,这里采样不进行灰度变换,直接保存彩色图
    if(capture_count%capture_interval == 0): 
        
        bounding_boxes, _ = align.detect_face.detect_face(frame, minsize, pnet, rnet, onet, threshold, factor)
        nrof_faces = bounding_boxes.shape[0]
                          
        for face_position in bounding_boxes: #因为只采集一张人脸,所以实际只遍历一次
            face_position=face_position.astype(int)
            cropped = frame[face_position[1]:face_position[3],face_position[0]:face_position[2],:]
            scaled = cv2.resize(cropped, (160, 160), interpolation=cv2.INTER_CUBIC )  #这里取和负样本一样大小
            cv2.imwrite('/Users/admin/Desktop/abby'+str(frame_count) + '.jpg', scaled)
                                
        frame_count += 1
          
    capture_count += 1
   
    if frame_count >= capture_num:
        break

video_capture.release()
cv2.destroyAllWindows()
print('采集完成')

注意我们这里从摄像头采集数据集的时候不做任何处理,以便和其他数据集统一,后面进行训练的时候再来统一灰度处理

训练第四步的分类模型,注意这里我们训练模型对象的输入和我们最后使用这个模型时的输入要一致,要么都是灰度图像,要么都是rgb或者其他处理的图像:

import tensorflow as tf
import numpy as np
import cv2
import facenet
import os
from os.path import join 
import sys
import pickle
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn import metrics 
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier 

%matplotlib inline
        
def load_data(image_paths, image_size):
    nrof_samples = len(image_paths)
    images = []
    for i in range(nrof_samples):
        img = cv2.imread(image_paths[i])
        #print(image_paths[i])
        #print(img.shape)
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        if gray.ndim == 2:
            img = facenet.to_rgb(gray)
        images.append(img)
    return images

# 训练数据存放于train_data目录下 格式如下
#-->train_data:
#     --->Abby:
#            Abby1.jpg
#            Abby2.jpg
#            ...
#     --->John:
#           John1.jpg
#           John2.jpg
#           ...
data_dir = './train_data'
image_size = 160

with tf.Graph().as_default():
      
    with tf.Session() as sess:
            
        np.random.seed(seed = 42)
        dataset = facenet.get_dataset(data_dir)
        
        paths, labels = facenet.get_image_paths_and_labels(dataset)
        print('Number of classes: %d' % len(dataset))
        print('Number of images: %d' % len(paths))
           
        # 加载模型,模型位于models目录下
        print('Loading feature extraction model')
        facenet.load_model('models')
            
        # 获取输入和输出 tensors
        images_placeholder = tf.get_default_graph().get_tensor_by_name("input:0")
        embeddings = tf.get_default_graph().get_tensor_by_name("embeddings:0")
        phase_train_placeholder = tf.get_default_graph().get_tensor_by_name("phase_train:0")
        embedding_size = embeddings.get_shape()[1]
        
        images = load_data(paths, image_size)
        #plt.imshow(images[10])
        
        feed_dict = {images_placeholder:images, phase_train_placeholder:False }
        emb_array = sess.run(embeddings, feed_dict=feed_dict)
        print('emb_array.shape:')
        print(emb_array.shape)
        
        X_train, X_test, y_train, y_test = train_test_split(emb_array, labels, test_size=.3, random_state=42)
                      
        classifier_filename_exp = os.path.expanduser('new_models.pkl')

        # Train classifier
        print('Training classifier')
        #model = KNeighborsClassifier() # accuracy: 77.70%
        #model = SVC(kernel='linear', probability=True)
        #model = SVC(kernel='poly',degree=2,gamma=1,coef0=0,probability=True) # accuracy: 77.03%
        model = SVC(kernel='poly',degree=10,gamma=1,coef0=0,probability=True) #accuracy: 87.16%
        
        model.fit(X_train, y_train)
            
        # Create a list of class names
        class_names = [ cls.name.replace('_', ' ') for cls in dataset]
        print(class_names)
        
        # Saving classifier model
        with open(classifier_filename_exp, 'wb') as outfile:
            pickle.dump((model, class_names), outfile)
        print('Saved classifier model to file "%s"' % classifier_filename_exp)
        
        # 验证
        with open(classifier_filename_exp, 'rb') as infile:
            (model, class_names) = pickle.load(infile)
        predict = model.predict(X_test) 
        accuracy = metrics.accuracy_score(y_test, predict)  
        print ('accuracy: %.2f%%' % (100 * accuracy)  )

打印信息

Number of classes: 2
Number of images: 493
Loading feature extraction model
Model directory: models
Metagraph file: model-20180402-114759.meta
Checkpoint file: model-20180402-114759.ckpt-275
INFO:tensorflow:Restoring parameters from models/model-20180402-114759.ckpt-275
emb_array.shape:
(493, 512)
Training classifier
['lijun', 'wenjun']
Saved classifier model to file "new_models.pkl"
accuracy: 87.16%

 这里是用的twins人脸分类,非twins的时候准确率可以达到99%

继续运行

# 我们的预训练模型的输出是一个493行,512列的tensor,
# 493是我们的数据集数目,512是每个图片最后得出的512个特征值,
# 作为我们需要训练的二元分类模型的输入,这里我们使用的是SVC来训练我们的模型
emb_array.shape

打印信息

(493, 512)

第二阶段


下面我们开始完整的视频流实时分类系统,每帧图像的人脸会被检测出来,并标出5个特征点,以及它所属的类别,代码及详细注释如下

import cv2
import sys
import os
import facenet
import tensorflow as tf
import numpy as np
import align.detect_face
import pickle
from sklearn.svm import SVC
import matplotlib.pyplot as plt
%matplotlib inline

minsize = 20 # minimum size of face
threshold = [ 0.6, 0.7, 0.7 ]  # three steps's threshold
factor = 0.709 # scale factor
image_size = 160
        
with tf.Graph().as_default():
    gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.9)
    sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options, log_device_placement=False))
    with sess.as_default():
        # 第一个预训练模型 ---> mtcnn ----> 人脸检测
        pnet, rnet, onet = align.detect_face.create_mtcnn(sess, None)  #第二个参数存放模型所在目录  
        
        # 第二个预训练模型 ---> facenet ----> 人脸识别,主要是输出512维的特征值,作为第三个模型的输入
        # 加载模型,模型位于models目录下
        print('Loading feature extraction model')
        facenet.load_model('models')
            
        # 获取输入和输出 tensors
        images_placeholder = tf.get_default_graph().get_tensor_by_name("input:0")
        embeddings = tf.get_default_graph().get_tensor_by_name("embeddings:0")
        phase_train_placeholder = tf.get_default_graph().get_tensor_by_name("phase_train:0")
        embedding_size = embeddings.get_shape()[1]

        # 第三个预训练模型 ---> facenet ----> 人脸识别分类
        classifier_filename_exp = os.path.expanduser('new_models.pkl')
        with open(classifier_filename_exp, 'rb') as infile:
            (model, class_names) = pickle.load(infile)
            print(class_names)
            
        print('Loaded classifier model from file "%s"' % classifier_filename_exp)

        video_capture = cv2.VideoCapture(0)
        capture_interval = 5
        capture_count = 0
        frame_count = 0 

        while True:

            ret, frame = video_capture.read()
    
            #每3帧采集一张人脸
            if(capture_count%capture_interval == 0): 
                
                gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
                if gray.ndim == 2:
                    gray = facenet.to_rgb(gray)
                
                # 检测出人脸框和5个特征点
                bounding_boxes, points = align.detect_face.detect_face(gray, minsize, pnet, rnet, onet, threshold, factor)
                nrof_faces = bounding_boxes.shape[0]
                
                for face_position in bounding_boxes: 
                    face_position = face_position.astype(int)

                    #裁剪出人脸区域作为第二个模型的输入
                    cropped = gray[face_position[1]:face_position[3],face_position[0]:face_position[2],:]
                    #丢弃
                    if cropped.shape[0] == 0 or cropped.shape[1] == 0:
                        continue
                    
                    scaled = cv2.resize(cropped, (image_size, image_size), interpolation=cv2.INTER_CUBIC )
                    plt.imshow(scaled)
                    scaled = scaled.reshape(-1,image_size,image_size,3)
                    
                    feed_dict = {images_placeholder:scaled, phase_train_placeholder:False }
                    emb_array = sess.run(embeddings, feed_dict=feed_dict)
        
                    predictions = model.predict_proba(emb_array)
                    print(predictions) 
                    predict = model.predict(emb_array) 
                    print(predict) 
                    
                    #画人脸矩形框并标示类别
                    cv2.rectangle(frame, (face_position[0], 
                                  face_position[1]), 
                                  (face_position[2], face_position[3]), 
                                  (255, 255, 0), 2)
                    cv2.putText(frame,class_names[predict[0]], (face_position[0],face_position[1]), 
                                cv2.FONT_HERSHEY_COMPLEX_SMALL, 2, (255, 0 ,0), 
                                thickness = 2, lineType = 2)       
                          
                frame_count += 1
                
                #画出特征点
                #print(points)
                #print(points.shape)
                if points.shape[0] != 0:
                    for i in range(points.shape[1]):
                        count = points.shape[0]/2
                        count = int(count)
                        for j in range(count):
                            cv2.circle(frame, (points[j][i], points[j+count][i]), 3, (255,255,0),-1)
                
            capture_count += 1
            cv2.imshow('Video', frame)
    
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

    video_capture.release()
    cv2.destroyAllWindows()

运行结果:

Loading feature extraction model
Model directory: models
Metagraph file: model-20180402-114759.meta
Checkpoint file: model-20180402-114759.ckpt-275
INFO:tensorflow:Restoring parameters from models/model-20180402-114759.ckpt-275
['lijun', 'other']
Loaded classifier model from file "new_models.pkl"
[[0.99591485 0.00408515]]
[0]
[[9.99134481e-01 8.65518748e-04]]
[0]
[[0.99891941 0.00108059]]
[0]
[[0.99896484 0.00103516]]
.....

Over!!!

GitHub地址:https://github.com/junjun870325/Video-stream-face-recognition

你可能感兴趣的:(深度学习实战(五) 基于MTCNN和Facenet的视频流人脸识别实战)