转载:https://zhuanlan.zhihu.com/p/31761796
MTCNN是多任务级联CNN的人脸检测深度学习模型,该模型中综合考虑了人脸边框回归和面部关键点检测。该级联的CNN网络结构包括PNet,RNet,ONet。本文主要介绍人脸检测中常用的数据处理方法,包括Bounding Box绘制,IOU计算,滑动窗口生成,回归框偏移值计算,面部轮廓关键点以及面部轮廓关键点回归。
PNet(Propose NetWork) 用来获取面部窗口和相应的候选框的回归向量。然后采用非极大值抑制方法non-maximum suppression(NMs)对生成的面部候选框进行合并操作。PNet网络结构如下:
RNet(Refine Network)获取PNet检测出来的人脸候选框,进行网络训练,进一步矫正人脸候选框的回归向量,并同样对候选框执行非极大值抑制。RNet网络结构如下:
ONet(Output Newwork)主要生成最终的人脸回归框和面部轮廓关键点。ONet的网络结构如下所示:
本文主要介绍回归框的PNet/RNet/ONet数据预处理部分的相关算法。由于PNet的网络输入是12*12的图片,所以要对训练图片进行预处理。本文采用的数据集为WIDER FACE数据集,可以从如下地址下载:
http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/
该数据集有32,203图片并且有93,703张脸被标记,如下图所示:
首先读取标度的人脸图片数据,如下所示:
0--Parade/0_Parade_marchingband_1_849 448.51 329.63 570.09 478.23
可以用PIL把图片中人脸的bounding box绘制出来,示例代码如下:
from PIL import Image, ImageDraw
from pylab import *
#annotation
#0--Parade/0_Parade_marchingband_1_849 448.51 329.63 570.09 478.23
annotation=[448.51,329.63,570.09,478.23]
im = Image.open("0_Parade_marchingband_1_849.jpg")
draw = ImageDraw.Draw(im)
line = 5
x, y = annotation[0], annotation[1]
width= annotation[2]-annotation[0]
height = annotation[3]-annotation[1]
for i in range(1, line + 1):
draw.rectangle((x + (line - i), y + (line - i), x + width + i, y + height + i), outline='red')
# imshow(im)
# show()
im.save("out.jpeg")
可以通过滑动窗口或者随机采样的方法获取训练数据,训练数据分为三种正样本,负样本,中间样本。其中正阳本是生成的滑动窗口和Guarant True Box的IOU大于0.65,负样本是IOU小于0.3,中间样本是IOU大于0.4小于0.65。
IOU(Intersection-Over-Union)的计算流程如下:
蓝色的框为生成的滑动窗口,红色的框为Guarant Box,其中(x, y)表示回归框的顶点坐标。IOU为两个框相交的面积除以两个框的总面积,如果IOU越大表示生成的滑动窗口和真实的窗口越接近。这样IOU的计算公式可以表示为:
IOU=((x2-gx1)*(y2-gy1))/((x2-x1)*(y2-y1)+(gx2-gx1)*(gy2-gy1))
IOU的计算代码如下所示:
def IoU(box, boxes):
box_area = (box[2] - box[0] + 1) * (box[3] - box[1] + 1)
area = (boxes[:, 2] - boxes[:, 0] + 1) * (boxes[:, 3] - boxes[:, 1] + 1)
xx1 = np.maximum(box[0], boxes[:, 0])
yy1 = np.maximum(box[1], boxes[:, 1])
xx2 = np.minimum(box[2], boxes[:, 2])
yy2 = np.minimum(box[3], boxes[:, 3])
# compute the width and height of the bounding box
w = np.maximum(0, xx2 - xx1 + 1)
h = np.maximum(0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (box_area + area - inter)
return ovr
其中box为滑动窗口,boxes为多个guarant box(一个图片可以有多个人脸,所以会有多个guarant box)。
那么怎么生成滑动窗口的图片呢?比较简单的方式是采用随机方法,如下所示:
import cv2
import numpy as np
import numpy.random as npr
from PIL import Image, ImageDraw
from pylab import *
img = cv2.imread("0_Parade_marchingband_1_849.jpg")
height, width, channel = img.shape
size = npr.randint(12, min(width, height) / 2)
#top_left
nx = npr.randint(0, width - size)
ny = npr.randint(0, height - size)
#random crop
crop_box = np.array([nx, ny, nx + size, ny + size])
cropped_im = img[ny : ny + size, nx : nx + size, :]
resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR)
cv2.imwrite("resize.jpeg", resized_im)
imshow(resized_im)
show()
生成12*12的滑动窗口图片示例如下所示,是一个负样本:
考虑到直接采用坐标信息进行回归框的预测,网络收敛比较慢。所以在回归框预测的时候一般采用回归框的坐标偏移进行预测,相当于归一化的一种方式。回归框的坐标偏移如下图所示:
生成滑动窗口后,对应滑动窗口和Guarant True Box的偏移值,即可算出来,如下所示:
offset_x1 = (gx1 - x1) / float(x2-x1)
offset_y1 = (gy1 - y1) / float(y2-y1)
offset_x2 = (gx2 - x2) / float(x2-x1)
offset_y2 = (gy2 - y2) / float(y2-y1)
这样生成滑动窗口的时候,对正样本及中间样本,同时保存相应的offset值,如下所示:
positive/0.jpg 1 0.02 -0.01 -0.20 -0.06
positive/1.jpg 1 0.08 0.04 -0.18 -0.06
positive/2.jpg 1 0.16 0.10 -0.03 0.09
positive/3.jpg 1 0.00 -0.04 0.08 0.28
positive/4.jpg 1 0.08 0.03 -0.12 0.01
生成Offset示例代码如下所示,主要是随机生成一个(x,y)坐标的偏移值:
import cv2
import numpy as np
import numpy.random as npr
from PIL import Image, ImageDraw
from pylab import *
img = cv2.imread("0_Parade_marchingband_1_849.jpg")
height, width, channel = img.shape
annotation=[448.51,329.63,570.09,478.23]
x1, y1, x2, y2 = annotation
w = x2 - x1 + 1
h = y2 - y1 + 1
size = npr.randint(int(min(w, h) * 0.8), np.ceil(1.25 * max(w, h)))
delta_x = npr.randint(-w * 0.2, w * 0.2)
delta_y = npr.randint(-h * 0.2, h * 0.2)
nx1 = int(max(x1 + w / 2 + delta_x - size / 2, 0))
ny1 = int(max(y1 + h / 2 + delta_y - size / 2, 0))
nx2 = nx1 + size
ny2 = ny1 + size
crop_box = np.array([nx1, ny1, nx2, ny2])
offset_x1 = (x1 - nx1) / float(size)
offset_y1 = (y1 - ny1) / float(size)
offset_x2 = (x2 - nx2) / float(size)
offset_y2 = (y2 - ny2) / float(size)
cropped_im = img[ny1 : ny2, nx1 : nx2, :]
resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR)
imshow(resized_im)
show()
在人脸检测的时候,除了用到回归框的信息外,还会利用到人脸面膜轮廓的关键点信息,主要包括眼睛,嘴,鼻子的坐标。该训练数据可从如下地址下载到:
http://mmlab.ie.cuhk.edu.hk/archive/CNN_FacePoint.htm
上述的数据集包括LFW中的5,590张图片和7,876张从网上下载下来的图片,坐标信息包括:左眼,右眼,鼻子,左嘴,右嘴。例如下面坐标对应图片为:(x1,x2,y1,y2)
lfw_5590/Aaron_Eckhart_0001.jpg 84 161 92 169 106.250000 107.750000 146.750000 112.250000 125.250000 142.750000 105.250000 157.750000 139.750000 161.750000
Bounding Box和面部landmark生成代码如下所示:
from PIL import Image, ImageDraw
from pylab import *
# 84 161 92 169 106.250000 107.750000 146.750000 112.250000 125.250000 142.750000 105.250000 157.750000 139.750000 161.750000
annotation = [84, 92, 161, 169] # left, top, right, bottom
landmarkX = [106.250000, 146.750000, 125.250000, 105.250000, 139.750000]
landmarkY = [107.750000, 112.250000, 142.750000, 157.750000, 161.750000]
im = Image.open("Aaron_Eckhart_0001.jpg")
draw = ImageDraw.Draw(im)
line = 5
x, y = annotation[0], annotation[1]
width = annotation[2] - annotation[0]
height = annotation[3] - annotation[1]
for i in range(1, line + 1):
draw.rectangle((x + (line - i), y + (line - i), x + width + i, y + height + i), outline='red')
imshow(im)
# 绘制点
plot(landmarkX, landmarkY, "g*")
show()
人脸的面部轮廓关键点不采用绝对坐标,同样使用的是回归值,不过该回归值对应的是Bounding Box的相对坐标,如下所示:
相应的计算公式如下所示:
offsetX=(lx-x)/bbox_width
offsetY=(ly-y)/bbox_height
对所有的landmark点计算offset后生成如下数据:
train_PNet_landmark/0.jpg -2 0.288961038961 0.204545454545 0.814935064935 0.262987012987 0.535714285714 0.659090909091 0.275974025974 0.853896103896 0.724025974026 0.905844155844
train_PNet_landmark/1.jpg -2 0.42816091954 0.215517241379 0.89367816092 0.26724137931 0.646551724138 0.617816091954 0.416666666667 0.790229885057 0.813218390805 0.836206896552
train_PNet_landmark/2.jpg -2 0.153125 0.271875 0.659375 0.328125 0.390625 0.709375 0.140625 0.896875 0.571875 0.946875
train_PNet_landmark/3.jpg -2 0.174327367914 0.242510936232 0.673748423293 0.342669482766 0.372792971258 0.69904560555 0.10740259497 0.864043175755 0.532653771385 0.95143882472
那么如何生成训练数据的面部轮廓关键点呢?生成方法类似于回归框的方式,在guarand true landmark点上加上一个随机偏移量,然后再计算offset值。示例code如下:
import cv2
import numpy as np
import numpy.random as npr
from PIL import Image, ImageDraw
from pylab import *
img = cv2.imread("Aaron_Eckhart_0001.jpg")
img_h,img_w,img_c = img.shape
annotation = [84, 92, 161, 169]
landmarkX = [106.250000, 146.750000, 125.250000, 105.250000, 139.750000]
landmarkY = [107.750000, 112.250000, 142.750000, 157.750000, 161.750000]
landmarkGt = np.zeros((5, 2))
for i in range(5):
landmarkGt[i]=(landmarkX[i],landmarkY[i])
x1, y1, x2, y2 = annotation
gt_w = x2 - x1 + 1
gt_h = y2 - y1 + 1
bbox_size = npr.randint(int(min(gt_w, gt_h) * 0.8), np.ceil(1.25 * max(gt_w, gt_h)))
delta_x = npr.randint(-gt_w * 0.2, gt_w * 0.2)
delta_y = npr.randint(-gt_h * 0.2, gt_h * 0.2)
nx1 = int(max(x1+gt_w/2-bbox_size/2+delta_x,0))
ny1 = int(max(y1+gt_h/2-bbox_size/2+delta_y,0))
nx2 = int(nx1 + bbox_size)
ny2 = int(ny1 + bbox_size)
crop_box = np.array([nx1,ny1,nx2,ny2])
cropped_im = img[ny1:ny2+1,nx1:nx2+1,:]
resized_im = cv2.resize(cropped_im, (12, 12))
landmark = np.zeros((5, 2))
for index, one in enumerate(landmarkGt):
rv = ((one[0]-nx1)/bbox_size, (one[1]-ny1)/bbox_size)
landmark[index] = rv
print(landmark)
imshow(resized_im)
show()
如果面部关键点的训练数据比较少,可以通过数据预处理的方式(比如回归框图像镜像,翻转等操作)增加landmark训练数据。