论文地址为:https://arxiv.org/abs/1511.02683
代码地址:
https://github.com/AlfredXiangWu/face_verification_experiment
正如前面《人脸验证:DeepID》博客所说,人脸验证任务中需要关心两个问题:一个是人脸特征提取,另一个就是如何判断是不是同一个人。特征提取的方法有LBP等传统方法,也有DeepID这样的深度学习方法。判断是不是同一个人的方法简单的有余弦相似度,复杂的有Joint Bayesian。本文主要的内容集中于人脸特征提取,就是使用Lighten CNN提取特征。
为了得到更好的准确度,深度学习的方法都趋向更深的网络和多个模型ensemble,这样导致模型很大,计算时间长。本文提出一种轻型的CNN,在取得比较好的效果同时,网络结构简化,时间和空间都得到了优化,可以跑在嵌入式设备和移动设备上。
优势在于一个很小的模型和一个非常不错的识别率。主要原因在于,
(1)作者使用maxout作为激活函数,实现了对噪声的过滤和对有用信号的保留,从而产生更好的特征图MFM(Max-Feature-Map)。这个思想非常不错,本人将此思想用在center_loss中,实现了大概0.5%的性能提升,同时,这个maxout也就是所谓的slice+eltwise,这2个层的好处就是,一,不会产生训练的参数,二,基本很少耗时,给人的感觉就是不做白不做,性能还有提升。
(2)作者使用了NIN(Network inNetwork)来减少参数,并提升效果,作者提供的A模型是没有NIN操作的,B模型是有NIN操作的,2个模型的训练数据集都是CASIA,但是性能有0.5%的提升,当然代价是会有额外参数的产生。但是相比其他网络结构,使用NIN还是会使模型小不少,作者论文中的网络结构和B,C模型相对应。
用CNN进行人脸验证分为三种。一种是使用人脸分类的任务训练CNN提取特征,然后用分类器判断是不是同一个人。第二种是直接优化验证损失。第三种是将人脸识别和验证任务同时进行。本文框架是属于第一种。
MFM:就是比较两个特征图各位置的大小,取对应位置大的值。 使用 caffe 的 Eltwise 层。MFM激活函数相比于Relu的优点,主要是它可以学习紧凑的特征而不是Relu那样稀疏高维的。
本文网络结构如上图所示,和DeepID一样,在训练时使用人脸分类的任务进行训练,最后得到256维的人脸特征。具体而言,网络结构如下,文章提出了两种结构,网络的主要结构是一样的,文章更多是集中在了第一种结构。
网络最后一层是Sofmax层,实现分类的目的,fc1的结果就是人脸的特征。
本文使用了一种称为MFM的激活函数,这个结构也很简单。在输入的卷积层中,选择两层,取相同位置较大的值。
写成公式:
输入的卷积层为2n层,取第k层和第k+n层中较大的值作为输出,MFM输出就变成了n层。激活函数的梯度为
这样激活层有一半的梯度为0,MFM可以得到稀疏的梯度。MFM激活函数相比于ReLU函数,ReLU函数得到的特征是稀疏高维的,MFM可以得到紧实(compact)的特征,还能实现特征选择和降维的效果。
本文使用的数据集是CASIA-WebFace,有10575个人的493456张照片。训练使用了Caffe。输入图片为144*144的黑白图片,随机裁剪成128*128的大小。全连接层Dropout设置为0.7。不同层SGD的参数也不一样,前面除了fc2层,momentum设为0.9,weight decay为5e-4,fc2层为了防止过拟合,weight decay为5e-3。learning rate从1e-3降到5e-5。最终在GTX980上训练了两周。
获取特征后,作者简单使用cosine similarity进行人脸验证。在LFW上,model A正确率为97.77%,model B为98.13%。可以看出这个结果是可以接受的。文章的模型A为26M,在i7-4790上测试一张图片的时间为71ms,我在骁龙808上测试是0.8s。
本文网络属于轻量级结构,模型相对较小,前向计算速度快,能够在嵌入式设备上使用。虽然精度没有达到最高,但是属于可以接受的范围。
name: "DeepFace_set003_net"
layer {
name: "data"
type:"ImageData"
top: "data"
top: "label"
image_data_param{
source: "/home/himon/code/caffe-master/lightCNNFace/train.txt"
batch_size: 20
shuffle: true
}
transform_param {
scale: 0.00390625
crop_size: 128
mirror: true
}
include: { phase: TRAIN }
}
layer {
name: "data"
type: "ImageData"
top: "data"
top: "label"
image_data_param{
source: "/home/himon/code/caffe-master/lightCNNFace/val.txt"
batch_size: 20
}
transform_param {
scale: 0.00390625
crop_size: 128
mirror: false
}
include: { phase: TEST }
}
layer{
name: "conv1"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 5
stride: 1
pad: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "data"
top: "conv1"
}
layer{
name: "slice1"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv1"
top: "slice1_1"
top: "slice1_2"
}
layer{
name: "etlwise1"
type: "Eltwise"
bottom: "slice1_1"
bottom: "slice1_2"
top: "eltwise1"
eltwise_param {
operation: MAX
}
}
layer{
name: "pool1"
type: "Pooling"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
bottom: "eltwise1"
top: "pool1"
}
layer{
name: "conv2a"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "pool1"
top: "conv2a"
}
layer{
name: "slice2a"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv2a"
top: "slice2a_1"
top: "slice2a_2"
}
layer{
name: "etlwise2a"
type: "Eltwise"
bottom: "slice2a_1"
bottom: "slice2a_2"
top: "eltwise2a"
eltwise_param {
operation: MAX
}
}
layer{
name: "conv2"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 192
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "eltwise2a"
top: "conv2"
}
layer{
name: "slice2"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv2"
top: "slice2_1"
top: "slice2_2"
}
layer{
name: "etlwise2"
type: "Eltwise"
bottom: "slice2_1"
bottom: "slice2_2"
top: "eltwise2"
eltwise_param {
operation: MAX
}
}
layer{
name: "pool2"
type: "Pooling"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
bottom: "eltwise2"
top: "pool2"
}
layer{
name: "conv3a"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 192
kernel_size: 1
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "pool2"
top: "conv3a"
}
layer{
name: "slice3a"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv3a"
top: "slice3a_1"
top: "slice3a_2"
}
layer{
name: "etlwise3a"
type: "Eltwise"
bottom: "slice3a_1"
bottom: "slice3a_2"
top: "eltwise3a"
eltwise_param {
operation: MAX
}
}
layer{
name: "conv3"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "eltwise3a"
top: "conv3"
}
layer{
name: "slice3"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv3"
top: "slice3_1"
top: "slice3_2"
}
layer{
name: "etlwise3"
type: "Eltwise"
bottom: "slice3_1"
bottom: "slice3_2"
top: "eltwise3"
eltwise_param {
operation: MAX
}
}
layer{
name: "pool3"
type: "Pooling"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
bottom: "eltwise3"
top: "pool3"
}
layer{
name: "conv4a"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param{
num_output: 384
kernel_size: 1
stride: 1
weight_filler{
type:"xavier"
}
bias_filler{
type: "constant"
value: 0.1
}
}
bottom: "pool3"
top: "conv4a"
}
layer{
name: "slice4a"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv4a"
top: "slice4a_1"
top: "slice4a_2"
}
layer{
name: "etlwise4a"
type: "Eltwise"
bottom: "slice4a_1"
bottom: "slice4a_2"
top: "eltwise4a"
eltwise_param {
operation: MAX
}
}
layer{
name: "conv4"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param{
num_output: 256
kernel_size: 3
stride: 1
pad: 1
weight_filler{
type:"xavier"
}
bias_filler{
type: "constant"
value: 0.1
}
}
bottom: "eltwise4a"
top: "conv4"
}
layer{
name: "slice4"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv4"
top: "slice4_1"
top: "slice4_2"
}
layer{
name: "etlwise4"
type: "Eltwise"
bottom: "slice4_1"
bottom: "slice4_2"
top: "eltwise4"
eltwise_param {
operation: MAX
}
}
layer{
name: "conv5a"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param{
num_output: 256
kernel_size: 1
stride: 1
weight_filler{
type:"xavier"
}
bias_filler{
type: "constant"
value: 0.1
}
}
bottom: "eltwise4"
top: "conv5a"
}
layer{
name: "slice5a"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv5a"
top: "slice5a_1"
top: "slice5a_2"
}
layer{
name: "etlwise5a"
type: "Eltwise"
bottom: "slice5a_1"
bottom: "slice5a_2"
top: "eltwise5a"
eltwise_param {
operation: MAX
}
}
layer{
name: "conv5"
type: "Convolution"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param{
num_output: 256
kernel_size: 3
stride: 1
pad: 1
weight_filler{
type:"xavier"
}
bias_filler{
type: "constant"
value: 0.1
}
}
bottom: "eltwise5a"
top: "conv5"
}
layer{
name: "slice5"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "conv5"
top: "slice5_1"
top: "slice5_2"
}
layer{
name: "etlwise5"
type: "Eltwise"
bottom: "slice5_1"
bottom: "slice5_2"
top: "eltwise5"
eltwise_param {
operation: MAX
}
}
layer{
name: "pool4"
type: "Pooling"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
bottom: "eltwise5"
top: "pool4"
}
layer{
name: "fc1"
type: "InnerProduct"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 512
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "pool4"
top: "fc1"
}
layer{
name: "slice_fc1"
type:"Slice"
slice_param {
slice_dim: 1
}
bottom: "fc1"
top: "slice_fc1_1"
top: "slice_fc1_2"
}
layer{
name: "etlwise_fc1"
type: "Eltwise"
bottom: "slice_fc1_1"
bottom: "slice_fc1_2"
top: "eltwise_fc1"
eltwise_param {
operation: MAX
}
}
layer{
name: "drop1"
type: "Dropout"
dropout_param{
dropout_ratio: 0.7
}
bottom: "eltwise_fc1"
top: "eltwise_fc1"
}
layer{
name: "fnc2"
type: "InnerProduct"
inner_product_param{
num_output: 50
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.1
}
}
bottom: "eltwise_fc1"
top: "fnc2"
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fnc2"
bottom: "label"
top: "accuracy"
include: { phase: TEST }
}
layer {
name: "softmaxloss"
type: "SoftmaxWithLoss"
bottom: "fnc2"
bottom: "label"
top: "loss"
}