MTCNN是当前效果最好的开源人脸检测算法之一,作者只提供了训练好的模型以及matlab部署代码,其训练和优化却没有放出来,引发了很多好事者复现
如果只是要部署的话可以使用MTCNN,其提供了部署全平台实现,包括C++、python、ncnn和tensorflow,还有加速版本和opencv直接加载版本,是所有版本中的集大成者
如果想了解算法原理,可以参考
MTCNN_Step_by_Step
本文的训练流程参考至
MTCNN_Training_Step_by_Step
提供了主流的pytorch实现
作为深度学习入门者必备的参考之一,MTCNN已经被工业界改造用以人脸识别和人脸表情分析的前置模块,本文的主要目的是展示如何训练人脸检测任务.
MTCNN混合了三类任务: 人脸/非人脸分类、人脸框回归和人脸关键点定位. WIDER FACE (左图) 用来进行人脸分类和框回归. CNN_FacePoint (右图) 用来进行关键点定位.
由于不同网络的任务不同,我们需要分别为他们准备数据,这里的策略是准备图像切片(或者是随机切取或者是用训的网络切), 需要准备如下数据:
人脸和非人脸样本用来进行人脸分类,人脸和部分人脸样本用来进行边框回归,关键点样本用于定位.所有数据的比例大致是3(非人脸):1(人脸):1(部分人脸):2(关键点样本)
整体的处理流程如下图所示,标注文件里列出了图片路径、标签、偏移和关键点偏移
PNet: 从widerface里随机切块以收集人脸、非人脸和部分人脸样本
RNet: 使用PNet在widerface上生成候选框
ONet: 和RNet差不多,但是多了关键点数据
损失函数
附其它框架实现:
https://github.com/luoyetx/Joint-Face-Detection-and-Alignment:代码规范,细节详尽
MTCNN配置及训练详细步骤
https://github.com/Aliang-SEU/MTCNN-on-caffe-window:Windows端caffe实现,采用自定义数据层实现
https://github.com/BobLiu20/mtcnn_tf:训练的tensorflow实现,是https://github.com/AITTSMD/MTCNN-Tensorflow的优化实现,去除了里面的冗余文件
https://github.com/wangbm/MTCNN-Tensorflow:
https://github.com/blaueck/tf-mtcnn:把三个网络融合到一个pb里
数据集:
wider
[2016-11-30 21:20:48,380][INFO] total images, train: 12797, val: 3196
[2016-11-30 21:20:48,382][INFO] total faces, train: 97311, val: 24101
[2016-11-30 21:22:56,825][INFO] writes 216096 positives, 405760 negatives, 238143 part
celeba
[2016-11-30 21:23:08,400][INFO] total images, train: 162079, val: 40520
[2016-11-30 21:23:08,400][INFO] writing train data, 162079 images
[2016-11-30 21:31:17,145][INFO] writes 405000 landmark faces
网络:
pnet
耗时20分钟
test_iter: 1557
test_interval: 6278
base_lr: 0.05
display: 500
max_iter: 125560
lr_policy: "step"
gamma: 0.1
momentum: 0.9
weight_decay: 0.0005
stepsize: 31390
snapshot: 6278
Namespace(epoch=20, gpu=0, lr=0.05, lrp=5, lrw=0.1, net='p', size=128, snapshot=None)
I1130 21:31:22.283428 11185 solver.cpp:228] Iteration 0, loss = 0.593411
I1130 21:31:22.283479 11185 solver.cpp:244] Train net output #0: bbox_reg_loss = 0.0405373 (* 0.5 = 0.0202686 loss)
I1130 21:31:22.283486 11185 solver.cpp:244] Train net output #1: face_cls_loss = 0.366912 (* 1 = 0.366912 loss)
I1130 21:31:22.283490 11185 solver.cpp:244] Train net output #2: face_cls_neg_acc = 0.921875
I1130 21:31:22.283501 11185 solver.cpp:244] Train net output #3: face_cls_pos_acc = 0.0390625
I1130 21:31:22.283506 11185 solver.cpp:244] Train net output #4: landmark_reg_loss = 0.412462 (* 0.5 = 0.206231 loss)
I1130 21:42:35.476652 11185 solver.cpp:337] Iteration 69058, Testing net (#0)
I1130 21:42:47.591168 11185 solver.cpp:404] Test net output #0: bbox_reg_loss = 0.0133936 (* 0.5 = 0.00669681 loss)
I1130 21:42:47.591210 11185 solver.cpp:404] Test net output #1: face_cls_loss = 0.106217 (* 1 = 0.106217 loss)
I1130 21:42:47.591215 11185 solver.cpp:404] Test net output #2: face_cls_neg_acc = 0.966099
I1130 21:42:47.591219 11185 solver.cpp:404] Test net output #3: face_cls_pos_acc = 0.816118
I1130 21:42:47.591225 11185 solver.cpp:404] Test net output #4: landmark_reg_loss = 0.00465923 (* 0.5 = 0.00232961 loss)
I1130 21:51:50.719760 11185 solver.cpp:337] Iteration 125560, Testing net (#0)
I1130 21:52:02.954237 11185 solver.cpp:404] Test net output #0: bbox_reg_loss = 0.0132568 (* 0.5 = 0.00662839 loss)
I1130 21:52:02.954282 11185 solver.cpp:404] Test net output #1: face_cls_loss = 0.105307 (* 1 = 0.105307 loss)
I1130 21:52:02.954288 11185 solver.cpp:404] Test net output #2: face_cls_neg_acc = 0.969522
I1130 21:52:02.954290 11185 solver.cpp:404] Test net output #3: face_cls_pos_acc = 0.807543
I1130 21:52:02.954295 11185 solver.cpp:404] Test net output #4: landmark_reg_loss = 0.0046395 (* 0.5 = 0.00231975 loss)
rnet耗时40分钟
test_iter: 1679
test_interval: 6810
base_lr: 0.01
display: 500
max_iter: 272400
lr_policy: "step"
gamma: 0.1
momentum: 0.9
weight_decay: 0.0005
stepsize: 68100
snapshot: 6810
snapshot_prefix: "tmp/rnet"
solver_mode: GPU
net: "proto/r_train_val.prototxt"
test_initialization: false
average_loss: 500
Namespace(epoch=40, gpu=0, lr=0.01, lrp=10, lrw=0.1, net='r', size=64, snapshot=None)
I1130 22:09:35.275794 11845 solver.cpp:228] Iteration 0, loss = 0.987914
I1130 22:09:35.275873 11845 solver.cpp:244] Train net output #0: bbox_reg_loss = 0.0379053 (* 0.5 = 0.0189526 loss)
I1130 22:09:35.275892 11845 solver.cpp:244] Train net output #1: face_cls_loss = 0.390344 (* 1 = 0.390344 loss)
I1130 22:09:35.275903 11845 solver.cpp:244] Train net output #2: face_cls_neg_acc = 0.473958
I1130 22:09:35.275913 11845 solver.cpp:244] Train net output #3: face_cls_pos_acc = 0.6875
I1130 22:09:35.275925 11845 solver.cpp:244] Train net output #4: landmark_reg_loss = 0.578617 (* 1 = 0.578617 loss)
I1130 22:12:33.694891 11845 solver.cpp:337] Iteration 20430, Testing net (#0)
I1130 22:12:44.435087 11845 solver.cpp:404] Test net output #0: bbox_reg_loss = 0.00908341 (* 0.5 = 0.00454171 loss)
I1130 22:12:44.435127 11845 solver.cpp:404] Test net output #1: face_cls_loss = 0.0609873 (* 1 = 0.0609873 loss)
I1130 22:12:44.435133 11845 solver.cpp:404] Test net output #2: face_cls_neg_acc = 0.99416
I1130 22:12:44.435137 11845 solver.cpp:404] Test net output #3: face_cls_pos_acc = 0.869091
I1130 22:12:44.435142 11845 solver.cpp:404] Test net output #4: landmark_reg_loss = 0.00228393 (* 1 = 0.00228393 loss)
I1130 22:50:48.909255 11845 solver.cpp:337] Iteration 272400, Testing net (#0)
I1130 22:50:59.960093 11845 solver.cpp:404] Test net output #0: bbox_reg_loss = 0.0081554 (* 0.5 = 0.0040777 loss)
I1130 22:50:59.960135 11845 solver.cpp:404] Test net output #1: face_cls_loss = 0.0447745 (* 1 = 0.0447745 loss)
I1130 22:50:59.960140 11845 solver.cpp:404] Test net output #2: face_cls_neg_acc = 0.993844
I1130 22:50:59.960144 11845 solver.cpp:404] Test net output #3: face_cls_pos_acc = 0.911061
I1130 22:50:59.960149 11845 solver.cpp:404] Test net output #4: landmark_reg_loss = 0.00189443 (* 1 = 0.00189443 loss)
onet耗时100分钟
test_iter: 1042
test_interval: 4299
base_lr: 0.01
display: 500
max_iter: 171960
lr_policy: "step"
gamma: 0.1
momentum: 0.9
weight_decay: 0.0005
stepsize: 42990
snapshot: 4299
snapshot_prefix: "tmp/onet"
solver_mode: GPU
net: "proto/o_train_val.prototxt"
test_initialization: false
Namespace(epoch=40, gpu=0, lr=0.01, lrp=10, lrw=0.1, net='o', size=64, snapshot=None)
I1130 23:18:21.946321 12698 solver.cpp:228] Iteration 0, loss = 0.818923
I1130 23:18:21.946379 12698 solver.cpp:244] Train net output #0: bbox_reg_loss = 0.0412246 (* 0.5 = 0.0206123 loss)
I1130 23:18:21.946389 12698 solver.cpp:244] Train net output #1: face_cls_loss = 0.342851 (* 1 = 0.342851 loss)
I1130 23:18:21.946395 12698 solver.cpp:244] Train net output #2: face_cls_neg_acc = 0.664062
I1130 23:18:21.946401 12698 solver.cpp:244] Train net output #3: face_cls_pos_acc = 0.421875
I1130 23:18:21.946408 12698 solver.cpp:244] Train net output #4: landmark_reg_loss = 0.45546 (* 1 = 0.45546 loss)
I1130 23:37:50.175259 12698 solver.cpp:337] Iteration 34392, Testing net (#0)
I1130 23:38:15.145500 12698 solver.cpp:404] Test net output #0: bbox_reg_loss = 0.00609372 (* 0.5 = 0.00304686 loss)
I1130 23:38:15.145556 12698 solver.cpp:404] Test net output #1: face_cls_loss = 0.0667225 (* 1 = 0.0667225 loss)
I1130 23:38:15.145562 12698 solver.cpp:404] Test net output #2: face_cls_neg_acc = 0.967708
I1130 23:38:15.145567 12698 solver.cpp:404] Test net output #3: face_cls_pos_acc = 0.931667
I1130 23:38:15.145572 12698 solver.cpp:404] Test net output #4: landmark_reg_loss = 0.00156244 (* 1 = 0.00156244 loss)
I1201 00:57:01.713709 12698 solver.cpp:337] Iteration 171960, Testing net (#0)
I1201 00:57:27.383432 12698 solver.cpp:404] Test net output #0: bbox_reg_loss = 0.00565168 (* 0.5 = 0.00282584 loss)
I1201 00:57:27.383491 12698 solver.cpp:404] Test net output #1: face_cls_loss = 0.0569045 (* 1 = 0.0569045 loss)
I1201 00:57:27.383497 12698 solver.cpp:404] Test net output #2: face_cls_neg_acc = 0.976772
I1201 00:57:27.383502 12698 solver.cpp:404] Test net output #3: face_cls_pos_acc = 0.937635
I1201 00:57:27.383507 12698 solver.cpp:404] Test net output #4: landmark_reg_loss = 0.00137772 (* 1 = 0.00137772 loss)