【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》

【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》_第1张图片


ChaLearn Face Anti-spoofing Attack Detection Challenge@CVPR2019 比赛中采用了该数据集


1st:【FAS-FRN】《Recognizing Multi-modal Face Spoofing with Face Recognition Networks》(人脸识别预训练 + 20+模型的 ensemble + Multi-level feature aggregation 模块)

2nd:【FaceBagNet】《FaceBagNet:Bag-of-local-features Model for Multi-modal Face Anti-spoofing》(patch input + erase fusion)

3rd:【FeatherNets】《FeatherNets:Convolutional Neural Networks as Light as Feather for Face Anti-spoofing》( Streaming Module 和 ensemble + cascade 的 fusion 方式)

关于各自队伍信息可以参考 CVPR2019| 人脸防伪检测挑战赛-俄初创公司夺冠,中美企业位列二三(附论文代码及参赛模型解析)


  • 1 Background and Motivation
  • 2 Related Work
  • 3 Advantages / Contributions
  • 4 Datasets
    • 4.1 Acquisition details
    • 4.2 Data preprocessing
    • 4.3 Statistics
    • 4.4 Evaluation protocol
  • 5 Method
    • 5.1 Naive halfway fusion
    • 5.2 Squeeze and excitation fusion
  • 6 Experiments
    • 6.1 Model analysis
    • 6.2 Dataset analysis
    • 6.3 Generalization capability
  • 7 Conclusion(own)

1 Background and Motivation

随着 CNN 技术的发展,人脸识别技术已经落地了,例如:phone unlock,access control,face payment,然而人脸识别系统很容易受到各种攻击,eg:print attack,video reply attack and 2D / 3D mask attack。因此 face presentation attack detection(PAD) 是确保面部识别系统处于安全状态的重要步骤

最近,PAD 算法的表现取得了不错的表现,成功部分要归因于 face anti-spoofing 数据集的建立!然而现有的 face anti-spoofing 数据集都像是开胃菜,和满汉全席般的 classification and face recognition 数据集无法媲美。

于是,作者制作了一个现有规模最大的 face anti-spoofing 数据集,subjects 之最(人),videos 之最,同时兼具 3 种模态(RGB / Depth / IR),来推动 face anti-spoofing 技术的发展,和其他开源的数据集对比如下

【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》_第2张图片

【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》_第3张图片
在此基础上,作者从实际的角度出发(更关心 false positive rate,FPR,也即把假的认为真的,这是最致命的),引入了 receiver operating characteristic(ROC,纵坐标 TPR,横坐标 FPR)curve 作为评价指标!


【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》_第4张图片

  • APCER:attack presentation classification error rate(Fake samples 的错误率), F P F P + T N \frac{FP}{FP+TN} FP+TNFP
  • BPCER:bona fide presentation classification error rate(real samples 的错误率), F N T P + F N \frac{FN}{TP+FN} TP+FNFN
  • ACER:average classification error rate(APCER 和 BPCER 的平均值), F P F P + T N + F N T P + F N 2 \frac{\frac{FP}{FP+TN} + \frac{FN}{TP+FN}}{2} 2FP+TNFP+TP+FNFN
  • HTER:half total error rate (真假人脸中各自被判断错的比例之和的一半,同 ACER

2 Related Work

  • Datasets
    Table 1 已经罗列的蛮详细了,现有数据集有两个 common limitation
    • subjects 和 samples 有限,PAD algorithms 很容易在数据集上 overfit
    • 大部分仅包含 RGB modality,面对 new types of PAs(3D and custom-made silicone masks)容易翻车
  • Methods
    • 传统方法,利用 eye-blinking,context information,moving information,HSV and YCbCr color space,Fourier spectrum,score or feature level 的 Fusion methods
    • CNN-based methods(二分类问题)

3 Advantages / Contributions

  • 制作公开了一个 large-scale multi-modal(RGB / IR / Depth) datasets for face anti-spoofing——CASIA-SURF dataset
  • 针对 CASIA-SURF 设计了一个多模态人脸活检的网络,并 conduct 了 extensive experiments

4 Datasets


【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》_第5张图片

6 types of phone attacks,eg:cropping,bending the print paper and stand-off distance,具体如下

【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》_第6张图片

【FAS-FRN】《Recognizing Multi-modal Face Spoofing with Face Recognition Networks》 文章种这个表总结的很到位,Surface 是纸张表面的扭曲情况,Eyes 是纸张的眼睛区域被 cut 掉,露出后面真人的眼睛,Nose 和 Mouth 同理

4.1 Acquisition details

【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》_第7张图片

摄像头:Intel RealSense SR300 camera
采集图像的大小:1280×720 for RGB,640×480 for Depth,IR and aligned images

4.2 Data preprocessing


【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》_第8张图片

图 4 第一列到第二列,用 Dlib——《Dlib-ml:A machine learning toolkit》工具检测人脸(矩形区域)

图 4 第二列到第三列,用 PRNet——《Joint 3d face reconstruction and dense alignment with position map regression network》,获取 accurate face area(face reconstruction area)

图 4 第三列到第四列,生成一个 mask

图 4 第四列到第五列,Dlib 生成的人脸矩形区域与 mask 结合,crop 出仅含人脸轮廓的区域

4.3 Statistics

【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》_第9张图片
作者从原始 video 中每 10 frames sample 一张,性别和年龄分布如 Figure 5 所示

【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》_第10张图片

所有人都是 Chinese

4.4 Evaluation protocol


live faces and Attacks 4,5,6 用作 train

live faces and Attacks 1,2,3 用作 validation and testing

数据都采用的是 CASIA-SURF 中的数据


用 CASIA-SURF 预训练,在其他数据集上 fine-tune 和测试

5 Method

5.1 Naive halfway fusion

三个模态先单飞,到一定的 network stage 后,再组团出道


However,direct concatenating these features cannot make full use of the characteristics between different modalities

5.2 Squeeze and excitation fusion

面对不同类型的 PAs,三种模态可以互补

  • RGB:rich appearance details
  • Depth:sensitive to the distance between the image plane and the corresponding face
  • IR:measure the amount of heat radiated from a face

作者借鉴 【SENet】《Squeeze-and-Excitation Networks》 来融合三个模态提取出来的信息,而不是简单的把三种模态提取出来的特征 concatenate 在一起

【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》_第11张图片

【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》_第12张图片

squeeze and excitation fusing 模块 performs modal-dependent feature re-weighting to select the more informative channel features while suppressing less useful features from each modality

6 Experiments

  • face region:112 × 112

  • data augmentation:random flipping,rotation,resizing,cropping and color distortion for data augmentation

6.1 Model analysis

halfway fusion 就是 Figure 6 去掉 SE fusion 模块,确实 SE fusion 带来的受益还是蛮多的

6.2 Dataset analysis

1)effect on the number of modalities

【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》_第13张图片

2)effect on the number of subjects

As described in《Revisiting unreasonable effectiveness of data in deep learning era》(ICCV-2017), there is a logarithmic relation between the amount of training data and the performance of deep neural network methods


【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》_第14张图片
Figure 7 展示的是不同 training subjects(50,100,200,300 人) 的 ROC曲线,Figure 8 展示的是不同 training subjects 的 ACER(越小越好)

还是可以看出,在网络没有吃饱的时候,多吃点(data)还是蛮管用的,话说想看 log 关系是不是像我这样更直观,哈哈

【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》_第15张图片

6.3 Generalization capability

用 CASIA-SURF 预训练,然后去别的数据集上 fine-tune 测测效果来验证泛化性能

1)Siw dataset

用 CASIA-SURF 的 RGB 和 Depth 数据预训练 FAS-TD-SF(利用到了深度信息) 模型,然后在 SiW(仅 RGB) 数据集上 fine-tune 和测试

【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》_第16张图片
2)CASIA-MFSD dataset

【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》_第17张图片
HTER 是 half total error rate,真假人脸中各自被判断错的比例之和的一半

1 0

H T E R = F P F P + T N + F N T P + F N 2 HTER = \frac{\frac{FP}{FP+TN} + \frac{FN}{TP+FN}}{2} HTER=2FP+TNFP+TP+FNFN

参考 活体检测评判标准HTER(half total error rate)解读

7 Conclusion(own)

  • CASIA:Institute of automation,Chinese Academy of Sciences
  • 不多说,中国人的文章读起来还是比较琅琅上口滴
  • 多模态输入,用 SE attention 配合 concatenate 融合,halfway fusion,总体来说还不错的样子
  • ROC 又补了补,时间久了,老忘了!注意,不同阈值对应着不同的 confusion matrix,然后对应上 ROC 曲线上的一个点(ROC及AUC计算方法及原理)
