学习笔记:Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment











一般来说,已有的对齐方式可被分为基于方法[7,21,14,34,19,6] 的整体特征和基于方法[8,10,15,23,9,25,35,32,31,2,28,11] 的局部特征。

学习笔记:Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment_第1张图片

Fig. 1. Overview of our Coarse-to-Fine Auto-encoder Networks (CFAN) for real-time
face alignment. H 1 , H 2 are hidden layers. Through function F Φ , the joint local features
Φ ( S i ) are extracted around facial landmarks of current shape S i .


在使用Intel i7-3770(3.4 GHz CPU)的台式机上,作者的方法(在Matlab代码中)每个图像大约需要23毫秒,以预测68个脸部检测时间。

假设有一副d个像素的人脸图像x Rd (d上标),Sg(x) Rp (g下标,p上标)表示p个标志的真实位置。面部标志检测是学习一个从图像到面部形状的映射函数F:

   F : S x.




学习笔记:Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment_第2张图片

F = {f1, f2, ..., fk}, fi 是深度网络中第 层的映射函数,σ 是sigmoid (不知咋翻译)函数和 ai 是每层的特征表示。
然而,Sigmoid函数的输出范围为[0 1],与位置范围不一致,因此在最后一层 fk 中利用线性回归得到准确的形状估计S0。




对于第 i 层,通过优化下面的公式来达到预训练的目的:


每个隐藏层的输出作为下一次的输入。对于第一层,a0 = x


学习笔记:Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time Face Alignment_第3张图片




在全局SAN和局部SAN中,α = 0.001 。


1. 300 faces in-the-wild challenge, http://ibug.doc.ic.ac.uk/resources/300-W/
2. Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Robust discriminative response
map fitting with constrained local models. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3444–3451 (2013)
3. Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of
faces using a consensus of exemplars. In: IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), pp. 545–552 (2011)
4. Bengio, Y.: Learning deep architectures for
AI. Foundations and TrendsR in Machine Learning 2(1), 1–127 (2009)
5. Burgos-Artizzu, X.P., Perona, P., Doll´ar, P.: Robust face landmark estimation
under occlusion. In: IEEE International Conference on Computer Vision, ICCV
6. Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression.
In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp.
2887–2894 (2012)
7. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 23(6), 681–685
8. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape modelstheir training and application. Computer Vision and Image Understanding
(CVIU) 61(1), 38–59 (1995)
9. Cristinacce, D., Cootes, T.F.: Feature detection and tracking with constrained
local models. In: British Machine Vision Conference (BMVC), vol. 17, pp. 929–938
10. Cristinacce, D., Cootes, T.F.: Boosted regression active shape models. In: British
Machine Vision Conference (BMVC), pp. 1–10 (2007)
11. Dantone, M., Gall, J., Fanelli, G., Van Gool, L.: Real-time facial feature detection
using conditional regression forests. In: IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pp. 2578–2585 (2012)
12. Doll´ ar, P., Welinder, P., Perona, P.: Cascaded pose regression. In: IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), pp. 1078–1085 (2010)
13. Grangier, D., Bottou, L., Collobert, R.: Deep convolutional networks for scene parsing. In: International Conference on Machine Learning Workshops, vol. 3 (2009)
14. Gross, R., Matthews, I., Baker, S.: Generic vs. person specific active appearance
models. Image and Vision Computing (IVC) 23(12), 1080–1093 (2005)
15. Gu, L., Kanade, T.: A generative shape regularization model for robust face alignment. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS,
vol. 5302, pp. 413–426. Springer, Heidelberg (2008)
16. Jesorsky, O., Kirchberg, K.J., Frischholz, R.W.: Robust face detection using
the hausdorff distance. In: International Conference on Audio-and Video-based
Biometric Person Authentication (AVBPA), pp. 90–95 (2001)
17. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems
(NIPS), pp. 1106–1114 (2012)
18. Le, V., Brandt, J., Lin, Z., Bourdev, L., Huang, T.S.: Interactive facial feature
localization. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C.
(eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 679–692. Springer, Heidelberg

19. Liu, X.: Discriminative face alignment. IEEE Transactions on Pattern Analysis
and Machine Intelligence (TPAMI) 31(11), 1941–1954 (2009)
20. Luo, P., Wang, X., Tang, X.: Hierarchical face parsing via deep learning. In: IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2480–2487
21. Matthews, I., Baker, S.: Active appearance models revisited. International Journal
of Computer Vision (IJCV) 60(2), 135–164 (2004)
22. Messer, K., Matas, J., Kittler, J., Luettin, J., Maitre, G.: Xm2vtsdb: The extended
m2vts database. In: International Conference on Audio and Video-based Biometric
Person Authentication (AVBPA), vol. 964, pp. 965–966 (1999)
23. Milborrow, S., Nicolls, F.: Locating facial features with an extended active shape
model. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS,
vol. 5305, pp. 504–513. Springer, Heidelberg (2008)
24. Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: A semi-automatic
methodology for facial landmark annotation. In: IEEE Conference on Computer
Vision and Pattern Recognition Workshops (CVPRW), pp. 896–903 (2013)
25. Saragih, J.M., Lucey, S., Cohn, J.F.: Face alignment through subspace constrained
mean-shifts. In: IEEE International Conference on Computer Vision (ICCV), pp.
1034–1041 (2009)
26. Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point
detection. In: IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), pp. 3476–3483 (2013)
27. Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural
networks. In: IEEE Conference on Computer Vision and Pattern Recognition,
CVPR (2014)
28. Valstar, M., Martinez, B., Binefa, X., Pantic, M.: Facial point detection using
boosted regression and graph models. In: IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), pp. 2729–2736 (2010)
29. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple fea
tures. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
vol. 1, p. I–511 (2001)
30. Wu, Y., Wang, Z., Ji, Q.: Facial feature tracking under varying facial expressions
and face poses based on restricted boltzmann machines. In: IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), pp. 3452–3459 (2013)
31. Xiong, X., De la Torre, F.: Supervised descent method and its applications to face
alignment. In: IEEE Conference on Computer Vision and Pattern Recognition,
CVPR (2013)
32. Yu, X., Huang, J., Zhang, S., Yan, W., Metaxas, D.N.: Pose-free facial landmark
fitting via optimized part mixtures and cascaded deformable shape model. In: IEEE
International Conference on Computer Vision, ICCV (2013)
33. Zhao, X., Kim, T.K., Luo, W.: Unified face analysis by iterative multi-output
random forests. In: IEEE Conference on Computer Vision and Pattern Recognition,
CVPR (2014)
34. Zhao, X., Shan, S., Chai, X., Chen, X.: Locality-constrained active appearance
model. In: Asian Conference on Computer Vision (ACCV), pp. 636–647 (2013)
35. Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization
in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), pp. 2879–2886 (2012)
