Face Alignment by Explicit Shape Regression
CVPR2012
https://github.com/soundsilence/FaceAlignment
本文的三个亮点: a two-level boosted regression, effective shape-indexed features, and a fast correlation-based feature selection method
1 Introduction
A face shape S 由 N 个facial landmarks 组成。face alignment 的目标就是 估计一个 shape S 使其与 true shape S^ 尽可能的接近。也就是人脸特征点的估计位置 和真值位置尽可能的接近。数学语言描述就是最小化
根据 shape S 估计的方法,可以将人脸对齐分为 optimization-based and regression-based
基于优化的方法 最小化另一个相关的损失函数,这类方法效果主要依赖于 the goodness of the error function and whether it can be optimized well。
AAM is sensitive to the initialization due to the gradient descent optimization
Regression-based methods learn a regression function that directly maps image appearance to the target output
基于回归的方法有的使用了 parametric model,This is indirect and sub-optimal because smaller parameter errors are not necessarily equivalent to smaller alignment errors。 有的方法对每个 individual landmarks 学习一个回归器,但是 because only local image patches are used in training and appearance correlation between landmarks is not exploited, such learned regressors are usually weak and cannot handle large pose variation and partial occlusion
我们注意到 shape constraint 对于所有方法来说都是至关重要的。只有少量的 salient landmarks (e.g., eye centers, mouth corners) 可以基于 image appearances 被稳定的表示。许多其他的 non-salient landmarks (e.g., points along face contour) 需要基于形状的约束 shape constraint - the correlation between landmarks
using a fixed shape model in an iterative alignment process (as most methods do)may also be suboptimal
分两个步骤来做人脸对齐可能更好:第一步是初始化,大致定位,第二步是特征点微调
1) in initial stages (the shape is far from the true target),restricted model for fast convergence and better regularization
2) in late stages (the shape has been roughly aligned), we may want to use a more flexible shape model with more subtle variations for refinement
the early regressors handle large shape variations and guarantee robustness, while the later regressors handle small shape variations and ensure accuracy. Thus, the shape constraint is adaptively enforced from coarse to fine, in an automatic manner.
To our knowledge, adapting such shape model flexibility is rarely exploited in the literature.
本文提出了一个 novel regression-based approach without using any parametric shape models,没有使用参数形状模型,
all facial landmarks are regressed jointly in a vectorial output
Our regressor realizes the shape constraint in an non-parametric manner: the regressed shape is always a linear combination of all training shapes
using features across the image for all landmarks is more discriminative than using only local patches for individual landmarks
2 Face Alignment by Shape Regression
这里我们使用 boosted regression [9, 8] to combine T weak regressors (R1 ,…Rt ,…,RT ) in an additive manner
that the regressor Rt depends on both image I and previous estimated shape S(t−1)
2.1. Two-level cascaded regression
key difference is that the shape-indexed image features are fixed in the second level, i.e., they are indexed only relative to S t−1 and no longer change when those r’s are learnt
This is important, as each r is rather weak and allowing feature indexing to change frequently is unstable.
2.2. Primitive regressor
We use a fern as our primitive regressor r
2.3. Shape-indexed (image) features
For efficient regression, we use simple pixel-difference features, i.e., the intensity difference of two pixels in the image
为了快速回归计算,我们使用像素差值特征
A pixel is indexed relative to the currently estimated shape rather than the original image coordinates
In this work, we suggest to index a pixel by its local coordinates (δx,δy) with respect to its nearest landmark. As Figure 2 shows, such indexing
holds invariance against the variations mentioned above and make the algorithm robust.
2.4. Correlation-based feature selection
一共有 P*P 个 pixel-difference features 产生,对每个fern regressor 如何选出 F 个好的特征了?