标题:Action unit classification using active appearance models and conditional random fields
Five main behavioral cues(五种主要行为线索):
Behavioral cues are primarily contained in facial expressions (cues 1,2,3,4 and 5), gestures (cues 2, 3 and 4), body pose (cues 1, 2, 4 and 5) and interactions (cues 4 and 5).
Action units can be given intensity scores: the simplest score is present or not present. Two alternative intensity scores are (1) netural, onset, apex, and offset. (2) Trace, slight, pronounced, extreme, and maximum.
(1) A component that extracts features from the face images that are indictive of the presence of action units
(2) A component that learns to recoginze action units based on these input features.
(1) A shape model that models the location of facial feature points
(2) A texture model that models the shape-normalized facial texture
Shape module
Texture module
可以看到并不是所有人都长的是一样的,shape module和 texture module针对不同的人需要有不同的区分,为此,可以引入偏移差的概念。
针对shape module
PCA learns: (1) a base shape v that is formed by the mean of the normalized feature point coordinates averaged over the entire data set. (2) A linear basisS that contains the directions in which the facial feature points vary most.
Using a small number of shape parameters p, the facial feature point configuration can be computed as P^T S+v. .
可以看到,通过一个均值加上在各个方向上一个偏差就能对各种不同的shape module进行表示,如下图:
针对texture module也是类似
(1) A mean texture image μthat is computed by averaging all shape-normalized face images
(2) A linear basis A that captures the main deviations from the texture image
Using only a small number of texture parameters λ. Given a texture parameter vectorλ,a facial texture image can be constructed by evaluating λ^T A+μ.. .
有了以上这些后,我们的任务就变成了寻找一个shape parameters P 和 texture parametersλ.一幅图的叠加过程也就如下所示:
Scale-invariant feature transform
The construction of the SIFT feature consists of three main steps: (1) the gradient magnitude and orientation at each pixel in the image patch are computed, (2) the gradient magnitudes are weighted using a Gaussian window that is centered onto the image patch, and (3) the weighted gradient magnitudes are accumulated into orientation histograms measured over subregions of size 4 9 4 pixels.
特征提取了,也知道了拟合方法,接下来就是寻找shape parameters P 和 texture parametersλ的过程,在这里使用了条件随机域模型,而悲剧的是到了这里就看不懂了
Logistic function
文中提到我们只有两种状态present/non-present,所以K=2。 x 对应的是从一幅图中提取出来的特征。回归权重 是从N个标记的训练数据点中学习得到的{(y1,x1)…(yn,xn)}(基于图片和对应的动作单元标签)。最后通过使用最大似然估计来进行学习。