Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Conv

Proposed Methodology

Heatmap Regression

将预测AU intensity vector的问题转换为预测multiple AU heatmaps

Fig.2给出了每一个AU的central location
Q:每一个点都应该由68个landmarks通过某些规则计算得到的吧,文中没有仔细说明
Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Conv_第1张图片
AU数量为 N N N,对于第 i i i个AU,其location的坐标为 x ^ i \hat{x}_i x^i,以 x ^ i \hat{x}_i x^i为中心作用Gaussian function,生成heatmap g i ( x ) g_i(x) gi(x)
g i ( x ) = I 2 π σ 2 exp ⁡ ( − ∥ x − x ^ i ∥ 2 2 2 σ 2 ) ( 1 ) g_i(x)=\frac{I}{2\pi\sigma^2}\exp\left ( -\frac{\left \| x-\hat{x}_i \right \|_2^2}{2\sigma^2} \right ) \qquad(1) gi(x)=2πσ2Iexp(2σ2xx^i22)(1)
其中 I I I表示AU intensity, σ \sigma σ是Gaussian function的标准差

网络输出的predicted heatmap为 h i ( x ; w , b ) h_i(x;w,b) hi(x;w,b) x x x表示input facial image, h h h w w w是网络的参数,于是求MSE作为监督损失
L M S E = min ⁡ w , b ∑ i = 1 N ∑ x ∥ h i ( x ; w , b ) − g i ( x ) ∥ 2 2 ( 2 ) L_{MSE}=\underset{w,b}{\min}\sum_{i=1}^{N}\sum_{x}\left \| h_i(x;w,b)-g_i(x) \right \|_2^2 \qquad(2) LMSE=w,bmini=1Nxhi(x;w,b)gi(x)22(2)
由predicted heatmap转换为predicted label时,只需要取最大值即可(若某个AU intensity为0,那么predicted heatmap必然是全黑的)

SCC: Semantic Correspondence Convolution

Given the co-occurrences of different AU intensities, the semantic representations of feature maps are highly correlated in spatial distributions.

本文提出SCC来model the correlation among feature channels,其思想inspired by the dynamic graph convolutions in geometry modeling
注:Wang Y, Sun Y, Liu Z, et al. Dynamic Graph CNN for Learning on Point Clouds[J]. ACM Transactions on Graphics (TOG), 2019, 38(5): 1-12.

后续的SCC有点复杂,是参考TOG paper的

你可能感兴趣的:(算法)