PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer(CVPR20)

3. PSGAN

3.1. Formulation

source image domain X X X, reference image domain Y Y Y

domain X X X上有 N N N个样本, { x n } n = 1 , ⋯   , N , x n ∈ X \left \{ x^n \right \}_{n=1,\cdots,N}, x^n\in X {xn}n=1,,N,xnX;domain Y Y Y上有 M M M个样本, { y m } m = 1 , ⋯   , M , y m ∈ Y \left \{ y^m \right \}_{m=1,\cdots,M}, y^m\in Y {ym}m=1,,M,ymY

domain X X X上的分布 P X \mathcal{P}_X PX,domain Y Y Y上的分布 P Y \mathcal{P}_Y PY

学习目标是一个transfer function G : { x , y } → x ~ G:\left \{ x, y \right \}\rightarrow\tilde{x} G:{x,y}x~,使得 x ~ \tilde{x} x~包含 y y y的makeup style,以及 x x x的identity

3.2. Framework


Overall

PSGAN的framework如Fig. 2所示

  1. Makeup distill network(MDNet),从reference image y y y中提取makeup style,共有2个成分 γ , β \gamma, \beta γ,β,称为makeup matrices
  2. Attentive makeup morphing module(AMM module),因为source image x x x和reference image y y y之间的expression和pose差异很大,所以提出AMM module,用于morph the two makeup matrices λ , β \lambda, \beta λ,β to two new matrices λ ′ , β ′ \lambda', \beta' λ,β, which are adaptive to the source image by considering the similarities between pixels of the source and reference
  3. Makeup apply network(MANet),将 λ ′ , β ′ \lambda', \beta' λ,β作用在MANet的bottleneck feature map上

Makeup distill network(MDNet)

MDNet的网络结构为StarGAN的encoder-bottleneck部分(bottleneck指residual block),负责提取 the makeup related features(如唇彩、眼影等),这些feature被表示为2个makeup matrices γ , β \gamma, \beta γ,β

如Fig.2(B)所示,MDNet的输出为feature map V y ∈ R C × H × W \mathbf{V}_\mathbf{y}\in\mathbb{R}^{C\times H\times W} VyRC×H×W,后接2个并列的1x1 conv layer,得到 γ ∈ R 1 × H × W , β ∈ R 1 × H × W \gamma\in\mathbb{R}^{1\times H\times W}, \beta\in\mathbb{R}^{1\times H\times W} γR1×H×W,βR1×H×W

Attentive makeup morphing module(AMM module)

因为source image x x x和reference image y y y之间的expression和pose差异很大,所以不能直接将 γ , β \gamma, \beta γ,β直接作用在 source image x x x
Q:可以认为 γ , β \gamma, \beta γ,β中仍然包含reference image y y y的expression和pose等信息吗?

AMM module计算一个attentive matrix A ∈ R H W × H W A\in\mathbb{R}^{HW\times HW} ARHW×HW to specify how a pixel in the source image x x x is morphed from the pixels in the reference image y y y,where A i , j A_{i,j} Ai,j indicates the attentive value between the i i i-th pixel x i x_i xi in image x x x and the j j j-th pixel y j y_j yj in image y y y
理解:假设在 x x x中position i i i是眼角的位置,在 y y y中position j j j也是眼角的位置,那么 A i , j A_{i,j} Ai,j的值应该比较大,意味着 x ~ \tilde{x} x~中position i i i的像素值应该参考 y y y中position j j j的像素值,才能实现较好的眼影迁移
(有个缺点,既然把 H H H W W W乘起来了,一定程度上丢失了spatial information)

引入68个facial landmarks作为anchor points
以鼻尖处的landmark为例,对于 x x x的所有position,计算该position i i i到鼻尖x的距离(有正有负),得到一个2维vector,于是所有68 landmark就可以得到136维向量, p i ∈ R 136 , i = 1 , ⋯   , H × W \mathbf{p}_i\in\mathbb{R}^{136}, i=1,\cdots,H\times W piR136,i=1,,H×W,称为relative position features
p = [ f ( x i ) − f ( l 1 ) , f ( x i ) − f ( l 2 ) , ⋯   , f ( x i ) − f ( l 68 ) g ( x i ) − g ( l 1 ) , g ( x i ) − g ( l 2 ) , ⋯   , g ( x i ) − g ( l 68 ) ] ( 1 ) \begin{aligned} \mathbf{p}=&[ f(x_i)-f(l_1), f(x_i)-f(l_2),\cdots,f(x_i)-f(l_{68}) \\ &g(x_i)-g(l_1), g(x_i)-g(l_2),\cdots,g(x_i)-g(l_{68}) ] \qquad(1) \end{aligned} p=[f(xi)f(l1),f(xi)f(l2),,f(xi)f(l68)g(xi)g(l1),g(xi)g(l2),,g(xi)g(l68)](1)
where f ( ⋅ ) f(\cdot) f() and g ( ⋅ ) g(\cdot) g() indicate the coordinates on x x x and y y y axes, l i l_i li indicates the i i i-th facial landmark
思考: p \mathbf{p} p的维度应该是 H × W × 136 H\times W\times136 H×W×136

既然是landmark,那么必然会存在face size的差异,因此令 p \mathbf{p} p单位化,即 p ∥ p ∥ \frac{\mathbf{p}}{\left \| \mathbf{p} \right \|} pp(为何不是将坐标转换到 [ 0 , 1 ] [0, 1] [0,1]之间?)

Moreover, to avoid unreasonable sampling pixels with similar relative positions but different semantics, we also consider the visual similarities between pixels

Fig.2(c)举了一个例子

【源代码】
face parser工具提供的标签
0:background,1:face,2:left-eyebrown,3:right-eyebrown,
4:left-eye,5:right-eye,6:nose,7:upper-lip,8:teeth,
9:under-lip,10:hair,11:left-ear,12:right-ear,13:neck
PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer(CVPR20)_第1张图片
PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer(CVPR20)_第2张图片

你可能感兴趣的:(算法)