Abstract
Recent reference-based face restoration methods have received considerable attention due to their great capability in recovering high-frequency details on real low-quality images.However, most of these methods require a high-quality reference image of the same identity, mak-ing them only applicable in limited scenes. To address this issue, this paper suggests a deep face dictionary network (termed as DFDNet) to guide the restoration process of degraded observations.To begin with,we use K-means to generate deep dictionaries for perceptually significant face components (i.e., left/right eyes, nose and mouth) from high-qualityimages. Next, with the degraded input, we match and select the most sim-ilar component features from their corresponding dictionaries and transfer the high-quality details to the input via the proposed dictionary feature transfer (DFT) block.In particular, component AdaIN is leveraged to eliminate the style diversity between the input and dictionary features(e.g., illumination), and a confidence score is proposed to adaptively fusethe dictionary feature to the input. Finally, multi-scale dictionaries are adopted in a progressive manner to enable the coarse-to-fine restoration. Experiments show that our proposed method can achieve plausible performance in both quantitative and qualitative evaluation, and more importantly, can generate realistic and promising results on real degraded images without requiring an identity-belonging reference. The source codeand models are available athttps://github.com/csxmli2016/DFDNet.
由于基于参考的面部恢复方法在真实的低质量图像上恢复高频细节的强大功能,因此受到了相当大的关注。但是,这些方法大多数都需要具有相同标识的高质量参考图像,从而使它们仅适用于有限的场景。为了解决这个问题,本文提出了一个深层字典网络(称为DFDNet)来指导退化观测的恢复过程。首先,我们使用K均值从高质量图像生成用于感知重要面部组件(即左/右眼,鼻子和嘴巴)的深层字典。接下来,使用降级的输入,我们从相应的字典中匹配并选择最相似的组件特征,然后通过建议的字典特征传递(DFT)块将高质量的细节传递给输入,特别地,利用分量AdaIN消除输入和字典特征(例如照明)之间的样式多样性,并提出置信度得分以将字典特征自适应地融合到输入。最后,逐步采用了多尺度词典,以实现从粗到细的还原。实验表明,我们提出的方法在定量和定性评估中都可以实现合理的性能,而且更重要的是,它可以在真实的退化图像上产生逼真的结果,而无需使用属于同一性的参考。源代码和模型可从https://github.com/csxmli2016/DFDNet获得。
1 Introduction
Blind face restoration (or face hallucination) aims at recovering realistic details from real low-quality (LQ) image to its high-quality (HQ) one, without knowing the degradation types or parameters. Compared with single image restoration tasks,e.g., image super-resolution, denoising, and deblurring, blind image restoration suffers from more challenges, yet is of great practical value in restoring real LQ images.
Recently, benefited from the carefully designed architecture and the incorporation of related priors in deep neural convolutional networks, the restoration results tend to be more plausible and acceptable. Though great achievements have beenmade, the real LQ images usually contain complex and diverse distributions thatare impractical to synthesize, making the blind restoration problem intractable. Though great achievements have been made, the real LQ images usually contain complex and diverse distributions that are impractical to synthesize, making the blind restoration problem intractable.To solve this issue, reference-based methods have been suggestedby using reference prior in image restoration task to improve the process of network learning and alleviate the dependency of network on degraded input.Among these methods, GFRNet and GWAINet adopt a frontal HQ imageas reference to guide the restoration of degraded observation. However, these two methods suffer from two drawbacks. 1) They have to obtain a frontal HQ reference which is from the same identity with LQ image. 2) The differences of poses and expressions between the reference and degraded input will affect the reconstruction performance. These two requirements limit their applic ativeability to some specific scenarios (e.g., old film restoration or phone album that supports identity group).
In this paper, we present a DFDNet by building deep face dictionaries to address the aforementioned difficulties. We note that the four face components(i.e., left/right eyes, nose and mouth) are similar among different people. Thus,in this work, we off-line build face component dictionaries by adopting K-meanson large amounts of HQ face images. This manner can obtain more accurate component reference without requiring the corresponding identity-belonging HQ images, which makes the proposed model applicable in most face restoration scenes. To be specific, we firstly use pre-trained VggFace to extract the multi-scale features of HQ face images in different feature scale (e.g., output of different convolutional layers). Secondly, we adopt RoIAlign to crop the ir component features based on the facial landmarks. K-means is then applied on these features to generate the K clusters for each component on different feature levels.After that, component adaptive instance normalization (CAdaIN) is proposed to norm the corresponding dictionary feature which helps to eliminate the effect of style diversity (i.e., illumination or skin color). Finally, with the degraded input, we match and select the dictionary component clusters which have the smallest feature distance to guide the following restoration process in an adaptive and progressive manner. A confidence score is predicted to balance the input component feature and the selected dictionary feature. In addition, we use multi-scale dictionaries to guide the restoration progressively which further improves the performance. Compared with the former reference-based methods(i.e., GFRNet and GWAINet), which have only one HQ reference, our DFDNet has more component candidates to be selected as a reference, thus making our model achieve superior performance.
Extensive experiments are conducted to evaluate the performance of our proposed DFDNet. equantitative and qualitative results show the benefitsof deep multi-scale face dictionaries brought in our method. Moreover, DFDNet can also generate plausible and promising results on real LQ images. Without requiring identity-belonging HQ reference, our method is flexible and practicalin most face restoration applications.
To sum up, the main contributions of this work are:
盲人脸恢复(或幻觉)的目的是在不知道退化类型或参数的情况下,将真实细节从真实的低质量(LQ)图像恢复为其高质量(HQ)图像。与单图像恢复任务相比,例如图像超分辨率,去噪和去模糊,盲图像恢复面临更多挑战,但在恢复实际LQ图像方面具有很大的实用价值。
近来,得益于精心设计的体系结构以及相关的先验知识在深度神经卷积网络中的结合,恢复结果趋于更加合理和可接受。尽管已经取得了很大的成就,但是实际的LQ图像通常包含复杂而多样的分布,这些分布难以综合,使得盲目恢复问题变得棘手。为了解决这个问题,人们提出了一种在图像恢复任务中使用先验参考的基于参考的方法,以改善网络学习的过程,减轻网络对降级输入的依赖性。在这些方法中,GFRNet和GWAINet采用正面HQ图像作为参考来指导退化观测的恢复。但是,这两种方法都有两个缺点。 1)他们必须获得与LQ图像具有相同身份的正面HQ参考。2)参考和降级输入之间的姿势和表情的差异将影响重建性能。这两个要求将其适用性限制为某些特定场景(例如,旧电影恢复或电话专辑)。
在本文中,我们通过构建深脸词典而展示了一个上述困难的DFDnet。我们注意到四个面部组件(即,左/右眼,鼻子和嘴巴)在不同的人中是相似的。因此,在这项工作中,我们通过采用K-Cleanon大量的HQ面部图像来离线构建面部分量词典。这种方式可以获得更准确的组件参考,而不需要相应的身份相关的HQ图像,这使得所提出的模型在大多数面部恢复场景中适用。具体而言,我们首先使用预先训练的vggface来提取不同特征规模的HQ面部图像的多尺度特征(例如,不同的卷积层的输出)其次,我们采用roidign根据面部地标作裁剪红外组件特征。然后应用于这些特征上的K-means以在不同特征级别上为每个组件生成K集群。在此之后,提出了组分自适应实例归一化(CADAIN)来规范相应的字典特征,这有助于消除风格分集的影响(即,照明或肤色)。最后,通过降级的输入,我们匹配并选择具有最小特征距离的字典组件群集,以指导以下恢复过程以自适应和渐进的方式。预测置信度分数平衡输入组件特征和所选词典特征。此外,我们使用多尺度词典逐步指导恢复,从而进一步提高了性能。与以前的基于参考的方法(即,GFRNET和GWAINET)相比,只有一个HQ参考,我们的DFDNET有更多的组件候选人作为参考,从而使我们的模型实现卓越的性能。
进行了广泛的实验,以评估我们提出的DFDNet的性能。定性和定性结果显示了我们的方法带来的深度多尺度人脸字典的好处。此外,DFDNet还可以在真实的LQ图像上产生合理且有希望的结果。在大多数人脸修复应用中,我们的方法灵活而实用,无需使用属于身份的HQ参考。
综上所述,这项工作的主要贡献是:
2 Related Work
2.1 Facial Age Estimation
2.1面部年龄估计
In recent years, with rapid development of convolution neural network (CNN) in computer vision tasks, such as facial landmark detection, face reconition, pedestrian attribute, semantic segmentation, deep learning meth-ods were also improved the performance of age estimation. Here we briefly review some representative works in the facial age estimation field. regarded the facial age estimation as a classification problem and predicted ages with the expectation of ages weighted by classification probability. proposed an age group classification method called age group-n-encoding method.However, these classification methods ignored the adjacent relationship between classes or groups. To overcome this, Niu et al proposed a multiple output CNN learning algorithm which took account of the ordinal information of agesfor estimation. Shen et al proposed Deep Regression Forests by extend ingdifferentiable decision trees to deal with regression. Furthermore, Li et al proposed BridgeNet, which consists of local regressors and gating networks, to effectively explore the continuous relationship between age labels. Tan et al proposed a complex Deep Hybrid Aligned Architecture (DHAA) that consists of global, local and global local branches and jointly optimized the architecture with complementary information. Besides, Xie et al proposed two ensemble learning methods both utilized ordinal regression modeling for age estimation.
近年来,随着卷积神经网络(CNN)在计算机视觉任务中的快速发展,例如面部标志检测,面部重构,行人属性,语义分割,深度学习方法,也提高了年龄估计的性能。在这里,我们简要回顾一下面部年龄估计领域中的一些代表性作品。将面部年龄估计作为分类问题,并根据年龄对分类年龄加权的期望值来预测年龄。提出了一种年龄组分类方法,称为年龄组n编码方法,但是这些分类方法忽略了类或组之间的相邻关系。Niu等人提出了一种多输出CNN学习算法,该算法考虑了年龄的序数信息以进行估计。Shen等人通过扩展不可区分的决策树来处理回归问题,提出了“深度回归森林”。此外,Li等人提出了由本地回归器和门控网络组成的BridgeNet,以有效地探索年龄标签之间的连续关系。Tan等人提出了一种复杂的深度混合对齐架构(DHAA),该架构由全球,本地和全球本地分支机构组成,并通过补充信息共同优化了该架构。此外,谢等人提出了两种集成学习方法,均使用有序回归模型进行年龄估计。
2.2 Distribution Learning
Distribution learning is a learning method proposed to solve the problem of label ambiguity, which has been utilized in a number of recognition tasks, such as head pose estimation, and age estimation. Geng et al proposed two adaptive label distribution learning (ALDL) algorithms, i.e. IIS-ALDL and BFGS-ALDL, to iteratively learn the estimation function parameters and the label distributions variance. Though ALDL used an adaptive variance learning, our proposed method is different in three ways. Firstly, ALDL utilized traditional optimization method like BFGS while ours uses deep learning and CNN. Secondly, ALDL chose better samples in current training iteration to estimate new variance while our method uses meta-learning to get adaptive variance. The third point is ALDL updated variance only by estimating the training sample, which may cause overfitting. Our adaptive variance is supervised by validation set to be more general. Distribution learning of label was also used toremedy the short age of training data with exact ages. Hou et al proposed asemi-supervised adaptive label distribution learning method. It used unlabeled data to enhance the label distribution adaptation to find a proper variance foreach age. However, aging tendencies varies and variances of people at the same age could be different. Gao et al jointly used LDL and expectation regression to alleviate the inconsistency between training and testing. Moreover, Panet al proposed a mean-variance loss for robust age estimation. li et al proposed label distribution refinery to adaptively estimate the age distributions without assumptions about the form of label distribution, barely took into account the correlation of adjacent ages. While our method used Gaussian label distribution with adaptively meta learned variance, which pays more attention to neigh boring ages and ordinal information.
2.2 分布学习
分布学习是为解决标签歧义性问题而提出的一种学习方法,该方法已被用于许多识别任务中,例如头部姿态估计和年龄估计。 Geng等人提出了两种自适应标签分布学习(ALDL)算法,即IIS-ALDL和BFGS-ALDL,以迭代地学习估计函数参数和标签分布方差。尽管ALDL使用了自适应方差学习,但我们提出的方法在三种方面有所不同。首先,ALDL利用了诸如BFGS之类的传统优化方法,而我们的则采用了深度学习和CNN。其次,ALDL在当前训练迭代中选择更好的样本来估计新的方差,而我们的方法使用元学习来获得自适应方差。第三点是仅通过估计训练样本来更新ALDL,这可能会导致过度拟合。我们的适应性方差由更广泛的验证集监督。标签的分布学习也被用于纠正具有准确年龄的训练数据的较短年龄。侯等人提出了半监督的自适应标签分布学习方法。它使用未标记的数据来增强标记分布的适应性,以找到适合每个年龄的差异。但是,老龄化趋势各不相同,同一年龄段的人的差异也可能不同。Gao等人联合使用LDL和期望回归来缓解训练和测试之间的不一致。此外,Panet等人提出了一种用于健壮年龄估计的均值方差损失。Li等人提出了标签分布精炼厂来自适应地估计年龄分布,而无需假设标签分布的形式,而几乎没有考虑相邻年龄之间的相关性。虽然我们的方法使用具有自适应元学习方差的高斯标签分布,但它更加关注邻近的无聊年龄和有序信息。
2.3 Meta-learning
Our proposed AVDL is an instantiation of meta-learning, i.e., learning to learn. According to the type of leveraged meta data, this concept can be classified to several types including transferring knowledge from empirically similar tasks, transferring trained model parameters between tasks, building meta-models to learn data characteristics and learn purely from model evaluations. Model Agnostic Meta-Learning (MAML) learned a model parameter initialization to perform better on target tasks. With the guidance of meta information, MAML took one gradient descent step on meta-objective to update model parameters. The idea of using validation loss as meta-objective was applied in few-shot learning . With reference to few-shot learning, Ren et al proposed a reweighting method (L2RW) guided by validation set. This method tried to solve the problem that data imbalance and label noise are bothin the training set. The crucial criteria of L2RW is a small unbiased clean val-idation set which was taken as the supervisor of learning sample weight. As validation set performance measures the quality of hyper-parameters, taking itas meta-objective could not only be applied to sample reweighting but also toany other online hyper-parameter adaptation tasks. Inspired by this, we propose AVDL to incorporate validation set based meta-learning and label distribution learning to adaptively learn the label variance.
2.3 Meta-learning
我们提出的AVDL是元学习(即学习学习)的实例。根据杠杆元数据的类型,此概念可以分为几种类型,包括从经验相似的任务中转移知识,在任务之间转移经过训练的模型参数,建立元模型以学习数据特征以及仅从模型评估中学习。模型不可知元学习(MAML)学习了模型参数初始化,以在目标任务上执行得更好。在元信息的指导下,MAML对元目标采取了一种梯度下降的步骤来更新模型参数。将验证损失作为元目标的想法被应用到了几次学习中。关于一次性学习,Ren等人提出了一种基于验证集的重加权方法(L2RW)。该方法试图解决训练集中都存在数据不平衡和标签噪声的问题。 L2RW的关键标准是一个小的无偏清洁验证集,该集被用作学习样本权重的监督者。由于验证集性能可衡量超参数的质量,因此将其视为元目标不仅可以应用于样本重加权,而且还可以应用于任何其他在线超参数自适应任务。受此启发,我们提出AVDL结合基于验证集的元学习和标签分布学习,以自适应地学习标签差异。
摘要单词:
Component:名词: 部件, 元件, 构件, 组成, 分枝 形容词: 组成的
reference:名词: 参考 动词: 参考
have received considerable attention:备受关注
attention:名词: 注意, 关注, 注意力, 关心, 意思, 意兴
capability:名词: 能力, 性能, 能, 能耐, 本领, 能量
applicable:形容词: 可适用, 可应用
degraded:降级的
observations:观察
perceptually:感性地
significant:形容词: 重大, 重要, 意义, 有意思
match:名词: 匹配, 比赛, 赛, 火柴, 对手, 球赛, 敌手, 亲 动词: 配, 相配, 匹, 比拟, 竞争, 相称, 抗衡, 媲, 敌, 相当, 逑, 当, 伦, 偩, 搭调
corresponding:形容词: 相应, 对应的, 通信的
transfer:动词: 转让, 传递, 转, 调, 移交, 转送, 调动, 过户, 划拨, 搬迁, 挪, 交待 名词: 移动, 转车, 换车, 搬动
is leveraged to:被用来
diversity:名词: 多样, 差异, 异样
illumination:名词: 照明, 光照, 照亮
adaptively:适应性地
fusethe:融合
a progressive manner:渐进的方式
coarse-to-fine:从粗到细
achieve:动词: 实现, 达到, 做到, 成就, 创, 办到, 臻, 蒇, 发迹, 收到
plausible performance:合理的表现
quantitative and qualitative evaluation:定量和定性评估
promising:形容词: 有为, 光明, 前途有望的, 有希望
介绍单词:
blind:名词: 盲, 盲人, 瞎子, 百叶窗, 眯, 瞽, 挡, 眛 形容词: 瞎, 瞍, 盲的, 盲目的 动词: 蒙蔽, 眯, 弄瞎, 黑暗, 眛, 压倒, 使 … 震惊
great practical value:极具实用价值
benefited from:受益于
carefully designed architecture:精心设计的架构
tend to :倾向于
plausible and acceptable:合理和可以接受
synthesize:动词: 合成, 综合
intractable:棘手的
reference:名词: 参考 动词: 参考
alleviate the dependency:减轻依赖性
frontal:形容词: 额的, 前的, 前面的 名词: 祭坛前面的, 罩巾
suffer from two drawbacks:遭受两个弊端
poses and expressions:姿势和表情
ativeability:可行性
specific scenarios:具体方案
the aforementioned difficulties:上述困难
diversity:名词: 多样, 差异, 异样
match:名词: 匹配, 比赛, 赛, 火柴, 对手, 球赛, 敌手, 亲 动词: 配, 相配, 匹, 比拟, 竞争, 相称, 抗衡, 媲, 敌, 相当, 逑, 当, 伦, 偩, 搭调
achieve superior performance:实现卓越的性能
contributions:贡献
相关工作单词:
pedestrian attribute:行人属性
semantic segmentation:语义分割
representative:名词: 代表, 众议员, 议员 形容词: 典型
estimation:名词: 预算, 意见, 意思
review :名词: 评论, 阅, 阅兵, 阅兵式, 述評 动词: 回顾, 评, 复习, 复审, 检阅, 评介, 温习, 再审, 习, 温
the adjacent relationship :相邻关系
ordinal :形容词: 顺序数 名词: 序数词
ingdifferentiable :不可微的
Furthermore:副词: 此外, 另外, 并且, 再者, 还有, 蒹之, 兼之
local regressors and gating networks:本地回归器和门控网络
Hybrid Aligned :混合对齐
complementary :形容词: 补充, 附加, 另外
label ambiguity:标签歧义
pose estimation:姿势估计
asemi-supervised adaptive:半监督自适应
tendencies :倾向
expectation regression:期望回归
alleviate the inconsistency :缓解不一致
assumptions :假设
the form of :的形式
leveraged :杠杆的
empirically:根据经验
characteristics :特征
guidance :名词: 领导, 遵
Agnostic :不可知论者