开篇先放张图片。虽然我之前没有了解过inpainting的任务,但是感觉能做得效果如此好,真的非常厉害了。至少人眼没那么容易区分出修补的部分
In this paper, we propose a Pyramid-context ENcoder Network (PEN-Net) for image inpainting by deep generative models.
we propose a pyramid-context encoder, which progressively learns region affinity by attention from a high-level semantic feature map and transfers the learned attention to the previous low-level feature map.
propose a multi-scale decoder with deeply-supervised pyramid losses and an adversarial loss. Such a design not only results in fast convergence in training, but more realistic results in testing.
contributions:
1、First, we adopt a U-Net structure as our backbone, which can encode the context from low-level pixels to high-level semantic features and decode the features back into an image. Specifically, we propose a Pyramidcontext ENcoder Network (PEN-Net) with three tailored key components, i.e., a pyramid-context encoder, a multiscale decoder, and an adversarial training loss, to boost the capacity (提高容量) of U-Net in image inpainting.
2、Second, once the compact(紧凑的) latent (潜在的) features have been encoded from images, the pyramid-context encoder fills regions from high-level semantic features to low-level features (with richer details) in a pyramid pathway before decoding。To this end, we propose an Attention Transfer Network (ATN) to learn region affinity (密切关系) between patches inside/outside missing regions in a high-level feature map, and then transfer (i.e., weighted copy by affinity) relevant (相关的)features from outside into inside regions of previous feature map with higher resolution.
3、Third, the proposed multi-scale decoder takes as input the reconstructed features from ATNs through skip connections and the latent features for final decoding. The PEN-Net is optimized by minimizing deeply-supervised pyramid L1 losses and an adversarial loss(感觉最近看的几篇论文好像都有对抗损失~~~一直以来对loss的接触较少,接下来会写一篇博客介绍一下loss).
PEN-Net 是以 U-Net网络为主干结构搭建的。根据观察,低层特征具有更丰富的纹理细节,高层特征具有更抽象的语义,高层特征可逐层次指导低层特征的补全,PEN-Net的核心是将高层特征图上通过注意力机制计算出的受损区域和未受损区域的区域相似度,应用于下一层低层特征图上的特征补全,补全后的特征图继续指导下一层特征图缺失区域的补全,直到最浅层的像素层。在这个过程中,网络进行了多次的不同层次的特征补全。最终,解码网络将补全后的特征以及具有高层语义的特征结合,生成最后的补全图像,使得补全图像不仅语义合理,补全内容还具有更清晰丰富的纹理细节。
As the compact latent features encode the semantics of the context(上下文的语义), the pyramid-context encoder can further improves the encoding effectiveness by filling missing regions from the compact latent feature to low-level features (with higher resolution and richer details).It fills holes by repeating using the proposed Attention Transfer Network (ATN) multiple times (according to the depth of the encoder) before decoding.
Specifically, an ATN learns region affinity (密切关系) between patches inside/outside missing regions from high-level semantic features, and the learned attention is transferred to fill regions (i.e., weighted copy from the context by affinity) in its previous feature map with higher resolution.
Multi-scale information is further aggregated to refine the filled features by four groups of dilated convolutions with different rates in an ATN.
Finally, the multi-scale decoder takes as input the reconstructed features from ATNs through skip connections and the latent features for decoding.
In order to improve the effectiveness of encoding, the pyramid-context encoder is proposed for filling missing regions before decoding.(采用金字塔上下文来提升encoding的有效性)Once a compact(紧凑的) latent (潜在的) feature is learned, the pyramid-context encoder fills regions from high-level semantic features to lowlevel features (with higher resolution) by repeating using the proposed ATNs in a pyramid fashion.
参考资料
https://mp.weixin.qq.com/s?__biz=MzIwMTE1NjQxMQ==&mid=2247487390&idx=2&sn=018c3010f122889a9c251ca63c9bfcdb&chksm=96f37dcaa184f4dccb3b2c46fedbd511bdf13bae714493143b106a372f19ccdccbf23e8a041e&mpshare=1&scene=1&srcid=&key=87f46a9190202d8fd565d515e753eadb248822f14ee8ab9c876dd33ff132b8119e75089f68078e8e1ef3afd7c2cf370e78ddb8697e0d91a0d1a5c78ea3064df6ba2362ed30bc546e6b2c12bbe0fea457&ascene=1&uin=MTIxMjY0NjM2Mw%3D%3D&devicetype=Windows+10&version=6206081a&lang=zh_CN&pass_ticket=YVN4sc2lJYwExywMrKrionebHQxTs8bB7CziKw7FoLaaLta4LYcjZUcQxRQ0T%2BXe