[paper] Image Restoration via Residual Non-local Attention Networks
[PyTorch] https://github.com/yulunzhang/RNAN
本篇文章语言非常好,值得细细细细读。
整体思路是,同时学习 local 和 non-local 注意力。这是一个非常重要的思想。
[Non-Local Attention 系列]
Non-local neural networks
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond [my CSDN]
Asymmetric Non-local Neural Networks for Semantic Segmentation [my CSDN]
Efficient Attention: Attention with Linear Complexities [my CSDN]
CCNet: Criss-Cross Attention for Semantic Segmentation [my CSDN]
Non-locally Enhanced Encoder-Decoder Network for Single Image De-raining [my CSDN]
Image Restoration via Residual Non-local Attention Networks [my CSDN]
目录
[2019ICLR] Image Restoration via Residual Non-local Attention Networks
Abstract
Residual Non-Local Attention Network For Image Restoration
Framework
Residual (Non-)Local Attention Block
In this paper, we propose a residual non-local attention network for high-quality image restoration.
这篇文章是干啥的。
Without considering the uneven distribution of information in the corrupted images, previous methods are restricted by local convolutional operation and equal treatment of spatial- and channel-wise features.
提出问题:在不考虑图像中信息分布不均匀的情况下,以往的方法受到局部卷积操作和平等处理空间特征、通道特征的限制。
To address this issue, we design local and non-local attention blocks to extract features that capture the long-range dependencies between pixels and pay more attention to the challenging parts.
一句话概况研究内容及功能。
Specifically, we design trunk branch and (non-)local mask branch in each (non-)local attention block. The is used to extract hierarchical features. Local and non-local mask branches aim to adaptively rescale these hierarchical features with mixed attentions. The local mask branch concentrates on more local structures with convolutional operations, while non-local attention considers more about long-range dependencies in the whole feature map. Furthermore, we propose residual local and non-local attention learning to train the very deep network, which further enhance the representation ability of the network.
细节介绍:
1)trunk branch:提取的分层特性;
2)Local and non-local mask branches:用混合注意力模型自适应地重新调整这些层次特征。
Local mask branch:使用卷积操作专注于局部结构特征
non-local mask branch:关注长距离依赖关系
3)residual local and non-local attention learning:训练非常深入的网络
Our proposed method can be generalized for various image restoration applications, such as image denoising, demosaicing, compression artifacts reduction, and super-resolution.
说明该网络的功能。(加分)
Experiments demonstrate that our method obtains comparable or better results compared with recently leading methods quantitatively and visually.
结论。
Introduction
Related Work
(持续更新。。。)
The first and last convolutional layers are shallow feature extractor and reconstruction layer respectively. We propose residual local and non-local attention blocks to extract hierarchical attention-aware features. In addition to making the main network learn degradation components, we further concentrate on more challenging areas by using local and non-local attention. We only incorporate residual non-local attention block in low-level and high-level feature space. This is mainly because a few non-local modules can well offer non-local ability to the network for image restoration.
整个 framework 中,第一和最后两个卷积层分别是:shallow feature extractor and reconstruction layer。
二者中间是一系列 local 或 non-local attention blocks。
non-local attention blocks 只在 low-level and high-level feature space。原因是:少量 non-local 模块可以很好地为网络提供非局部的图像恢复能力。(其实,还因为 non-local 计算量太大了。当然,作者很聪明,不是说作者不知道,而是不能说:其一,这是缺点不能说;其二,这不是本文要解决的核心内容,作者当然知道怎么简化,但最好是不要提,否则文章逻辑会因为内容太杂而混乱)。
To show the effectiveness of our RNAN, we choose to optimize the same loss function (e.g., L2 loss function) as previous works.
可见,作者对自己的模型多有信心,只用一个 L2 loss。
整体就这么简单,接下来,详细介绍每个细节.
Our residual non-local attention network is constructed by stacking several residual local and nonlocal attention blocks shown in Figure 2. Each attention block is divided into two parts: q residual blocks (RBs) in the beginning and end of attention block. Two branches in the middle part: trunk branch and mask branch. For non-local attention block, we incorporate non-local block (NLB) in the mask branch, resulting non-local attention. Then we give more details to those components.
这个 Residual (Non-)Local Attention Block 结构是:若干 RBs+两个支路+若干 RBs。
两个支路:trunk branch and mask branch。
是 local 还是 non-local,取决于根据 mask branch 的是否有 non-local block。
Trunk branch:
As shown in Figure 2, the trunk branch includes t residual blocks (RBs). Different from the original residual block in ResNet (He et al., 2016), we adopt the simplified RB from (Lim et al., 2017). The simplified RB (labelled with blue dashed) only consists of two convolutional layers and one ReLU (Nair & Hinton, 2010), omitting unnecessary components, such as maxpooling and batch normalization (Ioffe & Szegedy, 2015) layers. We find that such simplified RB not only contributes to image super-resolution (Lim et al., 2017), but also helps to construct very deep network for other image restoration tasks.
trunk branch 就是若干简化的 residual blocks。
所谓简化,就是去掉了 BN。只有卷积和 ReLU。不仅有助于图像的超分辨率,而且有助于为其他图像恢复任务构建非常深入的网络。
Figure 2: Residual (non-)local attention block. It mainly consists of trunk branch (labelled with gray dashed) and mask branch (labelled with red dashed). The trunk branch consists of t RBs. The mask branch is used to learning mixed attention maps in channel- and spatial-wise simultaneously.
Mask branch:
The key point in mask branch is how to grasp information of larger scope, namely larger receptive field size, so that it s possible to obtain more sophisticated attention map. One possible solution is to perform maxpooling several times, as used in (Wang et al., 2017) for image classification. However, more pixel-level accurate results are desired in image restoration. Maxpooling would lose lots of details of the image, resulting in bad performance. To alleviate such drawbacks, we choose to use large-stride convolution and deconvolution to enlarge receptive field size. Another way is considering non-local information across the whole inputs, which will be discussed in the next subsection.
mask branch 的关键是如何把握更大范围的信息,即更大的感受野大小,从而获得更复杂的注意图。
在图像恢复中,需要获得更精确的像素级结果。所以,Maxpooling 会丢失大量的细节的形象,导致不良的性能。
mask branch 结构:(NLB) + 若干 RB + 大步进卷积 + 若干 RB + 大步进去卷积 + 若干 RB + 1x1卷积 + sigmoid。
Non-Local Mixed Attention:
With non-local and local attention computation, feature maps in the mask branch are finally mapped by sigmoid function.
其实这节想说的是,mask branch 的结构可以理解为: 非局部(NLB)+ 局部(大步进卷积和去卷积) + sigmoid。这样,就将非局部和局部的 attention 给 mixed 一起了。
个人认为这是这篇文章最重要的思想。我在图像修复实验中,也用了类似结构,发现确实有效!
后面内容嘛,不算是本文最核心思想,所以等有时间在更新吧。。。
就酱吧。。。
(持续更新中)