[paper] : https://arxiv.org/pdf/2002.12213.pdf
[github] : https://github.com/JWSoh/MZSR
Convolutional neural networks (CNNs) have shown dramatic improvements in single image super-resolution (SISR) by using large-scale external samples. Despite their remarkable performance based on the external dataset, they cannot exploit internal information within a specific image. Another problem is that they are applicable only to the specific condition of data that they are supervised. For instance, the low-resolution (LR) image should be a “bicubic” downsampled noise-free image from a highresolution (HR) one.
提出问题:传统的监督 SISR 算法
1. 不能利用特定图像中的内部信息;
2. 只适用于它们所监督的数据的特定条件。例如,低分辨率 (LR) 图像是高分辨率 (HR) 图像的“双三次”下采样无噪声图像。
To address both issues, zero-shot super-resolution (ZSSR) has been proposed for flexible internal learning. However, they require thousands of gradient updates, i.e., long inference time.
传统的解决方法及问题:零样本学习超分算法,需要成千上万的梯度更新,即推理时间长。
In this paper, we present Meta-Transfer Learning for Zero-Shot SuperResolution (MZSR), which leverages ZSSR. Precisely, it is based on finding a generic initial parameter that is suitable for internal learning. Thus, we can exploit both external and internal information, where one single gradient update can yield quite considerable results. (See Figure 1). With our method, the network can quickly adapt to a given image condition. In this respect, our method can be applied to a large spectrum of image conditions within a fast adaptation process.
本文的解决方法:Meta-Transfer Learning for ZSSR --> MZSR
准确地说,它是基于找到一个适合内部学习的通用初始参数。
因此,利用外部和内部信息,其中一个梯度更新可以产生相当可观的结果。(如图1所示)。
该网络可以快速适应给定的图像条件。该方法可以应用于大范围图像条件下的快速适应过程。
Figure 1: Super-resolved results (×2) of “img050” in Urban100 [14]. The blur kernel of the LR image is an isotropic Gaussian kernel with width 2.0. Result of (c) is fine-tuned from a pre-trained model. Our MZSR outperforms other methods within just one single gradient descent update.
从图1中,可以知道本文的最大贡献,是极大提高了零样本学习SISR算法的更新速度。
[14] 2015CVPR Single image super-resolution from transformed self-exemplars
照例,我们通过前言看故事。
SISR, which is to find a plausible HR image from its counterpart LR image, is a long-standing problem in lowl-evel vision area. Recently, the remarkable success of CNNs brought attention to the research community, and hence numerous CNN-based SISR methods have exhibited large performance leap [15, 17, 21, 47, 2, 45, 36, 20, 12, 13]. Most of the recent state-of-the-art (SotA) CNN-based methods are based on a large number of external training dataset and self-supervised settings with known degradation model, e.g., “bicubic” downsampling. Impressively, the recent SotA CNNs show significant PSNR gains compared to the conventional large size of models for the noise-free “bicubic” downsampling condition. However, in real-world situations, when the LR image has distant statistics in downsampling kernels and noises, the recent methods produce undesirable artifacts and show inferior results due to the domain gap. Moreover, their number of parameters and memory overheads are usually too large to be used in real applications.
第一段,讲了传统自监督深度学习 SISR 问题的不足:不适合真实 LR 图像的超分辨;参数量过高。
可以隐约得知,本文提出的方法适合真实 LR 图像超分辨,同时参数、内存不是很大。
Besides, non-local self-similarity in scale and across multi-scale, which is the internal recurrence of information within a single image, is one of the strong natural image priors. Therefore it has long been used in image restoration tasks, including image denoising [5, 6] and superresolution [24, 14]. Additionally, the powerful image prior of non-local property is embedded into network architecture [19, 22, 46] by implicitly learning such priors to boost the performance of the networks further. Also, some works to learn internal distribution have been proposed [34, 32, 33]. Moreover, there have been many studies to combine the advantages of external and internal information for image restoration [26, 43, 42, 41].
第二段讲了一个自然图像中强有力的先验:在尺度和跨多尺度上的非局部自相似,即信息在单一图像内部是会重复出现的。
并且,用先前的工作来说明,该先验知识对于图像处理任务的有效性,具体工作包括:
1. denoising [5, 6] and superresolution [24, 14];
2. 网络框架已经开始使用非局部先验 [19, 22, 46];
3. internal distribution [34, 32, 33].;
4. 结合 external and internal information [26, 43, 42, 41]。
写这段的原因是 ZSSR 是建立在该先验知识上的。
Recently, ZSSR [34] has been proposed for zero-shot super-resolution, which is based on the zero-shot setting to exploit the power of CNN but can be easily adapted to the test image condition. Interestingly, ZSSR learns the internal non-local structure of the test image, i.e., deep internal learning. Thus it outperforms external-based CNNs in some regions where the recurrences are salient. Also, ZSSR is highly flexible that it can address any blur kernels, and thus easily adapted to the conditions of test images.
第三段,本文是建立在 ZSSR 之上的,所以要介绍一下这个网络的优点。
一种用零样本学习的 SISR 方法,解决了传统方法只训练了单一图像条件下的超分辨。
该方法可以灵活地用于任何 blur kernels。
该方法主要是在那些重复出现的区域中表现出明显的效果。
[34] 2018CVPR “Zero-Shot” Super-Resolution using Deep Internal Learning
However, ZSSR has a few limitations. First, it requires thousands of backpropagation gradient updates at test time, which requires considerable time to get the result. Also, it cannot fully exploit the large-scale external dataset, and rather it depends only on internal structure and patterns, which lacks in the number of total examples. Eventually, this leads to inferior results in most of the regions with general patterns compared to the external-based methods.
第四段,既然是对 ZSSR 改进,那么 ZSSR 一定是有缺点的。表现在:
1. 需要在测试时进行数千次反向传播梯度更新,这需要相当长的时间才能得到结果;
2. 不能充分利用大规模的外部数据集,相反,它只依赖于内部结构和模式,而这种结构和模式在实例总数上是不足的;
On the other hand, meta-learning or learning to learn fast has recently attracted many researchers. Meta-learning aims to address a problem that artificial intelligence is hard to learn new concepts quickly with a few examples, unlike human intelligence. In this respect, meta-learning is jointly merged with few-shot learning, and many methods with this approach have been proposed [35, 39, 38, 28, 25, 8, 10, 18, 37]. Among them, Model-Agnostic Meta-Learning (MAML) [8] has shown great impact, showing SotA performance by learning the optimal initial state of the model such that the base-learner can fast adapt to a new task within a few gradient steps. MAML employs the gradient update as meta-learner, and the same author analyzed that gradient descent can approximate any learning algorithm [9]. Moreover, Sun et al. [37] have jointly utilized MAML with transfer learning to exploit large-scale data for few-shot learning.
第五段,Meta-learning 介绍。本文的工作是将 Meta-learning 和 ZSSR 结合,所以要介绍一下这个知识点。
Meta-learning 旨在解决人工智能难以通过少量例子而快速学习新概念的问题。因此,Meta-learning 与少样本学习是相结合的,如 [35, 39, 38, 28, 25, 8, 10, 18, 37] 这么多工作。
其中,MAML 表现不俗。这应该是本文要用的方法了。
个人对这些知识完全不懂,所以,需要的话,要看看文献 [8] 或者这么多 [35, 39, 38, 28, 25, 8, 10, 18, 37]。
[8] 2017ICLR Model-agnostic meta-learning for fast adaptation of deep networks
1. Inspired by the above-stated works and ZSSR, we present Meta-Transfer Learning for Zero-Shot SuperResolution (MZSR), which is kernel-agnostic. We found that simply employing transfer learning or fine-tuning from a pre-trained network does not yield plausible results.
2. As ZSSR only has a meta-test step, we additionally adopt a meta-training step to make the model adapt fast to new blur kernel scenarios. Additionally, we adopt transfer learning in advance to fully utilize external samples, further leveraging the performance.
3. In particular, transfer learning with the help of a large-scale synthetic dataset (“bicubic” degradation setting) is first performed for the external learning of natural image priors.
Then, meta-learning plays a role in learning task-level knowledge with different downsampling kernels as different tasks.
At the meta-test step, simple self-supervised learning is conducted to learn image-specific information within a few gradient steps.
4. As a result, we can exploit both external and internal information.
Also, by leveraging the advantages of ZSSR, we may use a lightweight network, which is flexible to different degradation conditions of LR images.
Furthermore, our method is much faster than ZSSR, i.e., it quickly adapts to new tasks within a few gradient steps, while ZSSR requires thousands of updates.
第六段,终于讲本文的工作了(前面涉及的东西是有点多)。这段有点长,我分解成了几个子段,便于理清思路:
1. 点题:并解释了用简单地使用迁移学习或微调,效果不明显。
2. 本文的核心策略:
1)ZSSR 只在测试时使用了 meta-test step;本文在训练时增加使用了 meta-training step ,能够使模型快速适应新的模糊内核场景;
2)采用迁移学习,为了充分利用外部示例。
3. 具体实施方法:
1)首先利用大型合成数据集 (“双三次”退化设置) 进行迁移学习,以进行自然图像先验的外部学习;
2)然后,meta-learning 在学习任务级知识(learning task-level knowledge)中发挥作用,不同的下采样内核作为不同的任务。
3)在 meta-test 步骤中,进行简单的自我监督学习,在几个梯度步骤内学习特定于图像的信息。因此,本文的方法可以利用外部和内部信息。
4. 本文方法的优点:
1)可以利用外部和内部信息;
2)通过利用 ZSSR 的优点,本文的方法是一个轻量级的网络;
3)通过利用 ZSSR 的优点,可以灵活地适应 LR 图像的不同退化条件。
4)本文的方法比 ZSSR 快得多。当 ZSSR 需要数上千次更新时,本文可以在几个渐变步骤内快速适应新任务。
In summary, our overall contribution is three-fold:
• We present a novel training scheme based on metatransfer learning, which learns an effective initial weight for fast adaptation to new tasks with the zeroshot unsupervised setting.
• By using external and internal samples, it is possible to leverage the advantages of both internal and external learning.
• Our method is fast, flexible, lightweight and unsupervised at meta-test time, hence, eventually can be applied to real-world scenarios.
最后一段,总结本文贡献。
SISR is based on the image degradation model as
, (1)
where , , , , , and denote HR, LR image, blur kernel, convolution, decimation with scaling factor of , and white Gaussian noise, respectively. It is notable that diverse degraded conditions can be found in real-world scenes, with various unknown , , and .
本段呢,就是告诉读者本文的数学符号表达方法。
In recent years, diverse meta-learning algorithms have been proposed. They can be categorized into three groups. The first group is metric based methods [35, 38, 39], which is to learn metric space in which learning is efficient within a few samples. The second group is memory networkbased methods [31, 28, 25], where the network learns across task knowledges and well generalizes to unseen tasks. The last group is optimization based methods, where gradient descent plays a role as a meta-learner optimization [10, 18, 9, 8]. Among them, MAML [8] has shown a great impact on the research community, and several variants have been proposed [27, 37, 3, 30]. MAML inherently requires second-order derivative terms, and the first-order algorithm has also been proposed in [27]. Also, to cope with the instability of MAML training, MAML++ [3] has been proposed. Moreover, MAML within embedded space has been proposed [30]. In this paper, we employ MAML scheme for fast adaptation of zero-shot super-resolution.
介绍 meta-learning。我也是第一次看到这个东西,如果可以的话,可以作为知识积累学习一下。
We introduce self-supervised zero-shot super-resolution and meta-learning schemes with notations, following related works [34, 8].
本文引入了具有符号的自监督零镜头超分辨率和元学习方案
- Zero-Shot Super-Resolution
ZSSR [34] is totally unsupervised or self-supervised. Two phases of training and test are both held in runtime. In training phase, the test image is down-sampled with desired kernel to generate “LR son” denoted as , and becomes the HR supervision, “HR father.” Then, the CNN is trained with the LR-HR pairs generated by a single image. The training solely depends on the test image, thus learns specific internal information to given image statistics. In the test phase, the trained CNN then works as a feed-forward network, and the test input image is fed to the CNN to get the super-resolved image .
本章是详细介绍本文算法的两个基础(本文的工作就是二者的结合稍加修改):ZSSR 和 Meta-Learning。
ZSSR :
ZSSR的核心思想是将给定的 LR 图像 降采样,得到 LR 的儿子 ,此时,LR 做了爸爸,即 的 ground truth。通过这一对图像进行训练得到的模型,直接对原 LR 图像 进行超分辨,得到 。
- Meta-Learning
Meta-learning has two phases: meta-training and meta-test. We consider a model , which is parameterized by , that maps inputs to outputs . The goal of meta-training is to make the model to be able to adapt to a large number of different tasks. A task is sampled from a task distribution for meta-training. Within a task, training samples are used to optimize the base-learner with a task-specific loss and test samples are used to optimize the meta-learner.
In meta-test phase, the model quickly adapts to a new task with the help of meta-learner. MAML [8] employs a simple gradient descent algorithm as the meta-learner and seeks to find an initial transferable point where a few gradient updates lead to a fast adaptation of the model to a new task.
In our case, the input and the output are and . Also, diverse blur kernels constitute the task distribution, where each task corresponds to the super-resolution of an image degraded by a specific blur kernel.
Meta-Learning:
Meta 学习有两个阶段:meta 培训和 meta 测试。
考虑一个模型 ,它是 参数化的输入 到输出 的映射。meta 训练的目标是使模型能够适应大量不同的任务。任务 从任务分布 中取样进行 meta 训练。在一个任务中,训练样本用于优化具有特定任务损失 的 meta-learner,测试样本用于优化元学习者。
在 meta 测试阶段,在 meta 学习者的帮助下,模型 能够快速适应新的任务 。MAML [8] 使用一个简单的梯度下降算法作为 meta-learner,并试图找到一个初始的可迁移点,其中一些梯度更新导致模型快速适应新的任务。
在本文的例子中,输入和输出是 和 。此外,不同的模糊内核构成了任务分布,其中每个任务对应于由特定模糊内核降级的图像的超分辨率。
Figure 2: The overall scheme of our proposed MZSR. During meta-transfer learning, the external dataset is used, where internal learning is done during meta-test time. From random initial point , large-scale dataset DIV2K [1] with “bicubic” degradation is exploited to obtain . Then, meta-transfer learning learns a good representation for super-resolution tasks with diverse blur kernel scenarios. The figure shows N tasks for simplicity. In the meta-test phase, self-supervision within a test image is exploited to train the model with corresponding blur kernel.
The overall scheme of our proposed MZSR is shown in Figure 2. As shown, our method consists of three steps: large-scale training, meta-transfer learning, and meta-test.
方法包括三个步骤:大规模训练、meta 迁移学习和 meta 测试。
This step is similar to the large-scale ImageNet [7] pretraining for object recognition. In our case, we adopt DIV2K [1] which is a high-quality dataset DHR. Using known “bicubic” degradation, we first synthesized large number of paired dataset (, ), denoted as D. Then, we trained the network to learn super-resolution of “bicubic” degradation model by minimizing the loss,
which is the pixel-wise L1 loss [21, 34] between prediction and the ground-truth.
The large-scale training has contributions within two respects. First, as super-resolution tasks share similar properties, it is possible to learn efficient representations that implicitly represent natural image priors of high-resolution images, thus making the network ease to be learned. Second, as MAML [8] is known to show some unstable training, we ease the training phase of meta-learning with the help of well pre-trained feature representations.
大规模训练其实就是在已有的 paired 数据集上进行预训练。数据集选择 DIV2K [1] ,损失函数选择 L1 Loss。
大规模培训作用表现在两个方面:
首先,由于超分辨率任务具有相似的属性,因此可以学习隐式地表示高分辨率图像的自然图像先验的有效表示,从而使网络易于学习;
其次,由于 MAML (Model-Agnostic Meta-Learning (MAML) [8] )显示了一些不稳定的训练,通过良好的预先训练的特征表示来简化元学习的训练阶段。
Since ZSSR is trained with the gradient descent algorithm, it is possible to introduce an optimization-based meta-training step with the help of gradient descent algorithm, which is proven to be a universal learning algorithm [9].
In this step, we seek to find a sensitive and transferable initial point of the parameter space where a few gradient updates lead to large performance improvements. Inspired by MAML, our algorithm mostly follows MAML but with several modifications.
Unlike MAML, we adopt different settings for metatraining and meta-test. In particular, we use the external dataset for meta-training, whereas internal learning is adopted for meta-test. This is because we intend our meta-learner to more focus on the kernel-agnostic property with the help of a large-scale external dataset.
这三段呢,就是说明了一下本文提出的 meta 训练的根本动机。
由于 ZSSR 是使用梯度下降算法进行训练的,因此可以借助梯度下降算法引入基于优化的 meta 训练步骤,该步骤已被证明是一种通用的学习算法[9]。
在这一步,试图找到一个敏感的和可迁移的参数空间的初始点,其中少量梯度更新即可导致较大的性能改善。受 MAML 的启发,本文的算法主要遵循 MAML,但是做了一些修改。
与MAML不同的是,本文的 meta 训练和 meta 测试采用了不同的设置。特别地,使用外部数据集进行 meta 训练,而使用内部学习进行 meta 测试。这是因为本文的根本目标是让 meta-learner 在大型外部数据集的帮助下更多地关注与内核无关的属性。
We synthesize dataset for meta-transfer learning, denoted as . consists of pairs, (, ), with diverse kernel settings. Specifically, we used isotropic and anisotropic Gaussian kernels for the blur kernels. We consider a kernel distribution , where each kernel is determined by a covariance matrix . it is chosen to have a random angle , and two random eigenvalues , where denotes the scaling factor. Precisely, the covariance matrix is expressed as
Eventually, we train our meta-learner based on . We may divide into two groups: for task-level training, and for task-level test.
这一段介绍了用于 meta 迁移学习的数据集 是怎么生成的。
1. 卷积核选择 各向同性和各向异性高斯核;
2. 核分布,每个核由协方差矩阵决定,如公式(3);其中,随机角度 ;两个随机特征值 , ;比例因子 ;
3. 分为训练数据集 和测试数据集 。
In our method, adaptation to a new task with respect to the parameters is one or more gradient descent updates. For one gradient update, new adapted parameters is then
where is the task-level learning rate. The model parameters are optimized to achieve minimal test error of with respect to . Concretely, the meta-objective is
Meta-transfer optimization is performed using Eq. 6, which is to learn the knowledge across task. Any gradientbased optimization can be used for meta-transfer training. For stochastic gradient descents, the parameter update rule is expressed as
where is the meta-learning rate.
在我们的方法中,适应新任务 的参数 是一个或多个梯度下降法更新。对于一个梯度更新,新调整参数 为公式(4)。
对模型参数 进行优化,使 试验误差达到最小。具体来说,meta 目标是公式(6)。
使用公式(6)进行 meta 迁移优化,即跨任务学习知识。任何基于梯度的优化都可以用于 meta 迁移训练。对于随机梯度下降,参数更新规则表示为公式(7)。
The meta-test step is exactly the zero-shot superresolution. As evidence in [34], this step enables our model to learn internal information within a single image. With a given LR image, we downsample it with corresponding downsampling kernel (kernel estimation algorithms [24, 29] can be adopted for blind scenario) to generate and perform a few gradient updates with respect to the model parameter using a single pair of “LR son” and a given image. Then, we feed a given LR image to the model to get a superresolved image.
meta 测试步骤正是零样本超分辨率。这一步使本文的模型能够学习单个图像中的内部信息。对于给定的 LR 图像,使用相应的下采样内核对其进行下采样(盲场景可以采用内核估计算法[24,29]),生成 ,并使用一对 “LR son” 和给定的图像对模型参数进行一些梯度更新。然后,将给定的 LR 图像提供给模型以获得超分辨图像。
Algorithm 1 demonstrates the process of our metatransfer training procedures of Section 4.1 and 4.2. Lines 3-7 is the large-scale training stage. Lines 11-14 is the inner loop of meta-transfer learning where the base-learner is updated to task-specific loss. Lines 15-16 presents the meta-learner optimization.
Algorithm 2 presents the meta-test step, which is the zero-shot super-resolution. A few gradient updates (n) are performed while meta-test, and the super-resolved image is obtained with final updated parameters.
算法1: meta 训练步骤
第3-7行是大规模的训练阶段。
第11-14行是 meta 迁移学习的内部循环。
第15-16行展示了 meta-learner 优化。
算法2:meta 测试步骤,即零样本超分辨
在 meta 测试过程中进行少量梯度更新(),得到最终更新参数的超分辨率图像。
(后续算法步骤需要再详细研究分析,实验部分后续补充。。。)