Xiao, Lei & Heide, Felix & Heidrich, Wolfgang & Schölkopf, Bernhard & Hirsch, Michael. (2017). Discriminative Transfer Learning for General Image Restoration. IEEE Transactions on Image Processing. PP. 10.1109/TIP.2018.2831925.
本文较长,可以直接跳到 related work
Recently, several discriminative learning approaches have been proposed for effective image restoration, achieving convincing trade-off between image quality and computational efficiency. However, these methods require separate training for each restoration task (e.g., denoising, deblurring, demosaicing) and problem condition (e.g., noise level of input images). This makes it time-consuming and difficult to encompass all tasks and conditions during training. In this paper, we propose a discriminative transfer learning method that incorporates formal proximal optimization and discriminative learning for general image restoration. The method requires a single-pass training and allows for reuse across various problems and conditions while achieving an efficiency comparable to previous discriminative approaches. Furthermore, after being trained, our model can be easily transferred to new likelihood terms to solve untrained tasks, or be combined with existing priors to further improve image restoration quality.
Low-level vision problems, such as denoising, deconvolution and demosaicing, have to be addressed as part of most imaging and vision systems. Although a large body of work covers these classical problems, low-level vision is still a very active area. The reason is that, from a Bayesian perspective, solving them as statistical estimation problems does not only rely on models for the likelihood (i.e. the reconstruction task), but also on natural image priors as a key component.
A variety of models for natural image statistics have been explored in the past. Traditionally, models for gradient statistics [27, 17], including total-variation, have been a popular choice. Another line of works explores patch-based image statistics, either as per-patch sparse model [11, 35] or modeling non-local similarity between patches [9, 10, 13]. These prior models are general in the sense that they can be applied for various likelihoods, with the image formation and noise setting as parameters. However, the resulting optimization problems are prohibitively expensive, rendering them impractical for many real-time tasks especially on mobile platforms.
Recently, a number of works [29, 8] have addressed this issue by truncating the iterative optimization and learning discriminative image priors, tailored to a specific reconstruction task (likelihood) and optimization approach.
While these methods allow to trade-off quality with the computational budget for a given application, the learned models are highly specialized to the image formation model and noise parameters, in contrast to optimization-based approaches. Since each individual problem instantiation requires costly learning and storing of the model coefficients, current proposals for learned models are impractical for vision applications with dynamically changing (often continuous) parameters. This is a common scenario in most realworld vision settings, as well as applications in engineering and scientific imaging that rely on the ability to rapidly prototype methods.
In this paper, we combine discriminative learning techniques with formal proximal optimization methods to learn generic models that can be truly transferred across problem domains while achieving comparable efficiency as previous discriminative approaches. Using proximal optimization methods [12, 23, 3] allows us to decouple the likelihood and prior which is key to learn such shared models. It also means that we can rely on well-researched physically motivated models for the likelihood, while learning priors from example data. We verify our technique using the same model for a variety of diverse low-level image reconstruction tasks and problem conditions, demonstrating the effectiveness and versatility of our approach. After training, our approach benefits from the proximal splitting techniques, and can be naturally transferred to new likelihood terms for untrained restoration tasks, or it can be combined with existing state-of-the-art priors to further improve the reconstruction quality. This is impossible with previous discriminative methods. In particular, we make the following contributions:
• We propose a discriminative transfer learning technique for general image restoration. It requires a single-pass training and transfers across different restoration tasks and problem conditions.
• We show that our approach is general by demonstrating its robustness for diverse low-level problems, such as denoising, deconvolution, inpainting, and for varying noise settings.
• We show that, while being general, our method achieves comparable computational efficiency as previous discriminative approaches, making it suitable for processing high-resolution images on mobile imaging systems.
• We show that our method can naturally be combined with existing likelihood terms and priors after being trained. This allows our method to process untrained restoration tasks and take advantage of previous successful work on image priors (e.g., color and non-local similarity priors)
Image restoration aims at computationally enhancing the quality of images by undoing the adverse effects of image degradation such as noise and blur. As a key area of image and signal processing it is an extremely well studied problem and a plethora of methods exists, see for example [22] for a recent survey. Through the successful application of machine learning and data-driven approaches, image restoration has seen revived interest and much progress in recent years. Broadly speaking, recently proposed methods can be grouped into three classes: classical approaches that make no explicit use of machine learning, generative approaches that aim at probabilistic models of undegraded natural images and discriminative approaches that try to learn a direct mapping from degraded to clean images. Unlike classical methods, methods belonging to the latter two classes depend on the availability of training data.
Classical models focus on local image statistics and aim at maintaining edges. Examples include total variation [27], bilateral filtering [32] and anisotropic diffusion models [34]. More recent methods exploit the non-local statistics of images [1, 9, 21, 10, 13, 31]. In particular the highly successful BM3D method [9] searches for similar patches within the same image and combines them through a collaborative filtering step.
Generative learning models seek to learn probabilistic models of undegraded natural images. A simple, yet powerful subclass include models that approximate the sparse gradient distribution of natural images [19, 17, 18]. More expressive generative models include the fields of experts (FoE) model [26], KSVD [11] and the EPLL model [35]. While both FoE and KVSD learn a set of filters whose responses are assumed to be sparse, EPLL models natural images through Gaussian Mixture Models. All of these models have in common that they are agnostic to the image restoration task, i.e. they are transferable to any image degradation and can be combined in a modular fashion with any likelihood and additional priors at test time.
一个简单而强大的子类包括近似自然图像的稀疏梯度分布的模型[19,17,18]。更具表现力的生成模型包括fields of experts (FoE)模型[26]、KSVD[11]和EPLL模型[35]。FoE和KVSD都学习了一组假设响应为稀疏的滤波器,而EPLL通过高斯混合模型对自然图像进行建模。所有这些模型的共同之处在于,它们对图像恢复任务是不可知的,也就是说,它们可以转移到任何图像退化中,并且可以在测试时以模块化的方式与任何可能性和附加先验相结合。
Discriminative learning models have recently become increasingly popular for image restoration due to their attractive tradeoff between high image restoration quality and efficiency at test time. Methods include trainable random field models such as cascaded shrinkage fields (CSF) [29], regression tree fields (RTF) [16], trainable nonlinear reaction diffusion (TRD) models [8], as well as deep convolutional networks [15] and other multi-layer perceptrons [4]. Discriminative approaches owe their computational efficiency at run-time to a particular feed-forward structure whose trainable parameters are optimized for a particulartask during training. Those learned parameters are then kept fixed at test-time resulting in a fixed computational cost. On the downside, discriminative models do not generalize across tasks and typically necessitate separate feed-forward architectures and separate training for each restoration task (denoising, demosaicing, deblurring, etc.) as well as every possible image degradation (noise level, Bayer pattern, blur kernel, etc.).
In this work, we propose the discriminative transfer learning technique that is able to combine the strengths of both generative and discriminative models: it maintains the flexibility of generative models, but at the same time enjoys the computational efficiency of discriminative models.
While in spirit our approach is akin to the recently proposed method of Rosenbaum and Weiss [25], who equipped the successful EPLL model with a discriminative prediction step, the key idea in our approach is to use proximal optimization techniques that allow the decoupling of likelihood and prior and therewith share the full advantages of a Bayesian generative modeling approach. Table 1 summarizes the properties of the most prominent state-of-the-art methods and puts our own proposed approach into perspective.
The seminal work of fields-of-experts (FoE) [26] generalizes the form of filter response based regularizers in the objective function given in Eq. 1. The vectors b and x represent the observed and latent (desired) image respectively, the matrix A is the sensing operator, Fi represents 2D convolution with filter fi, and φi represents the penalty function on corresponding filter responses Fix. The positive scalar λ controls the relative weight between the data fidelity (likelihood) and the regularization term.
Fields-of-experts(FoE)[26]的开创性工作概括了在Eq. 1给出的目标函数中基于滤波器响应的正则化器的形式。向量b和x代表观察到的和潜在的(理想的)图像分别矩阵A是感知算子 过滤器Fi代表二维卷积,φi代表相应的滤波器响应修正的罚函数。正标量λ控制之间的相对重量数据保真度(可能性)和正则化项。
While there are various types of restoration tasks (e.g.,denoising, deblurring, demosaicing) and problem parameters (e.g., noise level of input images), each problem has its own sensing matrix A and optimal fidelity weight λ. For example, A is an identity matrix for denoising, a convolution operator for deblurring, a binary diagonal matrix for demosaicing, and a random matrix for compressive sensing [5]. λ depends on both the task and its parameters in order to produce the best quality results.
The state-of-the-art discriminative learning methods (CSF[29], TRD[8]) derive an end-to-end feed-forward model from Eq. 1 for each specific restoration task, and train this model to map the degraded input images directly to the output. These methods have demonstrated a great tradeoff between high-quality and time-efficiency, however, as an inherent problem of the discriminative learning procedure, they require separate training for each restoration task and problem condition. Given the diversity of data likelihood of image restoration, this fundamental drawback of discriminative models makes it time-consuming and difficult to encompass all tasks and conditions during training.
目前最先进的判别学习方法(CSF[29], TRD[8])从Eq. 1中推导出每个具体恢复任务的端到端前馈模型,并训练该模型将退化的输入图像直接映射到输出。这些方法证明了在高质量和时间效率之间的巨大权衡,但是,作为区别学习过程的一个固有问题,它们需要对每个恢复任务和问题条件进行单独的训练。考虑到图像恢复的数据似然的多样性,判别模型的这一基本缺点使得训练过程中包含所有的任务和条件既耗时又困难。
这段主要是建立模型、使用HQS算法,重点是使用new splitting tragedy,产生先验近端算子Prior proximal operator和数据近端算子Data proximal operator两个重要概念。
本文的new splitting tragedy:
We observed that, while the data proximal operator in Eq. 4 is task-dependent because both the sensing matrix A and fidelity weight λ are problem-specific as explained in Sec. 3.1, the prior proximal-operator (i.e. zt-update step in Eq. 3) is independent of the original restoration tasks and problem conditions.
我们观察到,因为传感矩阵A和保真权重 λ 是特定的(解释在3.1节),数据近端算子Eq.4是任务依赖的,但是,Eq.3是独立于原始恢复任务和问题的条件的。
This leads to our main insight: Discriminative learned models can be made transferable by using them in place of the prior proximal operator, embedded in a proximal optimization algorithm. This allows us to generalize a single discriminative learned model to a very large class of problems, i.e. any linear inverse imaging problem, while simultaneously overcoming the need for problem-specific retraining. Moreover, it enables learning the task-dependent parameter λ in the data proximal operator for each problem in a single training pass, eliminating tedious hand-tuning at test time.
We also observed that, benefiting from our new splitting strategy, the prior proximal operator in Eq. 3 can be interpreted as a Gaussian denoiser on the intermediate image xtt 1 , since the least-squares consensus term is equivalent to a Gaussian denoising term. This inspires us to utilize existing discriminative models that have been successfully used for denoising (e.g. CSF, TRD).
我们还观察到,得益于我们新的分割策略,Eq. 3中的先验近端算子可以被解释为对中间图像 x t − 1 \ x^{t-1} xt−1的高斯去噪,因为最小二乘一致项等价于高斯去噪项。这启发我们利用现有的已成功用于去噪的判别模型(如CSF、TRD)。
实际上,为了更新先验近端算子(更新 z t \ z^t zt),只需要 x t − 1 \ x^{t-1} xt−1和
ρ t \rho^t ρt即可。
分析:就是用同一个先验近端算子由 x \ x x更新 z \ z z, 不同的数据近端算子 由 z \ z z更新 x \ x x来处理不同的任务。
Recurrent network. Note that in Fig. 1 each HQS iteration uses exactly the same model parameters, forming a recurrent network. This is in contrast to previous discriminative learning methods including CSF and TRD, which form feed-forward networks. Our recurrent network architecture maintains the convergence property of the proximal optimization algorithm (HQS), and is critical for our method to transfer between various tasks and problem conditions.
接下来详细介绍如何更新共享的先验近端算子部分(也就是如何更新 z \ z z)。注意,每次迭代 t \ t t,它的 z \ z z更新了 k \ k k个阶段:
先验近端算子部分,更新 z \ z z需要的模型参数 F \ F F, ψ \psi ψ,以及数据近端算子部分需要训练 λ \lambda λ,以及参数 ρ \rho ρ的选取:
算法流程图分析:在每一次迭代( t \ t t),首先更新共享的先验近端算子部分:将上一次迭代的 x t − 1 \ x^{t-1} xt−1传给 z 0 t \ z^t_0 z0t,然后使用公式6不断更新 z \ z z,得到 z k t \ z^t_k zkt。这里需要训练不同 k \ k k的模型参数 F \ F F, ψ \psi ψ。
然后使用最新得到的 z k t \ z^t_k zkt去更新数据近端算子部分(对于不同的任务,在第四小节有不同的算法):得到新的 x t \ x^t xt。这里要用到 λ \lambda λ和 ρ \rho ρ。 λ \lambda λ是训练的, ρ \rho ρ是人工选取的。
这段介绍不同任务下的数据近端算子是如何更新的(也就是算法流程图里第10步更新 x t \ x^t xt):
(1)去噪任务更新 x t \ x^t xt
(2)解卷积任务更新 x t \ x^t xt (用到了FFT,应该是为了避免当图像尺寸比较大的时候, A A A的尺寸也会比较大,直接做矩阵乘法运算较慢,内存大带来的一系列问题.)