image compression 图像压缩
quantizer 量化器
rate–distortion performance率失真性能
a variant of 什么什么的一个变体
construct 构造
entropy 熵
discrete value 离散值
We describe an image compression method, consisting of a nonlinear analysis transformation, a uniform quantizer, and a nonlinear synthesis transformation. The transforms are constructed in three successive stages of convolutional linear filters and nonlinear activation functions. Unlike most convolutional neural net- works, the joint nonlinearity is chosen to implement a form of local gain control, inspired by those used to model biological neurons. Using a variant of stochastic gradient descent, we jointly optimize the entire model for rate–distortion performance over a database of training images, introducing a continuous proxy for the discontinuous loss function arising from the quantizer. Under certain conditions, the relaxed loss function may be interpreted as the log likelihood of a generative model, as implemented by a variational autoencoder. Unlike these models, however, the compression model must operate at any given point along the rate– distortion curve, as specified by a trade-off parameter. Across an independent set of test images, we find that the optimized method generally exhibits better rate–distortion performance than the standard JPEG and JPEG 2000 compression methods. More importantly, we observe a dramatic improvement in visual quality for all images at all bit rates, which is supported by objective quality estimates using MS-SSIM.
我们描述了一种图像压缩方法,包括非线性分析变换、均匀量化器和非线性合成变换。这些变换是在卷积线性滤波器和非线性激活函数的三个连续阶段中构造的。与大多数卷积神经网络不同,受用于模拟生物神经元的网络的启发,选择联合非线性来实现一种局部增益控制形式。使用随机梯度下降的变体,我们在训练图像数据库上联合优化整个模型的率失真性能,引入量化器产生的不连续损失函数的连续代理。在某些条件下,松弛损失函数可以解释为由变分自动编码器实现的生成模型的对数似然。然而,与这些模型不同的是,压缩模型必须在速率失真曲线上的任何给定点上运行,如权衡参数所指定。在一组独立的测试图像中,我们发现优化方法通常比标准 JPEG 和 JPEG 2000 压缩方法表现出更好的率失真性能。更重要的是,我们观察到所有比特率下所有图像的视觉质量都有显着改善,这得到了使用 MS-SSIM 的客观质量估计的支持
Data compression is a fundamental and well-studied problem in engineering, and is commonly formulated with the goal of designing codes for a given discrete data ensemble with minimal entropy (Shannon, 1948).The solution relies heavily on knowledge of the probabilistic structure of the data, and thus the problem is closely related to probabilistic source modeling. However, since all practical codes must have finite entropy, continuous-valued data (such as vectors of image pixel in- tensities) must be quantized to a finite set of discrete values, which introduces error. In this context, known as the lossy compression problem, one must trade off two competing costs: the entropy of the discretized representation (rate) and the error arising from the quantization (distortion).Different compression applications, such as data storage or transmission over limited-capacity channels, demand different rate–distortion trade-offs.
数据压缩是工程中一个基本且经过充分研究的问题,通常以为给定离散数据集合设计具有最小熵的代码为目标而制定(Shannon,1948)。该解决方案在很大程度上依赖于数据概率结构的知识,因此该问题与概率源建模密切相关。**然而,由于所有实际代码都必须具有有限的熵,因此连续值数据(例如图像像素强度的向量)必须量化为一组有限的离散值,这会引入误差。**在这种情况下,称为有损压缩问题,必须权衡两个相互竞争的成本:离散表示的熵(速率)和量化产生的误差(失真)。不同的压缩应用,例如数据存储或通过有限容量通道传输,需要不同的速率-失真权衡
Joint optimization of rate and distortion is difficult.Without further constraints, the general problem of optimal quantization in high-dimensional spaces is intractable (Gersho and Gray, 1992).For this reason, most existing image compression methods operate by linearly transforming the data vector into a suitable continuous-valued representation, quantizing its elements independently, and then encoding the resulting discrete representation using a lossless entropy code (Wintz, 1972; Netravali and Limb,1980).
速率和失真的联合优化很困难。如果没有进一步的约束,高维空间中最优量化的一般问题是棘手的(Gersho 和 Gray,1992)。因此,大多数现有的图像压缩方法通过将数据向量线性变换为合适的连续值表示,独立量化其元素,然后使用无损熵代码对所得离散表示进行编码(Wintz,1972;Netravali 和 Limb, 1980)。
This scheme is called transform coding due to the central role of the transformation.For example, JPEG uses a discrete cosine transform on blocks of pixels, and JPEG 2000 uses a multi-scale orthogonal wavelet decomposition. Typically, the three components of transform coding methods – transform, quantizer, and entropy code – are separately optimized (often through manual parameter adjustment).
由于变换的核心作用,该方案被称为变换编码。例如,JPEG 对像素块使用离散余弦变换,而 JPEG 2000 使用多尺度正交小波分解。通常,变换编码方法的三个组成部分——变换、量化器和熵代码——是分别优化的(通常通过手动参数调整)。
We have developed a framework for end-to-end optimization of an image compression model based on nonlinear transforms (figure 1).Previously, we demonstrated that a model consisting of linear– nonlinear block transformations, optimized for a measure of perceptual distortion, exhibited visually superior performance compared to a model optimized for mean squared error (MSE) (Ball ́e, La- parra, and Simoncelli,2016).Here, we optimize for MSE, but use a more flexible transforms built from cascades of linear convolutions and nonlinearities.Specifically, we use a generalized divisive normalization (GDN) joint nonlinearity that is inspired by models of neurons in biological visual systems, and has proven effective in Gaussianizing image densities (Ball ́e, Laparra, and Simoncelli, 2015).This cascaded transformation is followed by uniform scalar quantization (i.e., each element is rounded to the nearest integer), which effectively implements a parametric form of vector quan- tization on the original image space.The compressed image is reconstructed from these quantized values using an approximate parametric nonlinear inverse transform.
我们开发了一个基于非线性变换的图像压缩模型端到端优化框架(图 1)。之前,我们证明了由线性-非线性块变换组成的模型,针对感知失真的测量进行了优化,与针对均方误差(MSE)优化的模型(Ball ́e、Laparra 和 Simoncelli, 2016)。在这里,我们针对 MSE 进行优化,但使用由线性卷积和非线性级联构建的更灵活的变换。具体来说,我们使用广义除法归一化(GDN)联合非线性,其灵感来自生物视觉系统中的神经元模型,并已被证明在高斯化图像密度方面有效(Ball ́e、Laparra 和 Simoncelli,2015)。这种级联变换之后是均匀标量量化(即,每个元素都舍入到最接近的整数),这有效地在原始图像空间上实现了矢量量化的参数形式。使用近似参数非线性逆变换从这些量化值重建压缩图像。
For any desired point along the rate–distortion curve, the parameters of both analysis and synthesis transforms are jointly optimized using stochastic gradient descent.To achieve this in the presence of quantization (which produces zero gradients almost everywhere), we use a proxy loss function based on a continuous relaxation of the probability model, replacing the quantization step with additive uniform noise.The relaxed rate–distortion optimization problem bears some resemblance to those used to fit generative image models, and in particular variational autoencoders (Kingma and Welling, 2014; Rezende, Mohamed, and Wierstra, 2014), but differs in the constraints we impose to ensurethat it approximates the discrete problem all along the rate–distortion curve.Finally, rather than reporting differential or discrete entropy estimates, we implement an entropy code and report performance using actual bit rates, thus demonstrating the feasibility of our solution as a complete lossy compression method.
对于速率失真曲线上的任何所需点,使用随机梯度下降联合优化分析和综合变换的参数。为了在存在量化(几乎在任何地方产生零梯度)的情况下实现这一目标,我们使用基于概率模型的连续松弛的代理损失函数,用加性均匀噪声代替量化步骤。松弛率失真优化问题与用于拟合生成图像模型的问题有一些相似之处,特别是变分自动编码器(Kingma 和 Welling,2014 年;Rezende、Mohamed 和 Wierstra,2014 年),但不同之处在于我们为确保它近似于沿着率失真曲线的离散问题。最后,我们不是报告差分或离散熵估计,而是使用实际比特率实现熵代码并报告性能,从而证明了我们的解决方案作为完整有损压缩方法的可行性。
读不下去了…………根本看不懂
学习参考资料:(68条消息) 端到端的图像压缩------《End-to-end optimized image compression》笔记_gdn层_叶笙箫的博客-CSDN博客
整体算法分为三个部分:非线性分析变换(编码器),均匀量化器和非线性合成边变换(解码器)
x 与 x ^ \hat{x} x^分别代表输入的原图和经过编解码器后的重建图片。
g a g_a ga表示编码器提供的非线性分析变换,即由输入图片经过编码器网络后得到的潜在特征,通过量化器q 后,得到量化后结果: y ^ \hat{y} y^
再通过 g S g_S gS解码器重建图片结果.