jpg在线合并jpg_JPG如何运作

jpg在线合并jpg

by Colt McAnlis

通过Colt McAnlis

JPG如何运作 (How JPG Works)

The JPG file format was one of the most technologically impressive advancements to image compression to come on the scene in 1992. Since then, it’s been a dominant force in representation of photo quality images on the internet. And for good reason. Much of the technology behind how JPG works is exceptionally complex, and requires a firm understanding of how the human eye adjusts to the perception of colors and edges.

JPG文件格式是1992年出现的最先进的图像压缩技术之一。从那时起,它就成为Internet上代表照片质量图像的主导力量。 并且有充分的理由。 JPG的工作原理背后的许多技术异常复杂,需要对人眼如何适应颜色和边缘的感知有深刻的了解。

And since I’m into that kinda stuff (and you are too, if you’re reading this), I wanted to break down how JPG encoding works, so we can better understand how to make smaller JPG files.

而且由于我喜欢这种东西(如果您正在阅读本文,您也是如此),我想细分JPG编码的工作方式,以便我们更好地了解如何制作较小的JPG文件。

要点 (THE GIST)

The JPG compression scheme is broken down into several phases. The image below describes them at a high level, and we’ll walk through each phase below.

JPG压缩方案分为几个阶段。 下图从高层次描述了它们,我们将逐步介绍下面的每个阶段。

色彩空间转换 (Colorspace Conversion)

One of the key principles of lossy data compression, is that human sensors are not as accurate as computing systems. Scientifically, the human eye only has the physical ability to distinguish about 10 million different colors. However, there’s lots of things that can influence how the human eye perceives a color; perfectly highlighted with color illusions, or the fact that this dress broke the internet. The gist is that the human eye can be nicely manipulated with respect to the colors that it perceives.

有损数据压缩的关键原理之一是人体传感器不如计算系统准确。 从科学上讲,人眼仅具有分辨大约一千万种不同颜色的物理能力。 但是,有很多因素会影响人眼对颜色的感知方式。 色彩错觉完美突显出来,或者这件衣服打破了互联网的事实。 要点是,人眼可以感知到的颜色得到很好的控制。

Quantization is a form of this effect in lossy image compression, however JPG takes a different approach to this : color models. A color space is a specific organization of colors, and its color model represents the mathematical formula for how those colors are represented (e.g. triples in RGB, or quadruples in CMYK).

量化是这种效果在有损图像压缩中的一种形式,但是JPG采用了另一种方法: 颜色模型颜色空间颜色的特定组织,其颜色模型表示这些颜色表示方式的数学公式(例如RGB中的三倍或CMYK中的四倍)。

What’s powerful about this process is that you can convert from one color model, to another , meaning you can change the mathematical representation of a given color, with a completely different set of numerical values.

此过程的强大功能是,您可以从一种颜色模型转换为另一种颜色模型,这意味着您可以使用一组完全不同的数值来更改给定颜色的数学表示。

For example, below is a specific color, and it’s representation in RGB and CMYK color models, they are the same color to the human eye, but can be represented with a different set of numerical values.

例如,下面是一种特定的颜色,它以RGB和CMYK颜色模型表示,它们与人眼是相同的颜色,但是可以用一组不同的数值表示。

JPG converts from RGB to Y,Cb,Cr color model; Which comprises of Luminance (Y), Chroma Blue (Cb) and Chroma Red (Cr). The reason for this, is that psycho-visual experiments (aka how the brain works with info the eye sees) demonstrate that the human eye is more sensitive to luminance than chrominance, which means that we may neglect larger changes in the chrominance without affecting our perception of the image. As such, we can make aggressive changes to the CbCr channels before the human eye notices.

JPG从RGB转换为Y,Cb,Cr颜色模型; 其中包括亮度(Y),色度蓝(Cb)和色度红(Cr)。 原因是,心理视觉实验(也就是大脑如何利用眼睛看到的信息进行工作)表明,人眼对亮度的敏感度要高于色度 ,这意味着我们可以忽略色度的较大变化而不会影响色度。对图像的感知。 因此,我们可以在人眼察觉之前对CbCr通道进行积极的更改。

下采样 (Downsampling)

One of the interesting results of the YCbCr color space, is that the resulting Cb/Cr channels have less fine-grained details; they contain less information than the Y channel does.

YCbCr颜色空间有趣的结果之一是,生成的Cb / Cr通道具有较少的细粒度细节。 它们包含的信息少于Y频道。

As a result, the JPG algorithm resizes the Cb and Cr channels to be about ¼ their original size (note, there’s some nuance in how this is done that I’m not covering here…), which is called downsampling.

结果,JPG算法将Cb和Cr通道的大小调整为它们原始大小的1/4(请注意,在此方面有一些细微差别,我在此不做介绍……),这称为缩减采样

What’s important to note here is that downsampling is a lossy compression process ( you won’t be able to recover the exact source colors, but only a close approximation), but it’s overall impact on the visual components of the human visual cortex is minimal. Luma(Y) is where the interesting stuff is and since we’re only downsampling the CbCr channels, the impact on the visual system is low.

这里需要注意的重要一点是,下采样是有损的压缩过程(您将无法恢复确切的源颜色,而只能恢复近似值),但是对人类视觉皮层的视觉组件的总体影响很小。 Luma(Y)是有趣的地方,并且由于我们仅对CbCr通道进行下采样,因此对视觉系统的影响很小。

图像分为8x8像素块 (Image divided into 8x8 blocks of pixels)

From here on out, JPG does all operations on 8x8 blocks of pixels. This is done because we generally expect that there’s not a lot of variance over the 8x8 blocks, even in very complex photos, there tends to be some self similarity in local areas; this similarity is what we’ll take advantage of during our compression later.

从现在开始,JPG会在8x8像素块上执行所有操作。 这样做是因为我们通常期望8x8块之间没有太大差异,即使在非常复杂的照片中,局部区域也往往存在一些自我相似之处。 这种相似性是我们稍后压缩时将要利用的优势。

It’s worth noting that at this point, we’re introducing one of the first common “artifacts” of JPG encoding. “Color bleeding” is where colors along sharp edges can “bleed” onto the other side. This is because the chrominance channels, which express the color of pixels, have had each block of 4 pixels averaged into a single color, and some of these blocks cross the sharp edge.

值得注意的是,在这一点上,我们正在介绍JPG编码的第一个常见“工件”。 “颜色渗色”是指尖锐边缘的颜色可以“渗出”到另一侧。 这是因为表示像素颜色的色度通道已将每个4像素的块平均为一种颜色,并且其中一些块穿过了锐利的边缘。

离散余弦变换 (Discrete Cosine Transform)

Up to this point, things have been pretty tame. Colorspaces, downsampling, and blocking is simple stuff in the world of image compression. But now… now the real math shows up.

到现在为止,事情已经相当温和了。 在图像压缩领域,色彩空间,下采样和分块是很简单的事情。 但是现在……现在出现了真正的数学运算。

The key component of the DCT transform, is that it assumes that any numeric signal can be recreated using a combination of cosine functions.

DCT变换的关键组成部分是,它假定可以使用余弦函数的组合来重新创建任何数字信号。

For example, if we have this graph below:

例如,如果下面有此图:

You can see that it’s actually a sum of cos(x)+cos(2x)+cos(4x)

您可以看到它实际上是cos(x)+ cos(2x)+ cos(4x)的总和

Perhaps a better display of this, is the actual decoding of an image, given a series of cosine functions over a 2D space. To show this off, I present one of the most amazing GIFs on the internet: encoding of a 8x8 block of pixels using cosines in a 2D space:

考虑到二维空间上的一系列余弦函数,对此的更好显示可能是图像的实际解码。 为了证明这一点,我展示了互联网上最令人惊叹的GIF之一:在2D空间中使用余弦对8x8像素块进行编码:

What you’re watching here is the reconstruction of an image (leftmost panel). Each frame, we take a new basis value (right panel) and multiply it by a weight value (right panel text) to produce the contribution to the image (center panel).

您在这里看到的是图像的重建(最左侧的面板)。 在每一帧中,我们采用一个新的基准值(右侧面板),然后将其乘以权重值(右侧面板文本)以产生对图像的贡献(中央面板)。

As you can see, by summing various cosine values against a weight, we can reconstruct our original image (pretty well...)

如您所见,通过将各种余弦值与权重相加,我们可以重建原始图像(非常好...)

This is the fundamental background for how the Discrete Cosine Transform works. The idea is that any 8x8 block can be represented as a sum of weighted cosine transforms, at various frequencies. The trick with this whole thing, is figuring out what cosine inputs to use, and how they should be weighted together.

这是离散余弦变换如何工作的基本背景。 这个想法是, 任何 8x8块都可以表示为各种频率下的加权余弦变换之和。 整件事的窍门是弄清楚要使用哪些余弦输入,以及如何将它们加权在一起。

Turns out the “what cosines to use” problem is pretty easy; After a lot of testing, a set of cosine values were chosen to produce the best results, they are our basis functions and visualized in the image below.

事实证明,“ 使用什么余弦”问题非常简单; 经过大量测试,选择了一组余弦值以产生最佳结果,它们是我们的基本函数,并在下图中可视化。

As far as the “how they should be weighted together” problem, simply (HA!) apply this formula.

对于“如何将它们加权在一起”问题,只需简单地使用(HA!)这个公式即可。

I’ll spare you what all those values mean, you can look them up on the wikipedia page.

我将保留所有这些值的含义,您可以在Wikipedia页面上查找它们。

The basic result is that for an 8x8 block of pixels in each color channel, applying the above formula and basis functions will generate a new 8x8 matrix, which represents the weights to be used during reconstruction. Here’s a graphic of the process:

基本结果是,对于每个颜色通道中的8x8像素块,应用上述公式和基函数将生成一个新的8x8矩阵,该矩阵表示重建期间要使用的权重。 这是过程的图形:

This matrix, G, represents the basis weights to use to reconstruct the image (the small decimal value in the lower right side of the animation above). Basically, for each basis, we multiply it by the weight in this matrix, sum the whole thing together, and get our resulting image.

该矩阵G表示用于重建图像的基本权重(上方动画右下角的小十进制值)。 基本上,对于每个基础,我们将其乘以该矩阵中的权重,将整个事物相加在一起,得到最终的图像。

At this point, we’re no longer working in color spaces, but rather directly with the G Matrix (basis weights), all further compression is done on this matrix directly.

此时,我们不再在色彩空间中工作,而是直接使用G矩阵(基本权重),所有进一步的压缩都直接在此矩阵上完成。

The problem here though, is that we’ve now converted byte-aligned integer values into real numbers. Which effectively bloats our information (moving from 1 byte to 1 float (4 bytes)). To solve this, and start producing more significant compression, we move onto the quantization phase.

但是这里的问题是,我们现在已经将字节对齐的整数值转换为实数。 有效地膨胀了我们的信息(从1个字节移动到1个浮点(4个字节))。 为了解决这个问题,并开始产生更重要的压缩,我们进入量化阶段。

量化 (Quantization)

So, we don’t want to compress the floating point data. This would bloat our stream, and not be effective. To that end, we;d like to find a way to convert the weights-matrix back to values in the space of [0,255]. Directly, we could do this by finding the min/max value for the matrix (-415.38, and 77.13, respectively) and the dividing each number in this range to give us a value between [0,1] to which we multiply by 255 to get our final value.

因此,我们不想压缩浮点数据。 这将使我们的工作流程膨胀,并且无效。 为此,我们想找到一种方法来将权重矩阵转换回[0,255]空间中的值。 直接地,我们可以通过找到矩阵的最小/最大值(分别为-415.38和77.13)并将该范围内的每个数字相除,得到[0,1]之间的值,然后乘以255,来做到这一点获得我们的最终价值。

For example : [34.12- -415.38] / [77.13 — -415.38] *255= 232

例如:[34.12- -415.38] / [77.13 — -415.38] * 255 = 232

This works, but the tradeoff is a significant precision reduction. This scaling will produce an uneven distribution of values, the result of which is significant visual loss to the image.

这是可行的,但要权衡是大大降低了精度。 这种缩放将产生值的不均匀分布,其结果是图像的明显视觉损失。

Instead, the JPG takes a different route. Rather than using the range of values in the matrix as it’s scaling value, it instead, uses a precalculated matrix of quantization factors. These QFs don’t need to be part of the stream, rather they can be part of the codec itself.

而是,JPG采用不同的路线。 与其使用矩阵中的值范围作为缩放值,不如使用预先计算的量化因子矩阵。 这些QF不必成为流的一部分,而可以成为编解码器本身的一部分。

This example shows a commonly used matrix of quantization factors , one for each basis image,

本示例显示了一种常用的量化因子矩阵,每个基本图像一个,

We now use the Q and G matrices, to compute our quantized DCT coefficient matrix:

现在,我们使用Q和G矩阵来计算量化的DCT系数矩阵:

For example, using the G[0,0]=−415.37 and Q[0,0]=16 values:

例如,使用G [0,0] = − 415.37和Q [0,0] = 16值:

Resulting in a final matrix of :

最终矩阵为:

Observe how much simpler the matrix becomes — it now contains a large number of entries that are small or zero, making it much easier to compress.

观察矩阵变得多么简单-它现在包含大量的条目,这些条目很小或为零,从而使其更容易压缩。

As a quick aside, we apply this process to Y, CbCr channels independently, and as such we need two different matrices: one for Y, and the other for the C channels:

顺便说一句,我们将此过程分别应用于Y,CbCr通道,因此我们需要两种不同的矩阵:一种用于Y,另一种用于C通道:

Quantization compresses the image in two important ways: one, it limits the effective range of the weights, decreasing the number of bits required to represent them. Two, many of the weights become identical or zero, improving compression in the third step, entropy coding.

量化以两种重要方式压缩图像:一种是,它限制了权重的有效范围,减少了表示权重所需的位数。 两个权重中的许多权重变为相同或为零,从而在第三步(熵编码)中提高了压缩率。

As such quantization is the primary source of JPEG artifacts. Because the images in the lower-right tend to have the largest quantization divisors, JPEG artifacts will tend to resemble combinations of these images. The matrix of quantization factors can be directly controlled by altering the JPEG’s “quality level”, which scales its values up or down (we’ll cover that in a minute)

因此,量化是JPEG伪像的主要来源。 由于右下角的图像倾向于具有最大的量化因子,因此JPEG伪像将倾向于类似于这些图像的组合。 可以通过更改JPEG的“质量级别”来直接控制量化因子矩阵,此过程可以将其值向上或向下缩放(我们将在稍后介绍)

压缩 (Compression)

By now, we’re back in the world of integer values, and can move forward with applying a lossless compression stage to our blocks. When looking at our transformed data though, you should notice something interesting :

现在,我们回到了整数值的世界,并且可以通过对我们的块应用无损压缩阶段来向前迈进。 不过,在查看转换后的数据时,您应该注意到一些有趣的事情:

As you move from the upper left to the bottom right, the frequency of zeros increases. This looks like a prime-suspect for Run Length Encoding. But row-major and column-major orders are not ideal here, since that would interleave these runs of zeros, rather than packing them all together.

当您从左上方移动到右下方时,零的频率会增加。 这看起来像是运行长度编码的主要怀疑。 但是这里的行优先顺序和列优先顺序都不理想,因为这会交错这些零行程,而不是将它们打包在一起。

Instead, we start with the top-left corner and zig-zag in a diagonal pattern across the matrix, going back and forth until we reach the lower-right corner.

取而代之的是,我们从矩阵的对角线开始,以左上角和锯齿形来回移动,直到到达右下角为止。

The result of our luma matrix, in this order, becomes :

我们的亮度矩阵的结果按此顺序变为:

−26,−3,0,−3,−2,−6,2,−4,1,−3,1,1,5,1,2,−1,1,−1,2,0,0,0,0,0,-1,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

−26,−3,0,−3,−2,−6,2,−4,1,−3,1,1,5,1,2,−1,1,−1,2,0,0 ,0,0,0,-1,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

Once the data is in this format, the next steps are straightforward : execute RLE on the sequence, and then apply some statistical encoder (Huffman / Arithmetic / ANS) on the results.

一旦数据采用这种格式,接下来的步骤将很简单:对序列执行RLE,然后对结果应用一些统计编码器( 霍夫曼 / 算术 / ANS)。

And Boom. Your block is now JPG encoded.

和繁荣。 您的图块现在已被JPG编码。

了解质量参数 (Understanding the quality parameter)

Now that you understand how JPG files are actually created, it’s worth revisiting the concept of the quality parameter that you normally see when exporting JPG images from Photoshop (or whatnot).

现在您已经了解了JPG文件是如何实际创建的,值得重新审视从Photoshop(或其他方式)导出JPG图像时通常看到的质量参数的概念。

This parameter, which we’ll call q, is an integer from 1 to 100. You should think of q as being a measure of the quality of the image: higher values of q correspond to higher quality images and larger file sizes.

该参数称为q,它是1到100的整数。您应该将q视为衡量图像质量的指标:q的值越高,表示图像质量越高,文件大小越大。

This quality value is used during the quantization phase, to scale the quantization factors appropriately. So that per basis weight, the quantization step now resembles round(Gi,k / alpha*Qi,k)

在量化阶段使用此质量值来适当缩放量化因子。 因此,每基重的量化步骤现在类似于舍入(Gi,k / alpha * Qi,k)

Where the alpha symbol is created as a result of the quality parameter.

由于quality参数而创建的alpha符号的位置。

When either alpha or Q[x,y] is increased (remember that large values of alpha correspond to smaller values of the quality parameter q), more information is lost, and the file size decreases.

当增加alpha或Q [x,y](记住,较大的alpha值对应于质量参数q的较小值)时,会丢失更多信息,并且文件大小会减小

As such, if you want a smaller file, at the cost of more visual artifacts, you can set a lower quality value during the export phase.

因此,如果您想要一个较小的文件,而需要更多的视觉假象,则可以在导出阶段设置较低的质量值。

Notice above, in the lowest-quality image, how we see clear signs of the blocking stage, as well as the quantization stage.

请注意,在质量最低的图像中,我们如何看到清晰的信号,包括阻塞阶段以及量化阶段。

Probably most important, is that the quality parameter varies depending on the image. Since each image is unique, and presents different types of visual artifacts, the Q value will be unique as well.

可能最重要的是质量参数根据图像而变化 。 由于每个图像都是唯一的,并且呈现不同类型的视觉伪像,因此Q值也将是唯一的。

结论 (Conclusion)

Once you understand how the JPG algorithm works, a few things become apparent:

了解了JPG算法的工作原理后,就会发现一些事情:

  1. Getting the quality value right, per image, is important to find the tradeoff between visual quality and file size.

    为每个图像获取正确的质量值对于找到视觉质量和文件大小之间的折衷至关重要。
  2. Since this process is block-based, artifacts will tend to occur in blockyness, or “ringing”

    由于此过程是基于块的,因此伪影将倾向于出现块状或“振铃”
  3. Since processed blocks don’t intermingle with each other, JPG generally ignores the opportunity to compress large swaths of similar blocks together. Addressing that concern is something the WebP format is good at doing.

    由于已处理的块不会相互混合,因此JPG通常会忽略将大量类似块压缩在一起的机会。 WebP格式擅长解决此问题。

And if you want to play around with all this by yourself, all this madness can be boiled down to a ~1000 line file.

而且,如果您想自己玩弄所有这些事情,可以将所有这些疯狂归结为〜1000行文件 。

嘿! (HEY!)

Want to know how to make your JPG files smaller?

是否想知道如何缩小JPG文件 ?

Want to know how PNG files work, or how to make them smaller?

是否想知道PNG文件是如何工作的 ,或者如何使它们更小 ?

Want more data compression goodness? Buy my book!

是否需要更多的数据压缩优势? 买我的书 !

翻译自: https://www.freecodecamp.org/news/how-jpg-works-a4dbd2316f35/

jpg在线合并jpg

你可能感兴趣的:(python,机器学习,人工智能,深度学习,计算机视觉)