Note: This text supercedes all previous material produced by the HyperMedia Unit, including Jennifer's book. A number of corrections have been made in this text.
The JPEG still image compression standard was specified by a committee called the "Joint Photographic Experts Group". (I would have thought that they would have the name "Joint Photographic Experts Committee" but maybe they thought JPEC didn't sound as good as JPEG.)
This group was set up way back in the eighties to develop a standard for encoding continuous grey scale and colour images. The JPEG ISO standard was more-or-less settled in 1991 with the aims of having 'state of the art' compression rates (circa 1991), to be useful for practically any kind of continuous tone image and be implemented on many CPU and hardware configurations.
To achieve this, the JPEG standard allows a number of modes of operation.
To dispense with the simpler scheme of encoding first, I'll cover lossless encoding now. This coding scheme is rarely used in practice. In fact I wouldn't at all be surprised if many JPEG decoders can't actually handle this method and I have yet to find a program which claims to encode using the lossless method.
Lossless encoding makes use of a relatively simple prediction and differencing method. This can give around 2:1 compression on source images between 2 and 16 bits per pixel. ie. 2 - 32768 levels of grey. If the image has multiple colour components (like RGB), then each component is encoded separately.
The encoding goes like this: source images are scanned sequentially left to right, top to bottom. The value of the current pixel is predicted from the values of the previous pixels. The difference between the predicted value and the actual value is what is encoded using Huffman or arithmetic methods.
Figure 1. Lossless mode prediction schemes.
There are 8 prediction schemes available to the encoder, although scheme 0 is reserved for the hierarchical progressive mode. Also, scheme 1 is always used for the very first scan line, and scheme 2 is used to find the first pixel on a new row. That's basically all there is to it.
The lossy compression scheme is where all the action happens for JPEGs. This scheme makes use of the discrete cosine transform to convert images and compresses the resulting DCT coefficients (recall the module on Spatial and Spectral encoding). The lossy compression scheme can vary the level of compression ratio used, giving control over the final image quality (and file size, roughly). The amount of compression JPEG can supply depends almost entirely on the source image. Best results come from using images with few high frequency details.
The DCT is applied to 8x8 pixel blocks, this size being selected as a trade-off between computational complexity, compression speed and quality. Two methods are used in compression; the coefficients are quantised, and are then Huffman or arithmetically compressed. The quantising of the coefficients is where the lossy part of the sequence, where high frequency information is discarded.
The steps in DCT encoding an image can be loosely broken up into 9 steps.
Figure 2. JPEG lossy compression steps 1 and 2.
Each component of colour images is compressed independently, as with the lossless compression scheme. RGB colour space is not the most efficient way to JPEG compress images, as it is particularly susceptible to colour changes due to quantisation.
Colour images are converted from RGB components into YCbCr colour space, which consists of the luminance (greyscale) and two chroma (colour) components. Note that although some literature (including some previously produced by the HyperMedia Unit) states that images are converted into YUV components - or is just very hazy about what format the image is converted to - it is not YUV exactly. However, the YCbCr colour space is simply a lightened, gama adjusted version of YUV colour space. These equations are (to 1 decimal place):
Why these equations have all these strange numbers in them is due to arcane television display considerations. Both YUC and YCbCr colour space are derived from the colour coding method used when broadcasting colour television signals. When colour was added to TV transmissions, they had to be compatible with existing black and white TV sets. What they did was only send the extra chroma information to add colour, based on the existing brightness (luminance) information.
Figure 3. RGB colour cube.
The Y component contains the luminance information of the images and is in effect a greyscale version of the image. On an RGB colour cube, this would correspond to a line running from (0,0,0) to (1,1,1). Cb and Cr are perpendicular colour planes, approximately green minus magenta and blue minus yellow. Note that the weightings in the YUC equation above gives most detail to the green component and least to the blue. This reflects the fact that the human eye is more susceptible to variations in the colour green and less to blue.
Once the image has been converted into YCbCr colour space (if required), the Cb and Cr components are downsampled by a factor of 2. We can do this because the human eye gets more detail from the luminance information, than the chrominance. This is readily apparent from step 2 of figure 2. Note how the Cr and Cb components have much less contrast (and thus less information) than the luminance component.
This downsampling gives an immediate 50% reduction in the size of the file - which explains why colour images always seem smaller than greyscale images when JPEG compressed. Downsampling ratios are expressed in the usual manner. ie. 4:1:1 means that the U and V components have been subsampled 4 times (into 4x4 pixel blocks).
Figure 4. Downsampling U & V components (of YUV) vs. R & B components (of RGB).
The human eye is more susceptible to change in luminance (grey levels) than change in chrominance (colours). As most of the details the human eye picks up are stored in the Y component, more error can be tolerated in the CbCr colour components.
Figure 5. JPEG lossy compression steps 3 and 4.
This step seems fairly obvious. Each of the three YCbCr colour planes are encoded separately but using the same scheme. Eight by eight pixel blocks (64 pixels in total) were chosen as the size for computational simplicity. Other block sizes have been used for other encoding schemes. For example, the Cinepak codec used by Apple's Quicktime software uses 4x4 blocks to give very quick (but often poorer quality) DCT encoding and decoding. Older professional video capture cards made by Radius use 16x16 block because they can supply the extra processing power required. If the size of the image is not a factor of 8, it is padded out to the required size, with extra space being added on the left and bottom of the image. When the image is decoded, these pad pixels are chopped off.
The 64 data points are DCT encoded using the equation:
The coefficient values can be converted from a floating value to an integer at this point, but it's probably more efficient to do it after the coefficients are scaled and the quantisation stage (step 7).
Figure 6. JPEG lossy compression steps 5 and 6.
The coefficients are unwrapped in the infamous 'zig-zag' pattern which just converts the 8x8 array into a single 64 element array with the coefficients of most significance occurring first.
These coefficients are then scaled - except for the DC component which is the scaled average intensity value of the entire image (or YUV plane, in the case of a colour image). The quantisation factors used in scaling the coefficients is not actually dictated in the JPEG specification. The encoding algorithm can actually use whatever scale factors it likes, as these are included with the final data. The JPEG standard has a generic set of quantisation factors which are often used.
Quantisation achieves two goals:
Figure 7. JPEG lossy compression steps 7 to 10.
Next, coefficients which have been scaled to near zero values are actually given zero values, in the hope that there will be many zero coefficients and therefore compress well. This is the second lossy part of the compression process, where we deliberately and irretrievably loose information.
The major lossy part of the JPEG process is quantisation, where each coefficient is divided by its own scale factor. The larger this number is, the more compression will be applied to the image data. This step is where the intended quality of the final image can be specified. Although it is possible to calculation and encode custom quantisation tables, in practice the default ISO JPEG tables are used most of the time.
The resulting quantised integer coefficients are then Huffman encoded to squeeze that extra bit of compression out of the image. Huffman compression is lossless of course. Header information is added tot he data, including the quantisation factor, the scale factors and the Huffman tables. Do a bit of formatting and out squirts a JPEG file!
The JPEG compression scheme can be described as asymmetrical, in that the method used for compression, when reversed, is the method for decompression.
The main difference in decoding is the use of the Inverse Discrete Cosine Transform, which is:
There is a progressive mode available as part of the JPEG standard which, like interlaced GIF images, allows a quick 'preview' of the image to be viewed. The standard JPEG image data is arranged with DC components and 8x8 DCT coefficient blocks running left to right, top to bottom through the image.
The progressive mode allows the DC components to be sent first, followed by the DCT coefficients in a low- frequency to high-frequency order. This enables the decoder to reproduce a low quality version of the image quickly, before successive (higher frequency) coefficients and received and decoded. The image data can be organised into two or more 'strips' of DCT information, resulting in two or more possible preview images.
Figure 8. Two step progressive JPEG decoding.
The principle advantage of this mode is the ability to quickly view a low quality version of the image. The Disadvantages are:
There are some questions you can try to answer on JPEG encoding.
How did you find this 'lecture'? Was it too hard? Too easy? Were there something in particular you would like graphically illustrated? If you have any sensible comments, email me and let me know.
References:
JPEG Still Image Compression Standard.
William B. Pennebaker, Joan L. Mitchaell.
1993 Van Nostrand Reinhold.
The JPEG still picture compression standard.
Communications of the ACM.
G. K. Wallace.
1991 ACM.
Video Demystified 2nd ed.
Keith Jack.
1996 HighText Interactive.