The MPEG compression technique is described here as long it is necessary to understand the Java
implementation problems.
A MPEG "film" is a sequence of three kinds of frames:
The I-frames are intra coded , i.e. they can be reconstructed without any reference to other frames. The P-frames are forward predicted from the last I-frame or P-frame , i.e. it is impossible to reconstruct them without the data of another frame (I or P). The B-frames are both, forward predicted and backward predicted from the last/next I-frame or P-frame , i.e. there are two other frames necessary to reconstruct them. P-frames and B-frames are referred to as inter coded frames. That means a |
As an example the frame sequence above is transfered in the following order: I P B B B P B B B . The only task of the decoder is to reorder the reconstructed frames. To support this an ascending frame number comes with each frame (modulo 1024).
Imagine an I-frame showing a triangle on white background! A following P-frame shows the same triangle but at another position. Prediction means to supply a motion vector which declares how to move the triangle on I-frame to obtain the triangle in P-frame. This motion vector is part of the MPEG stream and it is divided in a horizontal and a vertical part. These parts can be positive or negative. A positive value means motion to the right or motion downwards , respectively. A negative value means motion to the left or motion upwards , respectively. The parts of the motion vector are in an range of -64 ... +63. So the referred area can be up to 64x64 pixels away. |
But this model assumes that every change between frames can be expressed as a simple displacement of pixels. But the figure to the right shows this isn't true. The red rectangle is shifted and rotated by 5° to the right. So a simple displacement of the red rectangle will cause a prediction error . Therefore the MPEG stream contains a matrix for compensating this prediction error. |
Thus, the reconstruction of inter coded frames goes ahead in two steps:
Note that the prediction error compensation requires less bytes than the whole frame because the white parts are zero and can be discarded from MPEG stream. Furthermore the DCT compression (see later in this chapter) is applied to the prediction error which decreases its memory size.
Note also the different meanings of the two + - signs. The first means adding the motion vector to the x-, y- coordinates of each pixel . The second means adding an error value to the color value of the appropriate pixel .
The motion vector isn't valid for the whole frame. Instead of this the frame is divided into macro blocks of 16x16 pixels. Every macro block has its own motion vector. Of course, this does not avoid contradictory motion but it minimizes its probability. And if contradictory motion occurs? One of the greatest misunderstandings of the MPEG compression technique is to assume that all macro blocks of P-frames are predicted. If the prediction error is to big the coder can decide to intra-code a macro block . Similarly the macro blocks in B-frames can be forward predicted or backward predicted or forward and backward predicted or intra-coded. |
Every macro block contains 4 luminance blocks and 2 chrominance blocks . Every block has a dimension of 8x8 values. The luminance blocks contain information of the brightness of every pixel in macro block . The chrominance blocks contain color information. Because of some properties of the human eye it isn't necessary to give color information for every pixel. Instead 4 pixels are related to one color value. This color value is divided into two parts. The first is in Cb color block the second is in Cr color block. The color information is to be applied as shown in the picture to the left. Depending on the kind of macro block the blocks contain pixel information or prediction error information as mentioned above. In any case the information is compressed using the discrete cosine transform (DCT). |
As mentioned above the 8x8 block values are coded by means of the discrete cosine transform . To explain this regard the figure to the right. This shall be an enlargement of an 8x8 area of a black-white frame. |
|
The normal way is to determine the brightness of each of the 64 pixels and to scale them to some limits, say from 0 to 255* , whereby "0" means "black" and "255" means "white". We can also represent the values in form of an 8x8 bar diagram. Normally the values are processed line by line. That requires 64 byte of storage. |
* (In MPEG a range from -256 to 255 is used.)
But you can define all the 64 values by only 5 integers if you apply the following formula called discrete cosine transform (DCT)
Where f(x,y)
is the brightness of the pixel at position [x,y]
. The result is F
an 8x8 array, too:
|
But as you can see: almost all values are equal to zero. Because the non-zero values are concentrated at the upper left corner the matrix is transfered to the receiver in zigzag scan order. That would result in: 700 90 90 -89 0 100 0 0 0 .... 0 Of course, the zeros are not transferred. An End-Of-Block sign is coded instead. |
The decoder can reconstruct the pixel values by the following formula called inverse discrete cosine transform (IDCT):
Where F(u,v)
is the transform matrix value at position [u,v]
. The results are exactly the original pixel values. Therefore the MPEG compression could be regarded as loss-less. But that isn't true, because the transformed values are quantized. That means they are (integer) divided by a certain value greater or equal 8 because the DCT supplies values up to 2047. To reduce them under the byte length at least the quantization value 8 is applied. The decoder multiplies the result by the same value. Of course the result differs from the original value. But again because of some properties of the human eye the error isn't visible. In MPEG there is a quantization matrix which defines a different quantization value for every transform value depending on its position.
No, it wasn't! The DCT always tends to compute zeros. This effect is assisted by the quantization which zeros small values. To understand this one must recognize the essence of the DCT. To do this let's begin from the opposite side.
Let us apply the IDCT to a matrix only containing one value of 700 at the upper left corner: | The result of the IDCT is | And the bar diagram looks like | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Of course, the picture is an grey colored square. The value at the upper left corner is called the DC value . This is the abbreviation for direct current and refers to a similar phenomenon in the theory of alternating current where an alternating current can have a direct component. In DCT the DC value determines the average brightness in block. All other values describe the variation around this DC value. Therefore they are sometimes referred as to AC values (from "alternating current").
Now let's add an AC value of 100 |
The result of the IDCT is |
And the bar diagram looks like | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
The resulting picture looks like As you can see the values vary around the DC value of 87. Furthermore if you regard the shape of the bar diagram you'll see a curve like a half cosine line. It is said the picture has a frequency of 1 in X-direction. Imagine a car that drives with constant speed from left to right along the "bar diagram street" parallel to X-Axis. In contrast to the DC example the car is shaked with a certain frequency but only if it follows the X-Axis. In Y direction it moves along the same height during the whole way. This behaviour is documented in the transform matrix because the only AC value of 100 appears in X direction.
Now let's consider what happens if we place the AC value of 100 at the next position |
The result of the IDCT is |
And the bar diagram looks like | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
The resulting picture looks like . The shape of the bar diagram shows a cosine line, too. But now we see a full period, i.e. the frequency is as twice as high as in the first example. This behaviour would continue if we replace the AC-value step by step to the right. Every step increases the frequency of the cosine wave.
But what happens if we place both AC values |
The result of the IDCT is |
And the bar diagram looks like | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
The resulting picture looks like .Regarding the shape of the bar diagram you can see a mix of both, the first and the second cosine wave. Indeed, the resulting AC value is simply an addition of the cosine lines.
Now let's add a AC value at the other direction |
The result of the IDCT is |
And the bar diagram looks like | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
The resulting picture looks like Now the values vary in Y direction, too. The principle is: The higher the index of the AC value the greater is the frequency.
Now as a last example let's place an AC value at the opposite side of the DC value. We already know what it means: The highest possible frequency of 8 is applied in both, the X- and the Y- direction.
What is to be expected? | The result of the IDCT is |
And the bar diagram looks like | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Because of the high frequency the neighbouring values differ numerously. The picture shows a checker-like appearance . Note that this shall be a 8x8 pixel enlargement in a real picture! How often does it happen? We can hope that such a case is very seldom. And that's why the DCT computes in almost every case zeros for the higher frequencies.