Image Processing Transform Coding Using the Residual Quadtree (RQT)

In HEVC, each picture is divided into coding tree blocks (CTBs). A CTB is a square block and represents the root of a quadtree, i.e., the coding tree. The CTB size may range from 8×8 to 64×64 luma samples, but typically 64×64 is used. Each CTB can be further split into smaller square blocks called coding blocks (CBs). After the CTB is split recursively into CBs, each CB is further divided into prediction blocks (PBs) and transform blocks (TBs). The partitioning of the CBs into TBs is carried out recursively based on a quadtree approach. The corresponding structure, i.e. the residual quadtree (RQT), allows TB sizes from 4×4 up to 32×32 luma samples. The figure below shows an example where a CB includes 10 TBs, labeled with the letters a to j, and the corresponding block partitioning. The individual TBs are processed in alphabetical order, which follows a recursive Z-scan with depth-first traversal. The quadtree approach enables the adaptation of the transform to the varying space-frequency characteristics of the residual signal. Larger transform block sizes, which have larger spatial support, provide better frequency resolution. However, smaller transform block sizes, which have smaller spatial support, provide better spatial resolution. The trade-off between the two, spatial and frequency resolution, is chosen by the encoder control, for example based on Lagrangian optimization techniques.

Parameter Signaling

The RQT is defined by three parameters: the maximum depth of the tree, the minimum allowed transform size and the maximum allowed transform size. The minimum and maximum transform sizes can vary within the range from 4×4 to 32×32 samples, which correspond to the supported block transforms mentioned in the previous section. The maximum allowed depth of the RQT restricts the number of subdivisions. A maximum depth equal to zero means that a CB cannot be split any further and thus the associated CB contains only one TB.
All these parameters interact and influence the subdivision of the RQT. Consider a case, in which the root CB size is 64×64, the maximum depth is equal to zero and the maximum transform size is equal to 32×32. In this case, the CB has to be subdivided at least once, since otherwise it would lead to a 64×64 TB, which is not allowed. The RQT parameters, i.e. maximum RQT depth, minimum and maximum transform size, are transmitted in the bitstream at the sequence parameter set level. Regarding the RQT depth, different values can be specified and signaled for intra and inter coded CUs.

Fast Encoder Control

In order to determine the optimal partitioning of a CU into TUs, the encoder has to exhaustively evaluate all possible RQT structures, corresponding to all possible TU partitionings for the given CU. Since the number of possible RQT structures grows exponentially with the maximum allowed tree depth, the encoder complexity (e.g. runtime) required to obtain the optimal TU partitioning in terms of rate-distortion (RD) would be exponentially increased with increased RQT depth. This would limit application of the RQT approach in transform coding. Therefore, in addition to the exhaustive search as it is done by the HM reference encoder software, we developed a fast RQT encoder control limiting the number of possible candidates. This leads to a reduction of encoder runtime, which comes at the cost of a slightly inferior coding performance and is designed as follows.

The encoder starts at the RQT root, corresponding to the maximum possible TB size, and continues evaluation at the next RQT level, corresponding to the next smaller TB size, until either an early-termination criterion is fulfilled or the maximum allowed RQT depth is reached. For the early-termination criterion, it is checked whether all the absolute unquantized transform coefficients are below a certain threshold. If this is the case, then the evaluation stops at the current level, and smaller TB sizes are not taken into consideration. A QP-dependent threshold is used, which is higher for the smaller QP values and lower for the larger QP values, such that the reduction of encoder runtime in percentage is approximately the same for the whole QP range. For QP values below 24, the threshold is equal to 125% of the quantizer step size, 50% for QP values above 48, and for QP values in the range of 24 and 48, there is a linear transition between 50% and 125% of the quantizer step size.

References

  1. D. Marpe, H. Schwarz, S. Bosse, B. Bross, P. Helle, T. Hinz, H. Kirchhoffer, H. Lakshman, T. Nguyen, S. Oudin, M. Siekmann, K. Sühring, M. Winken, and T. Wiegand, "Video Compression Using Nested Quadtree Structures, Leaf Merging and Improved Techniques for Motion Representation and Entropy Coding," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 20, No. 12, pp. 1676-1687, Dec. 2010.
  2. M. Winken, P. Helle, D. Marpe, H. Schwarz, and T. Wiegand, "Transform Coding in the HEVC Test Model," 18th IEEE International Conference on Image Processing (ICIP), 2011, pp. 3693 – 3696.
  3. M. Siekmann, H. Schwarz, B. Bross, D. Marpe, and T. Wiegand, "Fast encoder control for RQT," JCTVC-E425, Mar. 2011.

你可能感兴趣的:(Image Processing Transform Coding Using the Residual Quadtree (RQT))