HEVC核心编码技术之一.图像的编码划分

Overview of the High Efficiency Video Coding(HEVC) Standard之四


IV. HEVC的编码技术
HEVC Video Coding Techniques


As in all prior ITU-T and ISO/IEC JTC 1 video coding standards since H.261 [2], 
the HEVC design follows the classic block-based hybrid video coding approach 
(as depicted in Fig. 1). The basic source-coding algorithm is a hybrid
of interpicture prediction to exploit temporal statistical dependences,
intrapicture prediction to exploit spatial statistical dependences, and transform 
coding of the prediction residual signals to further exploit spatial statistical 
dependences. There is no single coding element in the HEVC design that provides
the majority of its significant improvement in compression efficiency in relation 
to prior video coding standards. It is, rather, a plurality of smaller improvements 
that add up to the significant gain.
和从H.261以来的视频编码标准一样,HEVC的设计沿用了经典的基于块的混合视频编码方式(如图1所示)。
基本的信源编码算法是对时域统计相关性使用帧间预测,对空域统计相关性使用帧内预测,
再对预测残差信号使得变换编码进一步去除空间统计相关性。
HEVC取得比以前的标准更好的压缩效果,而且以很小的改善就取得了很大的收获。


HEVC核心编码技术之一.图像的编码划分_第1张图片

Fig. 1. Typical HEVC video encoder (with decoder modeling elements shaded in light gray).


A. 图像的像素表示 
Sampled Representation of Pictures


For representing color video signals, HEVC typically uses a tristimulus YCbCr color 
space with 4:2:0 sampling (although extension to other sampling formats is 
straightforward, and is planned to be defined in a subsequent version). This separates 
a color representation into three components called Y, Cb, and Cr. The Y component is 
also called luma, and represents brightness. The two chroma components Cb and Cr represent
the extent to which the color deviates from gray toward blue and red, respectively. 
Because the human visual system is more sensitive to luma than chroma, the 4:2:0 sampling 
structure is typically used, in which each chroma component has one fourth of the number 
of samples of the luma component (half the number of samples in both the horizontal 
and vertical dimensions). Each sample for each component is typically represented with 8 
or 10 b of precision, and the 8-b case is the more typical one. In the remainder of 
this paper, we focus our attention on the typical use: YCbCr components with 4:2:0
sampling and 8 b per sample for the representation of the encoded input and decoded 
output video signal.
为了表示色彩视频信号,HEVC使用了典型的三色色彩空间YCbCr, 以及4:2:0像素样本。
Y表示亮度,
Cb,Cr为色度,分别表示绿,红。
每个像素是8比特或10比特表示;


The video pictures are typically progressively sampled with rectangular picture sizes W×H, 
where W is the width and H is the height of the picture in terms of luma samples.
Each chroma component array, with 4:2:0 sampling, is then W/2×H/2. Given such a video 
signal, the HEVC syntax partitions the pictures further as described follows.
视频图像通常是逐行扫描的,图像尺寸为WxH,
W和H分别为高度像素的宽度和高度;
在4:2:0像素样本中,每个色度分量的宽度和高度分别为W/2和H/2。

B. 将图像分割成编码树单元(CTU) 
Division of the Picture into Coding Tree Units

A picture is partitioned into coding tree units (CTUs), which each contain luma CTBs 
and chroma CTBs. A luma CTB covers a rectangular picture area of L×L samples of the luma
component and the corresponding chroma CTBs cover each L/2×L/2 samples of each of 
the two chroma components. The value of L may be equal to 16, 32, or 64 as determined by
an encoded syntax element specified in the SPS. Compared with the traditional macroblock 
using a fixed array size of 16×16 luma samples, as used by all previous ITU-T and
ISO/IEC JTC 1 video coding standards since H.261 (that was standardized in 1990), 
HEVC supports variable-size CTBs selected according to needs of encoders in terms of 
memory and computational requirements. The support of larger CTBs than in previous 
standards is particularly beneficial when encoding high-resolution video content. 
The luma CTB and the two chroma CTBs together with the associated syntax form
a CTU. The CTU is the basic processing unit used in the standard to specify the decoding 
process.
图像被分割成编码树单元(CTU), 每个CTU包含亮度CTB和色度CTB;
每个高度CTB表示图像中的LxL亮度像素块区域,对应的两个色度CTB的大小为L/2xL/2;
L的大小可以是16,32,64,并在SPS的语法元素中声明;
相对于传统的16x16大小的宏块,HEVC对可变尺寸的CTB选择需要依据编码端的内存和计算资源来确定;
大尺寸的CTB支持,相对于之前的标准,在编码高分辨率视频内容时非常有效果;
CTU由一个亮度CTB,两个色度CTB以及相关语法组成;
CTU是解码处理的基本处理单元;

C. 将CTB分割成CB
Division of the CTB into CBs

The blocks specified as luma and chroma CTBs can be directly used as CBs or can be 
further partitioned into multiple CBs. Partitioning is achieved using tree structures. 
The tree partitioning in HEVC is generally applied simultaneously to both luma and chroma, 
although exceptions apply when certain minimum sizes are reached for chroma.
亮度CTB和色度CTB可以直接当作CB使用;
也可以更一步划分成多个CB;
划分是以树结构的方式实现;
HEVC的树划分方式通常是同时用于亮度和色度;


The CTU contains a quadtree syntax that allows for splitting the CBs to a selected 
appropriate size based on the signal characteristics of the region that is covered 
by the CTB. The quadtree splitting process can be iterated until the size for a
luma CB reaches a minimum allowed luma CB size that is selected by the encoder using 
syntax in the SPS and is always 8×8 or larger (in units of luma samples).
CTU包含一个四叉树的语法,它允许对CTB区域依据信号特征划分成合适尺寸的CB;
四叉树的切分处理可以迭代进行,直到亮度CB到了允许的最小尺寸,通常是8x8或更大;
这个语法元素在SPS中定义;


The boundaries of the picture are defined in units of the minimum allowed luma CB size. 
As a result, at the right and bottom edges of the picture, some CTUs may cover regions
that are partly outside the boundaries of the picture. This condition is detected by 
the decoder, and the CTU quadtree is implicitly split as necessary to reduce the CB size 
to the point where the entire CB will fit into the picture.
图像边界定义为最小可允许的亮度CB大小的整数倍;
因此,图像右下角的边缘,有些CTU可能会超出图像的边界;
这种情况需要解码器检查并处理,并且CTU的四叉树暗示对CB尺寸的划分最好是整个CB正好适合图像;


Fig. 3. Modes for splitting a CB into PBs, subject to certain size constraints.
For intrapicture-predicted CBs, only M × M and M/2×M/2 are supported.


D. 预测块(PB)和预测单元(PU)
PBs and PUs

The prediction mode for the CU is signaled as being intra or inter, according to 
whether it uses intrapicture (spatial) prediction or interpicture (temporal) prediction.
依据其预测模式使用的是帧内预测还是帧间预测,可以将CU分成帧内或帧间;

When the prediction mode is signaled as intra, the PB size, which is the block size 
at which the intrapicture prediction mode is established is the same as the CB size 
for all block sizes except for the smallest CB size that is allowed in the
bitstream. For the latter case, a flag is present that indicates whether the CB is 
split into four PB quadrants that each have their own intrapicture prediction mode. 
The reason for allowing this split is to enable distinct intrapicture prediction
mode selections for blocks as small as 4×4 in size. When the luma intrapicture 
prediction operates with 4×4 blocks, the chroma intrapicture prediction also uses 
4×4 blocks (each covering the same picture region as four 4×4 luma blocks).
The actual region size at which the intrapicture prediction operates (which is distinct 
from the PB size, at which the intrapicture prediction mode is established) depends on the
residual coding partitioning that is described as follows.
当使用帧内预测模式时,PB的尺寸(即使用帧内预测模式块的尺寸)和所有块的CB尺寸相同,
除非在码流中允许最小的CB尺寸;
对于后一种情况,使用一个标志来指示是否CB被划分成四个PB,并且每个PB有不同的帧内预测模式;
这种划分的目的是为了帧内预测模式块最小可达到4x4大小;
当亮度预测模式为4x4块时,色度预测模式同样也使用4x4块大小--它对应的亮度区域是4个4x4块;
实际上,帧内预测模式操作的区域尺寸(源自PB尺寸)依赖于下面所述的残差编码划分;

When the prediction mode is signaled as inter, it is specified whether the luma 
and chroma CBs are split into one, two, or four PBs. The splitting into four PBs 
is allowed only when the CB size is equal to the minimum allowed CB size, using an
equivalent type of splitting as could otherwise be performed at the CB level of the 
design rather than at the PB level.
当使用帧间预测模式时,它指明了亮度和色度CB是划分成一个,二个,或是四个PB;
只有CB尺寸等于允许的最小CB尺寸时,才能将CB划分成四个PB;

When a CB is split into four PBs, each PB covers a quadrant of the CB. When a CB is 
split into two PBs, six types of this splitting are possible. The partitioning 
possibilities for interpicture-predicted CBs are depicted in Fig. 3. The upper
partitions illustrate the cases of not splitting the CB of size M×M, of splitting 
the CB into two PBs of size M×M/2 or M/2×M, or splitting it into four PBs of 
size M/2×M/2.
当一个CB被划分成四个PB时,每个PB覆盖CB的四分之一个象限;
当一个CB被划分成二个PB时,有六种可能的划分类型;
帧间预测的这六种可能的CB划分类型如图3所示;
图3中上部分的划分显示了将CB划分成两个PB,尺寸为MxM/2或M/2xM, 
或划分成四个PB,尺寸为M/2xM/2;

The lower four partition types in Fig. 3 are referred to as
asymmetric motion partitioning (AMP), and are only allowed
when M is 16 or larger for luma. One PB of the asymmetric
partition has the height or width M/4 and width or height
M, respectively, and the other PB fills the rest of the CB by
having a height or width of 3M/4 and width or height M.
图3的下部分的四个划分类型只有在M为16或更大的亮度尺寸时才允许,并称其为非对称运动划分(AMP);
如果一个非对称划分PB的高度或宽度为M/4且宽度和高度为M时,
则该CB的其它PB的高度或宽度为3M/4和宽度和高度为M;

Each interpicture-predicted PB is assigned one or two motion
vectors and reference picture indices. To minimize worst-case
memory bandwidth, PBs of luma size 4×4 are not allowed
for interpicture prediction, and PBs of luma sizes 4×8 and
8×4 are restricted to unipredictive coding. The interpicture
prediction process is further described as follows.
每个帧间预测PB对应有一个或两MV,和参考图像索引;
为了尽可能地减少内存带宽浪费,对于帧间预测来说,亮度PB的尺寸可以为4x4,
并且不允许在双向预测编码中使用4x8和8x4的亮度PB尺寸;
帧间预测处理的更多细节将在后面的章节中详述。


The luma and chroma PBs, together with the associated
prediction syntax, form the PU.
PU由亮度和色度PB,以及相应的预测语法组成;


Fig. 4. Subdivision of a CTB into CBs [and transform block (TBs)].
Solid lines indicate CB boundaries and dotted lines indicate TB boundaries.
(a) CTB with its partitioning. 
(b) Corresponding quadtree.

E. 以树结构方式切分成变换块和单元 
Tree-Structured Partitioning Into Transform Blocks and units

For residual coding, a CB can be recursively partitioned
into transform blocks (TBs). The partitioning is signaled by a
residual quadtree.
对于残差编码,可以将CB划分成变换块(TB),
这种划分被标记成残差四叉树结构;


Only square CB and TB partitioning is specified, where a
block can be recursively split into quadrants, as illustrated in
Fig. 4. For a given luma CB of size M×M, a flag signals
whether it is split into four blocks of size M/2×M/2. If
further splitting is possible, as signaled by a maximum depth
of the residual quadtree indicated in the SPS, each quadrant
is assigned a flag that indicates whether it is split into four
quadrants. The leaf node blocks resulting from the residual
quadtree are the transform blocks that are further processed
by transform coding. The encoder indicates the maximum and
minimum luma TB sizes that it will use. Splitting is implicit
when the CB size is larger than the maximum TB size. Not
splitting is implicit when splitting would result in a luma TB
size smaller than the indicated minimum. The chroma TB size
is half the luma TB size in each dimension, except when the
luma TB size is 4×4, in which case a single 4×4 chroma TB
is used for the region covered by four 4×4 luma TBs. In the
case of intrapicture-predicted CUs, the decoded samples of the
nearest-neighboring TBs (within or outside the CB) are used
as reference data for intrapicture prediction.
只有正方形的CB和TB划分是被允许的;
如图4中所示,可以将一个块以四象限的方式递归划分;
对于尺寸为MxM的亮度CB,使用了一个标志来标记它是否被切分成四个尺寸为M/2xM/2的块;
如果更进一步的划分是被允许的,那么需要在SPS中指定残差四叉树的最大深度,
并且每个象限有一个对应的标志来指示其是否被划分成四个更象限;
源自残差四叉树的叶子结点块都是变换块,它们更进一步的处理是变换编码;
编码器指明了将会使用的最大和最小亮度TB尺寸;
当CB尺寸大于最大的TB尺寸时,对CB的划分就是默认的;
当划分会导致亮度TB的尺寸小于最小TB尺寸时,划分就是不被允许的;
通常情况下,色度TB的尺寸在每个维度上都是亮度TB尺寸的一半,
除了亮度TB尺寸为4x4的情况,这时的4x4色度TB对应的是四个4x4的亮度TB;
在帧内预测CU中,最近相邻TB的解码像素被用作帧内预测的参考数据;


In contrast to previous standards, the HEVC design allows
a TB to span across multiple PBs for interpicture-predicted
CUs to maximize the potential coding efficiency benefits of
the quadtree-structured TB partitioning.
相比于之前的标准,为了使帧间预测的CU在四叉树结构的TB划分上获得最大的编码效率,
HEVC允许一个TB跨越多个PB;

F. 片和瓦片
Slices and Tiles

Slices are a sequence of CTUs that are processed in the
order of a raster scan. A picture may be split into one or
several slices as shown in Fig. 5(a) so that a picture is a
collection of one or more slices. Slices are self-contained in
the sense that, given the availability of the active sequence
and picture parameter sets, their syntax elements can be parsed
from the bitstream and the values of the samples in the area of
the picture that the slice represents can be correctly decoded
(except with regard to the effects of in-loop filtering near the
edges of the slice) without the use of any data from other slices
in the same picture. This means that prediction within the
picture (e.g., intrapicture spatial signal prediction or prediction
of motion vectors) is not performed across slice boundaries.
Some information from other slices may, however, be needed
to apply the in-loop filtering across slice boundaries. Each slice
can be coded using different coding types as follows.
片由CTU序列组成,它以光栅扫描的顺序进行处理;
如图5中所示,一帧图像能划分成一个或多个片,也可以说,一帧图像是一个或多个片的集合;
对于当前激活的SPS和PPS,片是自包含的,它们的语法元素和区域中的像素能在码流中解析;
片的解码不依赖于图像中的其它片(除了环内滤波的边界需要时);
这也意味着图像的预测不能跨边界;
但是,片的有些信息是需要跨边界的,如环内滤波;
每个片可使用的编码类型如下:

HEVC核心编码技术之一.图像的编码划分_第2张图片
Fig. 5. Subdivision of a picture into 
(a) slices and 
(b) tiles. 
(c) Illustration of wavefront parallel processing.

1) I片
I slice: 

A slice in which all CUs of the slice are coded
using only intrapicture prediction.
片中的所有编码单元(CU)都使用帧内预测.

2) P片
P slice: 

In addition to the coding types of an I slice,
some CUs of a P slice can also be coded using interpicture
prediction with at most one motion-compensated
prediction signal per PB (i.e., uniprediction). P slices
only use reference picture list 0.
除了I片外,还有P片,它使用帧间预测方式,并且只使用参考图像列表0;

3) B片
B slice: 

In addition to the coding types available in a
P slice, some CUs of the B slice can also be coded
using interpicture prediction with at most two motion compensated
prediction signals per PB (i.e., biprediction).
B slices use both reference picture list 0 and list 1.
B片也是使用帧间预测,并且是双向运动补偿预测,使用参考图像列表0和列表1;

The main purpose of slices is resynchronization after data
losses. Furthermore, slices are often restricted to use a maximum
number of bits, e.g., for packetized transmission. Therefore,
slices may often contain a highly varying number of
CTUs per slice in a manner dependent on the activity in the
video scene. In addition to slices, HEVC also defines tiles,
which are self-contained and independently decodable rectangular
regions of the picture. The main purpose of tiles is to
enable the use of parallel processing architectures for encoding
and decoding. Multiple tiles may share header information by
being contained in the same slice. Alternatively, a single tile
may contain multiple slices. A tile consists of a rectangular
arranged group of CTUs (typically, but not necessarily, with
all of them containing about the same number of CTUs), as
shown in Fig. 5(b).
片的主要目的是为了在数据丢失后实现重同步;
通常会对片的最大比特数做限制,如为了包传输;
因此,依据视频场景的运动和复杂性,每个片中的CTU个数是高度可变的;
除了片外,HEVC还定义了瓦片,它是自包含的,可独立解码的正方形图像区域;
瓦片的主要目的是为在编码端和解码端实现并行处理;
多个瓦片可以共享同一个片的头信息;
相应的,一个瓦片可以包含多个片;
一个瓦片由一个正文形的CTU组(通常,但不是必须的,所有瓦片中的CTU数相同)组成;
如图5(b)所示;

To assist with the granularity of data packetization, dependent
slices are additionally defined. Finally, with WPP, a
slice is divided into rows of CTUs. The decoding of each
row can be begun as soon a few decisions that are needed
for prediction and adaptation of the entropy coder have been
made in the preceding row. This supports parallel processing
of rows of CTUs by using several processing threads in
the encoder or decoder (or both). An example is shown in
Fig. 5(c). For design simplicity, WPP is not allowed to be
used in combination with tiles (although these features could,
in principle, work properly together).
为了达到数据分组的控制精度,片的相关性需要额外定义;
最后,对于波前并行处理(WPP),每个片都分割成CTU行;
每一行可以在前一行未完全解码完成时就开始解码;
这种并行处理方式的支持需要在编码和解码端使用多个处理线程,如图5(c)中所示;
为了使程序的设计更简单,WPP不允许使用瓦片的组合;


初六,履霜,坚冰至。
【白话】初六,脚踏上了霜,气候变冷,冰雪即将到来。

《象》曰:“履霜坚冰”,阴始凝也;驯致其道,至坚冰也。
【白话】《象辞》说:“脚踏上了霜,气候变冷,冰雪即将到来”,
说明阴气开始凝聚;按照这种情况发展下去,必然迎来冰雪的季节。

你可能感兴趣的:(pb,TB,HEVC,CB)