Overview of the High Efficiency Video Coding(HEVC) Standard之二

II. HEVC编码设计和功能亮点
HEVC Coding Design and Feature Highlights

The HEVC standard is designed to achieve multiple goals, including coding efficiency, 
ease of transport system integration and data loss resilience, as well as 
implementability using parallel processing architectures. The following subsections
briefly describe the key elements of the design by which these goals are achieved, 
and the typical encoder operation that would generate a valid bitstream. More details 
about the associated syntax and the decoding process of the different elements are 
provided in Sections III and IV.
HEVC要设计设计实现多个目标,包括提高编码效率,易于整合进传输系统和数据丢失的恢复,
以及在并行处理架构上的可实现性。
下面的章节细述了实现这些目标的关键设计元素,以及生成有效流的典型编码操作。
更多关于语法和不同语法元素的解码处理细节在第三章和第四章。


A. 视频编码层 
Video Coding Layer
The video coding layer of HEVC employs the same hybrid approach (inter-/intrapicture 
prediction and 2-D transform coding) used in all video compression standards since H.261.
Fig. 1 depicts the block diagram of a hybrid video encoder, which could create 
a bitstream conforming to the HEVC standard.
HEVC的视频编码层使用了从H.261以来所有视频压缩标准相同的混合方式(帧间/帧内预测和2维 变换编码).
图1描述了混合视频编码的方框图(解码部分为灰色显示),它能生成符合HEVC标准的码流。


Fig. 1. Typical HEVC video encoder (with decoder modeling elements shaded in light gray).

An encoding algorithm producing an HEVC compliant bitstream would typically proceed 
as follows. Each picture is split into block-shaped regions, with the exact block 
partitioning being conveyed to the decoder. The first picture of a video sequence 
(and the first picture at each clean random access point into a video sequence) is 
coded using only intrapicture prediction (that uses some prediction of data spatially 
from region-to-region within the same picture, but has no dependence on other pictures). 
For all remaining pictures of a sequence or between random access points, interpicture
temporally predictive coding modes are typically used for most blocks. The encoding 
process for interpicture prediction consists of choosing motion data comprising the 
selected reference picture and motion vector (MV) to be applied for predicting the 
samples of each block. The encoder and decoder generate identical interpicture 
prediction signals by applying motion compensation (MC) using the MV and mode decision
data, which are transmitted as side information.
生成HEVC码流的典型流程如下:
每帧图像都被划分成块状区域,并且这些精确的块划分被传输给解码器;
视频序列的第一帧图像(和视频序列的完全随机访问点的第一帧图像)使用帧内预测模式;
对于序列中随机访问点之间的其余图像,绝大多数的块都使用帧间时域预测编码模式;
帧间预测的编码处理由用来预测每个块像素经过选择的运动数据组成,
包括选择过的参考图像和运动矢量(MV).
编码和解码通过使用运动估计(MC)(包括有MV和模式选择数据)生成唯一的帧间预测信号. 


The residual signal of the intra- or interpicture prediction, which is the difference 
between the original block and its prediction, is transformed by a linear spatial 
transform. The transform coefficients are then scaled, quantized, entropy coded,
and transmitted together with the prediction information.
帧内/帧间预测的残差信号--原始块和预测块的差--通过线性空域变换;
变换系数然后被缩放,量化,熵编码后和预测信息一起传输.


The encoder duplicates the decoder processing loop (see gray-shaded boxes in Fig. 1) 
such that both will generate identical predictions for subsequent data. Therefore, 
the quantized transform coefficients are constructed by inverse scaling and are then 
inverse transformed to duplicate the decoded approximation of the residual signal. 
The residual is then added to the prediction, and the result of that addition may
then be fed into one or two loop filters to smooth out artifacts induced by block-wise 
processing and quantization. The final picture representation (that is a duplicate of 
the output of the decoder) is stored in a decoded picture buffer to be used for
the prediction of subsequent pictures. In general, the order of encoding or decoding 
processing of pictures often differs from the order in which they arrive from the source; 
necessitating a distinction between the decoding order (i.e., bitstream order)
and the output order (i.e., display order) for a decoder.
编码端复制了解码处理环,这两者都会对子序列数据生成唯一的预测。
因此,对残差信号的量化变换系数通过和解码端相同的反量化,反变换实现重建,
重建的残差信号加上预测值,将它们的结果通过环路滤波以去除块效应后,
最终重建的图像存储在解码图像buffer中,以用于子序列图像的预测。
通常,图像的编码或解码处理顺序和源不一样,也导致了解码端的解码顺序和播放顺序的不一样。

Video material to be encoded by HEVC is generally expected to be input as progressive 
scan imagery (either due to the source video originating in that format or resulting from
deinterlacing prior to encoding). No explicit coding features are present in the HEVC 
design to support the use of interlaced scanning, as interlaced scanning is no longer 
used for displays and is becoming substantially less common for distribution.
However, a metadata syntax has been provided in HEVC to allow an encoder to indicate 
that interlace-scanned video has been sent by coding each field (i.e., the even or 
odd numbered lines of each video frame) of interlaced video as a separate
picture or that it has been sent by coding each interlaced frame as an HEVC coded picture. 
This provides an efficient method of coding interlaced video without burdening decoders 
with a need to support a special decoding process for it.
用于编码成HEVC流的视频序列通常期望是逐行扫描图像(要么源视频格式本就是逐行的,
要么通过先于编码的去隔行处理)。
在HEVC中没有显示的功能来支持隔行扫描图像的编码,因为隔行扫描图像的应用越来越少了。
然后,HEVC提供了一个元数据语法,以允许编码器对隔行扫描视频以独立的场编码(如每个视频帧的
奇数行和偶数行)方式被传送。
这提供了一种编码隔行视频的有效方式,而不用解码器使用不同的处理流程来处理隔行扫描。


In the following, the various features involved in hybrid video coding using HEVC 
are highlighted as follows.
HEVC包含混合视频编码的各种功能的亮点如下:


1) 编码树单元和编码树块(CTB)结构:
Coding tree units and coding tree block (CTB) structure:

The core of the coding layer in previous standards was the macroblock, containing 
a 16×16 block of luma samples and, in the usual case of 4:2:0 color sampling, two
corresponding 8×8 blocks of chroma samples; whereas the analogous structure in HEVC 
is the coding tree unit (CTU), which has a size selected by the encoder and can be 
larger than a traditional macroblock. The CTU consists of a luma CTB and the 
corresponding chroma CTBs and syntax elements. The size L×L of a luma CTB can be 
chosen as L = 16, 32, or 64 samples, with the larger sizes typically enabling better 
compression. HEVC then supports a partitioning of the CTBs into smaller blocks using 
a tree structure and quadtree-like signaling [8].
以前标准的编码层的核心是宏块,包含一个16x16的亮度块,通常是4:2:0的色度样本,
对应的是两个8x8的色度块。
因此,HEVC中有类似的结构,叫做编码树单元(CTU),它在编码端做大小选择时可以大于传统的宏块。
CTU由一个亮度CTB, 相应的色度CTB 以及 语法元素 组成。
亮度CTB的尺寸为LxL,L=16, 32, 64像素,大尺寸通常有更好的压缩效果。
HEVC使用树结构和四叉树信号,支持将CTB划分成更小的块。

2)编码单元(CUs)和编码块(CBs): 
Coding units (CUs) and coding blocks (CBs):
 
The quadtree syntax of the CTU specifies the size and positions of its luma and chroma CBs. 
The root of the quadtree is associated with the CTU. Hence, the size of the luma CTB is the 
largest supported size for a luma CB. The splitting of a CTU into luma and chroma CBs
is signaled jointly. One luma CB and ordinarily two chroma CBs, together with associated 
syntax, form a coding unit (CU). A CTB may contain only one CU or may be split to form 
multiple CUs, and each CU has an associated partitioning into prediction units (PUs) and a
tree of transform units (TUs).
CTU的四叉树语法声明了亮度和色度CB的尺寸和位置。
四叉树的根是CTU,因此,亮度CTB的尺寸是可支持的亮度CB的最大尺寸。
CTU使用相同的方式划分亮度和色度CB。
一个亮度CB对应两个色度CB,他们一块使用相同的语法,形成一个编码单元(CU)。
一个CTB可以只包含一个CU或划分成多个CU,
而且每个CU可以划分成相应的预测单元(PU)和一个变换单元树(TU)

3) 预测单元(PU)和预测块(PB)
Prediction units(PUs) and prediction blocks (PBs): 

The decision whether to code a picture area using interpicture or intrapicture prediction 
is made at the CU level. A PU partitioning  structure has its root at the CU level. 
Depending on the basic prediction-type decision,  the luma and chroma CBs can then be 
further split in size and predicted from luma and chroma prediction blocks (PBs). 
HEVC supports variable PB sizes from 64×64 down to 4×4 samples.
是使用帧间预测还是帧内预测来编码一帧图像的决定在CU层实现。
PU划分结构的根也在CU层。
依靠基本的预测类型决策,亮度和色度CB可以更进一步地尺寸和预测上划分成亮度和色度预测块(PB).
HEVC支持各种各样的PB尺寸,从64x64到4x4像素大小。

4)变换单元(TU)和变换块 
TUs and transform blocks: 

The prediction residual is coded using block transforms. A TU tree structure has
its root at the CU level. The luma CB residual may be identical to the luma transform block 
(TB) or may be further split into smaller luma TBs. The same applies to the chroma TBs. 
Integer basis functions similar to those of a discrete cosine transform (DCT) are defined 
for the square TB sizes 4×4, 8×8, 16×16, and 32×32. For the 4×4 transform of luma 
intrapicture prediction residuals, an integer transform derived from a form of discrete 
sine transform (DST) is alternatively specified.
预测残差使用块变换来编码。
TU树结构的根也在CU层。
亮度CB的残差可以和亮度变换块(TB)相同,也可以进一步划分成更小的亮度TB。
色度CB和亮度CB的处理方式相同。
整数基函数和离散余弦变换(DCT)相似,定义了四种变换块尺寸,4x4,8×8, 16×16, 和32×32.
对于亮度帧内预测残差的4x4变换,使用了一个特别的DST变换的推导变换。

5)运动矢量信号
Motion vector signaling: 

Advanced motion vector prediction (AMVP) is used, including derivation of several
most probable candidates based on data from adjacent PBs and the reference picture. 
A merge mode for MV coding can also be used, allowing the inheritance of MVs from 
temporally or spatially neighboring PBs. Moreover, compared to H.264/MPEG-4 AVC, 
improved skipped and direct motion inference are also specified.
采用了高级运动矢量预测,包括基于相邻PB数据和参数图像的多个最可能候选者。
采用了对MV编码的一个合并模式,允许从时域或空域相邻PB继承来的MV。
而且,相对于H.264/MPEG-4 AVC,HEVC改进了skip和direct运动参考。

6)运动补偿
Motion compensation: 

Quarter-sample precision is used for the MVs, and 7-tap or 8-tap filters are used for
interpolation of fractional-sample positions (compared to six-tap filtering of half-sample 
positions followed by linear interpolation for quarter-sample positions in H.264/MPEG-4 AVC).
Similar to H.264/MPEG-4 AVC, multiple reference pictures are used. For each PB, either
one or two motion vectors can be transmitted, resulting either in unipredictive or 
bipredictive coding, respectively. As in H.264/MPEG-4 AVC, a scaling and offset
operation may be applied to the prediction signal(s) in a manner known as weighted prediction.
对MV采用了四分之一像素精度;
对分数像素位置的插件采用了7阶和8阶滤波;
(相对于H.264/MPEG-4 AVC来说,它只使用了对半像素插值的六阶滤波;)
和H.264/MPEG-4 AVC类似,HEVC也使用了多个参考图像。
对于每个PB,可以有一个或两个MV被传输,对应的是单向预测或双向预测编码。
和H.264/MPEG-4 AVC一样,缩放和偏移操作同样应用在预测信号中(即权值预测)。

7) 帧内预测
Intrapicture prediction: 

The decoded boundary samples of adjacent blocks are used as reference data for spatial
prediction in regions where interpicture prediction is not performed. Intrapicture 
prediction supports 33 directional modes (compared to eight such modes in H.264/MPEG-4 AVC), 
plus planar (surface fitting) and DC (flat) prediction modes. The selected intrapicture
prediction modes are encoded by deriving most probable modes (e.g., prediction directions) 
based on those of previously decoded neighboring PBs.
HEVC的帧内预测同样是使用空间域相邻块解码后的边界像素作为参考数据。
HEVC的帧内预测支持33个方向模式。
可选的帧内预测模式是基于前一个解码后的相信PB块来进行最可能模式的推导及编码。

8) 量化控制
Quantization control: 

As in H.264/MPEG-4 AVC, uniform reconstruction quantization (URQ) is used in HEVC, 
with quantization scaling matrices supported for the various transform block sizes.
和H.264/MPEG-4 AVC一样,HEVC也使用了均匀重建量化(URQ)。
量化缩放矩阵可以用于各种变换块尺寸。


9) 熵编码
Entropy coding: 

Context adaptive binary arithmetic coding (CABAC) is used for entropy coding. This is 
similar to the CABAC scheme in H.264/MPEG-4 AVC, but has undergone several improvements 
to improve its throughput speed (especially for parallel-processing architectures) and 
its compression performance, and to reduce its context memory requirements.
HEVC使用了CABAC;
相对于H.264/MPEG-4 AVC的CABAC,在并行处理架构上的数据吞吐速度和压缩效果获得了显著提高;
并减少了上下文的内存开销。

10) 环路滤波
In-loop deblocking filtering: 

A deblocking filter similar to the one used in H.264/MPEG-4 AVC is operated within 
the interpicture prediction loop. However, the design is simplified in regard to its 
decision-making and filtering processes, and is made more friendly to parallel processing.
和H.264/MPEG-4 AVC一样,HEVC也使用了环内的去块效应滤波;
然而,在设计上对并行处理架构做了优化;

11)  采样点自适应偏移
Sample adaptive offset (SAO): 

A nonlinear amplitude mapping is introduced within the interpicture prediction
loop after the deblocking filter. Its goal is to better reconstruct the original 
signal amplitudes by using a look-up table that is described by a few additional
parameters that can be determined by histogram analysis at the encoder side.
HEVC对去块效应滤波后帧间预测环采用了一个非线性的振幅映射。
它的目的通过使用查找表是更好地重建原始图像,
这个查找表只需要用很少的附加参数就能确定编码端的柱状图。

B. 高级语法架构 
High-Level Syntax Architecture

A number of design aspects new to the HEVC standard improve flexibility for operation 
over a variety of applications and network environments and improve robustness to data
losses. However, the high-level syntax architecture used in the H.264/MPEG-4 AVC standard 
has generally been retained, including the following features.
HEVC标准使用了很多新的设计来改善各种应用和网络环境的适应性,灵活性,
并改进了数据丢失后的健壮性。
然而,HEVC继续使用了H.264/MPEG-4 AVC的高级语法架构,包括下列功能:

1) 参数集结构
Parameter set structure: 

Parameter sets contain information that can be shared for the decoding of several regions
of the decoded video. The parameter set structure provides a robust mechanism for conveying 
data that are essential to the decoding process. The concepts of sequence and picture 
parameter sets from H.264/MPEG-4 AVC are augmented by a new video parameter set (VPS)
structure.
包含信息的参数集可以用于解码视频的多个方面。
参数集结构提供了传输数据的健壮的机制保证,这一点对解码处理非常重要。
H.264/MPEG-4 AVC中的PPS和SPS被扩充成一个新的视频参数集(VPS)结构;

2) NAL单元语法结构
NAL unit syntax structure: 

Each syntax structure is placed into a logical data packet called a network abstraction 
layer (NAL) unit. Using the content of a twobyte NAL unit header, it is possible to 
readily identify the purpose of the associated payload data.
每个语法结构被装进一个逻辑的数据包,叫做网络抽象层(NAL)单元;
NAL单元有两个字节的NAL单元头,用来标识不同目的的负载数据;

3) 片
Slices: 

A slice is a data structure that can be decoded independently from other slices of the 
same picture, in terms of entropy coding, signal prediction, and residual signal 
reconstruction. A slice can either be an entire picture or a region of a picture. 
One of the main purposes of slices is resynchronization in the event of data losses. 
In the case of packetized transmission, the maximum number of payload bits within a slice 
is typically restricted, and the number of CTUs in the slice is often varied to minimize 
the packetization overhead while keeping the size of each packet within this bound.
片是在熵编码,信号预测,和残差这些处理过程中,相对同一图像中其它片,可以独立解码的数据结构;
一个片要么是整个图像,要么是图像的一个区域。
片的主要目的是在发生数据丢失时进行重建。
在包传输机制中,片的最大负载比特数是受限的,
片中的CTU个数通常是变化的以最小化满足包的负载,使其保持在每个包边界上。

4) 附加的增强信息(SEI) 和 视频可用信息(VUI)元数据
Supplemental enhancement information (SEI) and 
video usability information (VUI) metadata: 

The syntax includes support for various types of metadata known as SEI and VUI. 
Such data provide information about the timing of the video pictures, the proper 
interpretation of the color space used in the video signal, 3-D stereoscopic frame 
packing information, other display hint information, and so on.
这些语法用来支持各种类型的元数据,包括SEI,VUI,3D,特效等;

C. 并行解码语法 和 改进的片组织 
Parallel Decoding Syntax and Modified Slice Structuring

Finally, four new features are introduced in the HEVC standard to enhance the parallel 
processing capability or modify the structuring of slice data for packetization purposes. 
Each of them may have benefits in particular application contexts, and it is generally 
up to the implementer of an encoder or decoder to determine whether and how to take 
advantage of these features.
HEVC使用了四个新的功能来提升并行处理能力和改进片数据的封装。
它们中的每一个都对特定的场合有一定的好处,
这需要编码器和解码器的实现者平衡和利用好这些功能;

1) 瓦片
Tiles: 

The option to partition a picture into rectangular regions called tiles has been specified. 
The main purpose of tiles is to increase the capability for parallel processing rather 
than provide error resilience. Tiles are independently decodable regions of a picture 
that are encoded with some shared header information. Tiles can additionally be used for 
the purpose of spatial random access to local regions of video pictures. A typical
tile configuration of a picture consists of segmenting the picture into rectangular 
regions with approximately equal numbers of CTUs in each tile. Tiles provide parallelism 
at a more coarse level of granularity (picture/subpicture), and no sophisticated 
synchronization of threads is necessary for their use.
可以将图像划分成矩形区域并指定其为瓦片。
它的主要目的是提高并行处理能力,而不是提供错误恢复;
瓦片是图像的独立解码区域,它在编码端有相同的形状信息。
另外,它还可以用来对视频图像的逻辑区域进行随机访问。
一个图像的典型瓦片设置是将图像分割成矩形区域,并且每个瓦片中的CTU是相等的。
瓦片提供了更粗粒度的并行机制,并且不需要很精细的线程同步机制。

2) 波前并行处理:
Wavefront parallel processing: 

When wavefront parallel processing (WPP) is enabled, a slice is divided into rows of CTUs. 
The first row is processed in an ordinary way, the second row can begin to be processed 
after only two CTUs have been processed in the first row, the third row can begin to be 
processed after only two CTUs have been processed in the second row, and so on. The 
context models of the entropy coder in each row are inferred from those in the preceding
row with a two-CTU processing lag. WPP provides a form of processing parallelism at a rather 
fine level of granularity, i.e., within a slice. WPP may often provide better compression 
performance than tiles (and avoid some visual artifacts that may be induced by using tiles).
当波前并行处理机制开启时,每个片都被划分成CTU行;
第一行以原来的方式处理;
第二行可以在第一行的前两个CTU被处理后开始处理;
第三行可以在第二行的前两个CTU被处理后开始处理。
等等,以此类推。
在熵编码时,每一行的上下文模型都只延迟引用前一行的两个CTU。
WPP提供了比片更细粒度的并行机制.
并且,相对于瓦片,WPP可以提供更好的压缩效果。
而且没有人造的边界。

3) 附属片分段:
Dependent slice segments: 

A structure called a dependent slice segment allows data associated with a particular 
wavefront entry point or tile to be carried in a separate NAL unit, and thus potentially 
makes that data available to a system for fragmented packetization with lower latency than 
if it were all coded together in one slice. A dependent slice segment for a wavefront
entry point can only be decoded after at least part of the decoding process of another 
slice segment has been performed. Dependent slice segments are mainly useful in low-delay 
encoding, where other parallel tools might penalize compression performance.
附属片分段允许特定的波前入口点或瓦片数据装入独立的NAL单元,
从而使数据的传输获得更低的延迟。
对于波前入口点来说,一个附属片分段只有在最近的片解码后才能解码。
它主要时用来实现低延迟编码,而其它的并行编码工具将损失编码性能。

In the following two sections, a more detailed description of the key features is given.
下面的两章,细述了标准更多的关键功能的细节。

你可能感兴趣的:(h.264,h.265,HEVC)