以H.264为例:
H.264适应不同网络之间的传输,主要原因是引入了分层结构,分为视频编码层(VCL)与网络抽象层(NAL),从而实现压缩编码与网络传输分离。
通过H.264算法压缩的后的数据通过NAL-VCL接口封装成NAL包
NAL的基本单元是NALU,而VCL层自上而下的结构如下所示:
其中划分条带(slice)的目的是为了适应不同传输网络的最大传输单元(MTU)
分组的目的是为了使数据独立于其他分组,从而实现特定的目的,比如防止误差扩散保证图像质量,区分前景背景以分别编码
每个NALU由包头信息和VCL层信息组合而成,一个NALU包含一个slice:
其中RBSP(Raw Byte Sequence payload,原始字节序列负载)
SoDB(String of Data BITS,原始数据比特流)
填充比特为rbsP_trailing是为了使码流按字节对齐
H.264的包头信息占一字节,即8bit,NAL类型只有32种(0-31),二进制只有5位,浪费三位,如果这三位不用的话大量的nal单元会造成大量的浪费,因此这三位也要利用上:
第1位:1比特的禁止位,当网络识别此单元中存在比特错误时,可将其设置为1,以方便接收方丢弃该单元。
第2位~第3位,2比特的优先级位(NRI),按照11,10,01,00的顺序优先级递减,当解码器忙碌时从优先级高的开始解码。
NAL的32种类型如下:
而HEVC和AVC的NAL包头主要有三个区别:
01、AVC包头信息占一字节,在HEVC中包头信息占两字节,足以支持HEVC可分级编码,多视点编码和3D视频编码的扩展
02、AVC的视频参数封存于pps,sps的NAL包中,在HEVC中还新增了vps(视频参数集),用于存放prfile,Level等
02、HEVC的NAL包头加入了该NAL所在的时间层的标志,去掉了NRI,并将该信息放在nal_unit_type中
01,1比特禁止位F,与AVC的不同,它的作用就是在尚存MPEG-2系统环境中,防止产生可以解释为MPEG-2起始码的比特模式。
02,6比特的类型位NAL_TYPE,新增了32位用作non-VCL单元
03, 6比特的Layer_ID,为层识别信息,表示当前NAL为哪一层,比如在可分级扩展中,它将用于联合标注空间和质量分级层,在3D扩展中,layer_id将标注视点和深度
04, 3比特的TID,temporal_id,表示HEVC的接入单元属于哪个时域子层
先用HM对篮球测试序列进行压缩
配置如下:
#======== File I/O ===============
InputFile : C:\Users\梁昊霖\Desktop\HM-16.20\BasketballDrill_832x480_50.yuv
InputBitDepth : 8 # Input bitdepth
InputChromaFormat : 420 # Ratio of luminance to chrominance samples
FrameRate : 50 # Frame Rate per second
FrameSkip : 0 # Number of frames to be skipped in input
SourceWidth : 832 # Input frame width
SourceHeight : 480 # Input frame height
FramesToBeEncoded : 5 # Number of frames to be coded
Level : 3.1
采用lowdelay模式,即除了第一帧全B帧
#======== File I/O =====================
BitstreamFile : 50LP.bin
ReconFile : 50LP.yuv
#======== Profile ================
Profile : main
#======== Unit definition ================
MaxCUWidth : 64 # Maximum coding unit width in pixel
MaxCUHeight : 64 # Maximum coding unit height in pixel
MaxPartitionDepth : 4 # Maximum coding unit depth
QuadtreeTULog2MaxSize : 5 # Log2 of maximum transform size for
# quadtree-based TU coding (2...6)
QuadtreeTULog2MinSize : 2 # Log2 of minimum transform size for
# quadtree-based TU coding (2...6)
QuadtreeTUMaxDepthInter : 3
QuadtreeTUMaxDepthIntra : 3
#======== Coding Structure =============
IntraPeriod : -1 # Period of I-Frame ( -1 = only first)
DecodingRefreshType : 0 # Random Accesss 0:none, 1:CRA, 2:IDR, 3:Recovery Point SEI
GOPSize : 4 # GOP Size (number of B slice = GOPSize-1)
ReWriteParamSetsFlag : 1 # Write parameter sets with every IRAP
IntraQPOffset : -1
LambdaFromQpEnable : 1 # see JCTVC-X0038 for suitable parameters for IntraQPOffset, QPoffset, QPOffsetModelOff, QPOffsetModelScale when enabled
# Type POC QPoffset QPOffsetModelOff QPOffsetModelScale CbQPoffset CrQPoffset QPfactor tcOffsetDiv2 betaOffsetDiv2 temporal_id #ref_pics_active #ref_pics reference pictures predict deltaRPS #ref_idcs reference idcs
Frame1: B 1 5 -6.5 0.2590 0 0 1.0 0 0 0 4 4 -1 -5 -9 -13 0
Frame2: B 2 4 -6.5 0.2590 0 0 1.0 0 0 0 4 4 -1 -2 -6 -10 1 -1 5 1 1 1 0 1
Frame3: B 3 5 -6.5 0.2590 0 0 1.0 0 0 0 4 4 -1 -3 -7 -11 1 -1 5 0 1 1 1 1
Frame4: B 4 1 0.0 0.0 0 0 1.0 0 0 0 4 4 -1 -4 -8 -12 1 -1 5 0 1 1 1 1
#=========== Motion Search =============
FastSearch : 1 # 0:Full search 1:TZ search
SearchRange : 64 # (0: Search range is a Full frame)
BipredSearchRange : 4 # Search range for bi-prediction refinement
HadamardME : 1 # Use of hadamard measure for fractional ME
FEN : 1 # Fast encoder decision
FDM : 1 # Fast Decision for Merge RD cost
#======== Quantization =============
QP : 32 # Quantization parameter(0-51)
MaxDeltaQP : 0 # CU-based multi-QP optimization
MaxCuDQPDepth : 0 # Max depth of a minimum CuDQP for sub-LCU-level delta QP
DeltaQpRD : 0 # Slice-based multi-QP optimization
RDOQ : 1 # RDOQ
RDOQTS : 1 # RDOQ for transform skip
SliceChromaQPOffsetPeriodicity: 0 # Used in conjunction with Slice Cb/Cr QpOffsetIntraOrPeriodic. Use 0 (default) to disable periodic nature.
SliceCbQpOffsetIntraOrPeriodic: 0 # Chroma Cb QP Offset at slice level for I slice or for periodic inter slices as defined by SliceChromaQPOffsetPeriodicity. Replaces offset in the GOP table.
SliceCrQpOffsetIntraOrPeriodic: 0 # Chroma Cr QP Offset at slice level for I slice or for periodic inter slices as defined by SliceChromaQPOffsetPeriodicity. Replaces offset in the GOP table.
#=========== Deblock Filter ============
LoopFilterOffsetInPPS : 1 # Dbl params: 0=varying params in SliceHeader, param = base_param + GOP_offset_param; 1 (default) =constant params in PPS, param = base_param)
LoopFilterDisable : 0 # Disable deblocking filter (0=Filter, 1=No Filter)
LoopFilterBetaOffset_div2 : 0 # base_param: -6 ~ 6
LoopFilterTcOffset_div2 : 0 # base_param: -6 ~ 6
DeblockingFilterMetric : 0 # blockiness metric (automatically configures deblocking parameters in bitstream). Applies slice-level loop filter offsets (LoopFilterOffsetInPPS and LoopFilterDisable must be 0)
#=========== Misc. ============
InternalBitDepth : 8 # codec operating bit-depth
#=========== Coding Tools =================
SAO : 1 # Sample adaptive offset (0: OFF, 1: ON)
AMP : 1 # Asymmetric motion partitions (0: OFF, 1: ON)
TransformSkip : 1 # Transform skipping (0: OFF, 1: ON)
TransformSkipFast : 1 # Fast Transform skipping (0: OFF, 1: ON)
SAOLcuBoundary : 0 # SAOLcuBoundary using non-deblocked pixels (0: OFF, 1: ON)
#============ Slices ================
SliceMode : 0 # 0: Disable all slice options.
# 1: Enforce maximum number of LCU in an slice,
# 2: Enforce maximum number of bytes in an 'slice'
# 3: Enforce maximum number of tiles in a slice
SliceArgument : 1500 # Argument for 'SliceMode'.
# If SliceMode==1 it represents max. SliceGranularity-sized blocks per slice.
# If SliceMode==2 it represents max. bytes per slice.
# If SliceMode==3 it represents max. tiles per slice.
LFCrossSliceBoundaryFlag : 1 # In-loop filtering, including ALF and DB, is across or not across slice boundary.
# 0:not across, 1: across
#============ PCM ================
PCMEnabledFlag : 0 # 0: No PCM mode
PCMLog2MaxSize : 5 # Log2 of maximum PCM block size.
PCMLog2MinSize : 3 # Log2 of minimum PCM block size.
PCMInputBitDepthFlag : 1 # 0: PCM bit-depth is internal bit-depth. 1: PCM bit-depth is input bit-depth.
PCMFilterDisableFlag : 0 # 0: Enable loop filtering on I_PCM samples. 1: Disable loop filtering on I_PCM samples.
#============ Tiles ================
TileUniformSpacing : 0 # 0: the column boundaries are indicated by TileColumnWidth array, the row boundaries are indicated by TileRowHeight array
# 1: the column and row boundaries are distributed uniformly
NumTileColumnsMinus1 : 0 # Number of tile columns in a picture minus 1
TileColumnWidthArray : 2 3 # Array containing tile column width values in units of CTU (from left to right in picture)
NumTileRowsMinus1 : 0 # Number of tile rows in a picture minus 1
TileRowHeightArray : 2 # Array containing tile row height values in units of CTU (from top to bottom in picture)
LFCrossTileBoundaryFlag : 1 # In-loop filtering is across or not across tile boundary.
# 0:not across, 1: across
#============ WaveFront ================
WaveFrontSynchro : 0 # 0: No WaveFront synchronisation (WaveFrontSubstreams must be 1 in this case).
# >0: WaveFront synchronises with the LCU above and to the right by this many LCUs.
#=========== Quantization Matrix =================
ScalingList : 0 # ScalingList 0 : off, 1 : default, 2 : file read
ScalingListFile : scaling_list.txt # Scaling List file name. If file is not exist, use Default Matrix.
#============ Lossless ================
TransquantBypassEnable : 0 # Value of PPS flag.
CUTransquantBypassFlagForce: 0 # Force transquant bypass mode, when transquant_bypass_enable_flag is enabled
#============ Rate Control ======================
RateControl : 0 # Rate control: enable rate control
TargetBitrate : 1000000 # Rate control: target bitrate, in bps
KeepHierarchicalBit : 2 # Rate control: 0: equal bit allocation; 1: fixed ratio bit allocation; 2: adaptive ratio bit allocation
LCULevelRateControl : 1 # Rate control: 1: LCU level RC; 0: picture level RC
RCLCUSeparateModel : 1 # Rate control: use LCU level separate R-lambda model
InitialQP : 0 # Rate control: initial QP
RCForceIntraQP : 0 # Rate control: force intra QP to be equal to initial QP
### DO NOT ADD ANYTHING BELOW THIS LINE ###
### DO NOT DELETE THE EMPTY LINE BELOW ###
再对编码得到的二进制文件进行码流分析:
可以看到除了第一帧为I帧,其质量最高,其余帧为B帧
在B帧中每隔4帧出现一个较高质量的B帧,因为在配置文件中设置为:
GOPSize : 4 # GOP Size (number of B slice = GOPSize-1)
(B条带的数量=GOP数量-1,因为第一个条带为I条带?)GOP不一定以I帧结尾?
I帧只包含I条带,P帧只包含P条带,B帧只包含B条带
I条带只包含I宏块,P条带可以包含P宏块也可以包含I宏块,同样B条带可以包含B宏块也可以包含I宏块
可以看到第4,5帧都含有intra,即I宏块,并且intra含量越高,B帧质量越高
通过16进制查看其码流:
框中的意义是起始地址,每个地址的最小单位中可以放两个16进制数 ,如EF:E的10进制为14,转为二进制为1110,F的10进制为15,转为二进制为1111,[1110 1111 ]就放在一个地址单元中。
当采用其他传输协议时,一个UDP包就是一个NAL单元,解码器可以很方便检测出NAL分界和解码。但在字节流格式中,NAL单元被编码成字节的码流,解码器无法确定每个NAL的起始位置和终止位置,因此定义了一个起同步作用的起始码前缀:0X 00 00 01,在上图中用红框框出。
每个NAL单元用0X 00 00 01分割开,紧跟着起始码前缀后面的是NAL头,如40 01,转为二进制为0100 0000 0000 0001对照NAL单元头结构:
可以看到其中的NAL_TYPE为为0[100 000]0 0000 0001,将[ ]中转为10进制为32,其对应NAL类型为VPS,同理可以看到后面两个NALU依次为SPS,PPS