What’s Hierarchical B-Frame Mode or B-pyramid (notice that in my opinion B-pyramid is a bad term)?
If there is a run of B frames and some B-frames in the run are used for backward reference for some other B frames – then this mode is called Hierarchical B-Frames Coding or B-pyramid.
The following figure is taken from the paper “ANALYSIS OF HIERARCHICAL B PICTURES AND MCTF”, by Heiko Schwarz, Detlev Marpe, and Thomas Wiegand, illustrates the conception of B-pyramid:
Let’s display the first GOP from the above figure slightly different:
So, some geometric form is revealed but not a pyramid. Therefore, in my opinion the term B-pyramid is bad.
According to results of the above mentioned article “ANALYSIS OF HIERARCHICAL B PICTURES AND MCTF” using of Hierarchical B-Frames commonly improves coding efficiency (e.g. on Football CIF 30Hz, the improvement is about 0.5 Y-PSNR dB).
Pros: better exploitation of temporal redundancy.
Cons: long coding latency (not suitable for low-latency applications)
For each frame we check that all following four conditions:
Current frame is B
Previous frame (in decoding order) is also B (i.e. successive number of B frames is greater than one)
Previous RefIdc (nal_ref_idc) is non-zero (i.e. the previous B-frame is used for reference)
POC of current B frame is smaller than that of the previous one
If all above conditions are met then B-pyramid is detected.
If elementary stream is encapsulated in Mpeg-TS container then we can use PTS instead of POC (consequently we avoid the derivation of POC, in case of pic_order_cnt_type=1 the derivation of POC is a tricky process). Indeed, to get POC it’s necessary to dive into SPS to get log2_max_pic_order_cnt_lsb if pic_order_cnt_type=0 or a dozen other parameters in case of pic_order_cnt_type=1.
When Elementary Stream is encapsulated in MPEG-TS container we look for video frame boundaries to pick up PTS. We get PTS from the PES header and frame start is mandatory indicated by AUD (nal_type=9) in transport packet payload. Notice that if PTS is not present then PTS=DTS and no B-pyramid can exist in such case. Picture data (or slice data in case of multiple slices per picture) is contained in NALU with nal_type = 1 or 5 (IDR). There is a possibility that slice data is absent in the current transport packet and it’s present in the next or next-next video packet (e.g. if SPS is too long).
Once NAL with nal_type 1 or 5 is sensed we need extract nal_ref_idc from the NAL header and two first parameters from the slice header: first_mb_in_slice and slice_type.
NAL unit of each slice consists of:
Start-code (000001 or 00000001), nal header (1 byte), slice header and slice data.
nalType = nal_header & 0x1f
nal_ref_idc = ( nal_header & 0x60 )>>5
To determine first_mb_in_slice and slice_type we need read the first byte from the slice header - slh[0] and to execute the following operations:
Get first_mb_in_slice:first_mb_in_slice = slh[0]>>7
if first_mb_in_slice==1 then the current slice is the first slice in a picture and it actually is the start of picture data (in such case the next step is to determine whether the slice type is B or not)
If first_mb_in_slice=0 then the current slice is not the first one in a picture and the picture type has been already determined.
if first_mb_in_slice==1 then we have to determine whether the slice type is B or not. Slice type code corresponding to B has two values 1 or 6. Exp-golomb bit-representation of 1 is ‘010’ and 6 is ‘00111’.
Hence if the current slice is corresponding to the first slice in a picture (i.e. first_mb_in_slice=1 or MSbit is ‘1’) and the picture type is B then one of the following two bit-patterns are transmitted in the first byte slh[0] of the slice:
1010 or 100111
Basing on the above patterns we derive the following rules to determine whether the picture type is B or not:
if (slh[0]>>4)=0xA then current slice is the first slice and the picture type is B
if ( slh[0] & 0xFC ) = 0x9C then then current slice is the first slice and the picture type is B
For each frame we check that all following four conditions:
Current frame is B
Previous frame (in decoding order) is also B (i.e. successive number of B frames is greater than one)
Previous RefIdc (nal_ref_idc) is non-zero (i.e. the previous frame is used for reference)
PTS of current B frame is smaller than that of the previous one
If all above conditions are met then B-pyramid is detected.
With ‘stco’ and ‘stsz’ tables in meta-data we can access all access units successively in decoding order.
For each access unit we skip over non-VCL units (e.g. SEI) until first slice data NAL sensed (nal_type=1 or 5).
Then we read NAL header (to determine nal_ref_idc) and the following byte (which corresponds to the first byte of slice header) to determine slice type (B or not B). Slice type and nal_ref_idc are identically determined according to the previous section. Although ref_idc can be derived from sdtp-box provided that this box is present in meta-data (notice it’s not mandatory to signal sdtp-box).
With ctts-table in meta data we derive PTS of each access unit (if ctts is not present then PTS = DTS and no B-pyramid can exist in such stream).
For each frame we check that all following four conditions:
Current frame is B
Previous frame (in decoding order) is also B (i.e. successive number of B frames is greater than one)
Previous RefIdc (nal_ref_idc) is non-zero (i.e. the previous frame is used for reference)
PTS of current B frame is smaller than that of the previous one
If all above conditions are met then B-pyramid is detected.