本文要说三个相关联的概念(GOP, scenecut和keyframe),这几个概念对于理解frame pattern,或者coding structure还是非常基础非常重要的。对于编码参数设置、解码性能优化、流媒体配置都非常重要。
##概要
GOP: Group of picture 图像组
这个概念不需要细说。从类型上说,分为Closed GOP和Open GOP两种。具体见内文。
scenecut
检测场景切换的工具,又可以通俗的称为自适应I帧选择。具体内文。
keyframe
常说的关键帧,在x264,里可以等同于IDR帧。
scenecut 场景检测,自适应I帧选择
scenecut 字面意思是场景切换,最终导致的结果是自适应I帧决策(adaptive I-frame decision)。而说起场景切换的依据,则又是结果导向的反推,完全是码率的决策。即,当前帧编码为P帧与编码为I帧差距小于某阈值,(优先)将该帧选择为I帧。(这就是编码器的核心思想:编码过程中任何算法理论的实践都以最终编码性能作为评判标准)
具体,参:akupenguin 2007-01-22
1)encode as (a really fast approximation of) a P-frame and an I-frame. 快速选择阶段
if ((keyframe-distance) > keyint) then
set IDR-frame
else if (1 - (P-frame bits) / (I-frame bits) < (scenecut / 100) * (keyframe-distance) / keyint)
if ((keyframe-distance) >= minkeyint) then
set IDR-frame
else
set I-frame
else
set P-frame
//! keyframe-distance: from previous keyframe 距离越大越倾向设置I帧(线性关系不合理)
1
2
3
4
5
6
7
8
9
10
首先,--keyint设置关键帧的最大间距,达到该间距,设置为IDR帧,没毛病;
其次,满足scenecut,场景切换来了,--min-keyint设置最小关键帧间距,如未达到要求,设置为普通I帧,否则为IDR帧。(顺便插一句,如果插入普通I帧,这个GOP就有两个I帧喽)
关于计算公式:
1)默认scenecut 40%,即P帧bits > I帧 bits * 60%时,认为scenecut。即设置40%,I帧可以比P帧多用至多2/3的bits。
2)与上一关键帧间距有关,间距越大,约应该设置为I帧或关键帧。
2)encode for real. 实际编码阶段
信息查看
keyframe
ffporbe查看frame属性’key_frame’`
scenecut
-f lavfi 利用filter
x264 最新代码
static int scenecut_internal( x264_t *h,
x264_mb_analysis_t *a,
x264_frame_t **frames,
int p0,
int p1,
int real_scenecut)
float f_thresh_max = h->param.i_scenecut_threshold / 100.0;
float f_thresh_min = f_thresh_max * 0.25;
if( h->param.i_keyint_min == h->param.i_keyint_max )
f_thresh_min = f_thresh_max;
if( i_gop_size <= h->param.i_keyint_min / 4 || h->param.b_intra_refresh )
f_bias = f_thresh_min / 4;
else if( i_gop_size <= h->param.i_keyint_min )
f_bias = f_thresh_min * i_gop_size / h->param.i_keyint_min;
else
{
f_bias = f_thresh_min
+ ( f_thresh_max - f_thresh_min )
* ( i_gop_size - h->param.i_keyint_min )
/ ( h->param.i_keyint_max - h->param.i_keyint_min );
}
res = pcost >= (1.0 - f_bias) * icost;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
在实现中可以看到,即便当前的distance小于--keyint-min,也还是有几率识别为scenecut,另外还处理了--keyint==--keyint-min等情况,健壮性提升了。
##open GOP和closed GOP:
2.7.5 Open GOP
Figure 2.12 shows an example of an open GOP structure. A closed GOP with an IBBP pattern starts with an I frame whereas an open GOP with the same pattern may start with a B frame. Unlike the closed GOP, both I and P frames can be used for forward or backward prediction. In addition, the last P frame in a previous GOP is referenced by B frames in the current GOP. This GOP structure is commonly employed in Apple’s HTTP live streaming (HLS). It ends with a P frame, just like a closed GOP. However, unlike a closed GOP, the open GOP fully exploits the last P frame, which is used as a reference for four B frames. As a consequence, fewer P frames may be employed when compared to closed GOP structures, giving rise to a slight improvement in compression efficiency. Note that the I frame now serves as a reference for more frames (5 frames), possibly as many as the P frame. Hence, interprediction is improved over the closed GOP and both I and P frames may be buffered by the decoder for the same period of time (i.e., a time interval corresponding to 5 frames).
For the same number of B frames in an IBBP GOP, two P frames are used for an open GOP compared to three in a closed GOP, giving rise to a smaller GOP length of 9 for the open GOP. The drawback of an open GOP is that it is no longer self-contained and hence, cannot be decoded independently. This will not apply to the rst GOP of the video, which will start with an I frame. Alternative frame patterns of IBP and IBBBP con rm that an additional P frame can be omitted for the open GOP struc- ture, thereby reducing its length by 1 compared to the closed GOP ([IBPBPBPBP] vs P[BIBPBPBP] and [IBBBPBBBP] vs P[BBBIBBBP]).
Another example of an open IBBP GOP structure is shown in Figure 2.13. Again, only two P frames are required for a GOP of length 9. This structure starts with an I frame, just like a closed GOP. In this case, the I frame is used as a reference for four B frames, including two from the previous GOP. Thus, the GOP need not end with a P frame. For the nal GOP of the video, the last two B frames (i.e., B-5 and B-6) are not encoded.
总结下,Open GOP和Closed GOP区别:
closed GOP中,I帧仅用于正向预测,open GOP中,I帧既用于正向预测,也可反向预测 ==> I帧被更多的帧参考 参图2.12
last P帧被更好的利用 <== 因为被下一个GOP的B帧正向预测 参图2.12
open GOP中,GOP第一帧也可以是I,那么GOP最后一帧就不一定是P。(closed GOP最后一帧肯定是P)参图2.13