ffmpeg 设置关键帧

本文要说三个相关联的概念(GOP, scenecut和keyframe),这几个概念对于理解frame pattern,或者coding structure还是非常基础非常重要的。对于编码参数设置、解码性能优化、流媒体配置都非常重要。

##概要

GOP: Group of picture 图像组
这个概念不需要细说。从类型上说,分为Closed GOP和Open GOP两种。具体见内文。
scenecut
检测场景切换的工具,又可以通俗的称为自适应I帧选择。具体内文。
keyframe
常说的关键帧,在x264,里可以等同于IDR帧。
scenecut 场景检测,自适应I帧选择

scenecut 字面意思是场景切换,最终导致的结果是自适应I帧决策(adaptive I-frame decision)。而说起场景切换的依据,则又是结果导向的反推,完全是码率的决策。即,当前帧编码为P帧与编码为I帧差距小于某阈值,(优先)将该帧选择为I帧。(这就是编码器的核心思想:编码过程中任何算法理论的实践都以最终编码性能作为评判标准)

具体,参:akupenguin 2007-01-22
1)encode as (a really fast approximation of) a P-frame and an I-frame. 快速选择阶段

if ((keyframe-distance) > keyint) then
    set IDR-frame
else if (1 - (P-frame bits) / (I-frame bits) < (scenecut / 100) * (keyframe-distance) / keyint)
    if ((keyframe-distance) >= minkeyint) then
        set IDR-frame
    else
        set I-frame
else
    set P-frame
//! keyframe-distance: from previous keyframe    距离越大越倾向设置I帧(线性关系不合理)
1
2
3
4
5
6
7
8
9
10
首先,--keyint设置关键帧的最大间距,达到该间距,设置为IDR帧,没毛病;
其次,满足scenecut,场景切换来了,--min-keyint设置最小关键帧间距,如未达到要求,设置为普通I帧,否则为IDR帧。(顺便插一句,如果插入普通I帧,这个GOP就有两个I帧喽)
关于计算公式:
1)默认scenecut 40%,即P帧bits > I帧 bits * 60%时,认为scenecut。即设置40%,I帧可以比P帧多用至多2/3的bits。
2)与上一关键帧间距有关,间距越大,约应该设置为I帧或关键帧。
2)encode for real. 实际编码阶段

信息查看

keyframe
ffporbe查看frame属性’key_frame’`
scenecut
-f lavfi 利用filter
x264 最新代码

static int scenecut_internal( x264_t *h, 
                              x264_mb_analysis_t *a, 
                              x264_frame_t **frames, 
                              int p0, 
                              int p1, 
                              int real_scenecut)

    float f_thresh_max = h->param.i_scenecut_threshold / 100.0;
    float f_thresh_min = f_thresh_max * 0.25;
    
    if( h->param.i_keyint_min == h->param.i_keyint_max )
        f_thresh_min = f_thresh_max;
    if( i_gop_size <= h->param.i_keyint_min / 4 || h->param.b_intra_refresh )
        f_bias = f_thresh_min / 4;
    else if( i_gop_size <= h->param.i_keyint_min )
        f_bias = f_thresh_min * i_gop_size / h->param.i_keyint_min;
    else
    {
        f_bias = f_thresh_min
                 + ( f_thresh_max - f_thresh_min )
                 * ( i_gop_size - h->param.i_keyint_min )
                 / ( h->param.i_keyint_max - h->param.i_keyint_min );
    }
    res = pcost >= (1.0 - f_bias) * icost;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
在实现中可以看到,即便当前的distance小于--keyint-min,也还是有几率识别为scenecut,另外还处理了--keyint==--keyint-min等情况,健壮性提升了。

##open GOP和closed GOP:

2.7.5 Open GOP

Figure 2.12 shows an example of an open GOP structure. A closed GOP with an IBBP pattern starts with an I frame whereas an open GOP with the same pattern may start with a B frame. Unlike the closed GOP, both I and P frames can be used for forward or backward prediction. In addition, the last P frame in a previous GOP is referenced by B frames in the current GOP. This GOP structure is commonly employed in Apple’s HTTP live streaming (HLS). It ends with a P frame, just like a closed GOP. However, unlike a closed GOP, the open GOP fully exploits the last P frame, which is used as a reference for four B frames. As a consequence, fewer P frames may be employed when compared to closed GOP structures, giving rise to a slight improvement in compression efficiency. Note that the I frame now serves as a reference for more frames (5 frames), possibly as many as the P frame. Hence, interprediction is improved over the closed GOP and both I and P frames may be buffered by the decoder for the same period of time (i.e., a time interval corresponding to 5 frames).
For the same number of B frames in an IBBP GOP, two P frames are used for an open GOP compared to three in a closed GOP, giving rise to a smaller GOP length of 9 for the open GOP. The drawback of an open GOP is that it is no longer self-contained and hence, cannot be decoded independently. This will not apply to the rst GOP of the video, which will start with an I frame. Alternative frame patterns of IBP and IBBBP con rm that an additional P frame can be omitted for the open GOP struc- ture, thereby reducing its length by 1 compared to the closed GOP ([IBPBPBPBP] vs P[BIBPBPBP] and [IBBBPBBBP] vs P[BBBIBBBP]).


Another example of an open IBBP GOP structure is shown in Figure 2.13. Again, only two P frames are required for a GOP of length 9. This structure starts with an I frame, just like a closed GOP. In this case, the I frame is used as a reference for four B frames, including two from the previous GOP. Thus, the GOP need not end with a P frame. For the nal GOP of the video, the last two B frames (i.e., B-5 and B-6) are not encoded.
总结下,Open GOP和Closed GOP区别:

closed GOP中,I帧仅用于正向预测,open GOP中,I帧既用于正向预测,也可反向预测 ==> I帧被更多的帧参考 参图2.12
last P帧被更好的利用 <== 因为被下一个GOP的B帧正向预测 参图2.12
open GOP中,GOP第一帧也可以是I,那么GOP最后一帧就不一定是P。(closed GOP最后一帧肯定是P)参图2.13
 

你可能感兴趣的:(ffmpeg 设置关键帧)