FFmpeg音视频文件格式探测

一.背景说明

用户反映，某课程MP3文件在安卓机上可以播放，苹果机上不能播放。

二.测试分析

拿到该MP3文件的链接，测试如下：

不能播放：safari浏览器，微信浏览器，iOS AVPlayer

可以播放：谷歌浏览器，ffplay，VLC

下载并查看该文件的格式，发现实际上不是MP3格式而是MP4格式，修改文件后缀名为MP4后，使用AVPlayer可正常播放。

那么问题来了，当文件后缀名错误的时候，为什么有些播放器可以正确播放呢？

初步猜测是：

这些播放器不仅仅使用音视频文件的后缀名去判断格式，而是探测到了音视频文件的真正格式，使用了正确的格式去解析。

ffplay是基于ffmpeg的播放器。通过阅读ffmpeg的源码可以了解它的音视频格式探测逻辑。

三.`avformat_open_input`函数

avformat_open_input:此函数是打开媒体文件，初始化AVFormatContext结构体的关键函数。filename此处传入的是mp3文件的本地路径。

/*
ps：AVFormatContext结构体的二级指针
filename:文件名，/var/containers/Bundle/Application/A5CD4766-08A9-4A3D-AC75-0FC4B3B7EF13/IJKMediaDemo.app/input.mp3
fmt：AVInputFormat，如果外部传了，就不会去探测实际格式。此处为NULL。
options：字典，此处可忽略。
*/

int avformat_open_input(AVFormatContext **ps, const char *filename,
                        AVInputFormat *fmt, AVDictionary **options)

四.`init_input`函数

init_input是avformat_open_input内部函数，是探测音视频格式的关键函数。

/*
ps：AVFormatContext结构体的二级指针
filename:文件名，/var/containers/Bundle/Application/A5CD4766-08A9-4A3D-AC75-0FC4B3B7EF13/IJKMediaDemo.app/input.mp3
options：字典，此处可忽略。
*/
static int init_input(AVFormatContext *s, const char *filename,
                      AVDictionary **options)

当执行完此函数后，可以看到
AVFormatContext结构体中iformat（AVInputFormat结构体）已经准确识别到了文件的实际格式:

1588925437675.jpg

五.FFmpeg的音视频探测逻辑

分析init_input的内部逻辑，使用输入文件的结构体AVProbeData依次匹配ffmpeg中链表结构的AVInputFormat，从以下三个维度进行匹配打分，并返回得分最高的AVInputFormat（低于25分则探测失败）。

匹配过程

1.扩展名匹配：不需要读取文件。使用av_match_ext函数比较输入媒体的扩展名和AVInputFormat的扩展名，匹配成功则为50分。

AVPROBE_SCORE_EXTENSION 50 ///< score for file extension

2.资源的媒体类型匹配:不需要读取文件。使用av_match_name函数比较输入媒体的mime_type和AVInputFormat的mime_type，匹配成功则为75分。

AVPROBE_SCORE_MIME 75 ///< score for file mime type

3.读取文件头部数据进行格式匹配：使用read_probe函数解析文件头部数据，一般解析成功则为100分（最高得分）。

AVPROBE_SCORE_MAX 100 ///< maximum score

此mp3文件在read_probe流程中，在mov.c文件的mov_probe函数中匹配格式成功，返回100分。并将此AVInputFormat赋值给AVFormatContext结构体的iformat。完成探测工作。

mp4格式的探测逻辑如下：可以看到利用了MP4结构的box进行解析。

static int mov_probe(AVProbeData *p)
{
    int64_t offset;
    uint32_t tag;
    int score = 0;
    int moov_offset = -1;

    /* check file header */
    offset = 0;
    for (;;) {
        /* ignore invalid offset */
        if ((offset + 8) > (unsigned int)p->buf_size)
            break;
        tag = AV_RL32(p->buf + offset + 4);
        switch(tag) {
        /* check for obvious tags */
        case MKTAG('m','o','o','v'):
            moov_offset = offset + 4;
        case MKTAG('m','d','a','t'):
        case MKTAG('p','n','o','t'): /* detect movs with preview pics like ew.mov and april.mov */
        case MKTAG('u','d','t','a'): /* Packet Video PVAuthor adds this and a lot of more junk */
        case MKTAG('f','t','y','p'):
            if (AV_RB32(p->buf+offset) < 8 &&
                (AV_RB32(p->buf+offset) != 1 ||
                 offset + 12 > (unsigned int)p->buf_size ||
                 AV_RB64(p->buf+offset + 8) == 0)) {
                score = FFMAX(score, AVPROBE_SCORE_EXTENSION);
            } else if (tag == MKTAG('f','t','y','p') &&
                       (   AV_RL32(p->buf + offset + 8) == MKTAG('j','p','2',' ')
                        || AV_RL32(p->buf + offset + 8) == MKTAG('j','p','x',' ')
                    )) {
                score = FFMAX(score, 5);
            } else {
                score = AVPROBE_SCORE_MAX;
            }
            offset = FFMAX(4, AV_RB32(p->buf+offset)) + offset;
            break;
        /* those are more common words, so rate then a bit less */
        case MKTAG('e','d','i','w'): /* xdcam files have reverted first tags */
        case MKTAG('w','i','d','e'):
        case MKTAG('f','r','e','e'):
        case MKTAG('j','u','n','k'):
        case MKTAG('p','i','c','t'):
            score  = FFMAX(score, AVPROBE_SCORE_MAX - 5);
            offset = FFMAX(4, AV_RB32(p->buf+offset)) + offset;
            break;
        case MKTAG(0x82,0x82,0x7f,0x7d):
        case MKTAG('s','k','i','p'):
        case MKTAG('u','u','i','d'):
        case MKTAG('p','r','f','l'):
            /* if we only find those cause probedata is too small at least rate them */
            score  = FFMAX(score, AVPROBE_SCORE_EXTENSION);
            offset = FFMAX(4, AV_RB32(p->buf+offset)) + offset;
            break;
        default:
            offset = FFMAX(4, AV_RB32(p->buf+offset)) + offset;
        }
    }
    if(score > AVPROBE_SCORE_MAX - 50 && moov_offset != -1) {
        /* moov atom in the header - we should make sure that this is not a
         * MOV-packed MPEG-PS */
        offset = moov_offset;

        while(offset < (p->buf_size - 16)){ /* Sufficient space */
               /* We found an actual hdlr atom */
            if(AV_RL32(p->buf + offset     ) == MKTAG('h','d','l','r') &&
               AV_RL32(p->buf + offset +  8) == MKTAG('m','h','l','r') &&
               AV_RL32(p->buf + offset + 12) == MKTAG('M','P','E','G')){
                av_log(NULL, AV_LOG_WARNING, "Found media data tag MPEG indicating this is a MOV-packed MPEG-PS.\n");
                /* We found a media handler reference atom describing an
                 * MPEG-PS-in-MOV, return a
                 * low score to force expanding the probe window until
                 * mpegps_probe finds what it needs */
                return 5;
            }else
                /* Keep looking */
                offset+=2;
        }
    }

    return score;
}

六.思考

一般来说，后台会将不同格式的音视频转码成固定格式，比如mp4,aac等。在这种情况下，前端播放器直接指定AVInputFormat效率会更高：

 is->iformat =  av_find_input_format("mp4");
 err = avformat_open_input(&ic, is->filename, is->iformat, &ffp->format_opts);

av_find_input_format的内部实现，用fmt->name来匹配：

AVInputFormat *av_find_input_format(const char *short_name)
{
    AVInputFormat *fmt = NULL;
    while ((fmt = av_iformat_next(fmt))) //遍历fmt链表
        if (av_match_name(short_name, fmt->name))
            return fmt;
    return NULL;
}

推荐阅读：
雷霄骅:FFmpeg源代码简单分析：avformat_open_input()