FFmpeg小白学习记录（二）视频流解码流程

视频解码流程

在对多媒体文件中的视频流解码前，我们先来了解以下流媒体数据的播放流程，可以根据这个流程梳理一下视频解码流程

视频解码流程

音视频播放的原理主要分为：解协议 -> 解封装 -> 解码 -> 音视频同步 -> 播放，不过如果播放文件是本地文件就不需要解协议这一步骤

其中对应数据格式转换流程为：多媒体文件 -> 流 -> 包 -> 帧

蓝色元素块代表具体的数据

紫色元素块代表数据格式

橙色元素块代表数据所对应的协议

白色元素块代表执行的操作

在 FFmpeg 中获取多媒体文件中视频流的数据具体流程如下图：

获取视频流

具体解码流程

接下来，我们根据上述的流程通过FFmpeg对视频流进行解码，具体代码如下：

extern"C" {
#include "libavcodec/avcodec.h"
#include "libavformat/avformat.h"
#include "libswscale/swscale.h"
#include "libavutil/imgutils.h"
}
#include 
using namespace std;

int main() {
    int ret = 0;
    //文件地址
    const char* filePath = "target.mp4";

    //声明所需的变量名
    AVFormatContext* fmtCtx = NULL;
    AVCodecContext* codecCtx = NULL;
    AVCodecParameters* avCodecPara = NULL;
    AVCodec* codec = NULL;

    //包
    AVPacket* pkt = NULL;
    //帧
    AVFrame* frame = NULL;

    do {
        //----------------- 创建AVFormatContext结构体 -------------------
        //内部存放着描述媒体文件或媒体流的构成和基本信息
        fmtCtx = avformat_alloc_context();
        //----------------- 打开本地文件 -------------------
        ret = avformat_open_input(&fmtCtx, filePath, NULL, NULL);
        if (ret) {
            printf("cannot open file\n");
            break;
        }
        //----------------- 获取多媒体文件信息 -------------------
        ret = avformat_find_stream_info(fmtCtx, NULL);
        if (ret < 0) {
            printf("Cannot find stream information\n");
            break;
        }

        //通过循环查找多媒体文件中包含的流信息，直到找到视频类型的流，并记录该索引值
        int videoIndex = -1;
        for (int i = 0; i < fmtCtx->nb_streams; i++) {
            if (fmtCtx->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
                videoIndex = i;
                break;
            }
        }

        //如果videoIndex为-1 说明没有找到视频流
        if (videoIndex == -1) {
            printf("cannot find video stream\n");
            break;
        }

        //打印流信息
        av_dump_format(fmtCtx, 0, filePath, 0);
        //----------------- 查找解码器 -------------------
        avCodecPara = fmtCtx->streams[videoIndex]->codecpar;
        AVCodec* codec = avcodec_find_decoder(avCodecPara->codec_id);
        if (codec == NULL) {
            printf("cannot open decoder\n");
            break;
        }
        //根据解码器参数来创建解码器上下文

        codecCtx = avcodec_alloc_context3(codec);
        ret = avcodec_parameters_to_context(codecCtx, avCodecPara);
        if (ret < 0) {
            printf("parameters to context fail\n");
            break;
        }

        //----------------- 打开解码器 -------------------
        ret = avcodec_open2(codecCtx, codec, NULL);
        if (ret < 0) {
            printf("cannot open decoder\n");
            break;
        }

        //----------------- 创建AVPacket和AVFrame结构体 ------------------- 
        pkt = av_packet_alloc();
        frame = av_frame_alloc();

        //----------------- 读取视频帧 ------------------- 
        int i = 0;      //记录视频帧数
        while (av_read_frame(fmtCtx, pkt) >= 0) {//读取的是一帧视频  数据存入AVPacket结构体中
            //是否对应视频流的帧
            if (pkt->stream_index == videoIndex) {
                //发送包数据去进行解析获得帧数据
                ret = avcodec_send_packet(codecCtx, pkt);
                if (ret == 0) {
                    //接收的帧不一定只有一个，可能为0个或多个
                    //比如：h264中存在B帧，会参考前帧和后帧数据得出图像数据
                    //即读到B帧时不会产出对应数据，直到后一个有效帧读取时才会有数据，此时就有2帧
                    while (avcodec_receive_frame(codecCtx, frame) == 0) {
                        //此处就可以获取到视频帧中的图像数据 -> frame.data
                        //可以通过openCV、openGL、SDL方式进行显示
                        //也可以保存到文件中（需要添加文件头）
                        i++;
                    }
                }
            }
            av_packet_unref(pkt);//重置pkt的内容
        }
        
        //此时缓存区中还存在数据，需要发送空包刷新
        ret = avcodec_send_packet(codecCtx, NULL);
        if (ret == 0) {
            while (avcodec_receive_frame(codecCtx, frame) == 0) {
                i++;
            }
        }
        
        printf("There are %d frames int total.\n", i);
    } while (0);

    //----------------- 释放所有指针 ------------------- 
    avcodec_close(codecCtx);
    avformat_close_input(&fmtCtx);
    av_packet_free(&pkt);
    av_frame_free(&frame);

    return 0;
}

输出结果：

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'target.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.48.100
  Duration: 00:03:10.36, start: 0.000000, bitrate: 773 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709), 1280x720, 442 kb/s, 25 fps, 25 tbr, 90k tbn, 50 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 325 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
There are 4759 frames int total.

4759 ÷ 25 + 0.0 = 190.36 ⇒ 03 :10. 36

在 FFmpeg 中将 JPG、PNG图片文件视为只有一帧的视频流，也可以使用上述解码流程读取图像数据

代码解析

结构体

上述代码中涉及的结构体有：AVFormatContext、AVCodecParameters、AVCodecContext、AVCodec、AVPacket、AVFrame

其中AVFormatContext已经讲解过，此处就不再阐述

AVCodecParameters和AVCodecContext

新的 FFmpeg 中 AVStream.codecpar(struct AVCodecParameter) 代替 AVStream.codec(struct AVCodecContext)：AVCodecParameter 是由 AVCodecContext 分离出来的，AVCodecParameter中没有函数，里面存放着解码器所需的各种参数

AVCodecContext 结构体仍然是编解码时不可或缺的结构体

// 其中截取出部分较为重要的数据
typedef struct AVCodecParameters {
    enum AVMediaType codec_type;    //编解码器的类型（视频，音频...）
    enum AVCodecID codec_id;        //标示特定的编码器
    int bit_rate;                   //平均比特率
    
    int sample_rate;                //采样率（音频）
    int channels;                   //声道数（音频）
    uint64_t channel_layout;        //声道格式
    
    int width, height;              //宽和高（视频）    
    int format;                     //像素格式（视频）/采样格式（音频）
    ...
} AVCodecParameters;

typedef struct AVCodecContext {
    //在AVCodecParameters中的属性，AVCodecContext都有
    struct AVCodec *codec;          //采用的解码器AVCodec（H.264,MPEG2...）
    
    enum AVSampleFormat sample_fmt; //采样格式（音频）
    enum AVPixelFormat pix_fmt;     //像素格式（视频）
    ...
}AVCodecContext;

其中avcodec_parameters_to_context就是将AVCodecParameter的参数传给AVCodecContext

AVCodec

AVCodec解码器结构体，对应一个具体的解码器

// 其中截取出部分较为重要的数据
typedef struct AVCodec {
    const char *name;       //编解码器短名字（形如："h264"）
    const char *long_name;  //编解码器全称（形如："H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10"）
    enum AVMediaType type;  //媒体类型：视频、音频或字母
    enum AVCodecID id;      //标示特定的编码器
    
    const AVRational *supported_framerates; //支持的帧率（仅视频）
    const enum AVPixelFormat *pix_fmts;     //支持的像素格式（仅视频）
    
    const int *supported_samplerates;       //支持的采样率（仅音频）
    const enum AVSampleFormat *sample_fmts; //支持的采样格式（仅音频）
    const uint64_t *channel_layouts;        //支持的声道数（仅音频）
    ...
}AVCodec ;

AVPacket

AVPacket：存储解码前数据的结构体，即包

// 其中截取出部分较为重要的数据
typedef struct AVPacket {
    AVBufferRef *buf;   //管理data指向的数据
    uint8_t *data;      //压缩编码的数据
    int size;           //data的大小
    int64_t pts;        //显示时间戳
    int64_t dts;        //解码时间戳
    int stream_index;   //标识该AVPacket所属的视频/音频流
    ...
}AVPacket ;

AVPacket的内存管理

AVPacket本身并不包含压缩的数据，通过data指针引用数据的缓存空间

可以多个AVPacket共享同一个数据缓存（AVBufferRef、AVBuffer）
av_read_frame(pFormatCtx, packet);  // 读取Packet
av_packet_ref(dst_pkt,packet); // dst_pkt 和 packet 共享同一个数据缓存空间，引用计数+1
av_packet_unref(dst_pkt); // 释放 pkt_pkt 引用的数据缓存空间，引用计数-1

AVFrame

AVFrame：存储解码后数据的结构体，即帧

// 其中截取出部分较为重要的数据
typedef struct AVFrame {
    uint8_t *data[AV_NUM_DATA_POINTERS];    //解码后原始数据（对视频来说是YUV，RGB，对音频来说是PCM）
    int linesize[AV_NUM_DATA_POINTERS];     //data中“一行”数据的大小。注意：未必等于图像的宽，一般大于图像的宽。
    int width, height;  //视频帧宽和高（1920x1080,1280x720...）
    int format;         //解码后原始数据类型（YUV420，YUV422，RGB24...）
    int key_frame;      //是否是关键帧
    enum AVPictureType pict_type;   //帧类型（I,B,P...）
    AVRational sample_aspect_ratio; //图像宽高比（16:9，4:3...）
    int64_t pts;        //显示时间戳
    int coded_picture_number;       //编码帧序号
    int display_picture_number;     //显示帧序号
    
    int nb_samples;     //音频采样数
    ...
}AVFrame ;

函数

avcodec_find_decoder

avcodec_find_decoder根据解码器ID查找到对应的解码器

AVCodec *avcodec_find_decoder(enum AVCodecID id);           //通过id查找解码器
AVCodec *avcodec_find_decoder_by_name(const char *name);    //通过解码器名字查找
/* 与解码器对应的就是编码器，也有相应的查找函数 */
AVCodec *avcodec_find_encoder(enum AVCodecID id);           //通过id查找编码器
AVCodec *avcodec_find_encoder_by_name(const char *name);    //通过编码器名字查找

参数：

enum AVCodecID id：解码器ID，可以从AVCodecParameters中获取

return：

返回一个AVCodec指针，如果没有找到就返回NULL

avcodec_alloc_context3

avcodec_alloc_context3会生成一个AVCodecContext并根据解码器给属性设置默认值

AVCodecContext *avcodec_alloc_context3(const AVCodec *codec);

参数：

const AVCodec *codec：解码器指针，会根据解码器分配私有数据并初始化默认值

return:

返回一个AVCodec指针，如果创建失败则会返回NULL

话说avcodec_alloc_context3函数名中的 3 是什么含义？

avcodec_parameters_to_context

avcodec_parameters_to_context将AVCodecParameters中的属性赋值给AVCodecContext

int avcodec_parameters_to_context(AVCodecContext *codec,
                                  const AVCodecParameters *par){
    //将par中的属性赋值给codec
    codec->codec_type = par->codec_type;
    codec->codec_id   = par->codec_id;
    codec->codec_tag  = par->codec_tag;
    ...
}

参数：

AVCodecContext *codec：需要被赋值的AVCodecContext
const AVCodecParameters *par：提供属性值的AVCodecParameters

return:

返回数值 ≥ 0时代表成功，失败时会返回一个负值

avcodec_open2

avcodec_open2打开音频解码器或者视频解码器

int avcodec_open2(AVCodecContext *avctx, const AVCodec *codec, AVDictionary **options){
    ...
    avctx->codec = codec;
    ...
}

参数：

AVCodecContext *avctx：已经初始化完毕的AVCodecContext
const AVCodec *codec：用于打开AVCodecContext中的解码器，之后AVCodecContext会使用该解码器进行解码
AVDictionary **options：指定各种参数，基本填NULL即可

return:

返回0表示成功，若失败则会返回一个负数

av_read_frame

av_read_frame获取音视频（编码）数据，即从流中获取一个AVPacket数据。将文件中存储的内容分割成包，并为每个调用返回一个包

int av_read_frame(AVFormatContext *s, AVPacket *pkt);

参数：

AVFormatContext *s：AVFormatContext结构体
AVPacket *pkt：通过data指针引用数据的缓存空间，本身不存储数据

return:

返回0表示成功，失败或读到了文件结尾则会返回一个负数

函数为什么是av_read_frame而不是av_read_packet，是早期 FFmpeg 设计时候没有包的概念，而是编码前的帧和编码后的帧，不容易区分。之后才产生包的概念，但出于编程习惯或向前兼容的原因，于是方法名就这样延续了下来

avcodec_send_packet

avcodec_send_packet用于向解码器发送一个包，让解码器进行解析

int avcodec_send_packet(AVCodecContext *avctx, const AVPacket *avpkt);

参数：

AVCodecContext *avctx：AVCodecContext结构体，必须使用avcodec_open2打开解码器
const AVPacket *avpkt：用于解析的数据包

return:

返回0表示成功，失败则返回负数的错误码，异常值说明：

AVERROR(EAGAIN)：当前不接受输出，必须重新发送
AVERROR_EOF：解码器已经刷新，并且没有新的包可以发送
AVERROR(EINVAL)：解码器没有打开，或者这是一个编码器
AVERRO(ENOMEN)：无法添加包到内部队列

avcodec_receive_frame

avcodec_receive_frame获取解码后的音视频数据（音视频原始数据，如YUV和PCM）

int avcodec_receive_frame(AVCodecContext *avctx, AVFrame *frame);

参数：

AVCodecContext *avctx：AVCodecContext结构体
AVFrame *frame：用于接收解码后的音视频数据的帧

return:

返回0表示成功，其余情况表示失败，异常值说明：

AVERROR(EAGAIN)：此状态下输出不可用，需要发送新的输入才能解析
AVERROR_EOF：解码器已经刷新，并且没有新的包可以发送
AVERROR(EINVAL)：解码器没有打开，或者这是一个编码器

调用avcodec_receive_frame方法时不需要通过av_packet_unref解引用，因为在该方法内部已经调用过av_packet_unref方法解引用

严格来说，除AVERROR(EAGAIN)和AVERROR_EOF两种错误情况之外的报错，应该直接退出程序

释放资源函数

avcodec_close(codecCtx);
avformat_close_input(&fmtCtx);
av_packet_free(&pkt);
av_frame_free(&frame);

只有我们手动申请的资源才需要我们手动进行释放，其余的资源FFmpeg会自动释放，重复调用会报错

如：AVCodecParameters是AVFormatContext内部的资源，就不需要我们手动释放，在avformat_close_input函数中会对其进行释放，不需要我们通过avcodec_parameters_free释放

使用OpenCV显示视频图像

因为最近还在学OpenCV处理图像，而且相比于SDL、OpenGL或ANativeWindow流程更加简洁，所以这里就通过OpenCV显示视频图像

extern"C" {
#include "libavcodec/avcodec.h"
#include "libavformat/avformat.h"
#include "libswscale/swscale.h"
#include "libavutil/imgutils.h"
}
#include 
using namespace std;

#include 
using namespace cv;

int main() {
    int ret = 0;
    //文件地址
    const char* filePath = "target.mp4";

    //声明所需的变量名
    AVFormatContext* fmtCtx = NULL;
    AVPacket* pkt = NULL;
    AVCodecContext* codecCtx = NULL;
    AVCodecParameters* avCodecPara = NULL;
    AVCodec* codec = NULL;

    //帧，并进行初始化
    AVFrame* rgbFrame = av_frame_alloc();
    AVFrame* yuvFrame = av_frame_alloc();

    do {
        ...
        //----------------- 设置数据转换参数 -------------------        
        struct SwsContext* img_ctx = sws_getContext(
            codecCtx->width, codecCtx->height, codecCtx->pix_fmt, //源地址长宽以及数据格式
            codecCtx->width, codecCtx->height, AV_PIX_FMT_BGR24,  //目的地址长宽以及数据格式
            SWS_BICUBIC, NULL, NULL, NULL);
        
        //一帧图像数据大小，会根据图像格式、图像宽高计算所需内存字节大小
        int numBytes = av_image_get_buffer_size(AV_PIX_FMT_BGR24, codecCtx->width, codecCtx->height, 1);
        unsigned char* out_buffer = (unsigned char*)av_malloc(numBytes * sizeof(unsigned char));

        //将rgbFrame中的数据以BGR格式存放在out_buffer中
        av_image_fill_arrays(rgbFrame->data, rgbFrame->linesize, out_buffer, AV_PIX_FMT_BGR24, codecCtx->width, codecCtx->height, 1);

        //创建OpenCV中的Mat
        Mat img = Mat(Size(codecCtx->width, codecCtx->height), CV_8UC3);

        while (av_read_frame(fmtCtx, pkt) >= 0) {
            if (pkt->stream_index == videoIndex) {
                ret = avcodec_send_packet(codecCtx, pkt);
                if (ret == 0) {
                    while (avcodec_receive_frame(codecCtx, yuvFrame) == 0) {
                        //mp4文件中视频流使用的是h.264，帧图像为yuv420格式
                        //通过sws_scale将数据转换为BGR格式
                        sws_scale(img_ctx,
                            (const uint8_t* const*)yuvFrame->data,
                            yuvFrame->linesize,
                            0,
                            codecCtx->height,
                            rgbFrame->data,
                            rgbFrame->linesize);

                        img.data = rgbFrame->data[0];

                        imshow("img", img);
                        waitKey(40);
                    }
                }
            }
            sws_freeContext(img_ctx);//释放SwsContext结构体
            av_packet_unref(pkt);//重置pkt的内容
        }
        ...
    } while (0);

    //----------------- 释放所有指针 ------------------- 
    av_packet_free(&pkt);
    avcodec_close(codecCtx);
    avformat_close_input(&fmtCtx);
    av_free(codec);

    av_frame_free(&rgbFrame);
    av_frame_free(&yuvFrame);

    return 0;
}

代码解析

结构体

libswscale库用于视频场景比例缩放、色彩映射转换；图像颜色空间或格式转换，而SwsContext结构体贯穿整个变换流程，其中存放变换所需的参数

// 其中截取出部分较为重要的数据 
typedef struct SwsContext {
    int srcW;                   //源图像中亮度/alpha的宽度
    int srcH;                   //源图像中亮度/alpha的高度
    int dstH;                   //目标图像中的亮度/alpha的宽度
    int dstW;                   //目标图像中的亮度/alpha的高度
    int chrSrcW;                //源图像中色度的宽度
    int chrSrcH;                //源图像中色度的高度
    int chrDstW;                //目标图像中色度的宽度
    int chrDstH;                //目标图像中色度的高度
    enum AVPixelFormat dstFormat;   //目标图像的格式，如：YUV420P、YUV444、RGB、RGBA、GRAY等
    enum AVPixelFormat srcFormat;   //源图像的格式
    int needAlpha;              //是否存在透明度
    int flags;                  //选择、优化、子采样算法的flag标识
    ...
 } SwsContext;

函数

sws_getContext

sws_getContext用来创建并返回SwsContext结构体

struct SwsContext *sws_getContext(int srcW, int srcH, enum AVPixelFormat srcFormat,
                                  int dstW, int dstH, enum AVPixelFormat dstFormat,
                                  int flags, SwsFilter *srcFilter,
                                  SwsFilter *dstFilter, const double *param);

参数：

int srcW：源图像的宽
int srcH：源图像的高
enum AVPixelFormat srcFormat：源图像的格式，如：YUV420P、YUV444、RGB、RGBA、GRAY等
int dstW：目标图像的宽
int dstH：目标图像的高
enum AVPixelFormat dstFormat：目标图像的格式
int flags：指定算法进行缩放插值
SwsFilter *srcFilter、SwsFilter *dstFilter：与Chroma/luminsence滤波相关，一般填NULL即可
const double *param：用于scalar的额外的数据，一般填NULL即可

return:

返回一个指向SwsContext的指针，或者出现错误的时候返回NULL

int flags：指定算法类型

#define SWS_FAST_BILINEAR     1        //选择快速双线性缩放算法
#define SWS_BILINEAR          2        //选择双线性缩放算法
#define SWS_BICUBIC           4        //选择双三次缩放算法
#define SWS_X                 8        
#define SWS_POINT          0x10
#define SWS_AREA           0x20
#define SWS_BICUBLIN       0x40
#define SWS_GAUSS          0x80
#define SWS_SINC          0x100
#define SWS_LANCZOS       0x200
#define SWS_SPLINE        0x400

其中具体根据需求选择合适的算法，可以看一下如何选择swscale中的缩放算法

其实还有一个获取SwsContext结构体的函数——sws_getCachedContext，这个函数会根据参数去检验参数中的SwsContext是否符合之后输入的参数，符合就直接返回该结构体指针进行复用，若不符合则会进行释放，然后根据参数创建一个新的SwsContext结构体并返回其指针
struct SwsContext *sws_getCachedContext(struct SwsContext *context,
                                        int srcW, int srcH, enum AVPixelFormat srcFormat,
                                        int dstW, int dstH, enum AVPixelFormat dstFormat,
                                        int flags, SwsFilter *srcFilter,
                                        SwsFilter *dstFilter, const double *param);

av_image_fill_arrays

该函数会根据图像类型参数、数组参数和宽高设置数据指针和线宽值

int av_image_fill_arrays(uint8_t *dst_data[4], int dst_linesize[4],
                         const uint8_t *src,
                         enum AVPixelFormat pix_fmt, int width, int height, int align);

参数：

uint8_t *dst_data[4]：要进行填充的数据指针
int dst_linesize[4]：填充的图像的线宽值
const uint8_t *src：包含或之后会包含实际的图像数据
enum AVPixelFormat pix_fmt：图像格式
int width：图像的宽
int height：图像的高
int align：是否根据线宽对src进行对齐调整

return:

成功返回src所需的字节大小，失败会返回一个负数的错误码

sws_scale

sws_scale函数会根据SwsContext中设置的参数，将源图像转换为对应属性的目标图像，其中srcSlice必须是以图像中连续的行序列为顺序的二维数组

int sws_scale(struct SwsContext *c, const uint8_t *const srcSlice[],
              const int srcStride[], int srcSliceY, int srcSliceH,
              uint8_t *const dst[], const int dstStride[]);

参数：

struct SwsContext *c：之前创建的SwsContext结构体
const uint8_t *const srcSlice[]：包含指向源数据平面的指针的数组，如：yuv420p中 y、u、v 数据分别存放，可以视为3个uint8_t数组
const int srcStride[]：对应每个源数据平面的长度
int srcSliceY：开始处理的y坐标位置
int srcSliceH：源数据平面的高度，即对应数组的长度
uint8_t *const dst[]：目标图像数据指针
const int dstStride[]：目标图像每个源数据平面的长度

return:

输出目标图像源数据平面的高度

RGB类型的图像R、G、B值混合存放所以无法将RGB抽离成对应的三个连续的数组，其数据全部存放在 data[0] 中，所以可以直接使用img.data = rgbFrame->data[0]给Mat赋值

而对于YUV格式的图像数据，按照img.data = rgbFrame->data[0]的方式给Mat赋值就会报错

资料参考

微信公众号：八小时程序员

FFmpeg4入门05：解码视频流过程

FFmpeg小白学习记录（二）视频流解码流程

视频解码流程

具体解码流程

代码解析

结构体

函数

使用OpenCV显示视频图像

代码解析

结构体

函数

资料参考

你可能感兴趣的:(FFmpeg小白学习记录（二）视频流解码流程)