ffmpeg 利用AVIOContext自定义IO 输出结果写buffer

前言

工程开发中, 需要用到强大的音视频处理集成工具ffmpeg来实现音频的转码. 我们的需求是, 转码后的文件, 不落盘, 直接存到缓存中, 提供下一个模块使用.

我们是C++工程, 直接读写缓存的方式来传递音频数据, 所以直接调用ffmpeg c api来实现这个功能是更简单直接的方案.

虽说ffmpeg的例子满天飞, 真正使用api来实现缓存读写的, 真是寥寥无几. 我在doc/examples/transcoding.c的基础上, 增加了输出结果写到buffer的功能. 测试下来, 发现我buffer中的音频数据有效时长一直不对, 且文件体大小比命令行的大一倍. 针对这两点, 进行了问题排查, 花了我一周多的时间才最终修复.

名词解释

在讲解代码前, 需要对相关名词术语做个简单的介绍, 有助于理解后面的代码:

  • AVFormatContext : 存储音视频数据的容器类
  • Muxer: 封装器, 将编码数据, 以AVPackets的形式写入指定格式的容器文件里
  • Demuxer: 分离器, 读取音视频文件, 并切分成AVPackets的形式
  • codec: 编解码算法, 主要分为有损和无损
  • encoder/decoder: 编码器/解码器
  • container: 容器格式, 即音频流, 视频流和字幕流等多个流共存存储到特定格式的文件中的协议
  • transcoding: 转码功能, 即将音视频从一种编码, 转换到另一种编码,
  • transmuxing: 容器格式转换, 即将音视频从一种容器格式转换到另一种容器格式, e.g. mp4转成mkv
  • sample rate : 样本率, 即每秒中的样本量, 也叫HERTZ, e.g. 48kHz就是48000样本/秒
  • bit rate : 每秒传输的bit的数量
  • sample format : 表达一个样本时使用的bit位数, 如16bit

FFMPEG 架构图

ffmpeg 利用AVIOContext自定义IO 输出结果写buffer_第1张图片
流程大体如下:
输入的音视频文件, 先用分离器做分离(demuxing), 将音频流, 视频流, 字幕流等分离出来. 接着将编码的数据, 以packet为单位, 发送给解码器(decoding), 开始做解码. 解码器将packet根据指定的codec_id解码解压成frame. 这些frame, 可以进一步做过滤(filtering)如声道切分, 降频等操作. 之后, 根据指定的codec_id, 将frame编码成packets. 最后, 这些packets通过封装器(muxter)进行封装打包, 输出到文件中. 这就是ffmpeg处理音视频的完整的流程.

更加详细的流程图, 可参考 雷霄桦的PPT中的图, 如下:
ffmpeg 利用AVIOContext自定义IO 输出结果写buffer_第2张图片
做个简单说明:

  • protocol layer 协议层: 负责读写的文件, ffmpeg支持的文件形式较多, 如本地, HTTP或RTMP等
  • format layer容器层: 通过demuxer和muxer, 从文件中读写元信息的同时, 将多个流, 包括音频流, 视频流和字幕流等做分离和封装操作
  • codec layer编解码层: 通过decoder和encoder, 在编码形式的packet和解压形式的frame之间做相互转换
  • pixel layer过滤层: 对frame数据做各种过滤和转换.

常见操作

解码Decoding

如下图, video.mp4是封装了aac编码的音频流和h264编码的视频流的mp4格式的容器文件. AVFormatContext类的实例, 读取该文件, 并分离出音频流(aac AVStream)和视频流(h264 AVStream). 接着从各个AVStream中, 读取数据块AVPacket, 并根据指定的编码器(aac/h264)进行解码, 输出无压缩的AVFrame. 这就是典型的解码过程.
ffmpeg 利用AVIOContext自定义IO 输出结果写buffer_第3张图片

转码transcoding

转码是指从一种编码转换成另一种编码. 下图例子中, 将视频流的编码从h264转换为h265, 命令行:

ffmpeg -i video.mp4 -c:v libx265 video_h265.mp4

ffmpeg 利用AVIOContext自定义IO 输出结果写buffer_第4张图片

转(封装)格式 transmuxing

转封装(容器)格式是指从一种容器格式转换为另一种容器格式, 命令行如下, 其中, 转换过程中无编解码过程, 是直接拷贝原packet, 组成新的容器格式.

ffmpeg -i video.mp4 -c copy video.webm

ffmpeg 利用AVIOContext自定义IO 输出结果写buffer_第5张图片

ffmpeg源代码编译

ffmpeg源代码的doc/example目录下, ffmpeg提供了很多调用api的例子. 下面是我编译一些例子时, 遇到的一些问题:

本地跑avio_reading.c时注意事项:

○ avformat, avcodec, avutil的linking顺序很重要, 具体可参考Makefile
○ swr_convert, swr_init, swr_close等出现undefined reference, 则加 -lswresample
○ 出现sin, cosin, log等undefined reference, 加 -lm
○ 出现pthread相关undefined reference, 加-lpthread
○ 出现压缩相关的报错, 加-lz
○ 出现lzma_* 相关报错的, 加-llzma

gcc -Wall -o avio_reading avio_reading.c -I./rpm/usr/local/include -L./rpm/usr/local/lib -lavformat -lavcodec -lavutil -lswresample -lm -lpthread -lz -llzma

filtering_audio.c也是类似:

gcc -Wall -o filtering_audio doc/examples/filtering_audio.c -I./rpm/usr/local/include -L./rpm/usr/local/lib -lavfilter -lavformat -lavcodec -lavutil -lswresample -lswscale -lm -lpthread -lz -llzma

打印ffmpeg中的av_log()时, 可在main()函数中设置:

int main(int argc, char **argv)
{
    av_log_set_level(AV_LOG_DEBUG);
 }

代码讲解

(我们应用场景中处理的是音频, 所以讲解以音频为例)
transcoding.c代码中实现的就是典型的转码的过程, 输入到解码, 过滤, 编码, 加输出到文件.

我在transcoding.c的基础上, 修改了输出相关的逻辑, 使其不输出文件而是输出到指定的buffer中. 其中, 遇到了两个问题:

  1. buffer写到文件后, 发现时长总是不对, 比实际时长大了好多.
  2. 文件大小也与命令行输出的文件结果不同, 偏大.

经排查, 发现原因:

  1. 使用ffmpeg custom IO接口来自定义输出形式时, 需要提供wrie()和seek()两个函数. 音频的有效时长, 是在所有的数据写进buffer后, 知道了有效时长, 则通过seek()函数定位到文件头部, 并在指定的存储文件时长的点, 进行有效时长的更新. 这意味着, write()和seek()函数定义, 缺一不可!
  2. buffer内容写到文件时, 未限定有效的buffer中的数据长度, 而是整个buffer都写进文件中, 导致音频文件最后, 多了很多空值.

基于上述两个原因, 分别进行了修复后, 现在代码效果符合预期!

附上代码: 在最后!!!

本人家属在杭州零售内蒙古高品质牛羊肉, 想品尝内蒙纯正"苏尼特羔羊肉"和"科尔沁牛肉"的同学们, 可以加群.

本店江浙沪满300包邮!吃过的都说好!
ffmpeg 利用AVIOContext自定义IO 输出结果写buffer_第6张图片

References

  • ffmpeg-libav-tutorial : https://github.com/leandromoreira/ffmpeg-libav-tutorial/blob/master/README.md
  • ffmpeg AVIOContext 自定义 IO 及 seek https://segmentfault.com/a/1190000021378256
  • ffmpeg documentation: https://ffmpeg.org/ffmpeg.html 再看, 讲得好清晰易懂
  • ffmpeg开发代码例子: http://leixiaohua1020.github.io/#ffmpeg-development-examples
  • ffmpeg PPT, 有很多例子, 讲解很清晰: https://slhck.info/ffmpeg-encoding-course/#/
  • 专门做音视频处理相关研究的: http://leixiaohua1020.github.io/#ffmpeg-development-examples
  • Werner Robitza’s must read about rate control: https://slhck.info/posts/
  • ffmpeg custom I/O https://mw.gl/posts/ffmpeg_custom_io/

Thanks

transcoding.c 源代码

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

struct buffer_data
{
    uint8_t *buf;
    size_t size;
    uint8_t *ptr;
    size_t room; ///< size left in the buffer
};

static AVFormatContext *ifmt_ctx;
static AVFormatContext *ofmt_ctx;
struct buffer_data bd_zm = {0};
AVIOContext *avio_ctx_zm = NULL;

typedef struct FilteringContext {
    AVFilterContext *buffersink_ctx;
    AVFilterContext *buffersrc_ctx;
    AVFilterGraph *filter_graph;

    AVPacket *enc_pkt;
    AVFrame *filtered_frame;
} FilteringContext;
static FilteringContext *filter_ctx;

typedef struct StreamContext {
    AVCodecContext *dec_ctx;
    AVCodecContext *enc_ctx;

    AVFrame *dec_frame;
} StreamContext;
static StreamContext *stream_ctx;

static int iowrite_to_buffer(void *opaque, uint8_t *buf, int buf_size)
{
    struct buffer_data *bd = (struct buffer_data *)opaque;
    while (buf_size > bd->room)
    {
        int64_t offset = bd->ptr - bd->buf;
        bd->buf = av_realloc_f(bd->buf, 2, bd->size);
        if (!bd->buf)
            return AVERROR(ENOMEM);
        bd->size = bd->size*2;
        bd->ptr = bd->buf + offset;
        bd->room = bd->size - offset;
    }
    /* copy buffer data to buffer_data buffer */
    memcpy(bd->ptr, buf, buf_size);
    bd->ptr += buf_size;
    bd->room -= buf_size;
    printf("write packet pkt_size:%d used_buf_size:%zu buf_size:%zu buf_room:%zu\n", buf_size, bd->ptr - bd->buf, bd->size, bd->room);
    return buf_size;
}
// whence: SEEK_SET, SEEK_CUR, SEEK_END (like fseek) and AVSEEK_SIZE
// https://segmentfault.com/a/1190000021378256
static int64_t seek_buffer(void *ptr, int64_t pos, int whence)
{
    struct buffer_data *bd = (struct buffer_data *)ptr;
    int64_t ret = -1;

    switch (whence)
    {
    case AVSEEK_SIZE:
        ret = bd->size;
        break;
    case SEEK_SET:
        bd->ptr = bd->buf + pos;
        // bd->room = bd->size - pos;
        ret = bd->ptr;
        break;
    }
    printf("whence=%d , offset=%ld , buffer_size=%ld, buffer_room=%ld\n", whence, pos, bd->size, bd->room);
    return ret;
}
static int ioread_from_buffer(void *opaque, uint8_t *buf, int buf_size)
{
    struct buffer_data *bd = (struct buffer_data *)opaque;
    buf_size = FFMIN(buf_size, bd->room); //bd->size改为bd->room

    if (!buf_size)
        return AVERROR_EOF;

    /* copy internal buffer data to buf */
    memcpy(buf, bd->ptr, buf_size);
    bd->ptr += buf_size;
    // bd->room -= buf_size; //bd->size -= buf_size
    printf("ptr:%p size:%zu\n", bd->ptr, bd->size);
    return buf_size;
}

static int open_input_file(const char *filename)
{
    int ret;
    unsigned int i;

    ifmt_ctx = NULL;
    if ((ret = avformat_open_input(&ifmt_ctx, filename, NULL, NULL)) < 0) {
        av_log(NULL, AV_LOG_ERROR, "Cannot open input file\n");
        return ret;
    }
    if ((ret = avformat_find_stream_info(ifmt_ctx, NULL)) < 0) {
        av_log(NULL, AV_LOG_ERROR, "Cannot find stream information\n");
        return ret;
    }
    stream_ctx = av_calloc(ifmt_ctx->nb_streams, sizeof(*stream_ctx));
    if (!stream_ctx)
        return AVERROR(ENOMEM);

    for (i = 0; i < ifmt_ctx->nb_streams; i++) {
        AVStream *stream = ifmt_ctx->streams[i];
        const AVCodec *dec = avcodec_find_decoder(stream->codecpar->codec_id);
        AVCodecContext *codec_ctx;
        if (!dec) {
            av_log(NULL, AV_LOG_ERROR, "Failed to find decoder for stream #%u\n", i);
            return AVERROR_DECODER_NOT_FOUND;
        }
        codec_ctx = avcodec_alloc_context3(dec);
        if (!codec_ctx) {
            av_log(NULL, AV_LOG_ERROR, "Failed to allocate the decoder context for stream #%u\n", i);
            return AVERROR(ENOMEM);
        }
        ret = avcodec_parameters_to_context(codec_ctx, stream->codecpar);
        if (ret < 0) {
            av_log(NULL, AV_LOG_ERROR, "Failed to copy decoder parameters to input decoder context "
                   "for stream #%u\n", i);
            return ret;
        }
        /* Reencode video & audio and remux subtitles etc. */
        if (codec_ctx->codec_type == AVMEDIA_TYPE_VIDEO
                || codec_ctx->codec_type == AVMEDIA_TYPE_AUDIO) {
            if (codec_ctx->codec_type == AVMEDIA_TYPE_VIDEO)
                codec_ctx->framerate = av_guess_frame_rate(ifmt_ctx, stream, NULL);
            /* Open decoder */
            ret = avcodec_open2(codec_ctx, dec, NULL);
            if (ret < 0) {
                av_log(NULL, AV_LOG_ERROR, "Failed to open decoder for stream #%u\n", i);
                return ret;
            }
        }
        stream_ctx[i].dec_ctx = codec_ctx;

        stream_ctx[i].dec_frame = av_frame_alloc();
        if (!stream_ctx[i].dec_frame)
            return AVERROR(ENOMEM);
    }
    av_dump_format(ifmt_ctx, 0, filename, 0);
    return 0;
}

// static int open_output_file(const char *filename)
static int open_output_file()
{   
    uint8_t *avio_ctx_buffer_zm = NULL;
    size_t avio_ctx_buffer_size_zm = 4096;
    const size_t bd_zm_buf_size = 1024;

    AVStream *out_stream;
    AVStream *in_stream;
    AVCodecContext *dec_ctx, *enc_ctx;
    const AVCodec *encoder;
    int ret;
    unsigned int i;
    //zhimo starts
    const AVOutputFormat *output_format = av_guess_format("wav", NULL, NULL);
    ofmt_ctx = NULL;
    // avformat_alloc_output_context2(&ofmt_ctx, output_format, NULL, filename);
    avformat_alloc_output_context2(&ofmt_ctx, output_format, NULL, NULL);
    if (!ofmt_ctx) {
        av_log(NULL, AV_LOG_ERROR, "Could not create output context\n");
        return AVERROR_UNKNOWN;
    }
    bd_zm.ptr = bd_zm.buf = av_malloc(bd_zm_buf_size);
    if (!bd_zm.buf) {
        ret = AVERROR(ENOMEM);
        return ret;
    }
    bd_zm.size = bd_zm.room = bd_zm_buf_size;
    
    avio_ctx_buffer_zm = av_malloc(avio_ctx_buffer_size_zm);
    if (!avio_ctx_buffer_zm)
    {
        av_log(NULL, AV_LOG_ERROR, "allocate buffer error\n");
        return AVERROR(ENOMEM);
    }
    avio_ctx_zm = avio_alloc_context(avio_ctx_buffer_zm, avio_ctx_buffer_size_zm,
                                     1, &bd_zm, NULL, iowrite_to_buffer, seek_buffer);
    if (!avio_ctx_zm)
    {
        av_log(NULL, AV_LOG_ERROR, "allocate avio context error\n");
        return AVERROR(ENOMEM);
    }
    ofmt_ctx->pb = avio_ctx_zm;
    if (!ofmt_ctx->pb) {
        av_log(NULL, AV_LOG_ERROR, "output format context pb is empty\n");
        return AVERROR(ENOMEM);
    }
    // ofmt_ctx->flags |= AVFMT_FLAG_CUSTOM_IO;
    ofmt_ctx->oformat = output_format;
    
    //zhimo ends
    av_log(NULL, AV_LOG_DEBUG, "gets here1\n");
    for (i = 0; i < ifmt_ctx->nb_streams; i++) {
        out_stream = avformat_new_stream(ofmt_ctx, NULL);
        if (!out_stream) {
            av_log(NULL, AV_LOG_ERROR, "Failed allocating output stream\n");
            return AVERROR_UNKNOWN;
        }

        in_stream = ifmt_ctx->streams[i];
        dec_ctx = stream_ctx[i].dec_ctx;
        if (dec_ctx->codec_type == AVMEDIA_TYPE_VIDEO
                || dec_ctx->codec_type == AVMEDIA_TYPE_AUDIO) {
            /* in this example, we choose transcoding to same codec */
            encoder = avcodec_find_encoder(AV_CODEC_ID_PCM_S16LE);  //dec_ctx->codec_id
            if (!encoder) {
                av_log(NULL, AV_LOG_FATAL, "Necessary encoder not found\n");
                return AVERROR_INVALIDDATA;
            }
            enc_ctx = avcodec_alloc_context3(encoder);
            if (!enc_ctx) {
                av_log(NULL, AV_LOG_FATAL, "Failed to allocate the encoder context\n");
                return AVERROR(ENOMEM);
            }
            /* In this example, we transcode to same properties (picture size,
             * sample rate etc.). These properties can be changed for output
             * streams easily using filters */
            if (dec_ctx->codec_type == AVMEDIA_TYPE_VIDEO) {
                enc_ctx->height = dec_ctx->height;
                enc_ctx->width = dec_ctx->width;
                enc_ctx->sample_aspect_ratio = dec_ctx->sample_aspect_ratio;
                /* take first format from list of supported formats */
                if (encoder->pix_fmts)
                    enc_ctx->pix_fmt = encoder->pix_fmts[0];
                else
                    enc_ctx->pix_fmt = dec_ctx->pix_fmt;
                /* video time_base can be set to whatever is handy and supported by encoder */
                enc_ctx->time_base = av_inv_q(dec_ctx->framerate);
            } else {
                enc_ctx->sample_rate = dec_ctx->sample_rate;
                enc_ctx->channel_layout = AV_CH_LAYOUT_MONO; //dec_ctx->channel_layout;
                enc_ctx->channels = 1; //av_get_channel_layout_nb_channels(enc_ctx->channel_layout);
                /* take first format from list of supported formats */
                enc_ctx->sample_fmt = AV_SAMPLE_FMT_S16; //encoder->sample_fmts[0];
                enc_ctx->time_base = (AVRational){1, enc_ctx->sample_rate};
            }

            if (ofmt_ctx->oformat->flags & AVFMT_GLOBALHEADER)
                enc_ctx->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;

            /* Third parameter can be used to pass settings to encoder */
            ret = avcodec_open2(enc_ctx, encoder, NULL);
            if (ret < 0) {
                av_log(NULL, AV_LOG_ERROR, "Cannot open video encoder for stream #%u\n", i);
                return ret;
            }
            ret = avcodec_parameters_from_context(out_stream->codecpar, enc_ctx);
            if (ret < 0) {
                av_log(NULL, AV_LOG_ERROR, "Failed to copy encoder parameters to output stream #%u\n", i);
                return ret;
            }

            out_stream->time_base = enc_ctx->time_base;
            stream_ctx[i].enc_ctx = enc_ctx;
        } else if (dec_ctx->codec_type == AVMEDIA_TYPE_UNKNOWN) {
            av_log(NULL, AV_LOG_FATAL, "Elementary stream #%d is of unknown type, cannot proceed\n", i);
            return AVERROR_INVALIDDATA;
        } else {
            /* if this stream must be remuxed */
            ret = avcodec_parameters_copy(out_stream->codecpar, in_stream->codecpar);
            if (ret < 0) {
                av_log(NULL, AV_LOG_ERROR, "Copying parameters for stream #%u failed\n", i);
                return ret;
            }
            out_stream->time_base = in_stream->time_base;
        }

    }
    // av_dump_format(ofmt_ctx, 0, filename, 1);
    av_dump_format(ofmt_ctx, 0, NULL, 1);

    // if (!(ofmt_ctx->oformat->flags & AVFMT_NOFILE)) {
    //     ret = avio_open(&ofmt_ctx->pb, filename, AVIO_FLAG_WRITE);
    //     if (ret < 0) {
    //         av_log(NULL, AV_LOG_ERROR, "Could not open output file '%s'", filename);
    //         return ret;
    //     }
    // }

    /* init muxer, write output file header */
    ret = avformat_write_header(ofmt_ctx, NULL);
    if (ret < 0) {
        av_log(NULL, AV_LOG_ERROR, "Error occurred when opening output file\n");
        return ret;
    }
    return 0;
}

static int init_filter(FilteringContext* fctx, AVCodecContext *dec_ctx,
        AVCodecContext *enc_ctx, const char *filter_spec)
{
    char args[512];
    int ret = 0;
    const AVFilter *buffersrc = NULL;
    const AVFilter *buffersink = NULL;
    AVFilterContext *buffersrc_ctx = NULL;
    AVFilterContext *buffersink_ctx = NULL;
    AVFilterInOut *outputs = avfilter_inout_alloc();
    AVFilterInOut *inputs  = avfilter_inout_alloc();
    AVFilterGraph *filter_graph = avfilter_graph_alloc();

    if (!outputs || !inputs || !filter_graph) {
        ret = AVERROR(ENOMEM);
        goto end;
    }

    if (dec_ctx->codec_type == AVMEDIA_TYPE_VIDEO) {
        buffersrc = avfilter_get_by_name("buffer");
        buffersink = avfilter_get_by_name("buffersink");
        if (!buffersrc || !buffersink) {
            av_log(NULL, AV_LOG_ERROR, "filtering source or sink element not found\n");
            ret = AVERROR_UNKNOWN;
            goto end;
        }

        snprintf(args, sizeof(args),
                "video_size=%dx%d:pix_fmt=%d:time_base=%d/%d:pixel_aspect=%d/%d",
                dec_ctx->width, dec_ctx->height, dec_ctx->pix_fmt,
                dec_ctx->time_base.num, dec_ctx->time_base.den,
                dec_ctx->sample_aspect_ratio.num,
                dec_ctx->sample_aspect_ratio.den);

        ret = avfilter_graph_create_filter(&buffersrc_ctx, buffersrc, "in",
                args, NULL, filter_graph);
        if (ret < 0) {
            av_log(NULL, AV_LOG_ERROR, "Cannot create buffer source\n");
            goto end;
        }

        ret = avfilter_graph_create_filter(&buffersink_ctx, buffersink, "out",
                NULL, NULL, filter_graph);
        if (ret < 0) {
            av_log(NULL, AV_LOG_ERROR, "Cannot create buffer sink\n");
            goto end;
        }

        ret = av_opt_set_bin(buffersink_ctx, "pix_fmts",
                (uint8_t*)&enc_ctx->pix_fmt, sizeof(enc_ctx->pix_fmt),
                AV_OPT_SEARCH_CHILDREN);
        if (ret < 0) {
            av_log(NULL, AV_LOG_ERROR, "Cannot set output pixel format\n");
            goto end;
        }
    } else if (dec_ctx->codec_type == AVMEDIA_TYPE_AUDIO) {
        buffersrc = avfilter_get_by_name("abuffer");
        buffersink = avfilter_get_by_name("abuffersink");
        if (!buffersrc || !buffersink) {
            av_log(NULL, AV_LOG_ERROR, "filtering source or sink element not found\n");
            ret = AVERROR_UNKNOWN;
            goto end;
        }

        if (!dec_ctx->channel_layout)
            dec_ctx->channel_layout =
                av_get_default_channel_layout(dec_ctx->channels);
        snprintf(args, sizeof(args),
                "time_base=%d/%d:sample_rate=%d:sample_fmt=%s:channel_layout=0x%"PRIx64,
                dec_ctx->time_base.num, dec_ctx->time_base.den, dec_ctx->sample_rate,
                av_get_sample_fmt_name(dec_ctx->sample_fmt),
                dec_ctx->channel_layout);
        ret = avfilter_graph_create_filter(&buffersrc_ctx, buffersrc, "in",
                args, NULL, filter_graph);
        if (ret < 0) {
            av_log(NULL, AV_LOG_ERROR, "Cannot create audio buffer source\n");
            goto end;
        }

        ret = avfilter_graph_create_filter(&buffersink_ctx, buffersink, "out",
                NULL, NULL, filter_graph);
        if (ret < 0) {
            av_log(NULL, AV_LOG_ERROR, "Cannot create audio buffer sink\n");
            goto end;
        }

        ret = av_opt_set_bin(buffersink_ctx, "sample_fmts",
                (uint8_t*)&enc_ctx->sample_fmt, sizeof(enc_ctx->sample_fmt),
                AV_OPT_SEARCH_CHILDREN);
        if (ret < 0) {
            av_log(NULL, AV_LOG_ERROR, "Cannot set output sample format\n");
            goto end;
        }

        ret = av_opt_set_bin(buffersink_ctx, "channel_layouts",
                (uint8_t*)&enc_ctx->channel_layout,
                sizeof(enc_ctx->channel_layout), AV_OPT_SEARCH_CHILDREN);
        if (ret < 0) {
            av_log(NULL, AV_LOG_ERROR, "Cannot set output channel layout\n");
            goto end;
        }

        ret = av_opt_set_bin(buffersink_ctx, "sample_rates",
                (uint8_t*)&enc_ctx->sample_rate, sizeof(enc_ctx->sample_rate),
                AV_OPT_SEARCH_CHILDREN);
        if (ret < 0) {
            av_log(NULL, AV_LOG_ERROR, "Cannot set output sample rate\n");
            goto end;
        }
    } else {
        ret = AVERROR_UNKNOWN;
        goto end;
    }

    /* Endpoints for the filter graph. */
    outputs->name       = av_strdup("in");
    outputs->filter_ctx = buffersrc_ctx;
    outputs->pad_idx    = 0;
    outputs->next       = NULL;

    inputs->name       = av_strdup("out");
    inputs->filter_ctx = buffersink_ctx;
    inputs->pad_idx    = 0;
    inputs->next       = NULL;

    if (!outputs->name || !inputs->name) {
        ret = AVERROR(ENOMEM);
        goto end;
    }

    if ((ret = avfilter_graph_parse_ptr(filter_graph, filter_spec,
                    &inputs, &outputs, NULL)) < 0)
        goto end;

    if ((ret = avfilter_graph_config(filter_graph, NULL)) < 0)
        goto end;

    /* Fill FilteringContext */
    fctx->buffersrc_ctx = buffersrc_ctx;
    fctx->buffersink_ctx = buffersink_ctx;
    fctx->filter_graph = filter_graph;

end:
    avfilter_inout_free(&inputs);
    avfilter_inout_free(&outputs);

    return ret;
}

static int init_filters(void)
{
    const char *filter_spec;
    unsigned int i;
    int ret;
    filter_ctx = av_malloc_array(ifmt_ctx->nb_streams, sizeof(*filter_ctx));
    if (!filter_ctx)
        return AVERROR(ENOMEM);

    for (i = 0; i < ifmt_ctx->nb_streams; i++) {
        filter_ctx[i].buffersrc_ctx  = NULL;
        filter_ctx[i].buffersink_ctx = NULL;
        filter_ctx[i].filter_graph   = NULL;
        if (!(ifmt_ctx->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_AUDIO
                || ifmt_ctx->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_VIDEO))
            continue;


        if (ifmt_ctx->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_VIDEO)
            filter_spec = "null"; /* passthrough (dummy) filter for video */
        else
            filter_spec = "channelsplit=channel_layout=stereo:channels=FR"; /* passthrough (dummy) filter for audio */
        ret = init_filter(&filter_ctx[i], stream_ctx[i].dec_ctx,
                stream_ctx[i].enc_ctx, filter_spec);
        if (ret)
            return ret;

        filter_ctx[i].enc_pkt = av_packet_alloc();
        if (!filter_ctx[i].enc_pkt)
            return AVERROR(ENOMEM);

        filter_ctx[i].filtered_frame = av_frame_alloc();
        if (!filter_ctx[i].filtered_frame)
            return AVERROR(ENOMEM);
    }
    return 0;
}

static int encode_write_frame(unsigned int stream_index, int flush)
{
    StreamContext *stream = &stream_ctx[stream_index];
    FilteringContext *filter = &filter_ctx[stream_index];
    AVFrame *filt_frame = flush ? NULL : filter->filtered_frame;
    AVPacket *enc_pkt = filter->enc_pkt;
    int ret;

    av_log(NULL, AV_LOG_INFO, "Encoding frame\n");
    /* encode filtered frame */
    av_packet_unref(enc_pkt);

    ret = avcodec_send_frame(stream->enc_ctx, filt_frame);

    if (ret < 0)
        return ret;

    while (ret >= 0) {
        ret = avcodec_receive_packet(stream->enc_ctx, enc_pkt);

        if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF)
            return 0;

        /* prepare packet for muxing */
        enc_pkt->stream_index = stream_index;
        av_packet_rescale_ts(enc_pkt,
                             stream->enc_ctx->time_base,
                             ofmt_ctx->streams[stream_index]->time_base);

        av_log(NULL, AV_LOG_DEBUG, "Muxing frame\n");
        av_log(NULL, AV_LOG_DEBUG, "interleaved_write enc_pkt: size=%d, stream_index=%d, duration=%ld", enc_pkt->size, enc_pkt->stream_index, enc_pkt->duration);
        /* mux encoded frame */
        ret = av_interleaved_write_frame(ofmt_ctx, enc_pkt);
        
    }

    return ret;
}

static int filter_encode_write_frame(AVFrame *frame, unsigned int stream_index)
{
    FilteringContext *filter = &filter_ctx[stream_index];
    int ret;

    av_log(NULL, AV_LOG_INFO, "Pushing decoded frame to filters\n");
    /* push the decoded frame into the filtergraph */
    ret = av_buffersrc_add_frame_flags(filter->buffersrc_ctx,
            frame, 0);
    if (ret < 0) {
        av_log(NULL, AV_LOG_ERROR, "Error while feeding the filtergraph\n");
        return ret;
    }

    /* pull filtered frames from the filtergraph */
    while (1) {
        av_log(NULL, AV_LOG_INFO, "Pulling filtered frame from filters\n");
        ret = av_buffersink_get_frame(filter->buffersink_ctx,
                                      filter->filtered_frame);
        if (ret < 0) {
            /* if no more frames for output - returns AVERROR(EAGAIN)
             * if flushed and no more frames for output - returns AVERROR_EOF
             * rewrite retcode to 0 to show it as normal procedure completion
             */
            if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF)
                ret = 0;
            break;
        }

        filter->filtered_frame->pict_type = AV_PICTURE_TYPE_NONE;
        ret = encode_write_frame(stream_index, 0);
        av_frame_unref(filter->filtered_frame);
        if (ret < 0)
            break;
    }

    return ret;
}

static int flush_encoder(unsigned int stream_index)
{
    if (!(stream_ctx[stream_index].enc_ctx->codec->capabilities &
                AV_CODEC_CAP_DELAY))
        return 0;

    av_log(NULL, AV_LOG_INFO, "Flushing stream #%u encoder\n", stream_index);
    return encode_write_frame(stream_index, 1);
}

int main(int argc, char **argv)
{
    av_log_set_level(AV_LOG_DEBUG);
    int ret;
    AVPacket *packet = NULL;
    unsigned int stream_index;
    unsigned int i;

    if (argc != 3) {
        av_log(NULL, AV_LOG_ERROR, "Usage: %s  \n", argv[0]);
        return 1;
    }
    const char *file_name = argv[2]; //从buffer再存到文件中, 看结果是否正常
    if ((ret = open_input_file(argv[1])) < 0)
        goto end;
    if ((ret = open_output_file()) < 0)  //argv[2]是关联OutputContext的文件
        goto end;
    if ((ret = init_filters()) < 0)
        goto end;
    if (!(packet = av_packet_alloc()))
        goto end;

    /* read all packets */
    unsigned int packet_count = 0, frame_count = 0;
    while (1) {
        if ((ret = av_read_frame(ifmt_ctx, packet)) < 0)
            break;
        ++packet_count;
        stream_index = packet->stream_index;
        av_log(NULL, AV_LOG_DEBUG, "Demuxer gave frame of stream_index %u\n",
                stream_index);

        if (filter_ctx[stream_index].filter_graph) {
            StreamContext *stream = &stream_ctx[stream_index];

            av_log(NULL, AV_LOG_DEBUG, "Going to reencode&filter the frame\n");

            av_packet_rescale_ts(packet,
                                 ifmt_ctx->streams[stream_index]->time_base,
                                 stream->dec_ctx->time_base);
            ret = avcodec_send_packet(stream->dec_ctx, packet);
            if (ret < 0) {
                av_log(NULL, AV_LOG_ERROR, "Decoding failed\n");
                break;
            }
            frame_count = 0;
            while (ret >= 0) {
                ret = avcodec_receive_frame(stream->dec_ctx, stream->dec_frame);
                if (ret == AVERROR_EOF || ret == AVERROR(EAGAIN))
                    break;
                else if (ret < 0)
                    goto end;
                ++frame_count;
                stream->dec_frame->pts = stream->dec_frame->best_effort_timestamp;
                ret = filter_encode_write_frame(stream->dec_frame, stream_index);
                if (ret < 0)
                    goto end;
            }
            av_log(NULL, AV_LOG_DEBUG, "%%%% package_count: %d, corresponding frame_count: %d \n", packet_count, frame_count);
        } else {
            /* remux this frame without reencoding */
            av_packet_rescale_ts(packet,
                                 ifmt_ctx->streams[stream_index]->time_base,
                                 ofmt_ctx->streams[stream_index]->time_base);

            ret = av_interleaved_write_frame(ofmt_ctx, packet);
            if (ret < 0)
                goto end;
        }
        av_packet_unref(packet);
    }

    /* flush filters and encoders */
    for (i = 0; i < ifmt_ctx->nb_streams; i++) {
        /* flush filter */
        if (!filter_ctx[i].filter_graph)
            continue;
        ret = filter_encode_write_frame(NULL, i);
        if (ret < 0) {
            av_log(NULL, AV_LOG_ERROR, "Flushing filter failed\n");
            goto end;
        }

        /* flush encoder */
        ret = flush_encoder(i);
        if (ret < 0) {
            av_log(NULL, AV_LOG_ERROR, "Flushing encoder failed\n");
            goto end;
        }
    }

    av_write_trailer(ofmt_ctx);
    //zhimo starts:  write buffer to file
    FILE *pFile;
    pFile = fopen(file_name, "wb");

    if (pFile) {
        fwrite(bd_zm.buf, bd_zm.size - bd_zm.room, 1, pFile);  //跟ffmpeg命令行比, 最后多个8bit的空值
        puts("Wrote to file!");
    }
    else {
        puts("Something wrong writing to File.");
    }
    fclose(pFile);
    //zhimo ends
end:
    
    for (i = 0; i < ifmt_ctx->nb_streams; i++) {
        avcodec_free_context(&stream_ctx[i].dec_ctx);
        if (ofmt_ctx && ofmt_ctx->nb_streams > i && ofmt_ctx->streams[i] && stream_ctx[i].enc_ctx)
            avcodec_free_context(&stream_ctx[i].enc_ctx);
        if (filter_ctx && filter_ctx[i].filter_graph) {
            avfilter_graph_free(&filter_ctx[i].filter_graph);
            av_packet_free(&filter_ctx[i].enc_pkt);
            av_frame_free(&filter_ctx[i].filtered_frame);
        }
        av_frame_free(&stream_ctx[i].dec_frame);
    }
    av_free(filter_ctx);
    av_free(stream_ctx);
    avformat_close_input(&ifmt_ctx);
    // if (ofmt_ctx && !(ofmt_ctx->oformat->flags & AVFMT_NOFILE))
    //     avio_closep(&ofmt_ctx->pb);
    avformat_free_context(ofmt_ctx);
    av_freep(&avio_ctx_zm->buffer);
    av_free(avio_ctx_zm);
    av_packet_free(&packet);

    av_free(bd_zm.buf);
    if (ret < 0)
        av_log(NULL, AV_LOG_ERROR, "Error occurred: %s\n", av_err2str(ret));

    return ret ? 1 : 0;
}

你可能感兴趣的:(语音识别,机器学习,算法,linux,ffmpeg)