《Android音视频系列-4》使用FFmpeg+AudioTrack播放一个mp3

上一篇已经成功将FFmpeg动态库集成到Android Studio中，这一篇将学习使用FFmpeg + AudioTrack 播放一个mp3文件，主要还是熟悉一下FFmpeg的一些基本用法，包括一些JNI基础，C++基础。

正文开始

一、播放一个音视频文件，需要经过哪些步骤？

我们知道，音频有很多格式，例如mp3、aac，视频有很多格式，例如mp4、rmvb。

这些mp3、mp4其实是一种封装格式

封装格式

视频信息+音频数据+视频数据

可以简单理解为压缩文件。

所以，第一步要对封装格式进行解封装，

解封装

得到原始的流数据，包括分辨率、时长、大小、码率、采样率、宽高、比例...等等这些信息

音频流、视频流的分离

分离出音频流（音频压缩数据） -> 音频解码 ->音频采样数据 ->扬声器播放
分来出视频流（视频压缩数据） -> 视频解码 ->视频采样数据 ->显示

结合下面这张图就比较好理解了

引用红橙Darren画的流程图

二、开始用FFmpeg来处理mp3

FFmpeg的一些重要类简单说明：
AVFormatContext：

一个贯穿全局的数据结构，很多函数都要用它作为参数。保存需要读入的文件的格式信息，比如流的个数以及流数据等

AVStream：

存储每一个视频/音频流信息的结构体

AVCodecCotext：

保存了相应流的详细编码信息，比如视频的宽、高，编码类型等。

AVCodec：

存储编解码器信息的结构体，其中有编解码需要调用的函数

AVPacket：

保存解复用之后，解码之前的数据（仍然是压缩数据），和一些附加信息，如时间戳，时长等

AVFrame：

存放从AVPacket中解码出来的原始数据

FFmpeg API调用顺序

1. av_register_all()

void av_register_all(void);
定义在avcodec里，调用它用以注册所有支持的文件格式以及编解码器

    av_register_all();

2. avformat_open_input

int avformat_open_input(AVFormatContext **ps, const char *url, AVInputFormat *fmt, AVDictionary **options);
主要功能是打开一个文件，读取header

    int open_input_result = avformat_open_input(&pFormatContext,url,NULL,NULL);
    if (open_input_result != 0){
        LOGE("format open input error: %s", av_err2str(open_input_result));
        goto _av_resource_destry; //错误处理，比如释放资源，回调给java层
    }

第一个参数是一个AVFormatContext指针变量的地址，根据打开的文件信息填充到AVFormatContext。后两个参数分别用于指定特定的输入格式以及指定文件打开额外参数，看文档，这里NULL就行

3. avformat_find_stream_info

int avformat_find_stream_info(AVFormatContext *ic, AVDictionary **options);

    formatFindStreamInfoRes = avformat_find_stream_info(pFormatContext, NULL);
    if (formatFindStreamInfoRes < 0) {
        LOGE("format find stream info error: %s", av_err2str(formatFindStreamInfoRes));
        goto _av_resource_destry;
    }

因为avformat_open_input函数只是读文件头，并不会填充流信息，因此需要调用avformat_find_stream_info，获取文件中的流信息，此函数会读取packet，并确定文件中所有的流信息，设置pFormatCtx->streams指向文件中的流，但此函数并不会改变文件指针，读取的packet会给后面的解码进行处理。

4. av_find_best_stream

int av_find_best_stream(AVFormatContext *ic, enum AVMediaType type, int wanted_stream_nb, int related_stream, AVCodec **decoder_ret, int flags);

查找音频流的 index，后面根据这个index处理音频

    audioStramIndex = av_find_best_stream(pFormatContext, AVMediaType::AVMEDIA_TYPE_AUDIO, -1, -1,NULL, 0);
    if (audioStramIndex < 0) {
        LOGE("format audio stream error:");
        goto _av_resource_destry;
    }

如果是要分离视频流，就传视频的type，tpye一共有下面这些

enum AVMediaType {
    AVMEDIA_TYPE_UNKNOWN = -1,  ///< Usually treated as AVMEDIA_TYPE_DATA
    AVMEDIA_TYPE_VIDEO,
    AVMEDIA_TYPE_AUDIO,
    AVMEDIA_TYPE_DATA,          ///< Opaque data information usually continuous
    AVMEDIA_TYPE_SUBTITLE,
    AVMEDIA_TYPE_ATTACHMENT,    ///< Opaque data information usually sparse
    AVMEDIA_TYPE_NB
};

5. 查找解码器 avcodec_find_decoder

AVCodec *avcodec_find_decoder(enum AVCodecID id);
查找解码器，需要一个解码器id

//audioStramIndex 上一步已经获取了，通过音频流的index，可以从pFormatContext中拿到音频解码器的一些参数
pCodecParameters = pFormatContext->streams[audioStramIndex]->codecpar;
pCodec = avcodec_find_decoder(pCodecParameters->codec_id);

pFormatContext->streams 返回的是二级指针，可以理解为数组，通过音频流index，拿到音频流，同理如果通过视频流index，拿到的AVStream是视频流

AVStream 这个结构体中有很多信息，这里我们需要的解码器id要这样取： AVStream->AVCodecParameters->codec_id

6. 打开解码器

int avcodec_open2(AVCodecContext *avctx, const AVCodec *codec, AVDictionary **options);

AVCodec 参数需要先创建一下

avcodec_alloc_context3

    //分配AVCodecContext，默认值
    pCodecContext = avcodec_alloc_context3(pCodec);
    if (pCodecContext == NULL){
        LOGE("avcodec_alloc_context3 error");
        goto _av_resource_destry;
    }

avcodec_parameters_to_context

    //pCodecParameters 转 context
    codecParametersToContextRes = avcodec_parameters_to_context(pCodecContext,pCodecParameters);
    if(codecParametersToContextRes <0){
        LOGE("avcodec_parameters_to_context error");
        goto _av_resource_destry;
    }

AVCodec参数准备好了，调用打开解码器

avcodec_open2

    codecOpenRes = avcodec_open2(pCodecContext,pCodec,NULL);
    if (codecOpenRes != 0) {
        LOGE("codec audio open error: %s", av_err2str(codecOpenRes));
        goto _av_resource_destry;
    }

avcodec_open2 的意思是用给定的解码器初始化 AVCodecContext，api示例代码如下

 * avcodec_register_all();
 * av_dict_set(&opts, "b", "2.5M", 0);
 * codec = avcodec_find_decoder(AV_CODEC_ID_H264);
 * if (!codec)
 *     exit(1);
 *
 * context = avcodec_alloc_context3(codec);
 *
 * if (avcodec_open2(context, codec, opts) < 0)
 *     exit(1);

打开解码器，最终得到一个AVCodecContext，接下来要用AVCodecContext解码每一帧数据

7. 不断取出每一帧数据 av_read_frame

int av_read_frame(AVFormatContext *s, AVPacket *pkt);
需要一个AVPacket参数，接收这一帧数据

pPacket = av_packet_alloc();
while (av_read_frame(pFormatContext,pPacket) >=0){
...8、9、
}

8. 输入数据到解码器 avcodec_send_packet

int avcodec_send_packet(AVCodecContext *avctx, const AVPacket *avpkt);
参数都有了，直接调用即可

int codecSendPacketRes = avcodec_send_packet(pCodecContext,pPacket);

9. 从解码器获取解码后的数据 avcodec_receive_frame

int avcodec_receive_frame(AVCodecContext *avctx, AVFrame *frame);
需要一个 AVFrame 参数来接收返回的数据

pFrame = av_frame_alloc();

int codecReceiveFrameRes = avcodec_receive_frame(pCodecContext,pFrame);

到此，这一帧数据有了，是 AVFrame 格式，怎么播放？需要转换一下

10. swr_convert

需要 include "libswresample/swresample.h"

int swr_convert(struct SwrContext *s, uint8_t **out, int out_count, const uint8_t **in , int in_count);
这个函数的意思是将音频数据转换成 buffers，第一个参数需要我们去初始化，第二个参数是接收输出，后面三个参数可以从AVFrame获取到，所以，要先初始化SwrContext

swr_alloc_set_opts

struct SwrContext *swr_alloc_set_opts(struct SwrContext *s, int64_t out_ch_layout, enum AVSampleFormat out_sample_fmt, int out_sample_rate, int64_t in_ch_layout, enum AVSampleFormat in_sample_fmt, int in_sample_rate, int log_offset, void *log_ctx);
这个方法就是用来构造SwrContex的,参数有点多，

    //参数声明
    #define AUDIO_SAMPLE_RATE 44100

    int64_t out_ch_layout;
    int out_sample_rate;
    enum AVSampleFormat out_sample_fmt;
    int64_t in_ch_layout;
    enum AVSampleFormat in_sample_fmt;
    int in_sample_rate;
    int swrInitRes;

    //定义
    out_ch_layout = AV_CH_LAYOUT_STEREO;
    out_sample_fmt = AVSampleFormat::AV_SAMPLE_FMT_S16;
    out_sample_rate = AUDIO_SAMPLE_RATE;
    in_ch_layout = pCodecContext->channel_layout;
    in_sample_fmt = pCodecContext->sample_fmt;
    in_sample_rate = pCodecContext->sample_rate;

    swrContext = swr_alloc_set_opts(NULL, out_ch_layout, out_sample_fmt,
                                    out_sample_rate, in_ch_layout, in_sample_fmt, in_sample_rate, 0, NULL);

回到数据转换swr_convert那里

      //数据转换成Buffer,需要导入 libswresample/swresample.h
       swr_convert(swrContext, &resampleOutBuffer, pFrame->nb_samples,
           (const uint8_t **) pFrame->data, pFrame->nb_samples);

memcpy

现在转换后的buffer我们只有一个内存地址，我们还要调用内存拷贝函数，拿到jbyte（java的byte）

jbyte *jPcmData;
int dataSize;
memcpy(jPcmData, resampleOutBuffer, dataSize);

dataSize需要我们调用av_get_channel_layout_nb_channels和av_samples_get_buffer_size函数计算

av_get_channel_layout_nb_channels

获取通道数

outChannels = av_get_channel_layout_nb_channels(out_ch_layout); //通道数

av_samples_get_buffer_size

根据通道数，音频格式，framesize，计算最终数据大小

    dataSize = av_samples_get_buffer_size(NULL, outChannels, pCodecParameters->frame_size,out_sample_fmt, 0);
    //上面resampleOutBuffer申请的内存大小就是这个数据大小
    resampleOutBuffer = (uint8_t *) malloc(dataSize);

内存拷贝之后，我们得到的数据是jbyte类型（java 的byte），而AudioTrack的write方法需要接收byte[]
public int write(@NonNull byte[] audioData, int offsetInBytes, int sizeInBytes) { return write(audioData, offsetInBytes, sizeInBytes, WRITE_BLOCKING); }

所以还需要转一下

jbyteArray jPcmDataArray = env->NewByteArray(dataSize);
 // native 创建 c 数组
jPcmData = env->GetByteArrayElements(jPcmDataArray, NULL);
// 同步刷新到 jbyteArray ，并释放 C/C++ 数组
env->ReleaseByteArrayElements(jPcmDataArray, jPcmData, 0);

好了，整个过程通过FFmpeg API总算是把音频解码成pcm数据并转换成AudioTrack支持的格式。
然后调用AudioTrack的write方法进行播放

env->CallIntMethod(audioTrack, jAudioTrackWriteMid, jPcmDataArray, 0, dataSize);

AudioTrack 的创建下面说

三、创建AudioTrack

//暂时用全局变量保存，后面再抽取优化
jmethodID jAudioTrackWriteMid;
jobject audioTrack;

/**
 * 创建 java 的 AudioTrack
 * @param env
 * @return
 */
jobject initAudioTrack(JNIEnv *env){
    jclass jAudioTrackClass = env->FindClass("android/media/AudioTrack");
    jmethodID jAudioTrackCMid = env->GetMethodID(jAudioTrackClass,"","(IIIIII)V"); //构造

    //  public static final int STREAM_MUSIC = 3;
    int streamType = 3;
    int sampleRateInHz = 44100;
    // public static final int CHANNEL_OUT_STEREO = (CHANNEL_OUT_FRONT_LEFT | CHANNEL_OUT_FRONT_RIGHT);
    int channelConfig = (0x4 | 0x8);
    // public static final int ENCODING_PCM_16BIT = 2;
    int audioFormat = 2;
    // getMinBufferSize(int sampleRateInHz, int channelConfig, int audioFormat)
    jmethodID jGetMinBufferSizeMid = env->GetStaticMethodID(jAudioTrackClass, "getMinBufferSize", "(III)I");
    int bufferSizeInBytes = env->CallStaticIntMethod(jAudioTrackClass, jGetMinBufferSizeMid, sampleRateInHz, channelConfig, audioFormat);
    // public static final int MODE_STREAM = 1;
    int mode = 1;

    //创建了AudioTrack
    jobject jAudioTrack = env->NewObject(jAudioTrackClass,jAudioTrackCMid, streamType, sampleRateInHz, channelConfig, audioFormat, bufferSizeInBytes, mode);

    //play方法
    jmethodID jPlayMid = env->GetMethodID(jAudioTrackClass,"play","()V");
    env->CallVoidMethod(jAudioTrack,jPlayMid);

    // write method
    jAudioTrackWriteMid = env->GetMethodID(jAudioTrackClass, "write", "([BII)I");

    return jAudioTrack;

}

AudioTrack的初始化，c调用java代码，相当于java层的new AudioTrack的过程，在播放之前初始化即可，然后调用write方法将pcm数据传入即可播放。

    //创建java 的 AudioTrack
    audioTrack = initAudioTrack(env);

四、所有代码

#include 
#include 
#include 

//ffmpeg 是c写的，要用c的include
extern "C"{
#include "libavformat/avformat.h"
#include "libswresample/swresample.h"
};

using namespace std;
#define TAG "JNI_TAG"
#define LOGE(...) __android_log_print(ANDROID_LOG_ERROR,TAG,__VA_ARGS__)
#define AUDIO_SAMPLE_RATE 44100

//暂时用全局变量，后面再抽取优化
jmethodID jAudioTrackWriteMid;
jobject audioTrack;

/**
 * 创建 java 的 AudioTrack
 * @param env
 * @return
 */
jobject initAudioTrack(JNIEnv *env){
    jclass jAudioTrackClass = env->FindClass("android/media/AudioTrack");
    jmethodID jAudioTrackCMid = env->GetMethodID(jAudioTrackClass,"","(IIIIII)V"); //构造

    //  public static final int STREAM_MUSIC = 3;
    int streamType = 3;
    int sampleRateInHz = 44100;
    // public static final int CHANNEL_OUT_STEREO = (CHANNEL_OUT_FRONT_LEFT | CHANNEL_OUT_FRONT_RIGHT);
    int channelConfig = (0x4 | 0x8);
    // public static final int ENCODING_PCM_16BIT = 2;
    int audioFormat = 2;
    // getMinBufferSize(int sampleRateInHz, int channelConfig, int audioFormat)
    jmethodID jGetMinBufferSizeMid = env->GetStaticMethodID(jAudioTrackClass, "getMinBufferSize", "(III)I");
    int bufferSizeInBytes = env->CallStaticIntMethod(jAudioTrackClass, jGetMinBufferSizeMid, sampleRateInHz, channelConfig, audioFormat);
    // public static final int MODE_STREAM = 1;
    int mode = 1;

    //创建了AudioTrack
    jobject jAudioTrack = env->NewObject(jAudioTrackClass,jAudioTrackCMid, streamType, sampleRateInHz, channelConfig, audioFormat, bufferSizeInBytes, mode);

    //play方法
    jmethodID jPlayMid = env->GetMethodID(jAudioTrackClass,"play","()V");
    env->CallVoidMethod(jAudioTrack,jPlayMid);

    // write method
    jAudioTrackWriteMid = env->GetMethodID(jAudioTrackClass, "write", "([BII)I");

    return jAudioTrack;

}


extern "C"
JNIEXPORT void JNICALL
Java_com_lanshifu_ffmpegdemo_player_MusicPlayer_nativePlay(JNIEnv *env, jobject instance,
                                                           jstring url_) {
    const char *url = env->GetStringUTFChars(url_, 0);

    AVFormatContext *pFormatContext = NULL;
    AVCodecParameters *pCodecParameters = NULL;
    AVCodec *pCodec = NULL;
    int formatFindStreamInfoRes = 0;
    int audioStramIndex = 0;
    AVCodecContext *pCodecContext = NULL;
    int codecParametersToContextRes = -1;
    int codecOpenRes = -1;
    AVPacket *pPacket = NULL;
    AVFrame *pFrame = NULL;
    int index = 0;

    int outChannels;
    int dataSize;

    uint8_t *resampleOutBuffer;
    jbyte *jPcmData;
    SwrContext *swrContext = NULL;

    int64_t out_ch_layout;
    int out_sample_rate;
    enum AVSampleFormat out_sample_fmt;
    int64_t in_ch_layout;
    enum AVSampleFormat in_sample_fmt;
    int in_sample_rate;
    int swrInitRes;


    ///1、初始化所有组件，只有调用了该函数，才能使用复用器和编解码器（源码）
    av_register_all();
    ///2、打开文件
    int open_input_result = avformat_open_input(&pFormatContext,url,NULL,NULL);
    if (open_input_result != 0){
        LOGE("format open input error: %s", av_err2str(open_input_result));
        goto _av_resource_destry;
    }

    ///3.填充流信息到 pFormatContext
    formatFindStreamInfoRes = avformat_find_stream_info(pFormatContext, NULL);
    if (formatFindStreamInfoRes < 0) {
        LOGE("format find stream info error: %s", av_err2str(formatFindStreamInfoRes));
        goto _av_resource_destry;
    }

    ///4.、查找音频流的 index，后面根据这个index处理音频
    audioStramIndex = av_find_best_stream(pFormatContext, AVMediaType::AVMEDIA_TYPE_AUDIO, -1, -1,NULL, 0);
    if (audioStramIndex < 0) {
        LOGE("format audio stream error:");
        goto _av_resource_destry;
    }


    ///4、查找解码器
    //audioStramIndex 上一步已经获取了，通过音频流的index，可以从pFormatContext中拿到音频解码器的一些参数
    pCodecParameters = pFormatContext->streams[audioStramIndex]->codecpar;
    pCodec = avcodec_find_decoder(pCodecParameters->codec_id);

    LOGE("采样率：%d", pCodecParameters->sample_rate);
    LOGE("通道数: %d", pCodecParameters->channels);
    LOGE("format: %d", pCodecParameters->format);

    if (pCodec == NULL) {
        LOGE("codec find audio decoder error");
        goto _av_resource_destry;
    }

    ///5、打开解码器
    //分配AVCodecContext，默认值
    pCodecContext = avcodec_alloc_context3(pCodec);
    if (pCodecContext == NULL){
        LOGE("avcodec_alloc_context3 error");
        goto _av_resource_destry;
    }
    //pCodecParameters 转 context
    codecParametersToContextRes = avcodec_parameters_to_context(pCodecContext,pCodecParameters);
    if(codecParametersToContextRes <0){
        LOGE("avcodec_parameters_to_context error");
        goto _av_resource_destry;
    }
    //
    codecOpenRes = avcodec_open2(pCodecContext,pCodec,NULL);
    if (codecOpenRes != 0) {
        LOGE("codec audio open error: %s", av_err2str(codecOpenRes));
        goto _av_resource_destry;
    }

    ///到此，pCodecContext 已经初始化完毕，下面可以用来获取每一帧数据

    pPacket = av_packet_alloc();
    pFrame = av_frame_alloc();

    ///创建java 的 AudioTrack
    audioTrack = initAudioTrack(env);

    // ---------- 重采样 构造 swrContext 参数 start----------
    out_ch_layout = AV_CH_LAYOUT_STEREO;
    out_sample_fmt = AVSampleFormat::AV_SAMPLE_FMT_S16;
    out_sample_rate = AUDIO_SAMPLE_RATE;
    in_ch_layout = pCodecContext->channel_layout;
    in_sample_fmt = pCodecContext->sample_fmt;
    in_sample_rate = pCodecContext->sample_rate;
    swrContext = swr_alloc_set_opts(NULL, out_ch_layout, out_sample_fmt,
                                    out_sample_rate, in_ch_layout, in_sample_fmt, in_sample_rate, 0, NULL);
    if (swrContext == NULL) {
        // 提示错误
        LOGE("swr_alloc_set_opts error");
        goto _av_resource_destry;
    }
    swrInitRes = swr_init(swrContext);
    if (swrInitRes < 0) {
        LOGE("swr_init error");
        goto _av_resource_destry;
    }
    // ---------- 重采样 构造 swrContext 参数 end----------


    // size 是播放指定的大小，是最终输出的大小
    outChannels = av_get_channel_layout_nb_channels(out_ch_layout); //通道数
    dataSize = av_samples_get_buffer_size(NULL, outChannels, pCodecParameters->frame_size,out_sample_fmt, 0);
    resampleOutBuffer = (uint8_t *) malloc(dataSize);

    //一帧一帧播放，wile循环
    while (av_read_frame(pFormatContext,pPacket) >=0){
        // Packet 包，压缩的数据，解码成 pcm 数据
        //判断是音频帧
        if (pPacket->stream_index != audioStramIndex) {
            continue;
        }

        //输入原数据到解码器
        int codecSendPacketRes = avcodec_send_packet(pCodecContext,pPacket);
        if (codecSendPacketRes == 0){
            //解码器输出解码后的数据 pFrame
            int codecReceiveFrameRes = avcodec_receive_frame(pCodecContext,pFrame);
            if(codecReceiveFrameRes == 0){
                index++;

                //数据转换成Buffer,需要导入 libswresample/swresample.h
                swr_convert(swrContext, &resampleOutBuffer, pFrame->nb_samples,
                            (const uint8_t **) pFrame->data, pFrame->nb_samples);

                //内存拷贝
                memcpy(jPcmData, resampleOutBuffer, dataSize);

                jbyteArray jPcmDataArray = env->NewByteArray(dataSize);
                // native 创建 c 数组
                jPcmData = env->GetByteArrayElements(jPcmDataArray, NULL);
                // 同步刷新到 jbyteArray ，并释放 C/C++ 数组
                env->ReleaseByteArrayElements(jPcmDataArray, jPcmData, 0);

                ///public int write(@NonNull byte[] audioData, int offsetInBytes, int sizeInBytes) {}
                env->CallIntMethod(audioTrack, jAudioTrackWriteMid, jPcmDataArray, 0, dataSize);

                LOGE("解码第 %d 帧dataSize =%d ", index , dataSize);

                // 解除 jPcmDataArray 的持有，让 javaGC 回收
                env->DeleteLocalRef(jPcmDataArray);

            }
        }

        //解引用
        av_packet_unref(pPacket);
        av_frame_unref(pFrame);
    }

    /// 解引用数据 data ， 2. 销毁 pPacket 结构体内存  3. pPacket = NULL
    av_frame_free(&pFrame);
    av_packet_free(&pPacket);


    _av_resource_destry:
    if (pFormatContext != NULL){
        avformat_close_input(&pFormatContext);
        avformat_free_context(pFormatContext);
        pFormatContext = NULL;
    }

    env->ReleaseStringUTFChars(url_, url);
}

五、总结

这一篇文章主要讲了两个知识点：

FFmpeg API 的基本使用流程
c++ 调用java 的AudioTrack，注释很清晰，没细讲

当然，有很多细节部分大家可以去思考一下，比如
播放mp3有杂音、播放的时候内存一直往上涨、代码需要封装一下

内容参考 https://www.jianshu.com/p/d8300535bbf0