FFmpeg⾳频编码-pcm编码aac实现

提取测试文件：

s16格式

ffmpeg -i buweishui.aac -ar 48000 -ac 2 -f s16le 48000_2_s16le.pcm

flt格式

ffmpeg -i buweishui.aac -ar 48000 -ac 2 -f f32le 48000_2_f32le.pcm

ffmpeg只能提取packed格式的PCM数据，在编码时候如果输入要为fltp则需要进行转换

FFmpeg PCM编码AAC程序流程

从本地⽂件读取PCM数据进⾏AAC格式编码，然后将编码后的AAC数据存储到本地⽂件。

image-20210322073447282

关键函数说明：

avcodec_find_encoder：根据指定的AVCodecID查找注册的编码器。
avcodec_alloc_context3：为AVCodecContext分配内存。
avcodec_open2：打开编码器。
avcodec_send_frame：将AVFrame⾮压缩数据给编码器。
avcodec_receive_packet：获取到编码后的AVPacket数据，收到的packet需要⾃⼰释放内存。
av_frame_get_buffer: 为⾳频或视频帧分配新的buffer。在调⽤这个函数之前，必须在AVFame上设置好以下属性：format(视频为像素格式，⾳频为样本格式)、nb_samples(样本个数，针对⾳频)、 channel_layout(通道类型，针对⾳频)、width/height(宽⾼，针对视频）。
av_frame_make_writable：确保AVFrame是可写的，使⽤av_frame_make_writable()的问题是，在最坏的情况下，它会在您使⽤encode再次更改整个输⼊frame之前复制它. 如果frame不可写，av_frame_make_writable()将分配新的缓冲区，并复制这个输⼊input frame数据，避免和编码器需要缓存该帧时造成冲突。
av_samples_fill_arrays 填充⾳频帧

对于 flush encoder的操作：

编码器通常的冲洗⽅法：调⽤⼀次 avcodec_send_frame(NULL)(返回成功)，然后不停调⽤avcodec_receive_packet() 直到其返回 AVERROR_EOF，取出所有缓存帧，avcodec_receive_packet() 返回 AVERROR_EOF 这⼀次是没有有效数据的，仅仅获取到⼀个结束标志

PCM样本格式

PCM(Pulse Code Modulation，脉冲编码调制)⾳频数据是未经压缩的⾳频采样数据裸流，它是由模拟信号经过采样、量化、编码转换成的标准数字⾳频数据。

描述PCM数据的6个参数：

Sample Rate : 采样频率。8kHz(电话)、44.1kHz(CD)、48kHz(DVD)。
Sample Size : 量化位数。通常该值为16-bit。
Number of Channels : 通道个数。常⻅的⾳频有⽴体声(stereo)和单声道(mono)两种类型，⽴体声包含左声道和右声道。另外还有环绕⽴体声等其它不太常⽤的类型。
Sign : 表示样本数据是否是有符号位，⽐如⽤⼀字节表示的样本数据，有符号的话表示范围为-128 ~ 127，⽆符号是0 ~ 255。有符号位16bits数据取值范围为-32768~32767。
Byte Ordering : 字节序。字节序是little-endian还是big-endian。通常均为little-endian。。
Integer Or Floating Point : 整形或浮点型。⼤多数格式的PCM样本数据使⽤整形表示，⽽在⼀些对精度要求⾼的应⽤⽅⾯，使⽤浮点类型表示PCM样本数据（浮点数 float值域为 [-1.0, 1.0]）。

PCM数据播放⼯具：

ffplay

使⽤示例如下：

#播放格式为f32le，双声道，采样频率48000Hz的PCM数据 
ffplay -f f32le -ac 2 -ar 48000 pcm_audio

Audacity：⼀款免费开源的跨平台⾳频处理软件。
Adobe Auditon。导⼊原始数据，打开的时候需要选择采样率、格式和字节序。

Audacity

FFmpeg⽀持的PCM数据格式

使⽤ffmpeg -formats命令，获取ffmpeg⽀持的⾳视频格式，其中我们可以找到⽀持的PCM格式。

ffmpeg -formats |grep PCM

 DE alaw            PCM A-law
 DE f32be           PCM 32-bit floating-point big-endian
 DE f32le           PCM 32-bit floating-point little-endian
 DE f64be           PCM 64-bit floating-point big-endian
 DE f64le           PCM 64-bit floating-point little-endian
 DE mulaw           PCM mu-law
 DE s16be           PCM signed 16-bit big-endian
 DE s16le           PCM signed 16-bit little-endian
 DE s24be           PCM signed 24-bit big-endian
 DE s24le           PCM signed 24-bit little-endian
 DE s32be           PCM signed 32-bit big-endian
 DE s32le           PCM signed 32-bit little-endian
 DE s8              PCM signed 8-bit
 DE u16be           PCM unsigned 16-bit big-endian
 DE u16le           PCM unsigned 16-bit little-endian
 DE u24be           PCM unsigned 24-bit big-endian
 DE u24le           PCM unsigned 24-bit little-endian
 DE u32be           PCM unsigned 32-bit big-endian
 DE u32le           PCM unsigned 32-bit little-endian
 DE u8              PCM unsigned 8-bit
 DE vidc            PCM Archimedes VIDC

s是有符号，u是⽆符号，f是浮点数。

be是⼤端，le是⼩端。

FFmpeg中Packed和Planar的PCM数据区别

FFmpeg中⾳视频数据基本上都有Packed和Planar两种存储⽅式，对于双声道⾳频来说， Packed⽅式为两个声道的数据交错存储；Planar⽅式为两个声道分开存储。假设⼀个L/R为⼀个采样点，数据存储的⽅式如下所示：

Packed: L R L R L R L R
Planar: L L L L ... R R R R...

packed格式

AV_SAMPLE_FMT_U8, ///< unsigned 8 bits 
AV_SAMPLE_FMT_S16, ///< signed 16 bits 
AV_SAMPLE_FMT_S32, ///< signed 32 bits 
AV_SAMPLE_FMT_FLT, ///< float 
AV_SAMPLE_FMT_DBL, ///< double

只能保存在AVFrame的uint8_t *data[0];

⾳频保持格式如下：

LRLRLR ...

planar格式

planar为FFmpeg内部存储⾳频使⽤的采样格式，所有的Planar格式后⾯都有字⺟P标识。

AV_SAMPLE_FMT_U8P,         ///< unsigned 8 bits, planar
AV_SAMPLE_FMT_S16P,        ///< signed 16 bits, planar
AV_SAMPLE_FMT_S32P,        ///< signed 32 bits, planar
AV_SAMPLE_FMT_FLTP,        ///< float, planar
AV_SAMPLE_FMT_DBLP,        ///< double, planar
AV_SAMPLE_FMT_S64,         ///< signed 64 bits
AV_SAMPLE_FMT_S64P,        ///< signed 64 bits, planar

plane 0: LLLLLLLLLLLLLLLLLLLLLLLLLL...

plane 1: RRRRRRRRRRRRRRRRRRRR....

plane 0对于uint8_t *data[0];

plane 1对于uint8_t *data[1];

FFmpeg默认的AAC编码器不⽀持AV_SAMPLE_FMT_S16格式的编码，只⽀持 AV_SAMPLE_FMT_FLTP，这种格式是按平⾯存储，样点是float类型，所谓平⾯也就是每个声道单独存储，⽐如左声道存储到data[0]中，右声道存储到data[1]中。

FFmpeg⾳频解码后和编码前的数据是存放在AVFrame结构中的。

Packed格式，frame.data[0]或frame.extended_data[0]包含所有的⾳频数据中。
Planar格式，frame.data[i]或者frame.extended_data[i]表示第i个声道的数据（假设声道0是第⼀个）, AVFrame.data数组⼤⼩固定为8，如果声道数超过8，需要从frame.extended_data获取声道数据。

补充说明

Planar模式是ffmpeg内部存储模式，我们实际使⽤的⾳频⽂件都是Packed模式的。
FFmpeg解码不同格式的⾳频输出的⾳频采样格式不是⼀样。测试发现，其中AAC解码输出的数据为浮点型的 AV_SAMPLE_FMT_FLTP 格式，MP3解码输出的数据为 AV_SAMPLE_FMT_S16P 格式（使⽤的mp3⽂件为16位深）。具体采样格式可以查看解码后的AVFrame中的format成员或编解码器的 AVCodecContext中的sample_fmt成员。
Planar或者Packed模式直接影响到保存⽂件时写⽂件的操作，操作数据的时候⼀定要先检测⾳频采样格式。

PCM字节序

谈到字节序的问题，必然牵涉到两⼤CPU派系。那就是Motorola的PowerPC系列CPU和Intel的x86系列CPU。PowerPC系列采⽤big endian⽅式存储数据，⽽x86系列则采⽤little endian⽅式存储数据。那么究竟什么是big endian，什么⼜是little endian？

big endian是指低地址存放最⾼有效字节（MSB，Most Significant Bit），⽽little endian则是低地址存放最低有效字节（LSB，Least Significant Bit）。

举例来说，整型数字$1234ABCD存储的时候就会有两种方式：

字节顺序	内存数据	备注
Big Endian (BE)	0xAB 0xCD 0x12 0x34	此时的0xAB被称为most significant byte (MSB)
Little Endian (LE)	0xCD 0xAB 0x34 0x12	此时的0xCD被称为least significant byte (LSB)

pcm编码aac代码实现

#include 
#include 
#include 

#include 

#include 
#include 
#include 
#include 
#include 

/* 检测该编码器是否支持该采样格式 */
static int check_sample_fmt(const AVCodec *codec, enum AVSampleFormat sample_fmt) {
    const enum AVSampleFormat *p = codec->sample_fmts;

    while (*p != AV_SAMPLE_FMT_NONE) { // 通过AV_SAMPLE_FMT_NONE作为结束符
        if (*p == sample_fmt)
            return 1;
        p++;
    }
    return 0;
}

/* 检测该编码器是否支持该采样率 */
static int check_sample_rate(const AVCodec *codec, const int sample_rate) {
    const int *p = codec->supported_samplerates;
    while (*p != 0) {// 0作为退出条件，比如libfdk-aacenc.c的aac_sample_rates
        printf("%s support %dhz\n", codec->name, *p);
        if (*p == sample_rate)
            return 1;
        p++;
    }
    return 0;
}

/* 检测该编码器是否支持该采样率, 该函数只是作参考 */
static int check_channel_layout(const AVCodec *codec, const uint64_t channel_layout) {
    // 不是每个codec都给出支持的channel_layout
    const uint64_t *p = codec->channel_layouts;
    if (!p) {
        printf("the codec %s no set channel_layouts\n", codec->name);
        return 1;
    }
    while (*p != 0) { // 0作为退出条件，比如libfdk-aacenc.c的aac_channel_layout
        printf("%s support channel_layout %d\n", codec->name, *p);
        if (*p == channel_layout)
            return 1;
        p++;
    }
    return 0;
}


static void get_adts_header(AVCodecContext *ctx, uint8_t *adts_header, int aac_length) {
    uint8_t freq_idx = 0;    //0: 96000 Hz  3: 48000 Hz 4: 44100 Hz
    switch (ctx->sample_rate) {
        case 96000:
            freq_idx = 0;
            break;
        case 88200:
            freq_idx = 1;
            break;
        case 64000:
            freq_idx = 2;
            break;
        case 48000:
            freq_idx = 3;
            break;
        case 44100:
            freq_idx = 4;
            break;
        case 32000:
            freq_idx = 5;
            break;
        case 24000:
            freq_idx = 6;
            break;
        case 22050:
            freq_idx = 7;
            break;
        case 16000:
            freq_idx = 8;
            break;
        case 12000:
            freq_idx = 9;
            break;
        case 11025:
            freq_idx = 10;
            break;
        case 8000:
            freq_idx = 11;
            break;
        case 7350:
            freq_idx = 12;
            break;
        default:
            freq_idx = 4;
            break;
    }
    uint8_t chanCfg = ctx->channels;
    uint32_t frame_length = aac_length + 7;
    adts_header[0] = 0xFF;
    adts_header[1] = 0xF1;
    adts_header[2] = ((ctx->profile) << 6) + (freq_idx << 2) + (chanCfg >> 2);
    adts_header[3] = (((chanCfg & 3) << 6) + (frame_length >> 11));
    adts_header[4] = ((frame_length & 0x7FF) >> 3);
    adts_header[5] = (((frame_length & 7) << 5) + 0x1F);
    adts_header[6] = 0xFC;
}

/*
*
*/
static int encode(AVCodecContext *ctx, AVFrame *frame, AVPacket *pkt, FILE *output) {
    int ret;

    /* send the frame for encoding */
    ret = avcodec_send_frame(ctx, frame);
    if (ret < 0) {
        fprintf(stderr, "Error sending the frame to the encoder\n");
        return -1;
    }

    /* read all the available output packets (in general there may be any number of them */
    // 编码和解码都是一样的，都是send 1次，然后receive多次, 直到AVERROR(EAGAIN)或者AVERROR_EOF
    while (ret >= 0) {
        ret = avcodec_receive_packet(ctx, pkt);
        if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) {
            return 0;
        } else if (ret < 0) {
            fprintf(stderr, "Error encoding audio frame\n");
            return -1;
        }
        uint8_t aac_header[7];
        get_adts_header(ctx, aac_header, pkt->size);

        size_t len = 0;
        len = fwrite(aac_header, 1, 7, output);
        if (len != 7) {
            fprintf(stderr, "fwrite aac_header failed\n");
            return -1;
        }
        len = fwrite(pkt->data, 1, pkt->size, output);
        if (len != pkt->size) {
            fprintf(stderr, "fwrite aac data failed\n");
            return -1;
        }
        /* 是否需要释放数据? avcodec_receive_packet第一个调用的就是 av_packet_unref
        * 所以我们不用手动去释放，这里有个问题，不能将pkt直接插入到队列，因为编码器会释放数据
        * 可以新分配一个pkt, 然后使用av_packet_move_ref转移pkt对应的buffer
        */
        // av_packet_unref(pkt);
    }
    return -1;
}

/*
 * 这里只支持2通道的转换
*/
void f32le_convert_to_fltp(float *f32le, float *fltp, int nb_samples) {
    float *fltp_l = fltp;   // 左通道
    float *fltp_r = fltp + nb_samples;   // 右通道
    for (int i = 0; i < nb_samples; i++) {
        fltp_l[i] = f32le[i * 2];     // 0 1   - 2 3
        fltp_r[i] = f32le[i * 2 + 1];   // 可以尝试注释左声道或者右声道听听声音
    }
}

int main(int argc, char **argv) {
    char *in_pcm_file = NULL;
    char *out_aac_file = NULL;
    FILE *infile = NULL;
    FILE *outfile = NULL;
    const AVCodec *codec = NULL;
    AVCodecContext *codec_ctx = NULL;
    AVFrame *frame = NULL;
    AVPacket *pkt = NULL;
    int ret = 0;
    int force_codec = 0;     // 强制使用指定的编码
    char *codec_name = NULL;

    if (argc < 3) {
        fprintf(stderr, "Usage: %s , argc:%d\n",
                argv[0], argc);
        return 0;
    }
    in_pcm_file = argv[1];      // 输入PCM文件
    out_aac_file = argv[2];     // 输出的AAC文件

    enum AVCodecID codec_id = AV_CODEC_ID_AAC;

    if (4 == argc) {
        if (strcmp(argv[3], "libfdk_aac") == 0) {
            force_codec = 1;     // 强制使用 libfdk_aac
            codec_name = "libfdk_aac";
        } else if (strcmp(argv[3], "aac") == 0) {
            force_codec = 1;
            codec_name = "aac";
        }
    }
    if (force_codec)
        printf("force codec name: %s\n", codec_name);
    else
        printf("default codec name: %s\n", "aac");

    if (force_codec == 0) { // 没有强制设置编码器
        codec = avcodec_find_encoder(codec_id); // 按ID查找则缺省的aac encode为aacenc.c
    } else {
        // 按名字查找指定的encode,对应AVCodec的name字段
        codec = avcodec_find_encoder_by_name(codec_name);
    }
    if (!codec) {
        fprintf(stderr, "Codec not found\n");
        exit(1);
    }

    codec_ctx = avcodec_alloc_context3(codec);
    if (!codec_ctx) {
        fprintf(stderr, "Could not allocate audio codec context\n");
        exit(1);
    }
    codec_ctx->codec_id = codec_id;
    codec_ctx->codec_type = AVMEDIA_TYPE_AUDIO;
    codec_ctx->bit_rate = 128 * 1024;
    codec_ctx->channel_layout = AV_CH_LAYOUT_STEREO;
    codec_ctx->sample_rate = 48000; //48000;
    codec_ctx->channels = av_get_channel_layout_nb_channels(codec_ctx->channel_layout);
    codec_ctx->profile = FF_PROFILE_AAC_LOW;    //

    if (strcmp(codec->name, "aac") == 0) {
        codec_ctx->sample_fmt = AV_SAMPLE_FMT_FLTP;
    } else if (strcmp(codec->name, "libfdk_aac") == 0) {
        codec_ctx->sample_fmt = AV_SAMPLE_FMT_S16;
    } else {
        codec_ctx->sample_fmt = AV_SAMPLE_FMT_FLTP;
    }

    /* 检测支持采样格式支持情况 */
    if (!check_sample_fmt(codec, codec_ctx->sample_fmt)) {
        fprintf(stderr, "Encoder does not support sample format %s",
                av_get_sample_fmt_name(codec_ctx->sample_fmt));
        exit(1);
    }
    if (!check_sample_rate(codec, codec_ctx->sample_rate)) {
        fprintf(stderr, "Encoder does not support sample rate %d", codec_ctx->sample_rate);
        exit(1);
    }
    if (!check_channel_layout(codec, codec_ctx->channel_layout)) {
        fprintf(stderr, "Encoder does not support channel layout %lu", codec_ctx->channel_layout);
        exit(1);
    }

    printf("\n\nAudio encode config\n");
    printf("bit_rate:%ldkbps\n", codec_ctx->bit_rate / 1024);
    printf("sample_rate:%d\n", codec_ctx->sample_rate);
    printf("sample_fmt:%s\n", av_get_sample_fmt_name(codec_ctx->sample_fmt));
    printf("channels:%d\n", codec_ctx->channels);
    // frame_size是在avcodec_open2后进行关联
    printf("1 frame_size:%d\n", codec_ctx->frame_size);
    /* 将编码器上下文和编码器进行关联 */
    if (avcodec_open2(codec_ctx, codec, NULL) < 0) {
        fprintf(stderr, "Could not open codec\n");
        exit(1);
    }
    printf("2 frame_size:%d\n\n", codec_ctx->frame_size); // 决定每次到底送多少个采样点
    // 打开输入和输出文件
    infile = fopen(in_pcm_file, "rb");
    if (!infile) {
        fprintf(stderr, "Could not open %s\n", in_pcm_file);
        exit(1);
    }
    outfile = fopen(out_aac_file, "wb");
    if (!outfile) {
        fprintf(stderr, "Could not open %s\n", out_aac_file);
        exit(1);
    }

    /* packet for holding encoded output */
    pkt = av_packet_alloc();
    if (!pkt) {
        fprintf(stderr, "could not allocate the packet\n");
        exit(1);
    }

    /* frame containing input raw audio */
    frame = av_frame_alloc();
    if (!frame) {
        fprintf(stderr, "Could not allocate audio frame\n");
        exit(1);
    }
    /* 每次送多少数据给编码器由：
     *  (1)frame_size(每帧单个通道的采样点数);
     *  (2)sample_fmt(采样点格式);
     *  (3)channel_layout(通道布局情况);
     * 3要素决定
     */
    frame->nb_samples = codec_ctx->frame_size;
    frame->format = codec_ctx->sample_fmt;
    frame->channel_layout = codec_ctx->channel_layout;
    frame->channels = av_get_channel_layout_nb_channels(frame->channel_layout);
    printf("frame nb_samples:%d\n", frame->nb_samples);
    printf("frame sample_fmt:%d\n", frame->format);
    printf("frame channel_layout:%lu\n\n", frame->channel_layout);
    /* 为frame分配buffer */
    ret = av_frame_get_buffer(frame, 0);
    if (ret < 0) {
        fprintf(stderr, "Could not allocate audio data buffers\n");
        exit(1);
    }
    // 计算出每一帧的数据 单个采样点的字节 * 通道数目 * 每帧采样点数量
    int frame_bytes = av_get_bytes_per_sample(frame->format) \
     * frame->channels \
     * frame->nb_samples;
    printf("frame_bytes %d\n", frame_bytes);
    uint8_t *pcm_buf = (uint8_t *) malloc(frame_bytes);
    if (!pcm_buf) {
        printf("pcm_buf malloc failed\n");
        return 1;
    }
    uint8_t *pcm_temp_buf = (uint8_t *) malloc(frame_bytes);
    if (!pcm_temp_buf) {
        printf("pcm_temp_buf malloc failed\n");
        return 1;
    }
    int64_t pts = 0;
    printf("start enode\n");
    for (;;) {
        memset(pcm_buf, 0, frame_bytes);
        size_t read_bytes = fread(pcm_buf, 1, frame_bytes, infile);
        if (read_bytes <= 0) {
            printf("read file finish\n");
            break;
//            fseek(infile,0,SEEK_SET);
//            fflush(outfile);
//            continue;
        }

        /* 确保该frame可写, 如果编码器内部保持了内存参考计数，则需要重新拷贝一个备份
            目的是新写入的数据和编码器保存的数据不能产生冲突
        */
        ret = av_frame_make_writable(frame);
        if (ret != 0)
            printf("av_frame_make_writable failed, ret = %d\n", ret);

        if (AV_SAMPLE_FMT_S16 == frame->format) {
            // 将读取到的PCM数据填充到frame去，但要注意格式的匹配, 是planar还是packed都要区分清楚
            ret = av_samples_fill_arrays(frame->data, frame->linesize,
                                         pcm_buf, frame->channels,
                                         frame->nb_samples, frame->format, 0);
        } else {
            // 将读取到的PCM数据填充到frame去，但要注意格式的匹配, 是planar还是packed都要区分清楚
            // 将本地的f32le packed模式的数据转为float palanar
            memset(pcm_temp_buf, 0, frame_bytes);
            f32le_convert_to_fltp((float *) pcm_buf, (float *) pcm_temp_buf, frame->nb_samples);
            ret = av_samples_fill_arrays(frame->data, frame->linesize,
                                         pcm_temp_buf, frame->channels,
                                         frame->nb_samples, frame->format, 0);
        }

        // 设置pts
        pts += frame->nb_samples;
        frame->pts = pts;       // 使用采样率作为pts的单位，具体换算成秒 pts*1/采样率
        ret = encode(codec_ctx, frame, pkt, outfile);
        if (ret < 0) {
            printf("encode failed\n");
            break;
        }
    }

    /* 冲刷编码器 */
    encode(codec_ctx, NULL, pkt, outfile);

    // 关闭文件
    fclose(infile);
    fclose(outfile);

    // 释放内存
    if (pcm_buf) {
        free(pcm_buf);
    }
    if (pcm_temp_buf) {
        free(pcm_temp_buf);
    }
    av_frame_free(&frame);
    av_packet_free(&pkt);
    avcodec_free_context(&codec_ctx);
    printf("main finish, please enter Enter and exit\n");
    getchar();
    return 0;
}

#测试范例
48000_2_f32le.pcm out-encode_audio.aac

default codec name: aac
aac support 96000hz
aac support 88200hz
aac support 64000hz
aac support 48000hz
the codec aac no set channel_layouts


Audio encode config
bit_rate:128kbps
sample_rate:48000
sample_fmt:fltp
channels:2
1 frame_size:0
2 frame_size:1024

frame nb_samples:1024
frame sample_fmt:8
frame channel_layout:3

frame_bytes 8192
start enode
read file finish
main finish, please enter Enter and exit
[aac @ 0x7fafaa80be00] Qavg: 297.421