音视频-FFmpeg音频录制、播放、编码和解码（上）

前言：本文旨在介绍在跨平台开发工具QT（跨平台C++图形用户界面应用程序开发框架）上使用 FFmpeg 进行音频的录制、播放、编码和解码。
视频请参考下篇：音视频-FFmpeg视频录制、播放、编码和解码（下）

一、工具QT的安装和使用

本文使用Mac环境进行开发，Windows请参考【秒懂音视频开发】04_Windows开发环境搭建

1、安装FFmpeg

在Mac环境中，直接使用Homebrew安装FFmpeg即可

brew install ffmpeg

查看版本

ffmpeg -version

2、安装Qt

通过brew install安装Qt，最终被安装在/usr/local/Cellar/qt目录。

brew install qt

通过brew install --cask安装Qt Creator，最终被安装在/usr/local/Caskroom/qt-creator目录。

brew install --cask qt-creator

3、集成FFmpeg到Qt项目中

在wxq_qt_ffmpeg_demo.pro 中添加

mac: {
    MMFPEG_HOME = /usr/local/Cellar/ffmpeg/5.1
    // 打印
    message($${MMFPEG_HOME})
    message($$(PATH))
# 设置头文件路径
INCLUDEPATH += $${MMFPEG_HOME}/include
# 设置库文件路径
LIBS += -L $${MMFPEG_HOME}/lib \
        -lavcodec \
        -lavdevice \
        -lavfilter \
        -lavformat \
        -lavutil \
        -lpostproc \
        -lswscale \
        -lswresample
}

二、QT的信号和槽

信号（Signal）：比如点击按钮就会发出一个点击信号
槽（Slot）：一般也叫槽函数，是用来处理信号的函数

// 比如点击按钮，关闭当前窗口
// btn发出clicked信号，就会调用this的close函数
connect(btn, &QPushButton::clicked, this, &MainWindow::close);

MainWindow::MainWindow(QWidget *parent)
    : QMainWindow(parent)
    , ui(new Ui::MainWindow)
{
    ui->setupUi(this);
    QPushButton *btn = new QPushButton;
    btn->setText("登陆");
    btn->setFixedSize(100, 30);
    btn->setParent(this);

    // 关闭窗口
    // 链接信号和槽
    // btn发出信号
    // QMainWindow接受信号，调用槽函数close
    connect(btn, &QPushButton::clicked, this, &QMainWindow::close);
}

三、音频

1、介绍

将音频数字化的常见技术方案是脉冲编码调制（PCM，Pulse Code Modulation），主要过程是：采样 → 量化 → 编码。

1.1、采样率：

每秒采集的样本数量，称为采样率（采样频率，采样速率，Sampling Rate）。比如，采样率44.1kHz表示1秒钟采集44100个样本。

1.2、位深度（采样精度，采样大小，Bit Depth）：

使用多少个二进制位来存储一个采样点的样本值。位深度越高，表示的振幅越精确。常见的CD采用16bit的位深度，能表示65536（2^16）个不同的值。DVD使用24bit的位深度，大多数电话设备使用8bit的位深度。

1.3、声道（Channel）

单声道产生一组声波数据，双声道（立体声）产生两组声波数据。

采样率44.1kHZ、位深度16bit的1分钟立体声PCM数据有多大？

采样率 * 位深度 * 声道数 * 时间
44100 * 16 * 2 * 60 / 8 ≈ 10.34MB

1.4、比特率（Bit Rate）:

指单位时间内传输或处理的比特数量，单位是：比特每秒（bit/s或bps），还有：千比特每秒（Kbit/s或Kbps）、兆比特每秒（Mbit/s或Mbps）、吉比特每秒（Gbit/s或Gbps）、太比特每秒（Tbit/s或Tbps）。

采样率44.1kHZ、位深度16bit的立体声PCM数据的比特率是多少？

采样率 * 位深度 * 声道数
44100 * 16 * 2 = 1411.2Kbps

通常，采样率、位深度越高，数字化音频的质量就越好。从比特率的计算公式可以看得出来：比特率越高，数字化音频的质量就越好。

2、ffmpeg音频转换

ffmpeg -i y831.wav y8.mp3

输出：

wxq@wangxueqideMBP Desktop % ffmpeg -i y831.wav y8.mp3
ffmpeg version 5.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with Apple clang version 13.1.6 (clang-1316.0.21.2.5)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/5.1 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox
  libavutil      57. 28.100 / 57. 28.100
  libavcodec     59. 37.100 / 59. 37.100
  libavformat    59. 27.100 / 59. 27.100
  libavdevice    59.  7.100 / 59.  7.100
  libavfilter     8. 44.100 /  8. 44.100
  libswscale      6.  7.100 /  6.  7.100
  libswresample   4.  7.100 /  4.  7.100
  libpostproc    56.  6.100 / 56.  6.100
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from 'y831.wav':
  Duration: 00:00:04.13, bitrate: 1536 kb/s
  Stream #0:0: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 48000 Hz, mono, flt, 1536 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_f32le (native) -> mp3 (libmp3lame))
Press [q] to stop, [?] for help
Output #0, mp3, to 'y8.mp3':
  Metadata:
    TSSE            : Lavf59.27.100
  Stream #0:0: Audio: mp3, 48000 Hz, mono, fltp
    Metadata:
      encoder         : Lavc59.37.100 libmp3lame
size=      33kB time=00:00:04.15 bitrate=  64.8kbits/s speed= 180x    
video:0kB audio:33kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.709411%

3、ffprobe：查看音视频的参数信息。

ffprobe的主要作用：查看音视频的参数信息。

# 可以查看MP3文件的采样率、比特率、时长等信息
ffprobe y8.mp3

输出

Input #0, mp3, from 'y8.mp3':
  Metadata:
    encoder         : Lavf59.27.100
  Duration: 00:00:04.18, start: 0.023021, bitrate: 64 kb/s
  Stream #0:0: Audio: mp3, 48000 Hz, mono, fltp, 64 kb/s

4、ffplay：播放音视频。

ffplay的主要作用：播放音视频。

ffplay y8.mp3

注意：ffplay -hide_banner y8.mp3 可以屏蔽版本信息

四、命令行录音和播放

1、查看可用设备

使用命令行查看当前平台的可用设备：

ffmpeg -devices

输出

wxq@wangxueqideMBP Desktop % ffmpeg -devices
Devices:
 D. = Demuxing supported
 .E = Muxing supported
 --
  E audiotoolbox    AudioToolbox output device
 D  avfoundation    AVFoundation input device
 D  lavfi           Libavfilter virtual input device
  E sdl,sdl2        SDL2 output device
 D  x11grab         X11 screen capture, using XCB

2、查看avfoundation支持的设备

在Mac平台，使用的是avfoundation，而不是dshow。

ffmpeg -f avfoundation -list_devices true -i ''

输出

[AVFoundation indev @ 0x7ff2dd205200] AVFoundation video devices:
[AVFoundation indev @ 0x7ff2dd205200] [0] FaceTime高清摄像头（内建）
[AVFoundation indev @ 0x7ff2dd205200] [1] Capture screen 0
[AVFoundation indev @ 0x7ff2dd205200] AVFoundation audio devices:
[AVFoundation indev @ 0x7ff2dd205200] [0] 外置麦克风
[AVFoundation indev @ 0x7ff2dd205200] [1] MacBook Pro麦克风

3、指定设备进行录音

录音参数：pcm_f32le, 48000 Hz

ffmpeg -f avfoundation -i :1 out.mp3  // :1表示使用1号音频设备，即[1] MacBook Pro麦克风

执行后直接开始录音，可以使用快捷键Ctrl + C终止录音

输出

Input #0, avfoundation, from ':0':
  Duration: N/A, start: 317843.718292, bitrate: 1536 kb/s
  Stream #0:0: Audio: pcm_f32le, 48000 Hz, mono, flt, 1536 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_f32le (native) -> mp3 (libmp3lame))
Press [q] to stop, [?] for help
Output #0, mp3, to 'out.mp3':
  Metadata:
    TSSE            : Lavf59.27.100
  Stream #0:0: Audio: mp3, 48000 Hz, mono, fltp
    Metadata:
      encoder         : Lavc59.37.100 libmp3lame
size=      35kB time=00:00:04.58 bitrate=  62.1kbits/s speed=0.997x    
video:0kB audio:34kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.670856%
Exiting normally, received signal 2.

4、命令行播放pcm录音

注意：采样率和格式要一样。录音参数：pcm_f32le, 48000 Hz

ffplay -ar 48000 -ac 2 -f f32le 08_15_11_58_59.pcm

五、编写代码录音和播放

1、通过编程录音

1.1、开发录音功能的主要步骤是：

注册设备
获取输入格式对象
打开设备
采集数据
释放资源

1.2、具体步骤：

1.2.1注册设备

在整个程序的运行过程中，只需要执行1次注册设备的代码。

// 初始化libavdevice并注册所有输入和输出设备 
avdevice_register_all();

1.2.2获取输入格式对象

Windows和Mac环境的格式名称、设备名称都是不同的，所以使用条件编译实现跨平台。

// 格式名称、设备名称目前暂时使用宏定义固定死
#ifdef Q_OS_WIN
    // 格式名称
    #define FMT_NAME "dshow"
    // 设备名称
    #define DEVICE_NAME "audio=麦克风阵列 (Realtek(R) Audio)"
#else
    #define FMT_NAME "avfoundation"
    #define DEVICE_NAME ":0"
#endif

1.2.3、录音核心代码

void showSpec(AVFormatContext *ctx) {
    // 获取输入流
    AVStream *stream = ctx->streams[0];
    // 获取音频参数
    AVCodecParameters *params = stream->codecpar;
    // 声道数
    qDebug() << params->channels;
    // 采样率
    qDebug() << params->sample_rate;
    // 采样格式
    qDebug() << params->format;
    // 每一个样本的一个声道占用多少个字节
    qDebug() << av_get_bytes_per_sample((AVSampleFormat) params->format);
}

// 当线程启动的时候（start），就会自动调用run函数
// run函数中的代码是在子线程中执行的
// 耗时操作应该放在run函数中
void AudioThread::run() {
    qDebug() << this << "开始执行----------";

    // 获取输入格式对象
    AVInputFormat *fmt = (AVInputFormat *)av_find_input_format(FMT_NAME);
    if (!fmt) {
        qDebug() << "获取输入格式对象失败" << FMT_NAME;
        return;
    }

    // 格式上下文（将来可以利用上下文操作设备）
    AVFormatContext *ctx = nullptr;
    // 打开设备
    int ret = avformat_open_input(&ctx, DEVICE_NAME, fmt, nullptr);
    if (ret < 0) {
        char errbuf[1024];
        av_strerror(ret, errbuf, sizeof (errbuf));
        qDebug() << "打开设备失败" << errbuf;
        return;
    }

    // 打印一下录音设备的参数信息
    showSpec(ctx);

    // 文件名
    QString filename = FILEPATH;

    filename += QDateTime::currentDateTime().toString("MM_dd_HH_mm_ss");
    filename += ".pcm";
    QFile file(filename);

    // 打开文件
    // WriteOnly：只写模式。如果文件不存在，就创建文件；如果文件存在，就会清空文件内容
    if (!file.open(QFile::WriteOnly)) {
        qDebug() << "文件打开失败" << filename;
        // 关闭设备
        avformat_close_input(&ctx);
        return;
    }

    // 数据包
    AVPacket *pkt = av_packet_alloc();
    while (!isInterruptionRequested()) {
        // 不断采集数据
        ret = av_read_frame(ctx, pkt);

        if (ret == 0) { // 读取成功
            // 将数据写入文件
            file.write((const char *) pkt->data, pkt->size);
        } else if (ret == AVERROR(EAGAIN)) { // 资源临时不可用
            continue;
        } else { // 其他错误
            char errbuf[1024];
            av_strerror(ret, errbuf, sizeof (errbuf));
            qDebug() << "av_read_frame error" << errbuf << ret;
            break;
        }

        // 必须要加，释放pkt内部的资源
//        av_packet_unref(&pkt);
        av_packet_unref(pkt);
    }
//    while (!_stop && av_read_frame(ctx, &pkt) == 0) {
//        // 将数据写入文件
//        file.write((const char *) pkt.data, pkt.size);
//    }

    // 释放资源
    // 关闭文件
    file.close();

    // 释放资源
    av_packet_free(&pkt);

    // 关闭设备
    avformat_close_input(&ctx);
    
    qDebug() << this << "正常结束----------";
}

2、通过编程播放录音

2.1、初始化子系统

SDL分成好多个子系统（subsystem）：

Video：显示和窗口管理
Audio：音频设备管理
Joystick：游戏摇杆控制
Timers：定时器
...

目前只用到了音频功能，所以只需要通过SDL_init函数初始化Audio子系统即可。

// 初始化Audio子系统
if (SDL_Init(SDL_INIT_AUDIO)) {
    // 返回值不是0，就代表失败
    qDebug() << "SDL_Init Error" << SDL_GetError();
    return;
}

2.2、打开音频设备

/* 一些宏定义 */
// 采样率
#define SAMPLE_RATE 48000
// 采样格式
#define SAMPLE_FORMAT AUDIO_F32LSB
// 采样大小
#define SAMPLE_SIZE SDL_AUDIO_BITSIZE(SAMPLE_FORMAT)
// 声道数
#define CHANNELS 2
// 音频缓冲区的样本数量
#define SAMPLES 1024
 
// 用于存储读取的音频数据和长度
typedef struct {
    int len = 0;
    int pullLen = 0;
    Uint8 *data = nullptr;
} AudioBuffer;
 
// 音频参数
SDL_AudioSpec spec;
// 采样率
spec.freq = SAMPLE_RATE;
// 采样格式（s16le）
spec.format = SAMPLE_FORMAT;
// 声道数
spec.channels = CHANNELS;
// 音频缓冲区的样本数量（这个值必须是2的幂）
spec.samples = SAMPLES;
// 回调
spec.callback = pull_audio_data;
// 传递给回调的参数
AudioBuffer buffer;
spec.userdata = &buffer;
 
// 打开音频设备
if (SDL_OpenAudio(&spec, nullptr)) {
    qDebug() << "SDL_OpenAudio Error" << SDL_GetError();
    // 清除所有初始化的子系统
    SDL_Quit();
    return;
}

2.3、打开文件

#define FILENAME "/Users/wxq/Desktop/08_15_11_58_59.pcm"
 
// 打开文件
QFile file(FILENAME);
if (!file.open(QFile::ReadOnly)) {
    qDebug() << "文件打开失败" << FILENAME;
    // 关闭音频设备
    SDL_CloseAudio();
    // 清除所有初始化的子系统
    SDL_Quit();
    return;
}

2.4、开始播放

// 每个样本占用多少个字节
#define BYTES_PER_SAMPLE ((SAMPLE_SIZE * CHANNELS) / 8)
// 文件缓冲区的大小
#define BUFFER_SIZE (SAMPLES * BYTES_PER_SAMPLE)
 
// 开始播放
SDL_PauseAudio(0);
 
// 存放文件数据
Uint8 data[BUFFER_LEN];
 
while (!isInterruptionRequested()) {
    // 只要从文件中读取的音频数据，还没有填充完毕，就跳过
    if (buffer.len > 0) continue;
 
    buffer.len = file.read((char *) data, BUFFER_SIZE);
 
    // 文件数据已经读取完毕
    if (buffer.len <= 0) {
        // 剩余的样本数量
        int samples = buffer.pullLen / BYTES_PER_SAMPLE;
        int ms = samples * 1000 / SAMPLE_RATE;
        SDL_Delay(ms);
        break;
    }
 
    // 读取到了文件数据
    buffer.data = data;
}

2.5、回调函数

// userdata：SDL_AudioSpec.userdata
// stream：音频缓冲区（需要将音频数据填充到这个缓冲区）
// len：音频缓冲区的大小（SDL_AudioSpec.samples * 每个样本的大小）
void pull_audio_data(void *userdata, Uint8 *stream, int len) {
    // 清空stream
    SDL_memset(stream, 0, len);
 
    // 取出缓冲信息
    AudioBuffer *buffer = (AudioBuffer *) userdata;
    if (buffer->len == 0) return;
 
    // 取len、bufferLen的最小值（为了保证数据安全，防止指针越界）
    buffer->pullLen = (len > buffer->len) ? buffer->len : len;
    
    // 填充数据
    SDL_MixAudio(stream,
                 buffer->data,
                 buffer->pullLen,
                 SDL_MIX_MAXVOLUME);
    buffer->data += buffer->pullLen;
    buffer->len -= buffer->pullLen;
}

2.6、释放资源

// 关闭文件
file.close();
// 关闭音频设备
SDL_CloseAudio();
// 清理所有初始化的子系统
SDL_Quit();

六、PCM转成WAV

1、通过下面的命令可以将PCM转成WAV

ffmpeg -ar 48000 -ac 2 -f f32le -i out.pcm out.wav

需要注意的是：上面命令生成的WAV文件头有78字节。对比44字节的文件头，它多增加了一个34字节大小的LIST chunk。

加上一个输出文件参数-bitexact可以去掉LIST Chunk。

ffmpeg -ar 48000 -ac 2 -f f32le -i out.pcm -bitexact out.wav

输出

Input #0, f32le, from 'out.pcm':
  Duration: 00:00:02.98, bitrate: 3072 kb/s
  Stream #0:0: Audio: pcm_f32le, 48000 Hz, stereo, flt, 3072 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_f32le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'out.wav':
  Metadata:
    ISFT            : Lavf59.27.100
  Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s16, 1536 kb/s
    Metadata:
      encoder         : Lavc59.37.100 pcm_s16le
size=     559kB time=00:00:02.98 bitrate=1536.2kbits/s speed= 387x        
video:0kB audio:559kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.013626%

2、通过代码将PCM转成WAV

在PCM数据的前面插入一个44字节的WAV文件头，就可以将PCM转成WAV。

2.1、WAV的文件头结构

WAV的文件头结构大概如下所示：

#define AUDIO_FORMAT_FLOAT 3
 
// WAV文件头（44字节）
typedef struct {
    // RIFF chunk的id
    uint8_t riffChunkId[4] = {'R', 'I', 'F', 'F'};
    // RIFF chunk的data大小，即文件总长度减去8字节
    uint32_t riffChunkDataSize;
 
    // "WAVE"
    uint8_t format[4] = {'W', 'A', 'V', 'E'};
 
    /* fmt chunk */
    // fmt chunk的id
    uint8_t fmtChunkId[4] = {'f', 'm', 't', ' '};
    // fmt chunk的data大小：存储PCM数据时，是16
    uint32_t fmtChunkDataSize = 16;
    // 音频编码，1表示PCM，3表示Floating Point
    uint16_t audioFormat = AUDIO_FORMAT_PCM;
    // 声道数
    uint16_t numChannels;
    // 采样率
    uint32_t sampleRate;
    // 字节率 = sampleRate * blockAlign
    uint32_t byteRate;
    // 一个样本的字节数 = bitsPerSample * numChannels >> 3
    uint16_t blockAlign;
    // 位深度
    uint16_t bitsPerSample;
 
    /* data chunk */
    // data chunk的id
    uint8_t dataChunkId[4] = {'d', 'a', 't', 'a'};
    // data chunk的data大小：音频数据的总长度，即文件总长度减去文件头的长度(一般是44)
    uint32_t dataChunkDataSize;
} WAVHeader;

2.2、PCM转WAV核心实现

封装到了FFmpegs类的pcm2wav函数中。

#include 
#include 
 
class FFmpegs {
public:
    FFmpegs();
    static void pcm2wav(WAVHeader &header,
                        const char *pcmFilename,
                        const char *wavFilename);
};
 
void FFmpegs::pcm2wav(WAVHeader &header,
                      const char *pcmFilename,
                      const char *wavFilename) {
    header.blockAlign = header.bitsPerSample * header.numChannels >> 3;
    header.byteRate = header.sampleRate * header.blockAlign;
 
    // 打开pcm文件
    QFile pcmFile(pcmFilename);
    if (!pcmFile.open(QFile::ReadOnly)) {
        qDebug() << "文件打开失败" << pcmFilename;
        return;
    }
    header.dataChunkDataSize = pcmFile.size();
    header.riffChunkDataSize = header.dataChunkDataSize
                               + sizeof (WAVHeader) - 8;
 
    // 打开wav文件
    QFile wavFile(wavFilename);
    if (!wavFile.open(QFile::WriteOnly)) {
        qDebug() << "文件打开失败" << wavFilename;
 
        pcmFile.close();
        return;
    }
 
    // 写入头部
    wavFile.write((const char *) &header, sizeof (WAVHeader));
 
    // 写入pcm数据
    char buf[1024];
    int size;
    while ((size = pcmFile.read(buf, sizeof (buf))) > 0) {
        wavFile.write(buf, size);
    }
 
    // 关闭文件
    pcmFile.close();
    wavFile.close();
}

2.3、调用函数

// 封装WAV的头部
WAVHeader header;
header.numChannels = 2;
header.sampleRate = 44100;
header.bitsPerSample = 16;
// 调用函数
FFmpegs::pcm2wav(header, "F:/in.pcm", "F:/out.wav");

七、播放WAV

对于WAV文件来说，可以直接使用ffplay命令播放，而且不用像PCM那样增加额外的参数。因为WAV的文件头中已经包含了相关的音频参数信息。

ffplay in.wav

1、初始化子系统

// 初始化Audio子系统
if (SDL_Init(SDL_INIT_AUDIO)) {
    qDebug() << "SDL_Init error:" << SDL_GetError();
    return;
}

2、加载WAV文件

// 存放WAV的PCM数据和数据长度
typedef struct {
    Uint32 len = 0;
    int pullLen = 0;
    Uint8 *data = nullptr;
} AudioBuffer;
 
// WAV中的PCM数据
Uint8 *data;
// WAV中的PCM数据大小（字节）
Uint32 len;
// 音频参数
SDL_AudioSpec spec;
 
// 加载wav文件
if (!SDL_LoadWAV(FILENAME, &spec, &data, &len)) {
    qDebug() << "SDL_LoadWAV error:" << SDL_GetError();
    // 清除所有的子系统
    SDL_Quit();
    return;
}
 
// 回调
spec.callback = pull_audio_data;
// 传递给回调函数的userdata
AudioBuffer buffer;
buffer.len = len;
buffer.data = data;
spec.userdata = &buffer;

3、打开音频设备

// 打开设备
if (SDL_OpenAudio(&spec, nullptr)) {
    qDebug() << "SDL_OpenAudio error:" << SDL_GetError();
    // 释放文件数据
    SDL_FreeWAV(data);
    // 清除所有的子系统
    SDL_Quit();
    return;
}

4、开始播放

// 开始播放（0是取消暂停）
SDL_PauseAudio(0);
 
while (!isInterruptionRequested()) {
    if (buffer.len > 0) continue;
    // 每一个样本的大小
    int size = spec.channels * SDL_AUDIO_BITSIZE(spec.format) / 8;
    // 最后一次播放的样本数量
    int samples = buffer.pullLen / size;
    // 最后一次播放的时长
    int ms = samples * 1000 / spec.freq;
    SDL_Delay(ms);
    break;
}

5、回调函数

// 等待音频设备回调(会回调多次)
void pull_audio_data(void *userdata,
                     // 需要往stream中填充PCM数据
                     Uint8 *stream,
                     // 希望填充的大小(samples * format * channels / 8)
                     int len
                    ) {
    // 清空stream
    SDL_memset(stream, 0, len);
 
    AudioBuffer *buffer = (AudioBuffer *) userdata;
 
    // 文件数据还没准备好
    if (buffer->len <= 0) return;
 
    // 取len、bufferLen的最小值
    buffer->pullLen = (len > (int) buffer->len) ? buffer->len : len;
 
    // 填充数据
    SDL_MixAudio(stream,
                 buffer->data,
                 buffer->pullLen,
                 SDL_MIX_MAXVOLUME);
    buffer->data += buffer->pullLen;
    buffer->len -= buffer->pullLen;
}

6、释放资源

// 释放WAV文件数据
SDL_FreeWAV(data);

// 关闭设备
SDL_CloseAudio();

// 清除所有的子系统
SDL_Quit();

八、音频重采样

1、什么叫音频重采样

音频重采样（Audio Resample）：将音频A转换成音频B，并且音频A、B的参数（采样率、采样格式、声道数）并不完全相同。比如：

1.1、音频A的参数

采样率：48000
采样格式：f32le
声道数：1

1.2、音频B的参数

采样率：44100
采样格式：s16le
声道数：2

2、为什么需要音频重采样

这里列举一个音频重采样的经典用途。

有些音频编码器对输入的原始PCM数据是有特定参数要求的，比如要求必须是44100_s16le_2。但是你提供的PCM参数可能是48000_f32le_1。这个时候就需要先将48000_f32le_1转换成44100_s16le_2，然后再使用音频编码器对转换后的PCM进行编码。

3、命令行重采样

通过下面的命令行可以将44100_s16le_2转换成48000_f32le_1。

ffmpeg -ar 44100 -ac 2 -f s16le -i 44100_s16le_2.pcm -ar 48000 -ac 1 -f f32le 48000_f32le_1.pcm

4、编码重采样

音频重采样需要用到2个库：

swresample
avutil

4.1、函数声明

为了让音频重采样功能更加通用，设计成以下函数：

// 音频参数
typedef struct {
    const char *filename;
    int sampleRate;
    AVSampleFormat sampleFmt;
    int chLayout;
} ResampleAudioSpec;
 
class FFmpegs {
public:
    static void resampleAudio(ResampleAudioSpec &in,
                              ResampleAudioSpec &out);
 
    static void resampleAudio(const char *inFilename,
                              int inSampleRate,
                              AVSampleFormat inSampleFmt,
                              int inChLayout,
 
                              const char *outFilename,
                              int outSampleRate,
                              AVSampleFormat outSampleFmt,
                              int outChLayout);
};
 
// 导入头文件
extern "C" {
#include 
#include 
}
 
// 处理错误码
#define ERROR_BUF(ret) \
    char errbuf[1024]; \
    av_strerror(ret, errbuf, sizeof (errbuf));
 
void FFmpegs::resampleAudio(ResampleAudioSpec &in,
                            ResampleAudioSpec &out) {
    resampleAudio(in.filename, in.sampleRate, in.sampleFmt, in.chLayout,
                  out.filename, out.sampleRate, out.sampleFmt, out.chLayout);
}

4.2、函数调用

// 输入参数
ResampleAudioSpec in;
in.filename = "F:/44100_s16le_2.pcm";
in.sampleFmt = AV_SAMPLE_FMT_S16;
in.sampleRate = 44100;
in.chLayout = AV_CH_LAYOUT_STEREO;
 
// 输出参数
ResampleAudioSpec out;
out.filename = "F:/48000_f32le_1.pcm";
out.sampleFmt = AV_SAMPLE_FMT_FLT;
out.sampleRate = 48000;
out.chLayout = AV_CH_LAYOUT_MONO;
 
// 进行音频重采样
FFmpegs::resampleAudio(in, out);

4.3、函数实现

为了简化释放资源的代码，函数中用到了goto语句，所以把需要用到的变量都定义到了前面。

// 文件名
QFile inFile(inFilename);
QFile outFile(outFilename);
 
// 输入缓冲区
// 指向缓冲区的指针
uint8_t **inData = nullptr;
// 缓冲区的大小
int inLinesize = 0;
// 声道数
int inChs = av_get_channel_layout_nb_channels(inChLayout);
// 一个样本的大小
int inBytesPerSample = inChs * av_get_bytes_per_sample(inSampleFmt);
// 缓冲区的样本数量
int inSamples = 1024;
// 读取文件数据的大小
int len = 0;
 
// 输出缓冲区
// 指向缓冲区的指针
uint8_t **outData = nullptr;
// 缓冲区的大小
int outLinesize = 0;
// 声道数
int outChs = av_get_channel_layout_nb_channels(outChLayout);
// 一个样本的大小
int outBytesPerSample = outChs * av_get_bytes_per_sample(outSampleFmt);
// 缓冲区的样本数量（AV_ROUND_UP是向上取整）
int outSamples = av_rescale_rnd(outSampleRate, inSamples, inSampleRate, AV_ROUND_UP);
 
/*
 inSampleRate     inSamples
 ------------- = -----------
 outSampleRate    outSamples
 
 outSamples = outSampleRate * inSamples / inSampleRate
 */
 
// 返回结果
int ret = 0;

4.4、创建重采样上下文

// 创建重采样上下文
SwrContext *ctx = swr_alloc_set_opts(nullptr,
                                     // 输出参数
                                     outChLayout, outSampleFmt, outSampleRate,
                                     // 输入参数
                                     inChLayout, inSampleFmt, inSampleRate,
                                     0, nullptr);
if (!ctx) {
    qDebug() << "swr_alloc_set_opts error";
    goto end;
}

4.5、初始化重采样上下文

// 初始化重采样上下文
int ret = swr_init(ctx);
if (ret < 0) {
    ERROR_BUF(ret);
    qDebug() << "swr_init error:" << errbuf;
    goto end;
}

4.6、创建缓冲区

// 创建输入缓冲区
ret = av_samples_alloc_array_and_samples(
          &inData,
          &inLinesize,
          inChs,
          inSamples,
          inSampleFmt,
          1);
if (ret < 0) {
    ERROR_BUF(ret);
    qDebug() << "av_samples_alloc_array_and_samples error:" << errbuf;
    goto end;
}
 
// 创建输出缓冲区
ret = av_samples_alloc_array_and_samples(
          &outData,
          &outLinesize,
          outChs,
          outSamples,
          outSampleFmt,
          1);
if (ret < 0) {
    ERROR_BUF(ret);
    qDebug() << "av_samples_alloc_array_and_samples error:" << errbuf;
    goto end;
}

4.7、读取文件数据

// 打开文件
if (!inFile.open(QFile::ReadOnly)) {
    qDebug() << "file open error:" << inFilename;
    goto end;
}
if (!outFile.open(QFile::WriteOnly)) {
    qDebug() << "file open error:" << outFilename;
    goto end;
}
 
// 读取文件数据
// inData[0] == *inData
while ((len = inFile.read((char *) inData[0], inLinesize)) > 0) {
    // 读取的样本数量
    inSamples = len / inBytesPerSample;
 
    // 重采样(返回值转换后的样本数量)
    ret = swr_convert(ctx,
                      outData, outSamples,
                      (const uint8_t **) inData, inSamples
                     );
 
    if (ret < 0) {
        ERROR_BUF(ret);
        qDebug() << "swr_convert error:" << errbuf;
        goto end;
    }
 
    // 将转换后的数据写入到输出文件中
    // outData[0] == *outData
    outFile.write((char *) outData[0], ret * outBytesPerSample);
}

4.8、刷新输出缓冲区

// 检查一下输出缓冲区是否还有残留的样本（已经重采样过的，转换过的）
while ((ret = swr_convert(ctx,
                          outData, outSamples,
                          nullptr, 0)) > 0) {
    outFile.write((char *) outData[0], ret * outBytesPerSample);
}

4.9、回收释放资源

end:
    // 释放资源
    // 关闭文件
    inFile.close();
    outFile.close();
 
    // 释放输入缓冲区
    if (inData) {
        av_freep(&inData[0]);
    }
    av_freep(&inData);
 
    // 释放输出缓冲区
    if (outData) {
        av_freep(&outData[0]);
    }
    av_freep(&outData);
 
    // 释放重采样上下文
    swr_free(&ctx);

九、AAC编码

1、查看FFmpeg编解码器

 ffmpeg -codecs | grep aac

输出

ffmpeg version 5.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with Apple clang version 13.1.6 (clang-1316.0.21.2.5)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/5.1 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox
  libavutil      57. 28.100 / 57. 28.100
  libavcodec     59. 37.100 / 59. 37.100
  libavformat    59. 27.100 / 59. 27.100
  libavdevice    59.  7.100 / 59.  7.100
  libavfilter     8. 44.100 /  8. 44.100
  libswscale      6.  7.100 /  6.  7.100
  libswresample   4.  7.100 /  4.  7.100
  libpostproc    56.  6.100 / 56.  6.100
 DEAIL. aac                  AAC (Advanced Audio Coding) (decoders: aac aac_fixed aac_at ) (encoders: aac aac_at )
 D.AIL. aac_latm             AAC LATM (Advanced Audio Coding LATM syntax)

2、通过命令行编码

-vbr

开启VBR模式（Variable Bit Rate，可变比特率）
如果开启了VBR模式，-b:a选项将会被忽略，但-profile:a选项仍然有效

ffmpeg -i out.wav -c:a libfdk_aac -vbr 1 out.aac

3、通过编程编码

3.1、变量定义

// 编码器
AVCodec *codec = nullptr;
// 上下文
AVCodecContext *ctx = nullptr;
 
// 用来存放编码前的数据
AVFrame *frame = nullptr;
// 用来存放编码后的数据
AVPacket *pkt = nullptr;
 
// 返回结果
int ret = 0;
 
// 输入文件
QFile inFile(in.filename);
// 输出文件
QFile outFile(outFilename);

3.2、获取编码器

下面的代码可以获取FFmpeg默认的AAC编码器（并不是libfdk_aac）。

AVCodec *codec1 = avcodec_find_encoder(AV_CODEC_ID_AAC);
 
AVCodec *codec2 = avcodec_find_encoder_by_name("aac");
 
// true
qDebug() << (codec1 == codec2);
 
// aac
qDebug() << codec1->name;

不过我们最终要获取的是libfdk_aac。

// 获取fdk-aac编码器
codec = avcodec_find_encoder_by_name("libfdk_aac");
if (!codec) {
    qDebug() << "encoder libfdk_aac not found";
    return;
}

3.3、检查采样格式

接下来检查编码器是否支持当前的采样格式。

// 检查采样格式
if (!check_sample_fmt(codec, in.sampleFmt)) {
    qDebug() << "Encoder does not support sample format"
             << av_get_sample_fmt_name(in.sampleFmt);
    return;
}

检查函数check_sample_fmt的实现如下所示。

// 检查编码器codec是否支持采样格式sample_fmt
static int check_sample_fmt(const AVCodec *codec,
                            enum AVSampleFormat sample_fmt) {
    const enum AVSampleFormat *p = codec->sample_fmts;
    while (*p != AV_SAMPLE_FMT_NONE) {
        if (*p == sample_fmt) return 1;
        p++;
    }
    return 0;
}

3.4、创建上下文

avcodec_alloc_context3后面的3说明这已经是第3版API，取代了此前的avcodec_alloc_context和avcodec_alloc_context2。

// 创建上下文
ctx = avcodec_alloc_context3(codec);
if (!ctx) {
    qDebug() << "avcodec_alloc_context3 error";
    return;
}
 
// 设置参数
ctx->sample_fmt = in.sampleFmt;
ctx->sample_rate = in.sampleRate;
ctx->channel_layout = in.chLayout;
// 比特率
ctx->bit_rate = 32000;
// 规格
ctx->profile = FF_PROFILE_AAC_HE_V2;

3.5、打开编码器

// 打开编码器
ret = avcodec_open2(ctx, codec, nullptr);
if (ret < 0) {
    ERROR_BUF(ret);
    qDebug() << "avcodec_open2 error" << errbuf;
    goto end;
}

如果是想设置一些libfdk_aac特有的参数（比如vbr），可以通过options参数传递。

AVDictionary *options = nullptr;
av_dict_set(&options, "vbr", "1", 0);
ret = avcodec_open2(ctx, codec, &options);

3.6、创建AVFrame

AVFrame用来存放编码前的数据。

// 创建AVFrame
frame = av_frame_alloc();
if (!frame) {
    qDebug() << "av_frame_alloc error";
    goto end;
}
 
// 样本帧数量（由frame_size决定）
frame->nb_samples = ctx->frame_size;
// 采样格式
frame->format = ctx->sample_fmt;
// 声道布局
frame->channel_layout = ctx->channel_layout;
// 创建AVFrame内部的缓冲区
ret = av_frame_get_buffer(frame, 0);
if (ret < 0) {
    ERROR_BUF(ret);
    qDebug() << "av_frame_get_buffer error" << errbuf;
    goto end;
}

3.7、创建AVPacket

// 创建AVPacket
pkt = av_packet_alloc();
if (!pkt) {
    qDebug() << "av_packet_alloc error";
    goto end;
}

3.8、打开文件

// 打开文件
if (!inFile.open(QFile::ReadOnly)) {
    qDebug() << "file open error" << in.filename;
    goto end;
}
if (!outFile.open(QFile::WriteOnly)) {
    qDebug() << "file open error" << outFilename;
    goto end;
}

3.9、开始编码

// frame->linesize[0]是缓冲区的大小
// 读取文件数据
while ((ret = inFile.read((char *) frame->data[0],
                          frame->linesize[0])) > 0) {
    // 最后一次读取文件数据时，有可能并没有填满frame的缓冲区
    if (ret < frame->linesize[0]) {
        // 声道数
        int chs = av_get_channel_layout_nb_channels(frame->channel_layout);
        // 每个样本的大小
        int bytes = av_get_bytes_per_sample((AVSampleFormat) frame->format);
        // 改为真正有效的样本帧数量
        frame->nb_samples = ret / (chs * bytes);
    }
 
    // 编码
    if (encode(ctx, frame, pkt, outFile) < 0) {
        goto end;
    }
}
 
// flush编码器
encode(ctx, nullptr, pkt, outFile);

encode函数专门用来进行编码，它的实现如下所示。

// 音频编码
// 返回负数：中途出现了错误
// 返回0：编码操作正常完成
static int encode(AVCodecContext *ctx,
                  AVFrame *frame,
                  AVPacket *pkt,
                  QFile &outFile) {
    // 发送数据到编码器
    int ret = avcodec_send_frame(ctx, frame);
    if (ret < 0) {
        ERROR_BUF(ret);
        qDebug() << "avcodec_send_frame error" << errbuf;
        return ret;
    }
 
    while (true) {
        // 从编码器中获取编码后的数据
        ret = avcodec_receive_packet(ctx, pkt);
        // packet中已经没有数据，需要重新发送数据到编码器（send frame）
        if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) {
            return 0;
        } else if (ret < 0) { // 出现了其他错误
            ERROR_BUF(ret);
            qDebug() << "avcodec_receive_packet error" << errbuf;
            return ret;
        }
 
        // 将编码后的数据写入文件
        outFile.write((char *) pkt->data, pkt->size);
 
        // 释放资源
        av_packet_unref(pkt);
    }
 
    return 0;
}

3.10、资源回收

// 关闭文件    
inFile.close();   
 outFile.close();
 // 释放资源
 av_frame_free(&frame);
 av_packet_free(&pkt);
 avcodec_free_context(&ctx);

3.11、函数调用

AudioEncodeSpec in;
in.filename = "F:/in.pcm";
in.sampleRate = 44100;
in.sampleFmt = AV_SAMPLE_FMT_S16;
in.chLayout = AV_CH_LAYOUT_STEREO;
FFmpegs::aacEncode(in, "F:/out.aac");

十、AAC解码

1、命令行解码

ffmpeg -c:a libfdk_aac -i out.aac -f s16le out.pcm

2、编码进行解码

2.1、函数声明

// 解码后的PCM参数
typedef struct {
    const char *filename;
    int sampleRate;
    AVSampleFormat sampleFmt;
    int chLayout;
} AudioDecodeSpec;
 
class FFmpegs {
public:
    FFmpegs();
 
    static void aacDecode(const char *inFilename,
                          AudioDecodeSpec &out);
};

2.2、函数实现

// 解码后的PCM参数
typedef struct {
    const char *filename;
    int sampleRate;
    AVSampleFormat sampleFmt;
    int chLayout;
} AudioDecodeSpec;
 
class FFmpegs {
public:
    FFmpegs();
 
    static void aacDecode(const char *inFilename,
                          AudioDecodeSpec &out);
};

2.3、获取解码器

// 获取解码器
codec = avcodec_find_decoder_by_name("libfdk_aac");
if (!codec) {
    qDebug() << "decoder libfdk_aac not found";
    return;
}

2.4、初始化解析器上下文

// 初始化解析器上下文
parserCtx = av_parser_init(codec->id);
if (!parserCtx) {
    qDebug() << "av_parser_init error";
    return;
}

2.5、创建上下文

// 创建上下文
ctx = avcodec_alloc_context3(codec);
if (!ctx) {
    qDebug() << "avcodec_alloc_context3 error";
    goto end;
}

2.6、创建AVPacket

// 创建AVPacket
pkt = av_packet_alloc();
if (!pkt) {
    qDebug() << "av_packet_alloc error";
    goto end;
}

2.7、创建AVFrame

// 创建AVFrame
frame = av_frame_alloc();
if (!frame) {
    qDebug() << "av_frame_alloc error";
    goto end;
}

2.8、打开解码器

// 打开解码器
ret = avcodec_open2(ctx, codec, nullptr);
if (ret < 0) {
    ERROR_BUF(ret);
    qDebug() << "avcodec_open2 error" << errbuf;
    goto end;
}

2.9、打开文件

// 打开文件
if (!inFile.open(QFile::ReadOnly)) {
    qDebug() << "file open error:" << inFilename;
    goto end;
}
if (!outFile.open(QFile::WriteOnly)) {
    qDebug() << "file open error:" << out.filename;
    goto end;
}

2.10、解码

// 打开文件
if (!inFile.open(QFile::ReadOnly)) {
    qDebug() << "file open error:" << inFilename;
    goto end;
}
if (!outFile.open(QFile::WriteOnly)) {
    qDebug() << "file open error:" << out.filename;
    goto end;
}

具体的解码操作在decode函数中。

static int decode(AVCodecContext *ctx,
                  AVPacket *pkt,
                  AVFrame *frame,
                  QFile &outFile) {
    // 发送压缩数据到解码器
    int ret = avcodec_send_packet(ctx, pkt);
    if (ret < 0) {
        ERROR_BUF(ret);
        qDebug() << "avcodec_send_packet error" << errbuf;
        return ret;
    }
 
    while (true) {
        // 获取解码后的数据
        ret = avcodec_receive_frame(ctx, frame);
        if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) {
            return 0;
        } else if (ret < 0) {
            ERROR_BUF(ret);
            qDebug() << "avcodec_receive_frame error" << errbuf;
            return ret;
        }
        // 将解码后的数据写入文件
        outFile.write((char *) frame->data[0], frame->linesize[0]);
    }
}

2.11、设置输出参数

// 设置输出参数
out.sampleRate = ctx->sample_rate;
out.sampleFmt = ctx->sample_fmt;
out.chLayout = ctx->channel_layout;

2.12、释放资源

end:    
inFile.close();    
outFile.close();
av_frame_free(&frame);
av_packet_free(&pkt);
av_parser_close(parserCtx);
avcodec_free_context(&ctx);

2.13、函数调用

AudioDecodeSpec out;
out.filename = "F:/out.pcm";
FFmpegs::aacDecode("F:/in.aac", out);
// 44100
qDebug() << out.sampleRate;
// s16
qDebug() << av_get_sample_fmt_name(out.sampleFmt);
// 2
qDebug() << av_get_channel_layout_nb_channels(out.chLayout);

备注：以上是使用FFmpeg音频录制、播放、编码和解码相关介绍。

十一、lame代码

1、音频转换-caf转为mp3

#pragma mark - 转换caf为mp3
//转换caf为mp3
-(void)transformCafToMP3{
    //在录制caf文件时，需要使用双通道，否则在转换为MP3格式时，声音不对。caf录制端的设置为：
    NSMutableDictionary * recordSetting = [NSMutableDictionary dictionary];
    [recordSetting setValue :[NSNumber numberWithInt:kAudioFormatLinearPCM] forKey:AVFormatIDKey];//
    [recordSetting setValue:[NSNumber numberWithFloat:8000.0] forKey:AVSampleRateKey];//采样率
    [recordSetting setValue:[NSNumber numberWithInt:2] forKey:AVNumberOfChannelsKey];//声音通道，这里必须为双通道
    [recordSetting setValue :[NSNumber numberWithInt:16] forKey: AVLinearPCMBitDepthKey];//线性采样位数
    [recordSetting setValue:[NSNumber numberWithInt:AVAudioQualityMin] forKey:AVEncoderAudioQualityKey];//音频质量
    
    //在转换mp3端的代码为:
    NSString *urlStr = [NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES) lastObject];
    
    cafUrlStr = [urlStr stringByAppendingPathComponent:kRecordAudioCafFile];//caf文件路径
    mp3UrlStr = [urlStr stringByAppendingPathComponent:kRecordAudioMP3file];//存储mp3文件的路径
    
    @try {
        int read, write;
        
        FILE *pcm = fopen([cafUrlStr cStringUsingEncoding:1], "rb");  //source 被转换的音频文件位置
        fseek(pcm, 4*1024, SEEK_CUR);                                   //skip file header
        FILE *mp3 = fopen([mp3UrlStr cStringUsingEncoding:1], "wb");  //output 输出生成的Mp3文件位置
        
        const int PCM_SIZE = 8192;
        const int MP3_SIZE = 8192;
        short int pcm_buffer[PCM_SIZE*2];
        unsigned char mp3_buffer[MP3_SIZE];
        
        lame_t lame = lame_init();
        lame_set_in_samplerate(lame, 8000.0);
        lame_set_VBR(lame, vbr_default);
        lame_init_params(lame);
        
        do {
            read = (int)fread(pcm_buffer, 2*sizeof(short int), PCM_SIZE, pcm);
            if (read == 0)
                write = lame_encode_flush(lame, mp3_buffer, MP3_SIZE);
            else
                write = lame_encode_buffer_interleaved(lame, pcm_buffer, read, mp3_buffer, MP3_SIZE);
            
            fwrite(mp3_buffer, write, 1, mp3);
            
        } while (read != 0);
        
        lame_close(lame);
        fclose(mp3);
        fclose(pcm);
    }
    @catch (NSException *exception) {
        NSLog(@"%@",[exception description]);
    }
    @finally {
        
    }
}

注意：本文只用于个人记录和学习，原文请参考：秒懂音视频开发
源码下载请参考：CoderMJLee/audio-video-dev-tutorial