ffmpeg跨平台录音技术详解

  • ffmpeg开发的基本知识
  • linux下的录音
  • windows下的录音
  • 示例代码
  • 遇到的一些工程问题
    • 如果想要编码为MP3文件保存,需要怎么做?
    • 为什么编译的时候老是显示对xxxx函数未定义的引用?

ffmpeg开发的基本知识

这部分请参考我的另一篇博文:https://blog.csdn.net/FakeTaoZero/article/details/104001059

linux下的录音

linux平台下的录音目前主要使用的比较多的是alsa和pulseaudio两种,网上的录音教程多是使用alsa的,其实pulseaudio也是可以的,这部分的内容最近才开始接触,特意了解了一下,可能不一定对,下面简单说一下两者的关系, 其实这个问题最初要从linux的音频驱动体系说起:

  • linux早期的音频驱动是OSS,OSS音频驱动最大的问题是声卡独占问题,即一个应用程序如果要播放/录音,则另一个程序要么无法播放/录音,要么会打断之前程序的功能,因此随着linux的发展,alsa这一新的驱动技术诞生了
  • alsa技术理论上是能解决声卡独占问题的,因为它引入了一些逻辑设备的概念,能达到自动混音等功能,但是它太原始了,而且alsa里面的单个设备也是独占的,只不过你可以新建逻辑设备,通过逻辑设备引用物理设备的方式来间接解决问题

而pulseaudio是一种音频服务器,因为声卡的抢占问题而诞生的一种解决方案,其思想是通过pulseaudio来控制声卡,其他应用程序作为client连接到pulseaudio服务器上,引入一层间接性, 从而将音频混音、录音等功能交给pulseaudio来解决,例如linux下的网易云音乐的音频播放功能就是通过pulseaudio来播放的,这一点可以通过播放音乐时杀死pulseaudio来验证,会发现杀死之后就没有声音了。 但是pulseaudio在一些服务器系统上并不是默认安装/启动的,在使用基于pulseaudio开发的应用程序时必须要保证pulseaudio程序已启动。不过linux下的很多桌面环境都是使用的pulseaudio

在ffmpeg中,alsa和pulseaudio都支持,但是需要在编译时将相关的库编译进去,不能可能会在devices里面可能找不到alsa/pulse项,那么怎么确认编译时有没有编译进去呢? 在ffmpeg编译之前需要执行./configure脚本,这个脚本会自动检测依赖库的存在,检测完成之后会打印本次编译时的一个功能组件情况,这时可以到demuxer/device项下面看alsa/pulse项是否存在。

了解过ffmpeg开发基本知识的话就知道,AVInputFormat结构体是输入格式的结构体,可以通过av_find_input_format函数设置,在linux录音中,该项参数设置为"alsa"/“pulse”,而相应的声卡名称则是通过avformat_open_input进行设置。

在这里,声卡的名称获取有几种方式,对于alsa来说,可以使用arecord工具,对于pulseaudio来说,可以使用pacmd工具来查看,下面进行举例说明:

tao@tao-PC:~$ arecord -l
**** CAPTURE 硬體裝置清單 ****
card 0: PCH [HDA Intel PCH], device 0: ALC887-VD Analog [ALC887-VD Analog]
  子设备: 1/1
  子设备 #0: subdevice #0
card 0: PCH [HDA Intel PCH], device 2: ALC887-VD Alt Analog [ALC887-VD Alt Analog]
  子设备: 1/1
  子设备 #0: subdevice #0

在这里,列出了两个设备,我们可以看到对应项为:card {数字1}: xxxx, device {数字2}: xxxx,
这里我们重点关注这两个数字,我们选择哪个设备,则alsa下对应的声卡名称为:"hw:{数字1},{数字2}“

tao@tao-PC:~$ pacmd list-sources
2 source(s) available.
  * index: 0
  name: 
  driver: 
  flags: DECIBEL_VOLUME LATENCY DYNAMIC_LATENCY
  state: SUSPENDED
  suspend cause: IDLE 
  priority: 1030
  volume: front-left: 65536 / 100% / 0.00 dB,   front-right: 65536 / 100% / 0.00 dB
          balance 0.00
  base volume: 65536 / 100% / 0.00 dB
  volume steps: 65537
  muted: no
  current latency: 0.00 ms
  max rewind: 0 KiB
  sample spec: s16le 2ch 48000Hz
  channel map: front-left,front-right
               立体声
  used by: 0
  linked by: 0
  configured latency: 0.00 ms; range is 0.50 .. 341.33 ms
  monitor_of: 0
  card: 0 
  module: 7
  properties:
    device.description = "Monitor of 内置音频 数字立体声(IEC958)"
    device.class = "monitor"
    alsa.card = "0"
    alsa.card_name = "HDA Intel PCH"
    alsa.long_card_name = "HDA Intel PCH at 0xf7d00000 irq 27"
    alsa.driver_name = "snd_hda_intel"
    device.bus_path = "pci-0000:00:1b.0"
    sysfs.path = "/devices/pci0000:00/0000:00:1b.0/sound/card0"
    device.bus = "pci"
    device.vendor.id = "8086"
    device.vendor.name = "Intel Corporation"
    device.product.id = "8c20"
    device.product.name = "8 Series/C220 Series Chipset High Definition Audio Controller"
    device.form_factor = "internal"
    device.string = "0"
    module-udev-detect.discovered = "1"
    device.icon_name = "audio-card-pci"


    index: 1
  name: 
  driver: 
  flags: HARDWARE HW_MUTE_CTRL HW_VOLUME_CTRL DECIBEL_VOLUME LATENCY DYNAMIC_LATENCY
  state: SUSPENDED
  suspend cause: IDLE 
  priority: 9039
  volume: front-left: 9996 /  15% / -49.00 dB,   front-right: 9996 /  15% / -49.00 dB
          balance 0.00
  base volume: 6554 /  10% / -60.00 dB
  volume steps: 65537
  muted: no
  current latency: 0.00 ms
  max rewind: 0 KiB
  sample spec: s16le 2ch 48000Hz
  channel map: front-left,front-right
               立体声
  used by: 0
  linked by: 0
  configured latency: 0.00 ms; range is 0.50 .. 341.33 ms
  card: 0 
  module: 7
  properties:
    alsa.resolution_bits = "16"
    device.api = "alsa"
    device.class = "sound"
    alsa.class = "generic"
    alsa.subclass = "generic-mix"
    alsa.name = "ALC887-VD Analog"
    alsa.id = "ALC887-VD Analog"
    alsa.subdevice = "0"
    alsa.subdevice_name = "subdevice #0"
    alsa.device = "0"
    alsa.card = "0"
    alsa.card_name = "HDA Intel PCH"
    alsa.long_card_name = "HDA Intel PCH at 0xf7d00000 irq 27"
    alsa.driver_name = "snd_hda_intel"
    device.bus_path = "pci-0000:00:1b.0"
    sysfs.path = "/devices/pci0000:00/0000:00:1b.0/sound/card0"
    device.bus = "pci"
    device.vendor.id = "8086"
    device.vendor.name = "Intel Corporation"
    device.product.id = "8c20"
    device.product.name = "8 Series/C220 Series Chipset High Definition Audio Controller"
    device.form_factor = "internal"
    device.string = "front:0"
    device.buffering.buffer_size = "65536"
    device.buffering.fragment_size = "32768"
    device.access_mode = "mmap+timer"
    device.profile.name = "analog-stereo"
    device.profile.description = "模拟立体声"
    device.description = "内置音频 模拟立体声"
    alsa.mixer_name = "Realtek ALC887-VD"
    alsa.components = "HDA:10ec0887,10438576,00100302"
    module-udev-detect.discovered = "1"
    device.icon_name = "audio-card-pci"
  ports:
    analog-input-front-mic: 前麦克风 (priority 8500, latency offset 0 usec, available: no)
      properties:
        device.icon_name = "audio-input-microphone"
    analog-input-rear-mic: 后麦克风 (priority 8200, latency offset 0 usec, available: no)
      properties:
        device.icon_name = "audio-input-microphone"
    analog-input-linein: 输入插孔 (priority 8100, latency offset 0 usec, available: no)
      properties:

  active port: 

对于pulse,我们只关注两个index的数字值,需要使用哪个,就传入哪个

有了声卡名称,alsa和pulse还有着不同的AVOptions来设置采集的格式,其对应的AVOptions可以通过ffmpeg -h demuxer=alsa/pulse来查看,下面列出了alsa和pulse对应的参数:

Demuxer alsa [ALSA audio input]:
ALSA demuxer AVOptions:
  -sample_rate               .D......  (from 1 to INT_MAX) (default 48000)
  -channels                  .D......  (from 1 to INT_MAX) (default 2)

Demuxer pulse [Pulse audio input]:
Pulse demuxer AVOptions:
  -server                 .D...... set PulseAudio server
  -name                   .D...... set application name (default "Lavf57.83.100")
  -stream_name            .D...... set stream description (default "record")
  -sample_rate               .D...... set sample rate in Hz (from 1 to INT_MAX) (default 48000)
  -channels                  .D...... set number of audio channels (from 1 to INT_MAX) (default 2)
  -frame_size                .D...... set number of bytes per frame (from 1 to INT_MAX) (default 1024)
  -fragment_size             .D...... set buffering size, affects latency and cpu usage (from -1 to INT_MAX) (default -1)
  -wallclock                 .D...... set the initial pts using the current time (from -1 to 1) (default 1)

通过上面,我们知道了有哪些参数项,但是对于alsa来说,参数的取值也是需要注意的,因为不一定默认参数就是可用的,因为alsa不支持自动重采样(pulse支持),alsa两个参数的取值项可以通过如下命令来查看:

tao@tao-PC:~$ arecord -D hw:0,0 --dump-hw-params
正在录音 WAVE 'stdin' : Unsigned 8 bit, 频率8000Hz, Mono
HW Params of device "hw:0,0":
--------------------
ACCESS:  MMAP_INTERLEAVED RW_INTERLEAVED
FORMAT:  S16_LE S32_LE
SUBFORMAT:  STD
SAMPLE_BITS: [16 32]
FRAME_BITS: [32 64]
CHANNELS: 2
RATE: [44100 192000]
PERIOD_TIME: (83 185760)
PERIOD_SIZE: [16 8192]
PERIOD_BYTES: [128 65536]
PERIODS: [2 32]
BUFFER_TIME: (166 371520)
BUFFER_SIZE: [32 16384]
BUFFER_BYTES: [128 65536]
TICK_TIME: ALL
--------------------
arecord: set_params:1299: 样本格式不可用
Available formats:
- S16_LE
- S32_LE

在这里,通过指定声卡名称,我们可以获取到指定声卡的硬件参数信息,在这里我们可以看到channels通道数和rate即sample_rate采样率,
可以看到采样率的范围是一个区间,但是这并不意味着区间内的所有值都是有效值,根据个人经历来看,一般只有几个特定的值才可以,
例如44100/48000/192000,其他的值将会变成最接近上面三个值中的一个,这其中的验证方法可以通过ffmpeg命令行来试验一下,如下:
ffmpeg -f alsa -sample_rate 采样率 -i hw:0,0 -t 10 test.wav -v trace
查看命令行的输出结果,可以看到最终可用的采样率。

windows下的录音

目前windows的录音使用较广且简单的是WAVEIN/OUT,在其之上还有dshow,但是dshow的api是比较难的,ffmpeg用的是dshow,和linux录音类似,其AVInputFormat使用av_find_input_format(“dshow”)进行设置,

dshow的AVOptions如下:

Demuxer dshow [DirectShow capture]:
dshow indev AVOptions:
  -video_size         .D....... set video size given a string such as 640x480 or hd720.
  -pixel_format          .D....... set video pixel format (default none)
  -framerate              .D....... set video frame rate
  -sample_rate               .D....... set audio sample rate (from 0 to INT_MAX) (default 0)
  -sample_size               .D....... set audio sample size (from 0 to 16) (default 0)
  -channels                  .D....... set number of audio channels, such as 1 or 2 (from 0 to INT_MAX) (default 0)
  -audio_buffer_size         .D....... set audio device buffer latency size in milliseconds (default is the device's default) (from 0 to INT_MAX) (default 0)
  -list_devices          .D....... list available devices (default false)
  -list_options          .D....... list available options for specified device (default false)
  -video_device_number         .D....... set video device number for devices with same name (starts at 0) (from 0 to INT_MAX) (default 0)
  -audio_device_number         .D....... set audio device number for devices with same name (starts at 0) (from 0 to INT_MAX) (default 0)
  -crossbar_video_input_pin_number         .D....... set video input pin number for crossbar device (from -1 to INT_MAX) (default -1)
  -crossbar_audio_input_pin_number         .D....... set audio input pin number for crossbar device (from -1 to INT_MAX) (default -1)
  -show_video_device_dialog     .D....... display property dialog for video capture device (default false)
  -show_audio_device_dialog     .D....... display property dialog for audio capture device (default false)
  -show_video_crossbar_connection_dialog     .D....... display property dialog for crossbar connecting pins filter on video device (default false)
  -show_audio_crossbar_connection_dialog     .D....... display property dialog for crossbar connecting pins filter on audio device (default false)
  -show_analog_tv_tuner_dialog     .D....... display property dialog for analog tuner filter (default false)
  -show_analog_tv_tuner_audio_dialog     .D....... display property dialog for analog tuner audio filter (default false)
  -audio_device_load      .D....... load audio capture filter device (and properties) from file
  -audio_device_save      .D....... save audio capture filter device (and properties) to file
  -video_device_load      .D....... load video capture filter device (and properties) from file
  -video_device_save      .D....... save video capture filter device (and properties) to file

其相应的声卡名称通过如下命令行命令查看:

PS C:\Users\Tao> ffmpeg -list_devices true -f dshow -i dummy                                                                                                                                                      ffmpeg version N-94652-g808a6717e0 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 9.1.1 (GCC) 20190807
  configuration: --disable-static --enable-shared --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt
  libavutil      56. 33.100 / 56. 33.100
  libavcodec     58. 55.101 / 58. 55.101
  libavformat    58. 31.104 / 58. 31.104
  libavdevice    58.  9.100 / 58.  9.100
  libavfilter     7. 58.101 /  7. 58.101
  libswscale      5.  6.100 /  5.  6.100
  libswresample   3.  6.100 /  3.  6.100
  libpostproc    55.  6.100 / 55.  6.100
[dshow @ 000001a0e0fc02c0] DirectShow video devices (some may be both video and audio devices)
[dshow @ 000001a0e0fc02c0] Could not enumerate video devices (or none found).
[dshow @ 000001a0e0fc02c0] DirectShow audio devices
[dshow @ 000001a0e0fc02c0]  "麦克风阵列 (Realtek High Definition Audio)"
[dshow @ 000001a0e0fc02c0]     Alternative name "@device_cm_{33D9A762-90C8-11D0-BD43-00A0C911CE86}\wave_{28679435-B3DB-4479-90B4-8EBC4EB7C18E}"
dummy: Immediate exit requested

注意,如果windows命令运行看到中文乱码,可以先执行chcp 65001将终端的代码页转换为utf8,然后重新执行命令即可。
这里,我们可以看到声卡名称为"麦克风阵列 (Realtek High Definition Audio)",但是,实际上我们传入参数的时候需要传入u8"audio=麦克风阵列 (Realtek High Definition Audio)"才行

示例代码

下面的代码是linux下录音的一个简单的例子,并实现了44.1KHz双通道的原始音频到16Khz单通道的转换,最终保存为WAV文件,如果要保存为其他类型的文件,需要更改编码器, 如果要在windows下使用下列代码,只需要将devname和alsa两个参数改成windows下对应的参数便能在windows下实现录音功能了。

#include 
#include 
extern "C" {
#include 
#include 
#include 
#include 
#include 
}

#include 

void call_back(void* avcl, int level, const char* fmt, va_list vl)
{
    vfprintf(stdout, fmt, vl);
}

int main()
{
    av_log_set_level(AV_LOG_TRACE);
    av_log_set_callback(call_back);
    avdevice_register_all();
    avcodec_register_all();
    av_register_all();

    AVInputFormat* in_fmt = nullptr;
    AVFormatContext* in_ctx = nullptr;
    AVFormatContext* out_ctx = nullptr;
    AVCodecContext* in_decode_ctx = nullptr;
    AVCodecContext* out_encode_ctx = nullptr;
    SwrContext* conv_ctx = nullptr;
    AVDictionary* opt_dict = nullptr;
    std::string devname = u8"hw:0,0";   // 打开的设备名
    std::string filename = "test.wav";  // 要保存的文件名称
    int stream_idx = -1;

    // format和参数设置
    {
        in_fmt = av_find_input_format("alsa");
        in_ctx = avformat_alloc_context();

        av_dict_set(&opt_dict, "sample_rate", "44100", 0);
        av_dict_set(&opt_dict, "sample_size", "16", 0);
        av_dict_set(&opt_dict, "channels", "2", 0);
    }

    // 打开设备
    {
        avformat_open_input(&in_ctx, devname.c_str(), in_fmt, &opt_dict);
        avformat_find_stream_info(in_ctx, nullptr);
    }

    // 查找流信息
    {
        for (size_t i = 0; i < in_ctx->nb_streams; i++)
        {
            if (in_ctx->streams[i]->codecpar->codec_type == AVMediaType::AVMEDIA_TYPE_AUDIO)
            {
                stream_idx = i;
                break;
            }
        }
        av_dump_format(in_ctx, stream_idx, devname.c_str(), 0);
    }

    // 解码器设置
    {
        AVCodec* codec = avcodec_find_decoder(in_ctx->streams[stream_idx]->codecpar->codec_id);
        in_decode_ctx = avcodec_alloc_context3(codec);
        avcodec_parameters_to_context(in_decode_ctx, in_ctx->streams[stream_idx]->codecpar);
        in_decode_ctx->channel_layout = av_get_default_channel_layout(2);
        avcodec_open2(in_decode_ctx, codec, nullptr);
    }

    // 编码器设置
    {
        AVCodec* codec = avcodec_find_encoder(AVCodecID::AV_CODEC_ID_PCM_S16LE);
        out_encode_ctx = avcodec_alloc_context3(codec);
        out_encode_ctx->sample_rate = 16000;
        out_encode_ctx->channels = 1;
        out_encode_ctx->channel_layout = av_get_channel_layout("mono");
        out_encode_ctx->sample_fmt = AVSampleFormat::AV_SAMPLE_FMT_S16;
        out_encode_ctx->time_base = AVRational{ 1, 16000 };
        avcodec_open2(out_encode_ctx, codec, nullptr);
    }

    // 输出上下设置
    {
        AVCodec* codec = avcodec_find_encoder(AVCodecID::AV_CODEC_ID_PCM_S16LE);
        avformat_alloc_output_context2(&out_ctx, nullptr, nullptr, filename.c_str());
        AVStream* out_stream = avformat_new_stream(out_ctx, codec);
        // 将编码器和输出上下文进行关联
        avcodec_parameters_from_context(out_stream->codecpar, out_encode_ctx);
        out_stream->time_base = AVRational{ 1, 16000 };
        av_dump_format(out_ctx, 0, filename.c_str(), 1);
        if (!(out_ctx->flags & AVFMT_NOFILE))
        {
            avio_open2(&out_ctx->pb, filename.c_str(), AVIO_FLAG_WRITE, nullptr, nullptr);
        }
    }

    // 写入文件头
    {
        avformat_write_header(out_ctx, nullptr);
    }

    // 重采样上下文设置
    {
        conv_ctx = swr_alloc_set_opts(
            nullptr,
            AV_CH_LAYOUT_MONO,
            AVSampleFormat::AV_SAMPLE_FMT_S16,
            16000,
            AV_CH_LAYOUT_STEREO,
            AVSampleFormat::AV_SAMPLE_FMT_S16,
            44100,
            0,
            nullptr);
        swr_init(conv_ctx);
    }

    AVPacket* in_pkt = nullptr;
    AVPacket* out_pkt = nullptr;
    AVFrame* raw_frm = nullptr;
    AVFrame* conv_frm = nullptr;

    {
        in_pkt = av_packet_alloc();
        out_pkt = av_packet_alloc();
        raw_frm = av_frame_alloc();
        conv_frm = av_frame_alloc();
    }

    {
        int ret = 0;
        auto start_time = std::chrono::system_clock::now();
        while (av_read_frame(in_ctx, in_pkt) == 0)
        {
            // 解码
            ret = avcodec_send_packet(in_decode_ctx, in_pkt);
            ret = avcodec_receive_frame(in_decode_ctx, raw_frm);
            // 初始化重采样之后的数据包
            {
                conv_frm->pts = raw_frm->pts;
                conv_frm->sample_rate = 16000;
                conv_frm->channels = 1;
                conv_frm->channel_layout = av_get_channel_layout("mono");
                conv_frm->format = AVSampleFormat::AV_SAMPLE_FMT_S16;
            }
            // 重采样
            ret = swr_convert_frame(conv_ctx, conv_frm, raw_frm);
            // 重采样后的数据进行编码
            ret = avcodec_send_frame(out_encode_ctx, conv_frm);
            ret = avcodec_receive_packet(out_encode_ctx, out_pkt);
            if (ret == 0)
            {
                // 写入编码后的数据
                ret = av_write_frame(out_ctx, out_pkt);
            }
            av_packet_unref(in_pkt);
            av_packet_unref(out_pkt);
            av_frame_unref(raw_frm);
            av_frame_unref(conv_frm);
            auto end_time = std::chrono::system_clock::now();
            // 录制20s后退出
            if (std::chrono::duration_cast(end_time - start_time).count() > 20)
            {
                break;
            }
        }

        {
            av_write_trailer(out_ctx);
        }

        {
            swr_free(&conv_ctx);
        }

        {
            av_packet_free(&in_pkt);
            av_packet_free(&out_pkt);
            av_frame_free(&raw_frm);
            av_frame_free(&conv_frm);
        }

        {
            avcodec_free_context(&in_decode_ctx);
            avcodec_free_context(&out_encode_ctx);
        }

        {
            avformat_free_context(in_ctx);
            avformat_free_context(out_ctx);
        }
    }
}

遇到的一些工程问题

如果想要编码为MP3文件保存,需要怎么做?

  • 首先用ffmpeg -encoders查看MP3对应的编码器,可以看到编码器名称为libmp3lame
  • 使用ffmpeg -h encoder=libmp3lame查看编码器支持的rate和format以及layout,并设置相应参数
  • 如果原始音频的参数和encoder的参数不一致,则需要使用Swrcontext进行转换,然后送入编码器
  • 编码器可能报错frame_size和samples数目不相同,这个时候需要引入fifo队列,具体实现可以参考这个例子:https://github.com/FFmpeg/FFmpeg/blob/master/doc/examples/transcode_aac.c

为什么编译的时候老是显示对xxxx函数未定义的引用?

这是因为库的链接顺序的不同导致的,ffmpeg库之间有一定的依赖关系,需要调整链接时的链接顺序就能解决。

你可能感兴趣的:(ffmpeg,C++)