这部分请参考我的另一篇博文:https://blog.csdn.net/FakeTaoZero/article/details/104001059
linux平台下的录音目前主要使用的比较多的是alsa和pulseaudio两种,网上的录音教程多是使用alsa的,其实pulseaudio也是可以的,这部分的内容最近才开始接触,特意了解了一下,可能不一定对,下面简单说一下两者的关系, 其实这个问题最初要从linux的音频驱动体系说起:
而pulseaudio是一种音频服务器,因为声卡的抢占问题而诞生的一种解决方案,其思想是通过pulseaudio来控制声卡,其他应用程序作为client连接到pulseaudio服务器上,引入一层间接性, 从而将音频混音、录音等功能交给pulseaudio来解决,例如linux下的网易云音乐的音频播放功能就是通过pulseaudio来播放的,这一点可以通过播放音乐时杀死pulseaudio来验证,会发现杀死之后就没有声音了。 但是pulseaudio在一些服务器系统上并不是默认安装/启动的,在使用基于pulseaudio开发的应用程序时必须要保证pulseaudio程序已启动。不过linux下的很多桌面环境都是使用的pulseaudio
在ffmpeg中,alsa和pulseaudio都支持,但是需要在编译时将相关的库编译进去,不能可能会在devices里面可能找不到alsa/pulse项,那么怎么确认编译时有没有编译进去呢? 在ffmpeg编译之前需要执行./configure脚本,这个脚本会自动检测依赖库的存在,检测完成之后会打印本次编译时的一个功能组件情况,这时可以到demuxer/device项下面看alsa/pulse项是否存在。
了解过ffmpeg开发基本知识的话就知道,AVInputFormat结构体是输入格式的结构体,可以通过av_find_input_format函数设置,在linux录音中,该项参数设置为"alsa"/“pulse”,而相应的声卡名称则是通过avformat_open_input进行设置。
在这里,声卡的名称获取有几种方式,对于alsa来说,可以使用arecord工具,对于pulseaudio来说,可以使用pacmd工具来查看,下面进行举例说明:
tao@tao-PC:~$ arecord -l
**** CAPTURE 硬體裝置清單 ****
card 0: PCH [HDA Intel PCH], device 0: ALC887-VD Analog [ALC887-VD Analog]
子设备: 1/1
子设备 #0: subdevice #0
card 0: PCH [HDA Intel PCH], device 2: ALC887-VD Alt Analog [ALC887-VD Alt Analog]
子设备: 1/1
子设备 #0: subdevice #0
在这里,列出了两个设备,我们可以看到对应项为:card {数字1}: xxxx, device {数字2}: xxxx,
这里我们重点关注这两个数字,我们选择哪个设备,则alsa下对应的声卡名称为:"hw:{数字1},{数字2}“
tao@tao-PC:~$ pacmd list-sources
2 source(s) available.
* index: 0
name:
driver:
flags: DECIBEL_VOLUME LATENCY DYNAMIC_LATENCY
state: SUSPENDED
suspend cause: IDLE
priority: 1030
volume: front-left: 65536 / 100% / 0.00 dB, front-right: 65536 / 100% / 0.00 dB
balance 0.00
base volume: 65536 / 100% / 0.00 dB
volume steps: 65537
muted: no
current latency: 0.00 ms
max rewind: 0 KiB
sample spec: s16le 2ch 48000Hz
channel map: front-left,front-right
立体声
used by: 0
linked by: 0
configured latency: 0.00 ms; range is 0.50 .. 341.33 ms
monitor_of: 0
card: 0
module: 7
properties:
device.description = "Monitor of 内置音频 数字立体声(IEC958)"
device.class = "monitor"
alsa.card = "0"
alsa.card_name = "HDA Intel PCH"
alsa.long_card_name = "HDA Intel PCH at 0xf7d00000 irq 27"
alsa.driver_name = "snd_hda_intel"
device.bus_path = "pci-0000:00:1b.0"
sysfs.path = "/devices/pci0000:00/0000:00:1b.0/sound/card0"
device.bus = "pci"
device.vendor.id = "8086"
device.vendor.name = "Intel Corporation"
device.product.id = "8c20"
device.product.name = "8 Series/C220 Series Chipset High Definition Audio Controller"
device.form_factor = "internal"
device.string = "0"
module-udev-detect.discovered = "1"
device.icon_name = "audio-card-pci"
index: 1
name:
driver:
flags: HARDWARE HW_MUTE_CTRL HW_VOLUME_CTRL DECIBEL_VOLUME LATENCY DYNAMIC_LATENCY
state: SUSPENDED
suspend cause: IDLE
priority: 9039
volume: front-left: 9996 / 15% / -49.00 dB, front-right: 9996 / 15% / -49.00 dB
balance 0.00
base volume: 6554 / 10% / -60.00 dB
volume steps: 65537
muted: no
current latency: 0.00 ms
max rewind: 0 KiB
sample spec: s16le 2ch 48000Hz
channel map: front-left,front-right
立体声
used by: 0
linked by: 0
configured latency: 0.00 ms; range is 0.50 .. 341.33 ms
card: 0
module: 7
properties:
alsa.resolution_bits = "16"
device.api = "alsa"
device.class = "sound"
alsa.class = "generic"
alsa.subclass = "generic-mix"
alsa.name = "ALC887-VD Analog"
alsa.id = "ALC887-VD Analog"
alsa.subdevice = "0"
alsa.subdevice_name = "subdevice #0"
alsa.device = "0"
alsa.card = "0"
alsa.card_name = "HDA Intel PCH"
alsa.long_card_name = "HDA Intel PCH at 0xf7d00000 irq 27"
alsa.driver_name = "snd_hda_intel"
device.bus_path = "pci-0000:00:1b.0"
sysfs.path = "/devices/pci0000:00/0000:00:1b.0/sound/card0"
device.bus = "pci"
device.vendor.id = "8086"
device.vendor.name = "Intel Corporation"
device.product.id = "8c20"
device.product.name = "8 Series/C220 Series Chipset High Definition Audio Controller"
device.form_factor = "internal"
device.string = "front:0"
device.buffering.buffer_size = "65536"
device.buffering.fragment_size = "32768"
device.access_mode = "mmap+timer"
device.profile.name = "analog-stereo"
device.profile.description = "模拟立体声"
device.description = "内置音频 模拟立体声"
alsa.mixer_name = "Realtek ALC887-VD"
alsa.components = "HDA:10ec0887,10438576,00100302"
module-udev-detect.discovered = "1"
device.icon_name = "audio-card-pci"
ports:
analog-input-front-mic: 前麦克风 (priority 8500, latency offset 0 usec, available: no)
properties:
device.icon_name = "audio-input-microphone"
analog-input-rear-mic: 后麦克风 (priority 8200, latency offset 0 usec, available: no)
properties:
device.icon_name = "audio-input-microphone"
analog-input-linein: 输入插孔 (priority 8100, latency offset 0 usec, available: no)
properties:
active port:
对于pulse,我们只关注两个index的数字值,需要使用哪个,就传入哪个
有了声卡名称,alsa和pulse还有着不同的AVOptions来设置采集的格式,其对应的AVOptions可以通过ffmpeg -h demuxer=alsa/pulse来查看,下面列出了alsa和pulse对应的参数:
Demuxer alsa [ALSA audio input]:
ALSA demuxer AVOptions:
-sample_rate .D...... (from 1 to INT_MAX) (default 48000)
-channels .D...... (from 1 to INT_MAX) (default 2)
Demuxer pulse [Pulse audio input]:
Pulse demuxer AVOptions:
-server .D...... set PulseAudio server
-name .D...... set application name (default "Lavf57.83.100")
-stream_name .D...... set stream description (default "record")
-sample_rate .D...... set sample rate in Hz (from 1 to INT_MAX) (default 48000)
-channels .D...... set number of audio channels (from 1 to INT_MAX) (default 2)
-frame_size .D...... set number of bytes per frame (from 1 to INT_MAX) (default 1024)
-fragment_size .D...... set buffering size, affects latency and cpu usage (from -1 to INT_MAX) (default -1)
-wallclock .D...... set the initial pts using the current time (from -1 to 1) (default 1)
通过上面,我们知道了有哪些参数项,但是对于alsa来说,参数的取值也是需要注意的,因为不一定默认参数就是可用的,因为alsa不支持自动重采样(pulse支持),alsa两个参数的取值项可以通过如下命令来查看:
tao@tao-PC:~$ arecord -D hw:0,0 --dump-hw-params
正在录音 WAVE 'stdin' : Unsigned 8 bit, 频率8000Hz, Mono
HW Params of device "hw:0,0":
--------------------
ACCESS: MMAP_INTERLEAVED RW_INTERLEAVED
FORMAT: S16_LE S32_LE
SUBFORMAT: STD
SAMPLE_BITS: [16 32]
FRAME_BITS: [32 64]
CHANNELS: 2
RATE: [44100 192000]
PERIOD_TIME: (83 185760)
PERIOD_SIZE: [16 8192]
PERIOD_BYTES: [128 65536]
PERIODS: [2 32]
BUFFER_TIME: (166 371520)
BUFFER_SIZE: [32 16384]
BUFFER_BYTES: [128 65536]
TICK_TIME: ALL
--------------------
arecord: set_params:1299: 样本格式不可用
Available formats:
- S16_LE
- S32_LE
在这里,通过指定声卡名称,我们可以获取到指定声卡的硬件参数信息,在这里我们可以看到channels通道数和rate即sample_rate采样率,
可以看到采样率的范围是一个区间,但是这并不意味着区间内的所有值都是有效值,根据个人经历来看,一般只有几个特定的值才可以,
例如44100/48000/192000,其他的值将会变成最接近上面三个值中的一个,这其中的验证方法可以通过ffmpeg命令行来试验一下,如下:
ffmpeg -f alsa -sample_rate 采样率 -i hw:0,0 -t 10 test.wav -v trace
查看命令行的输出结果,可以看到最终可用的采样率。
目前windows的录音使用较广且简单的是WAVEIN/OUT,在其之上还有dshow,但是dshow的api是比较难的,ffmpeg用的是dshow,和linux录音类似,其AVInputFormat使用av_find_input_format(“dshow”)进行设置,
dshow的AVOptions如下:
Demuxer dshow [DirectShow capture]:
dshow indev AVOptions:
-video_size .D....... set video size given a string such as 640x480 or hd720.
-pixel_format .D....... set video pixel format (default none)
-framerate .D....... set video frame rate
-sample_rate .D....... set audio sample rate (from 0 to INT_MAX) (default 0)
-sample_size .D....... set audio sample size (from 0 to 16) (default 0)
-channels .D....... set number of audio channels, such as 1 or 2 (from 0 to INT_MAX) (default 0)
-audio_buffer_size .D....... set audio device buffer latency size in milliseconds (default is the device's default) (from 0 to INT_MAX) (default 0)
-list_devices .D....... list available devices (default false)
-list_options .D....... list available options for specified device (default false)
-video_device_number .D....... set video device number for devices with same name (starts at 0) (from 0 to INT_MAX) (default 0)
-audio_device_number .D....... set audio device number for devices with same name (starts at 0) (from 0 to INT_MAX) (default 0)
-crossbar_video_input_pin_number .D....... set video input pin number for crossbar device (from -1 to INT_MAX) (default -1)
-crossbar_audio_input_pin_number .D....... set audio input pin number for crossbar device (from -1 to INT_MAX) (default -1)
-show_video_device_dialog .D....... display property dialog for video capture device (default false)
-show_audio_device_dialog .D....... display property dialog for audio capture device (default false)
-show_video_crossbar_connection_dialog .D....... display property dialog for crossbar connecting pins filter on video device (default false)
-show_audio_crossbar_connection_dialog .D....... display property dialog for crossbar connecting pins filter on audio device (default false)
-show_analog_tv_tuner_dialog .D....... display property dialog for analog tuner filter (default false)
-show_analog_tv_tuner_audio_dialog .D....... display property dialog for analog tuner audio filter (default false)
-audio_device_load .D....... load audio capture filter device (and properties) from file
-audio_device_save .D....... save audio capture filter device (and properties) to file
-video_device_load .D....... load video capture filter device (and properties) from file
-video_device_save .D....... save video capture filter device (and properties) to file
其相应的声卡名称通过如下命令行命令查看:
PS C:\Users\Tao> ffmpeg -list_devices true -f dshow -i dummy ffmpeg version N-94652-g808a6717e0 Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 9.1.1 (GCC) 20190807
configuration: --disable-static --enable-shared --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt
libavutil 56. 33.100 / 56. 33.100
libavcodec 58. 55.101 / 58. 55.101
libavformat 58. 31.104 / 58. 31.104
libavdevice 58. 9.100 / 58. 9.100
libavfilter 7. 58.101 / 7. 58.101
libswscale 5. 6.100 / 5. 6.100
libswresample 3. 6.100 / 3. 6.100
libpostproc 55. 6.100 / 55. 6.100
[dshow @ 000001a0e0fc02c0] DirectShow video devices (some may be both video and audio devices)
[dshow @ 000001a0e0fc02c0] Could not enumerate video devices (or none found).
[dshow @ 000001a0e0fc02c0] DirectShow audio devices
[dshow @ 000001a0e0fc02c0] "麦克风阵列 (Realtek High Definition Audio)"
[dshow @ 000001a0e0fc02c0] Alternative name "@device_cm_{33D9A762-90C8-11D0-BD43-00A0C911CE86}\wave_{28679435-B3DB-4479-90B4-8EBC4EB7C18E}"
dummy: Immediate exit requested
注意,如果windows命令运行看到中文乱码,可以先执行chcp 65001将终端的代码页转换为utf8,然后重新执行命令即可。
这里,我们可以看到声卡名称为"麦克风阵列 (Realtek High Definition Audio)",但是,实际上我们传入参数的时候需要传入u8"audio=麦克风阵列 (Realtek High Definition Audio)"才行
下面的代码是linux下录音的一个简单的例子,并实现了44.1KHz双通道的原始音频到16Khz单通道的转换,最终保存为WAV文件,如果要保存为其他类型的文件,需要更改编码器, 如果要在windows下使用下列代码,只需要将devname和alsa两个参数改成windows下对应的参数便能在windows下实现录音功能了。
#include
#include
extern "C" {
#include
#include
#include
#include
#include
}
#include
void call_back(void* avcl, int level, const char* fmt, va_list vl)
{
vfprintf(stdout, fmt, vl);
}
int main()
{
av_log_set_level(AV_LOG_TRACE);
av_log_set_callback(call_back);
avdevice_register_all();
avcodec_register_all();
av_register_all();
AVInputFormat* in_fmt = nullptr;
AVFormatContext* in_ctx = nullptr;
AVFormatContext* out_ctx = nullptr;
AVCodecContext* in_decode_ctx = nullptr;
AVCodecContext* out_encode_ctx = nullptr;
SwrContext* conv_ctx = nullptr;
AVDictionary* opt_dict = nullptr;
std::string devname = u8"hw:0,0"; // 打开的设备名
std::string filename = "test.wav"; // 要保存的文件名称
int stream_idx = -1;
// format和参数设置
{
in_fmt = av_find_input_format("alsa");
in_ctx = avformat_alloc_context();
av_dict_set(&opt_dict, "sample_rate", "44100", 0);
av_dict_set(&opt_dict, "sample_size", "16", 0);
av_dict_set(&opt_dict, "channels", "2", 0);
}
// 打开设备
{
avformat_open_input(&in_ctx, devname.c_str(), in_fmt, &opt_dict);
avformat_find_stream_info(in_ctx, nullptr);
}
// 查找流信息
{
for (size_t i = 0; i < in_ctx->nb_streams; i++)
{
if (in_ctx->streams[i]->codecpar->codec_type == AVMediaType::AVMEDIA_TYPE_AUDIO)
{
stream_idx = i;
break;
}
}
av_dump_format(in_ctx, stream_idx, devname.c_str(), 0);
}
// 解码器设置
{
AVCodec* codec = avcodec_find_decoder(in_ctx->streams[stream_idx]->codecpar->codec_id);
in_decode_ctx = avcodec_alloc_context3(codec);
avcodec_parameters_to_context(in_decode_ctx, in_ctx->streams[stream_idx]->codecpar);
in_decode_ctx->channel_layout = av_get_default_channel_layout(2);
avcodec_open2(in_decode_ctx, codec, nullptr);
}
// 编码器设置
{
AVCodec* codec = avcodec_find_encoder(AVCodecID::AV_CODEC_ID_PCM_S16LE);
out_encode_ctx = avcodec_alloc_context3(codec);
out_encode_ctx->sample_rate = 16000;
out_encode_ctx->channels = 1;
out_encode_ctx->channel_layout = av_get_channel_layout("mono");
out_encode_ctx->sample_fmt = AVSampleFormat::AV_SAMPLE_FMT_S16;
out_encode_ctx->time_base = AVRational{ 1, 16000 };
avcodec_open2(out_encode_ctx, codec, nullptr);
}
// 输出上下设置
{
AVCodec* codec = avcodec_find_encoder(AVCodecID::AV_CODEC_ID_PCM_S16LE);
avformat_alloc_output_context2(&out_ctx, nullptr, nullptr, filename.c_str());
AVStream* out_stream = avformat_new_stream(out_ctx, codec);
// 将编码器和输出上下文进行关联
avcodec_parameters_from_context(out_stream->codecpar, out_encode_ctx);
out_stream->time_base = AVRational{ 1, 16000 };
av_dump_format(out_ctx, 0, filename.c_str(), 1);
if (!(out_ctx->flags & AVFMT_NOFILE))
{
avio_open2(&out_ctx->pb, filename.c_str(), AVIO_FLAG_WRITE, nullptr, nullptr);
}
}
// 写入文件头
{
avformat_write_header(out_ctx, nullptr);
}
// 重采样上下文设置
{
conv_ctx = swr_alloc_set_opts(
nullptr,
AV_CH_LAYOUT_MONO,
AVSampleFormat::AV_SAMPLE_FMT_S16,
16000,
AV_CH_LAYOUT_STEREO,
AVSampleFormat::AV_SAMPLE_FMT_S16,
44100,
0,
nullptr);
swr_init(conv_ctx);
}
AVPacket* in_pkt = nullptr;
AVPacket* out_pkt = nullptr;
AVFrame* raw_frm = nullptr;
AVFrame* conv_frm = nullptr;
{
in_pkt = av_packet_alloc();
out_pkt = av_packet_alloc();
raw_frm = av_frame_alloc();
conv_frm = av_frame_alloc();
}
{
int ret = 0;
auto start_time = std::chrono::system_clock::now();
while (av_read_frame(in_ctx, in_pkt) == 0)
{
// 解码
ret = avcodec_send_packet(in_decode_ctx, in_pkt);
ret = avcodec_receive_frame(in_decode_ctx, raw_frm);
// 初始化重采样之后的数据包
{
conv_frm->pts = raw_frm->pts;
conv_frm->sample_rate = 16000;
conv_frm->channels = 1;
conv_frm->channel_layout = av_get_channel_layout("mono");
conv_frm->format = AVSampleFormat::AV_SAMPLE_FMT_S16;
}
// 重采样
ret = swr_convert_frame(conv_ctx, conv_frm, raw_frm);
// 重采样后的数据进行编码
ret = avcodec_send_frame(out_encode_ctx, conv_frm);
ret = avcodec_receive_packet(out_encode_ctx, out_pkt);
if (ret == 0)
{
// 写入编码后的数据
ret = av_write_frame(out_ctx, out_pkt);
}
av_packet_unref(in_pkt);
av_packet_unref(out_pkt);
av_frame_unref(raw_frm);
av_frame_unref(conv_frm);
auto end_time = std::chrono::system_clock::now();
// 录制20s后退出
if (std::chrono::duration_cast(end_time - start_time).count() > 20)
{
break;
}
}
{
av_write_trailer(out_ctx);
}
{
swr_free(&conv_ctx);
}
{
av_packet_free(&in_pkt);
av_packet_free(&out_pkt);
av_frame_free(&raw_frm);
av_frame_free(&conv_frm);
}
{
avcodec_free_context(&in_decode_ctx);
avcodec_free_context(&out_encode_ctx);
}
{
avformat_free_context(in_ctx);
avformat_free_context(out_ctx);
}
}
}
这是因为库的链接顺序的不同导致的,ffmpeg库之间有一定的依赖关系,需要调整链接时的链接顺序就能解决。