ffmpeg混音(将多个声音合成一个)命令


ffmpeg命令中可以使用filter amix实现这个功能。

官方文档

http://ffmpeg.org/ffmpeg-filters.html 
6.8 amix 
Mixes multiple audio inputs into a single output.

Note that this filter only supports float samples (the amerge and pan audio filters support many formats). If the amix input has integer samples then aresample will be automatically inserted to perform the conversion to float samples.

For example

ffmpeg -i INPUT1 -i INPUT2 -i INPUT3 -filter_complex amix=inputs=3:duration=first:dropout_transition=3 OUTPUT 
will mix 3 input audio streams to a single output with the same duration as the first input and a dropout transition time of 3 seconds.

It accepts the following parameters:

inputs 
The number of inputs. If unspecified, it defaults to 2.

duration 
How to determine the end-of-stream.

longest 
The duration of the longest input. (default)

shortest 
The duration of the shortest input.

first 
The duration of the first input.

dropout_transition 
The transition time, in seconds, for volume renormalization when an input stream ends. The default value is 2 seconds.

例子

把当前电脑播放的声音混合到文件中的例子(”audio=”后面是音频设备名,ffmpeg -f dshow -list_devices 1 -i dummy可以在win下获取音频设备名):

ffmpeg.exe -re -i 1234.mp4 -f dshow -i audio=”立体声混音 (Realtek High Definition ” -filter_complex amix=inputs=2:duration=first:dropout_transition=0 -t 10 out.mp4 -y


pcm混音的原理是把两组数据相加,相加后的数据范围不能超过pcm位宽的表示范围,MixFrames写死是int16_t类型(具体查看AudioFrame),所以可以看出webrtc内混音处理是不支持16bit之外的pcm音频。

PCM操作,包括单声道转立体声、立体声转单声道、哑音、音量调整。 
音频术语

webrtc中的混音函数在webrtc/modules/audio_conference_mixer/source/audio_conference_mixer_impl.cc,也就是下面这个函数。

// Mix |frame| into |mixed_frame|, with saturation protection and upmixing.
// These effects are applied to |frame| itself prior to mixing. Assumes that
// |mixed_frame| always has at least as many channels as |frame|. Supports
// stereo at most.
//
// TODO(andrew): consider not modifying |frame| here.
void MixFrames(AudioFrame* mixed_frame, AudioFrame* frame, bool use_limiter) {
  assert(mixed_frame->num_channels_ >= frame->num_channels_);
  if (use_limiter) {
    // Divide by two to avoid saturation in the mixing.
    // This is only meaningful if the limiter will be used.
    *frame >>= 1;
  }
  if (mixed_frame->num_channels_ > frame->num_channels_) {
    // We only support mono-to-stereo.
    assert(mixed_frame->num_channels_ == 2 &&
           frame->num_channels_ == 1);
    AudioFrameOperations::MonoToStereo(frame);
  }

  *mixed_frame += *frame;
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

最后一句代码才是混合的关键所在,它调用了AudioFrame的重载函数+=,也就是进行了下面的操作。也就是把相加后的数据控制在int16_t范围。 
文件路径是:webrtc/modules/interface/module_common_types.h

inline AudioFrame& AudioFrame::operator+=(const AudioFrame& rhs) {
  ...

  if (speech_type_ != rhs.speech_type_) speech_type_ = kUndefined;

  if (noPrevData) {
    memcpy(data_, rhs.data_,
           sizeof(int16_t) * rhs.samples_per_channel_ * num_channels_);
  } else {
    // IMPROVEMENT this can be done very fast in assembly
    for (int i = 0; i < samples_per_channel_ * num_channels_; i++) {
      int32_t wrapGuard =
          static_cast(data_[i]) + static_cast(rhs.data_[i]);
      if (wrapGuard < -32768) {
        data_[i] = -32768;
      } else if (wrapGuard > 32767) {
        data_[i] = 32767;
      } else {
        data_[i] = (int16_t)wrapGuard;
      }
    }
  }
  energy_ = 0xffffffff;
  return *this;
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25

最后的判断可以用宏来写

#define MIXER_MAX(x,y) ((x)>(y)? (x):(y))
#define MIXER_MIN(x,y) ((x)<(y)? (x):(y))
#define MIXER_CLIP3(a,b,x) (MIXER_MAX(a,MIXER_MIN(x,b)))  /* clip x between a and b */
#define MIXER_CLIP(x)  MIXER_CLIP3(-32768,32767,x)

for (int i = 0; i < samples_per_channel_ * num_channels_; i++) {
    int32_t wrapGuard =
        static_cast(data_[i]) + static_cast(rhs.data_[i]);
    data_[i] = (int16_t)MIXER_CLIP(wrapGuard);
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

Android源码里面是这样写的,用位移的效率要高一些,我仅仅是根据理论知识推测效率要比判断要高,没有进行过对比。

static inline int16_t clamp16(int32_t sample)
{
    if ((sample>>15) ^ (sample>>31))
        sample = 0x7FFF ^ (sample>>31);
    return sample;
}


你可能感兴趣的:(FFmpeg/FFplay)