PCM混音

混音

pcm混音的原理是把两组数据相加,相加后的数据范围不能超过pcm位宽的表示范围,MixFrames写死是int16_t类型(具体查看AudioFrame),所以可以看出webrtc内混音处理是不支持16bit之外的pcm音频。

PCM操作,包括单声道转立体声、立体声转单声道、哑音、音量调整。
音频术语

webrtc中的混音函数在webrtc/modules/audio_conference_mixer/source/audio_conference_mixer_impl.cc,也就是下面这个函数。

// Mix |frame| into |mixed_frame|, with saturation protection and upmixing.
// These effects are applied to |frame| itself prior to mixing. Assumes that
// |mixed_frame| always has at least as many channels as |frame|. Supports
// stereo at most.
//
// TODO(andrew): consider not modifying |frame| here.
void MixFrames(AudioFrame* mixed_frame, AudioFrame* frame, bool use_limiter) {
  assert(mixed_frame->num_channels_ >= frame->num_channels_);
  if (use_limiter) {
    // Divide by two to avoid saturation in the mixing.
    // This is only meaningful if the limiter will be used.
    *frame >>= 1;
  }
  if (mixed_frame->num_channels_ > frame->num_channels_) {
    // We only support mono-to-stereo.
    assert(mixed_frame->num_channels_ == 2 &&
           frame->num_channels_ == 1);
    AudioFrameOperations::MonoToStereo(frame);
  }

  *mixed_frame += *frame;
}

最后一句代码才是混合的关键所在,它调用了AudioFrame的重载函数+=,也就是进行了下面的操作。也就是把相加后的数据控制在int16_t范围。
文件路径是:webrtc/modules/interface/module_common_types.h

inline AudioFrame& AudioFrame::operator+=(const AudioFrame& rhs) {
  ...

  if (speech_type_ != rhs.speech_type_) speech_type_ = kUndefined;

  if (noPrevData) {
    memcpy(data_, rhs.data_,
           sizeof(int16_t) * rhs.samples_per_channel_ * num_channels_);
  } else {
    // IMPROVEMENT this can be done very fast in assembly
    for (int i = 0; i < samples_per_channel_ * num_channels_; i++) {
      int32_t wrapGuard =
          static_cast(data_[i]) + static_cast(rhs.data_[i]);
      if (wrapGuard < -32768) {
        data_[i] = -32768;
      } else if (wrapGuard > 32767) {
        data_[i] = 32767;
      } else {
        data_[i] = (int16_t)wrapGuard;
      }
    }
  }
  energy_ = 0xffffffff;
  return *this;
}

最后的判断可以用宏来写

#define MIXER_MAX(x,y) ((x)>(y)? (x):(y))
#define MIXER_MIN(x,y) ((x)<(y)? (x):(y))
#define MIXER_CLIP3(a,b,x) (MIXER_MAX(a,MIXER_MIN(x,b)))  /* clip x between a and b */
#define MIXER_CLIP(x)  MIXER_CLIP3(-32768,32767,x)

for (int i = 0; i < samples_per_channel_ * num_channels_; i++) {
    int32_t wrapGuard =
        static_cast(data_[i]) + static_cast(rhs.data_[i]);
    data_[i] = (int16_t)MIXER_CLIP(wrapGuard);
}

Android源码里面是这样写的,用位移的效率要高一些,我仅仅是根据理论知识推测效率要比判断要高,没有进行过对比。

static inline int16_t clamp16(int32_t sample)
{
    if ((sample>>15) ^ (sample>>31))
        sample = 0x7FFF ^ (sample>>31);
    return sample;
}

你可能感兴趣的:(媒体)