会说话的Tom猫是一款非常经典的终端游戏,可爱的Tom猫可以发出不同音调的声音。
之前用过一个非常著名的开源库SoundTouch可以实现音频的变速变调功能,具体可参考:
https://blog.csdn.net/wkw1125/article/details/63807128
Android作为当今移动终端的霸主,多媒体的变速变调自然是要支持的,想看一下它是怎么实现的。
接口定义在MediaPlayer.java中:
public class MediaPlayer extends PlayerBase
implements SubtitleController.Listener
, VolumeAutomation
, AudioRouting
{
...
public native void setPlaybackParams(@NonNull PlaybackParams params);
public native PlaybackParams getPlaybackParams();
...
}
其中的PlaybackParams定义如下:
public final class PlaybackParams implements Parcelable {
...
private float mPitch = 1.0f; // 音调
private float mSpeed = 1.0f; // 播放速率
public PlaybackParams() {
}
private PlaybackParams(Parcel in) {
mSet = in.readInt();
mAudioFallbackMode = in.readInt();
mAudioStretchMode = in.readInt();
mPitch = in.readFloat();
if (mPitch < 0.f) {
mPitch = 0.f;
}
mSpeed = in.readFloat();
}
...
public PlaybackParams setPitch(float pitch) { // 变调接口
if (pitch < 0.f) {
throw new IllegalArgumentException("pitch must not be negative");
}
mPitch = pitch;
mSet |= SET_PITCH;
return this;
}
public float getPitch() {
if ((mSet & SET_PITCH) == 0) {
throw new IllegalStateException("pitch not set");
}
return mPitch;
}
public PlaybackParams setSpeed(float speed) { // 变速接口
mSpeed = speed;
mSet |= SET_SPEED;
return this;
}
public float getSpeed() {
if ((mSet & SET_SPEED) == 0) {
throw new IllegalStateException("speed not set");
}
return mSpeed;
}
...
}
以下是自己封装的一个调用MediaPlayer接口来实现变调的函数:
public void setPlayTone(TONE_TYPE toneType) {
float pitch = 1.0f;
switch (toneType) {
case TONE_INDEX_ORANGE:
pitch = 1.23705f;
break;
case TONE_INDEX_PURPLE:
pitch = 0.7695f;
break;
case TONE_INDEX_BLUE: //original voice
pitch = 1.0f;
break;
case TONE_INDEX_NONE: //original voice
pitch = 1.0f;
break;
default:
Log.e(TAG, "no such specific tonetype found!!");
return;
}
// mMediaPlayer是MediaPlayer的对象
PlaybackParams params = mMediaPlayer.getPlaybackParams().setPitch(pitch);
mMediaPlayer.setPlaybackParams(params);
}
MediaPlayer的setPlaybackParams带有native标签,会调用到native层,之后会经历一个漫长的调用流程,完整流程如下:
流程来到AudioTrack后,将开始有实际的处理逻辑:
status_t AudioTrack::setPlaybackRate(const AudioPlaybackRate &playbackRate)
{
...
// 调整采样率???
const uint32_t effectiveRate = adjustSampleRate(mSampleRate, playbackRate.mPitch);
// 调整播放速率???
const float effectiveSpeed = adjustSpeed(playbackRate.mSpeed, playbackRate.mPitch);
// 调整音调???
const float effectivePitch = adjustPitch(playbackRate.mPitch);
// 定义了一个临时的AudioPlaybackRate,记录了刚才调整过后的播放速率和音调
AudioPlaybackRate playbackRateTemp = playbackRate;
playbackRateTemp.mSpeed = effectiveSpeed;
playbackRateTemp.mPitch = effectivePitch;
...
mPlaybackRate = playbackRate; // 本地保存是前面传过来的参数
mProxy->setPlaybackRate(playbackRateTemp); // 而传给AudioFlinger的是刚定义的临时参数
// 还将调整过的采样率也设置到AudioFlinger中
mProxy->setSampleRate(effectiveRate); // FIXME: not quite "atomic" with setPlaybackRate
...
}
看到这里有点蒙,似乎是通过传入的speed和pitch重新计算了一组临时的sample_rate, speed以及pitch,并将这三个临时参数设置到AudioFlinger中去,先看看这三个计算的逻辑:
static const bool kFixPitch = true; // enable pitch fix
static inline uint32_t adjustSampleRate(uint32_t sampleRate, float pitch)
{
return kFixPitch ? (sampleRate * pitch + 0.5) : sampleRate;
}
static inline float adjustSpeed(float speed, float pitch)
{
return kFixPitch ? speed / max(pitch, AUDIO_TIMESTRETCH_PITCH_MIN_DELTA) : speed;
}
static inline float adjustPitch(float pitch)
{
return kFixPitch ? AUDIO_TIMESTRETCH_PITCH_NORMAL : pitch;
}
我们以设置音调为TONE_INDEX_PURPLE为例,pitch = 0.7695f < 1,相当于调低音调,也就是调低频率。
对于44100Hz的音频来说,effectiveRate = sampleRate * pitch + 0.5 = 44100 * 0.7695 + 0.5 = 33935.45,也就是说采样率调低到了33935
对于44100Hz的音频来说,effectiveSpeed = speed / max(pitch, AUDIO_TIMESTRETCH_PITCH_MIN_DELTA) = 1 / max(0.7695, 0.0001) = 1.2995,也就是说播放速率调为了正常速率的1.3倍
对于44100Hz的音频来说,effectivePitch = AUDIO_TIMESTRETCH_PITCH_NORMAL = 1
所以,设置音调的流程执行到了AudioFlinger之后,其实pitch永远是1,真正用于控制音调的是调整过后的speed = 1.3和sample_rate=33935两个参数。不瞒你说,这个方案是真的把我坑苦了,一开始我先入为主的在AudioFlinger中寻找使用pitch的代码,结果掘地三尺找来找去好了好久死活找不到,这才追根溯源往前分析AudioTrack的代码才发现,原来pitch在AudioTrack中就已经完成了它的使命。
接下来,处理流程就会跨越进程,从media_server来到audio_server,进入到AudioFlinger中,代码如下:
AudioFlinger::PlaybackThread::mixer_state AudioFlinger::MixerThread::prepareTracks_l(
Vector< sp
MixerThread什么也没做,直接把参数传递给了AudioMixer:
void AudioMixer::setParameter(int name, int target, int param, void *value)
{
LOG_ALWAYS_FATAL_IF(!exists(name), "invalid name: %d", name);
const std::shared_ptr
先看setPlaybackRate:
bool AudioMixer::Track::setPlaybackRate(const AudioPlaybackRate &playbackRate)
{
...
mPlaybackRate = playbackRate; // 保存参数
// 创建TimestretchBufferProvider对象,并传入变速变调参数
if (mTimestretchBufferProvider.get() == nullptr) {
// TODO: Remove MONO_HACK. Resampler sees #channels after the downmixer
// but if none exists, it is the channel count (1 for mono).
const int timestretchChannelCount = mDownmixerBufferProvider.get() != nullptr
? mMixerChannelCount : channelCount;
mTimestretchBufferProvider.reset(new TimestretchBufferProvider(timestretchChannelCount,
mMixerInFormat, sampleRate, playbackRate));
reconfigureBufferProviders();
} else {
static_cast(mTimestretchBufferProvider.get())
->setPlaybackRate(playbackRate);
}
return true;
}
status_t TimestretchBufferProvider::setPlaybackRate(const AudioPlaybackRate &playbackRate)
{
mPlaybackRate = playbackRate; // 保存参数
mFallbackFailErrorShown = false;
sonicSetSpeed(mSonicStream, mPlaybackRate.mSpeed); // 这是什么鬼?
//TODO: pitch is ignored for now
//TODO: optimize: if parameters are the same, don't do any extra computation.
mAudioPlaybackRateValid = isAudioPlaybackRateValid(mPlaybackRate); // 判断参数合法性
return OK;
}
变速变调参数在AudioMixer::Track中保留了一份,然后创建了一个TimestretchBufferProvider对象,它也保存了一份,并且还会对参数进行合法性判断。中间调用了一个sonicSetSpeed(),并且把speed(1.3)传了进去,根据引入的头文件找到这个函数定义在external/sonic/目录下,那意味着这应该是一个第三方的开源库。
于是百度了一下,发现原来这个libsonic居然是另一个和soundtouch功能完全一样的开源库。现在将播放速设进去,那意着后面会播放时,应该会先将音频数据传入到这个库里面进行变速(不变调)处理。关于libsonic,可以参考:
https://blog.csdn.net/u010339039/article/details/89196814
https://www.jianshu.com/p/2f9939111681
再看setResampler:
bool AudioMixer::Track::setResampler(uint32_t trackSampleRate, uint32_t devSampleRate)
{
if (trackSampleRate != devSampleRate || mResampler.get() != nullptr) {
if (sampleRate != trackSampleRate) {
sampleRate = trackSampleRate; // 保存采样率
if (mResampler.get() == nullptr) { // 必须创建一个AudioResampler对象
...
mResampler.reset(AudioResampler::create(
mMixerInFormat,
resamplerChannelCount,
devSampleRate, quality));
}
return true;
}
}
return false;
}
需要注意的是两个参数的含义,trackSampleRate表示当前AudioTrack的采样率,而devSampleRate表示输出设备的采样率,所以只有当二者不相等时才需要重采样。
可以看到这个函数最重要的逻辑是将新的采样率trackSampleRate赋给成员变量sampleRate,如果还未创建AudioResampler,则还要先创建一个并赋给成员变量mResampler。
我们的例子中,设置音调为TONE_INDEX_PURPLE后,采样率为33935,而当前大部分Android终端的输出采样率为48000,所以显然是需要重采样的,会创建AudioResampler。
采样率设置完成后,会调用invalidate():
// Called when track info changes and a new process hook should be determined.
void invalidate() {
mHook = &AudioMixer::process__validate;
}
注释已经解释的很清楚了,Track信息有变化,就是需要重新选择hook。
void AudioMixer::process__validate()
{
// TODO: fix all16BitsStereNoResample logic to
// either properly handle muted tracks (it should ignore them)
// or remove altogether as an obsolete optimization.
bool all16BitsStereoNoResample = true;
bool resampling = false;
bool volumeRamp = false;
mEnabled.clear();
mGroups.clear();
for (const auto &pair : mTracks) { // 逐个Track进行更新
const int name = pair.first;
const std::shared_ptr
这个函数的工作有仨:
1. 为每个Track更新hook;
2. 为AudioTrack本身更新hook;
3. 调用AudioTrack更新后的hook。
选择hook的两个函数AudioMixer::hook_t AudioMixer::Track::getTrackHook()和AudioMixer::process_hook_t AudioMixer::getProcessHook()逻辑都十分简单,就是switch case,不再赘述。
继续以设置音调为TONE_INDEX_PURPLE为例,最终Track的hook会选择void AudioMixer::Track::track__Resample(),AudioMixer的hook会选择void AudioMixer::process__genericResampling()。
先看void AudioMixer::process__genericResampling():
// generic code with resampling
void AudioMixer::process__genericResampling()
{
ALOGVV("process__genericResampling\n");
int32_t * const outTemp = mOutputTemp.get(); // naked ptr
size_t numFrames = mFrameCount;
for (const auto &pair : mGroups) { // 还是逐个Track加以处理
const auto &group = pair.second;
const std::shared_ptr &t1 = mTracks[group[0]];
// clear temp buffer
memset(outTemp, 0, sizeof(*outTemp) * t1->mMixerChannelCount * mFrameCount);
for (const int name : group) {
const std::shared_ptr &t = mTracks[name];
int32_t *aux = NULL;
if (CC_UNLIKELY(t->needs & NEEDS_AUX)) { // 是否需要处理音量
aux = t->auxBuffer;
}
// this is a little goofy, on the resampling case we don't
// acquire/release the buffers because it's done by
// the resampler.
if (t->needs & NEEDS_RESAMPLE) { // 对于需要重采样的Track,先调用其hook进行重采样及音效及其他处理
(t.get()->*t->hook)(outTemp, numFrames, mResampleTemp.get() /* naked ptr */, aux); // 注意传入的参数,有两个是前面创建的临时Buffer,还有一个用于处理音量的Buffer
} else { // 对于不需要重采样的Track
size_t outFrames = 0;
while (outFrames < numFrames) {
t->buffer.frameCount = numFrames - outFrames;
t->bufferProvider->getNextBuffer(&t->buffer); // 直接读取Track输入的Buffer
t->mIn = t->buffer.raw;
// t->mIn == nullptr can happen if the track was flushed just after having
// been enabled for mixing.
if (t->mIn == nullptr) break;
(t.get()->*t->hook)(
outTemp + outFrames * t->mMixerChannelCount, t->buffer.frameCount,
mResampleTemp.get() /* naked ptr */,
aux != nullptr ? aux + outFrames : nullptr); // 调用Track的hook进行音量及其他处理,没有重采样处理
outFrames += t->buffer.frameCount;
t->bufferProvider->releaseBuffer(&t->buffer);
}
}
}
convertMixerFormat(t1->mainBuffer, t1->mMixerFormat,
outTemp, t1->mMixerInFormat, numFrames * t1->mMixerChannelCount); // 最后将处理后的数据转化成输出设备所支持的格式
}
}
再来看看单个Audio Track的hook函数:
void AudioMixer::Track::track__genericResample(
int32_t* out, size_t outFrameCount, int32_t* temp, int32_t* aux)
{
ALOGVV("track__genericResample\n");
mResampler->setSampleRate(sampleRate); // 终于用到了我们设置进来重采样参数(33935)
// ramp gain - resample to temp buffer and scale/mix in 2nd step
if (aux != NULL) { // 若需处理音量
// always resample with unity gain when sending to auxiliary buffer to be able
// to apply send level after resampling
mResampler->setVolume(UNITY_GAIN_FLOAT, UNITY_GAIN_FLOAT);
memset(temp, 0, outFrameCount * mMixerChannelCount * sizeof(int32_t));
mResampler->resample(temp, outFrameCount, bufferProvider); // 先进行重采样
if (CC_UNLIKELY(volumeInc[0]|volumeInc[1]|auxInc)) { // 再进行音量处理
volumeRampStereo(out, outFrameCount, temp, aux); // 处理音量渐变调节
} else {
volumeStereo(out, outFrameCount, temp, aux); // 处理音量调节
}
} else {
if (CC_UNLIKELY(volumeInc[0]|volumeInc[1])) {
mResampler->setVolume(UNITY_GAIN_FLOAT, UNITY_GAIN_FLOAT);
memset(temp, 0, outFrameCount * MAX_NUM_CHANNELS * sizeof(int32_t));
mResampler->resample(temp, outFrameCount, bufferProvider); // 进行重采样
volumeRampStereo(out, outFrameCount, temp, aux);
}
// constant gain
else {
mResampler->setVolume(mVolume[0], mVolume[1]);
mResampler->resample(out, outFrameCount, bufferProvider); // 进行重采样
}
}
}
音量处理不是重点,重点关注重采样的逻辑:
size_t AudioResamplerOrder1::resample(int32_t* out, size_t outFrameCount,
AudioBufferProvider* provider) {
// should never happen, but we overflow if it does
// ALOG_ASSERT(outFrameCount < 32767);
// select the appropriate resampler
switch (mChannelCount) {
case 1:
return resampleMono16(out, outFrameCount, provider); // 单声道
case 2:
return resampleStereo16(out, outFrameCount, provider); // 立体声
default:
LOG_ALWAYS_FATAL("invalid channel count: %d", mChannelCount);
return 0;
}
}
AudioResamplerOrder1是AudioResampler的子类之一,实现的是一种线性的重采样算法,计算复杂度较低,算法对音质损害较大,属于低质量重采样算法,其他实现更高质量的重采样算法的有AudioResamplerCubic, AudioResamplerSinc, AudioResamplerDyn以及AudioResamplerQTI。算法超出了本文件研究范围,所以就以AudioResamplerOrder1为例来研究代码流程,可以看到会根据声道数来选择不同的处理函数,而且重采样功能最多只支持到2声道。
看一下立体声的重采样函数:
size_t AudioResamplerOrder1::resampleStereo16(int32_t* out, size_t outFrameCount,
AudioBufferProvider* provider) {
...
while (outputIndex < outputSampleCount) {
// buffer is empty, fetch a new one
while (mBuffer.frameCount == 0) {
mBuffer.frameCount = inFrameCount;
provider->getNextBuffer(&mBuffer); // 从TimeStretchBufferProvider中读取数据,按照前面的猜想,这个数据应该是作过变速(不变调)处理后的数据
if (mBuffer.raw == NULL) {
goto resampleStereo16_exit;
}
...
}
...
#ifdef ASM_ARM_RESAMP1 // asm optimisation for ResamplerOrder1
if (inputIndex + 2 < mBuffer.frameCount) {
int32_t* maxOutPt;
int32_t maxInIdx;
maxOutPt = out + (outputSampleCount - 2); // 2 because 2 frames per loop
maxInIdx = mBuffer.frameCount - 2;
AsmStereo16Loop(in, maxOutPt, maxInIdx, outputIndex, out, inputIndex, vl, vr,
phaseFraction, phaseIncrement); // 重采样处理
}
#endif // ASM_ARM_RESAMP1
...
}
...
}
重采样的终极处理函数AsmStereo16Loop()是居然是汇编写的,看来真得抽时间学学汇编了,重采样的逻辑就到这了。不过我发现这个类在重采样过程中居然都没用到过前面传进来的采样率(33935),而其他几个更高质量的重采样类都用到了,难怪这个类的质量最低。
这里还要再看一下provider->getNextBuffer(&mBuffer); 猜测这里获取到的buffer实际上已经经过变速处理后的数据了。
status_t TimestretchBufferProvider::getNextBuffer(
AudioBufferProvider::Buffer *pBuffer)
{
...
do {
mBuffer.frameCount = mPlaybackRate.mSpeed == AUDIO_TIMESTRETCH_SPEED_NORMAL
? outputDesired : outputDesired * mPlaybackRate.mSpeed + 1;
status_t res = mTrackBufferProvider->getNextBuffer(&mBuffer); // 读取buffer
...
// time-stretch the data
dstAvailable = min(mLocalBufferFrameCount - mRemaining, outputDesired);
size_t srcAvailable = mBuffer.frameCount;
processFrames((uint8_t*)mLocalBufferData + mRemaining * mFrameSize, &dstAvailable,
mBuffer.raw, &srcAvailable); // 调用processFrames()来对数据作time stretch处理
// release all data consumed
mBuffer.frameCount = srcAvailable;
mTrackBufferProvider->releaseBuffer(&mBuffer);
} while (dstAvailable == 0); // try until we get output data or upstream provider fails.
...
return OK;
}
void TimestretchBufferProvider::processFrames(void *dstBuffer, size_t *dstFrames,
const void *srcBuffer, size_t *srcFrames)
{
ALOGV("processFrames(%zu %zu) remaining(%zu)", *dstFrames, *srcFrames, mRemaining);
// Note dstFrames is the required number of frames.
if (!mAudioPlaybackRateValid) { // 我们传入的speed是1.3,pitch是1,所以mAudioPlaybackRateValid肯定是true,会走else分支
...
} else {
switch (mFormat) {
case AUDIO_FORMAT_PCM_FLOAT:
...
case AUDIO_FORMAT_PCM_16_BIT:
if (sonicWriteShortToStream(mSonicStream, (short*)srcBuffer, *srcFrames) != 1) { // 先将数据写入到libsonic进行变速(不变调)处理
ALOGE("sonicWriteShortToStream cannot realloc");
*srcFrames = 0; // cannot consume all of srcBuffer
}
*dstFrames = sonicReadShortFromStream(mSonicStream, (short*)dstBuffer, *dstFrames); // 读取变速处理完成后的数据
break;
...
}
}
}
果不其然,通过TimeStretchBufferProvider获取的数据,都会先经过libsonic进行变速不变调处理,处理完成之后,再由Track的hook函数进行重采样处理,重采样完成之后还要将数据转化成AudioMixer技持的格式,以备后面混音用:
void AudioMixer::convertMixerFormat(void *out, audio_format_t mixerOutFormat,
void *in, audio_format_t mixerInFormat, size_t sampleCount)
{
switch (mixerInFormat) {
case AUDIO_FORMAT_PCM_FLOAT:
switch (mixerOutFormat) {
case AUDIO_FORMAT_PCM_FLOAT:
memcpy(out, in, sampleCount * sizeof(float)); // MEMCPY. TODO optimize out
break;
case AUDIO_FORMAT_PCM_16_BIT:
memcpy_to_i16_from_float((int16_t*)out, (float*)in, sampleCount);
break;
default:
LOG_ALWAYS_FATAL("bad mixerOutFormat: %#x", mixerOutFormat);
break;
}
break;
case AUDIO_FORMAT_PCM_16_BIT:
switch (mixerOutFormat) {
case AUDIO_FORMAT_PCM_FLOAT:
memcpy_to_float_from_q4_27((float*)out, (const int32_t*)in, sampleCount);
break;
case AUDIO_FORMAT_PCM_16_BIT:
memcpy_to_i16_from_q4_27((int16_t*)out, (const int32_t*)in, sampleCount);
break;
default:
LOG_ALWAYS_FATAL("bad mixerOutFormat: %#x", mixerOutFormat);
break;
}
break;
default:
LOG_ALWAYS_FATAL("bad mixerInFormat: %#x", mixerInFormat);
break;
}
}
至此,整个变调的过程就结束了,总结一下,其实包含核心处理流程就是两步:
1. 通过TimestretchBufferProvider进行变速(不变调)处理,变速的计算公式为:effectiveSpeed = speed / max(pitch, AUDIO_TIMESTRETCH_PITCH_MIN_DELTA),所以调高音调是先作降速处理,调低音调是先作加速处理。
2. 通过AudioMixer::Track进行重采样处理,采样率的计算公式为:effectiveRate = sampleRate * pitch + 0.5,所以调高音调会提高采样率,而调低音调会降低采样率,跟速率调整的方向正好是相反的,这样在达到变调效果的同时,也能将播放速率恢复到原始状态。