上一节说到TDStretch类成员函数processSamples,粗略分析了一下大概。至于流程是通过TDStretch类成员函数putsamples调用processSamples进行处理,我们不难联想到前面对于SoundTouch类成员函数putSamples的分析。TDStretch类成员函数putSamples实现如下:
// Adds 'numsamples' pcs of samples from the 'samples' memory position into
// the input of the object.
void TDStretch::putSamples(const SAMPLETYPE *samples, uint nSamples)
{
// Add the samples into the input buffer
inputBuffer.putSamples(samples, nSamples);
// Process the samples in input buffer
processSamples();
}
先拷贝长度为nSamples的samples数据到inputbuffer,然后调用processSamples进行处理。TDStretch类的核心就是这个成员函数的实现,以下将详细分析一下这个类成员函数的实现。// Processes as many processing frames of the samples 'inputBuffer', store
// the result into 'outputBuffer'
void TDStretch::processSamples()
{
int ovlSkip, offset;
int temp;
while ((int)inputBuffer.numSamples() >= sampleReq)
{
// If tempo differs from the normal ('SCALE'), scan for the best overlapping
// position
offset = seekBestOverlapPosition(inputBuffer.ptrBegin());
// Mix the samples in the 'inputBuffer' at position of 'offset' with the
// samples in 'midBuffer' using sliding overlapping
// ... first partially overlap with the end of the previous sequence
// (that's in 'midBuffer')
overlap(outputBuffer.ptrEnd((uint)overlapLength), inputBuffer.ptrBegin(), (uint)offset);
outputBuffer.putSamples((uint)overlapLength);
// ... then copy sequence samples from 'inputBuffer' to output:
temp = (seekLength / 2 - offset);
// length of sequence
temp = (seekWindowLength - 2 * overlapLength);
// crosscheck that we don't have buffer overflow...
if ((int)inputBuffer.numSamples() < (offset + temp + overlapLength * 2))
{
continue; // just in case, shouldn't really happen
}
outputBuffer.putSamples(inputBuffer.ptrBegin() + channels * (offset + overlapLength), (uint)temp);
// Copies the end of the current sequence from 'inputBuffer' to
// 'midBuffer' for being mixed with the beginning of the next
// processing sequence and so on
assert((offset + temp + overlapLength * 2) <= (int)inputBuffer.numSamples());
memcpy(pMidBuffer, inputBuffer.ptrBegin() + channels * (offset + temp +
overlapLength),
channels * sizeof(SAMPLETYPE) * overlapLength);
// Remove the processed samples from the input buffer. Update
// the difference between integer & nominal skip step to 'skipFract'
// in order to prevent the error from accumulating over time.
skipFract += nominalSkip; // real skip size
ovlSkip = (int)skipFract; // rounded to integer skip
skipFract -= ovlSkip; // maintain the fraction part, i.e. real vs.
integer skip
inputBuffer.receiveSamples((uint)ovlSkip);
}
}
首先,sampleReq就是上一节提到的计算得到的参数,音频伸缩的长度。先判断一下inputBuffer的长度是否达到sampleReq的长度,如果达到。则通过调用类成员函数seekBestOverlapPosition(inputBuffer.ptrBegin());从输入的buffer中找一个最相似的点。我们看看类成员函数seekBestOverlapPosition的实现:
// Seeks for the optimal overlap-mixing position.
int TDStretch::seekBestOverlapPosition(const SAMPLETYPE *refPos)
{
if (channels == 2)
{
// stereo sound
if (bQuickSeek)
{
return seekBestOverlapPositionStereoQuick(refPos);
}
else
{
return seekBestOverlapPositionStereo(refPos);
}
}
else
{
// mono sound
if (bQuickSeek)
{
return seekBestOverlapPositionMonoQuick(refPos);
}
else
{
return seekBestOverlapPositionMono(refPos);
}
}
}
同样以单声道为例,便于理解,通过判断bQuickSeek这个条件变量,分情况调用seekBestOverlapPositionMonoQuick和seekBestOverlapPositionMono。
// Seeks for the optimal overlap-mixing position. The 'mono' version of the
// routine
//
// The best position is determined as the position where the two overlapped
// sample sequences are 'most alike', in terms of the highest cross-correlation
// value over the overlapping period
int TDStretch::seekBestOverlapPositionMonoQuick(const SAMPLETYPE *refPos)
{
int j;
int bestOffs;
double bestCorr, corr;
int scanCount, corrOffset, tempOffset;
// Slopes the amplitude of the 'midBuffer' samples
precalcCorrReferenceMono();
bestCorr = FLT_MIN;
bestOffs = _scanOffsets[0][0];
corrOffset = 0;
tempOffset = 0;
// Scans for the best correlation value using four-pass hierarchical search.
//
// The look-up table 'scans' has hierarchical position adjusting steps.
// In first pass the routine searhes for the highest correlation with
// relatively coarse steps, then rescans the neighbourhood of the highest
// correlation with better resolution and so on.
for (scanCount = 0;scanCount < 4; scanCount ++)
{
j = 0;
while (_scanOffsets[scanCount][j])
{
tempOffset = corrOffset + _scanOffsets[scanCount][j];
if (tempOffset >= seekLength) break;
// Calculates correlation value for the mixing position corresponding
// to 'tempOffset'
corr = (double)calcCrossCorrMono(refPos + tempOffset, pRefMidBuffer);
// heuristic rule to slightly favour values close to mid of the range
double tmp = (double)(2 * tempOffset - seekLength) / seekLength;
corr = ((corr + 0.1) * (1.0 - 0.25 * tmp * tmp));
// Checks for the highest correlation value
if (corr > bestCorr)
{
bestCorr = corr;
bestOffs = tempOffset;
}
j ++;
}
corrOffset = bestOffs;
}
// clear cross correlation routine state if necessary (is so e.g. in MMX
routines).
clearCrossCorrState();
return bestOffs;
}
和
// Seeks for the optimal overlap-mixing position. The 'mono' version of the
// routine
//
// The best position is determined as the position where the two overlapped
// sample sequences are 'most alike', in terms of the highest cross-correlation
// value over the overlapping period
int TDStretch::seekBestOverlapPositionMono(const SAMPLETYPE *refPos)
{
int bestOffs;
double bestCorr, corr;
int tempOffset;
const SAMPLETYPE *compare;
// Slopes the amplitude of the 'midBuffer' samples
precalcCorrReferenceMono();
bestCorr = FLT_MIN;
bestOffs = 0;
// Scans for the best correlation value by testing each possible position
// over the permitted range.
for (tempOffset = 0; tempOffset < seekLength; tempOffset ++)
{
compare = refPos + tempOffset;
// Calculates correlation value for the mixing position corresponding
// to 'tempOffset'
corr = (double)calcCrossCorrMono(pRefMidBuffer, compare);
// heuristic rule to slightly favour values close to mid of the range
double tmp = (double)(2 * tempOffset - seekLength) / seekLength;
corr = ((corr + 0.1) * (1.0 - 0.25 * tmp * tmp));
// Checks for the highest correlation value
if (corr > bestCorr)
{
bestCorr = corr;
bestOffs = tempOffset;
}
}
// clear cross correlation routine state if necessary (is so e.g. in MMX
routines).
clearCrossCorrState();
return bestOffs;
}
可以看出这两个函数大大的不同,其实却是大同小异,先分析一下TDStretch::seekBestOverlapPositionMono,这是一个中规中矩的实现,同样以浮点为例。留意到函数中有这么一个函数precalcCorrReferenceMono(),函数实现如下:
// Slopes the amplitude of the 'midBuffer' samples so that cross correlation
// is faster to calculate
void TDStretch::precalcCorrReferenceMono()
{
int i;
float temp;
for (i=0 ; i < (int)overlapLength ;i ++)
{
temp = (float)i * (float)(overlapLength - i);
pRefMidBuffer[i] = (float)(pMidBuffer[i] * temp);
}
}
这个可以理解为一个新的窗函数W[i],i=[0,overlapLength];temp是顶点在(overlapLength/2,overlapLength^2/4),与x轴相交于(0,0),(overlapLength,0)的二次函数,一个具有对称性的二次函数。pRefMidBuffer[i] = pMidBuffer[i]*W[i]再往下看calcCrossCorrMono计算互相关系数这个函数的实现:
double TDStretch::calcCrossCorrMono(const float *mixingPos, const float *compare) const
{
double corr;
double norm;
int i;
corr = norm = 0;
for (i = 1; i < overlapLength; i ++)
{
corr += mixingPos[i] * compare[i];
norm += mixingPos[i] * mixingPos[i];
}
if (norm < 1e-9) norm = 1.0; // to avoid div by zero
return corr / sqrt(norm);
}
想一下归一化互相关系数计算公式
E为累加,L=0,正负1,正负2,...
Rxy = E(x(n)y(n-L)) = E(x(n+L)y(n))
Ryx = E(y(n)x(n-L)) = E(y(n+L)x(n))
Pxy = Rxy / Sqrt(Rxy(0)Ryx(0))
Pxy的值在[-1,1]之间
可以看出他的计算方法和传统的互相关系数计算有着形式上的不同。我个人是这么理解的。pMidBuffer就是两个离散信号叠加的中间部分,两个信号叠加为了使叠加部分的更平滑一般的做法就是
.______________ .
|. . | . |
| . . -y[n] | . |
| . . | . |
| . . | -> . |
| . . | . |
| x[n] . . | . y[n] |
_____________. . .______________
y[n]的和x[n]叠加的部分应该满足以上这种情况以得到比较好的平滑质量。TDStretch::seekBestOverlapPositionMono类函数实现了这样的叠加过程,只不过做了相当的优化工作,所以在void TDStretch::processSamples()类成员函数中:
memcpy(pMidBuffer, inputBuffer.ptrBegin() + channels * (offset + temp + overlapLength),channels * sizeof(SAMPLETYPE) * overlapLength);
pMidBuffer直接先从x[n]取值,compare就是x[n+overlapLength],把seekBestOverlapPositionMono做以下变形,方便理解:
int i=0,j=0,bestcorr=0;
double crosscorr = 0,norm = 0,tmp = 0;
for (i = 0; i < seekLength;i++)
{
for (j = 0; j < overlapLength;j++)
{
mixingPos[j] = inputBuffer[j] * (overlapLength - j);
compare[j] = inputBuffer[i + j] * j;
corr += compare[j] * mixingPos[j];
norm += mixingPos[j]*mixingPos[j];
}
corr = corr / sqrt(norm);
tmp = (double)(2 * i - seekLength) / seekLength;
corr = ((corr + 0.1) * (1.0 - 0.25 * tmp * tmp));
if corr > bestcorr)
{
// found new best offset candidate
bestcorr = corr;
bestoffset = i;
}
}
注意到在j的循环里面,mixingPos[j]* (overlapLength - j)和j的值和i值无关,为了提升性能,可以在i的循环外先算出mixingPos[j]* j*(overlapLength - j)的值,seekBestOverlapPositionMono函数就是优化后的算法结构。
这样子就可以理解为y[m] = x[m]*w[m]*w[N-m],w[m]的镜像函数是w[N-m]然后和x[n]通过互相关系数计算出最相似的位置作为叠加的位置。
tmp = (double)(2 * i - seekLength) / seekLength;
corr = ((corr + 0.1) * (1.0 - 0.25 * tmp * tmp));
画出(1.0 - 0.25 * tmp * tmp))的图形就很好理解,可以认为人为的对corr进行修正,越靠近叠加区域中点,corr可以取得更大的相关性,把最相似点的位置尽量往中间靠。
至此,ST的大部分源码已经分析完毕,将在下一节中提取算法改良,无非就是一个总结。