SoundTouch音频处理库源码分析及算法提取(7)

 上一节说到TDStretch类成员函数processSamples,粗略分析了一下大概。至于流程是通过TDStretch类成员函数putsamples调用processSamples进行处理,我们不难联想到前面对于SoundTouch类成员函数putSamples的分析。TDStretch类成员函数putSamples实现如下:
// Adds 'numsamples' pcs of samples from the 'samples' memory position into
// the input of the object.
void TDStretch::putSamples(const SAMPLETYPE *samples, uint nSamples)
{
    // Add the samples into the input buffer
    inputBuffer.putSamples(samples, nSamples);
    // Process the samples in input buffer
    processSamples();
}
先拷贝长度为nSamples的samples数据到inputbuffer,然后调用processSamples进行处理。TDStretch类的核心就是这个成员函数的实现,以下将详细分析一下这个类成员函数的实现。// Processes as many processing frames of the samples 'inputBuffer', store
// the result into 'outputBuffer'
void TDStretch::processSamples()
{
    int ovlSkip, offset;
    int temp;

    while ((int)inputBuffer.numSamples() >= sampleReq)
    {
        // If tempo differs from the normal ('SCALE'), scan for the best overlapping
        // position
        offset = seekBestOverlapPosition(inputBuffer.ptrBegin());

        // Mix the samples in the 'inputBuffer' at position of 'offset' with the
        // samples in 'midBuffer' using sliding overlapping
        // ... first partially overlap with the end of the previous sequence
        // (that's in 'midBuffer')
        overlap(outputBuffer.ptrEnd((uint)overlapLength), inputBuffer.ptrBegin(), (uint)offset);
        outputBuffer.putSamples((uint)overlapLength);

        // ... then copy sequence samples from 'inputBuffer' to output:
        temp = (seekLength / 2 - offset);

        // length of sequence
        temp = (seekWindowLength - 2 * overlapLength);

        // crosscheck that we don't have buffer overflow...
        if ((int)inputBuffer.numSamples() < (offset + temp + overlapLength * 2))
        {
            continue;    // just in case, shouldn't really happen
        }

        outputBuffer.putSamples(inputBuffer.ptrBegin() + channels * (offset + overlapLength), (uint)temp);

        // Copies the end of the current sequence from 'inputBuffer' to
        // 'midBuffer' for being mixed with the beginning of the next
        // processing sequence and so on
        assert((offset + temp + overlapLength * 2) <= (int)inputBuffer.numSamples());
        memcpy(pMidBuffer, inputBuffer.ptrBegin() + channels * (offset + temp + 
overlapLength),
            channels * sizeof(SAMPLETYPE) * overlapLength);

        // Remove the processed samples from the input buffer. Update
        // the difference between integer & nominal skip step to 'skipFract'
        // in order to prevent the error from accumulating over time.
        skipFract += nominalSkip;   // real skip size
        ovlSkip = (int)skipFract;   // rounded to integer skip
        skipFract -= ovlSkip;       // maintain the fraction part, i.e. real vs. 
integer skip
        inputBuffer.receiveSamples((uint)ovlSkip);
    }
}
首先,sampleReq就是上一节提到的计算得到的参数,音频伸缩的长度。先判断一下inputBuffer的长度是否达到sampleReq的长度,如果达到。则通过调用类成员函数seekBestOverlapPosition(inputBuffer.ptrBegin());从输入的buffer中找一个最相似的点。我们看看类成员函数seekBestOverlapPosition的实现:
// Seeks for the optimal overlap-mixing position.
int TDStretch::seekBestOverlapPosition(const SAMPLETYPE *refPos)
{
    if (channels == 2)
    {
        // stereo sound
        if (bQuickSeek)
        {
            return seekBestOverlapPositionStereoQuick(refPos);
        }
        else
        {
            return seekBestOverlapPositionStereo(refPos);
        }
    }
    else
    {
        // mono sound
        if (bQuickSeek)
        {
            return seekBestOverlapPositionMonoQuick(refPos);
        }
        else
        {
            return seekBestOverlapPositionMono(refPos);
        }
    }
}
同样以单声道为例,便于理解,通过判断bQuickSeek这个条件变量,分情况调用seekBestOverlapPositionMonoQuick和seekBestOverlapPositionMono。
// Seeks for the optimal overlap-mixing position. The 'mono' version of the
// routine
//
// The best position is determined as the position where the two overlapped
// sample sequences are 'most alike', in terms of the highest cross-correlation
// value over the overlapping period
int TDStretch::seekBestOverlapPositionMonoQuick(const SAMPLETYPE *refPos)
{
    int j;
    int bestOffs;
    double bestCorr, corr;
    int scanCount, corrOffset, tempOffset;

    // Slopes the amplitude of the 'midBuffer' samples
    precalcCorrReferenceMono();

    bestCorr = FLT_MIN;
    bestOffs = _scanOffsets[0][0];
    corrOffset = 0;
    tempOffset = 0;

    // Scans for the best correlation value using four-pass hierarchical search.
    //
    // The look-up table 'scans' has hierarchical position adjusting steps.
    // In first pass the routine searhes for the highest correlation with
    // relatively coarse steps, then rescans the neighbourhood of the highest
    // correlation with better resolution and so on.
    for (scanCount = 0;scanCount < 4; scanCount ++)
    {
        j = 0;
        while (_scanOffsets[scanCount][j])
        {
            tempOffset = corrOffset + _scanOffsets[scanCount][j];
            if (tempOffset >= seekLength) break;

            // Calculates correlation value for the mixing position corresponding
            // to 'tempOffset'
            corr = (double)calcCrossCorrMono(refPos + tempOffset, pRefMidBuffer);
            // heuristic rule to slightly favour values close to mid of the range
            double tmp = (double)(2 * tempOffset - seekLength) / seekLength;
            corr = ((corr + 0.1) * (1.0 - 0.25 * tmp * tmp));

            // Checks for the highest correlation value
            if (corr > bestCorr)
            {
                bestCorr = corr;
                bestOffs = tempOffset;
            }
            j ++;
        }
        corrOffset = bestOffs;
    }
    // clear cross correlation routine state if necessary (is so e.g. in MMX 
routines).
    clearCrossCorrState();

    return bestOffs;
}

// Seeks for the optimal overlap-mixing position. The 'mono' version of the
// routine
//
// The best position is determined as the position where the two overlapped
// sample sequences are 'most alike', in terms of the highest cross-correlation
// value over the overlapping period
int TDStretch::seekBestOverlapPositionMono(const SAMPLETYPE *refPos)
{
    int bestOffs;
    double bestCorr, corr;
    int tempOffset;
    const SAMPLETYPE *compare;

    // Slopes the amplitude of the 'midBuffer' samples
    precalcCorrReferenceMono();

    bestCorr = FLT_MIN;
    bestOffs = 0;

    // Scans for the best correlation value by testing each possible position
    // over the permitted range.
    for (tempOffset = 0; tempOffset < seekLength; tempOffset ++)
    {
        compare = refPos + tempOffset;

        // Calculates correlation value for the mixing position corresponding
        // to 'tempOffset'
        corr = (double)calcCrossCorrMono(pRefMidBuffer, compare);
        // heuristic rule to slightly favour values close to mid of the range
        double tmp = (double)(2 * tempOffset - seekLength) / seekLength;
        corr = ((corr + 0.1) * (1.0 - 0.25 * tmp * tmp));

        // Checks for the highest correlation value
        if (corr > bestCorr)
        {
            bestCorr = corr;
            bestOffs = tempOffset;
        }
    }
    // clear cross correlation routine state if necessary (is so e.g. in MMX 
routines).
    clearCrossCorrState();
    return bestOffs;
}
可以看出这两个函数大大的不同,其实却是大同小异,先分析一下TDStretch::seekBestOverlapPositionMono,这是一个中规中矩的实现,同样以浮点为例。留意到函数中有这么一个函数precalcCorrReferenceMono(),函数实现如下:
// Slopes the amplitude of the 'midBuffer' samples so that cross correlation
// is faster to calculate
void TDStretch::precalcCorrReferenceMono()
{
    int i;
    float temp;

    for (i=0 ; i < (int)overlapLength ;i ++)
    {
        temp = (float)i * (float)(overlapLength - i);
        pRefMidBuffer[i] = (float)(pMidBuffer[i] * temp);
    }
}
这个可以理解为一个新的窗函数W[i],i=[0,overlapLength];temp是顶点在(overlapLength/2,overlapLength^2/4),与x轴相交于(0,0),(overlapLength,0)的二次函数,一个具有对称性的二次函数。pRefMidBuffer[i] = pMidBuffer[i]*W[i]再往下看calcCrossCorrMono计算互相关系数这个函数的实现:
double TDStretch::calcCrossCorrMono(const float *mixingPos, const float *compare) const
{
    double corr;
    double norm;
    int i;

    corr = norm = 0;
    for (i = 1; i < overlapLength; i ++)
    {
        corr += mixingPos[i] * compare[i];
        norm += mixingPos[i] * mixingPos[i];
    }

    if (norm < 1e-9) norm = 1.0;    // to avoid div by zero
    return corr / sqrt(norm);
}
想一下归一化互相关系数计算公式
E为累加,L=0,正负1,正负2,...
Rxy = E(x(n)y(n-L)) = E(x(n+L)y(n))
Ryx = E(y(n)x(n-L)) = E(y(n+L)x(n))
Pxy = Rxy / Sqrt(Rxy(0)Ryx(0))
Pxy的值在[-1,1]之间
可以看出他的计算方法和传统的互相关系数计算有着形式上的不同。我个人是这么理解的。pMidBuffer就是两个离散信号叠加的中间部分,两个信号叠加为了使叠加部分的更平滑一般的做法就是
          .______________                      .
|.          .           |                    . |
|  .          .  -y[n]  |                  .   |
|    .          .       |                .     |
|      .          .     |     ->       .       |
|        .          .   |            .         |
| x[n]     .          . |          .    y[n]   |
_____________.          .        .______________
y[n]的和x[n]叠加的部分应该满足以上这种情况以得到比较好的平滑质量。TDStretch::seekBestOverlapPositionMono类函数实现了这样的叠加过程,只不过做了相当的优化工作,所以在void TDStretch::processSamples()类成员函数中:
memcpy(pMidBuffer, inputBuffer.ptrBegin() + channels * (offset + temp + overlapLength),channels * sizeof(SAMPLETYPE) * overlapLength);
pMidBuffer直接先从x[n]取值,compare就是x[n+overlapLength],把seekBestOverlapPositionMono做以下变形,方便理解:
int i=0,j=0,bestcorr=0;
double crosscorr = 0,norm = 0,tmp = 0;
   for (i = 0; i < seekLength;i++)
   {
      for (j = 0; j < overlapLength;j++)
      {
         mixingPos[j] = inputBuffer[j] * (overlapLength - j);
         compare[j] = inputBuffer[i + j] * j;
         corr += compare[j] * mixingPos[j];
         norm += mixingPos[j]*mixingPos[j];
      }
      corr = corr / sqrt(norm);
      tmp = (double)(2 * i - seekLength) / seekLength;
      corr = ((corr + 0.1) * (1.0 - 0.25 * tmp * tmp));
      if corr > bestcorr)
      {
         // found new best offset candidate
         bestcorr = corr;
         bestoffset = i;
      }
   }
注意到在j的循环里面,mixingPos[j]* (overlapLength - j)和j的值和i值无关,为了提升性能,可以在i的循环外先算出mixingPos[j]* j*(overlapLength - j)的值,seekBestOverlapPositionMono函数就是优化后的算法结构。
这样子就可以理解为y[m] = x[m]*w[m]*w[N-m],w[m]的镜像函数是w[N-m]然后和x[n]通过互相关系数计算出最相似的位置作为叠加的位置。
tmp = (double)(2 * i - seekLength) / seekLength;
corr = ((corr + 0.1) * (1.0 - 0.25 * tmp * tmp));
画出(1.0 - 0.25 * tmp * tmp))的图形就很好理解,可以认为人为的对corr进行修正,越靠近叠加区域中点,corr可以取得更大的相关性,把最相似点的位置尽量往中间靠。
    至此,ST的大部分源码已经分析完毕,将在下一节中提取算法改良,无非就是一个总结。

你可能感兴趣的:(音频处理)