如何实现音视频同步 (live555)

live555中视频和音频是分别进行编码的,如何实现两者的同步呢?
如果可以做到让视频和音频的时间戳,都与NTP时间保持同步,就可达到音视频同步的目的。

Network Time Protocol (NTP) is a networking protocol for clock synchronization between computer systems overpacket-switched, variable-latency data networks.

在live555中是如何实现这种机制的呢?
总体思路是:

  • RTSP服务端利用RTCP协议中的Sender Report将NTP Timestamp发送到RTSP客户端
  • RTSP客户端(数据的接收方)把A/V的RTP时间戳同步到RTCP的绝对时间(NTP Timestamp),实现A/V同步
    这个绝对时间就是当前时间距离Jan 1 1900 00:00:00的差值。

首先看一下未加入同步机制的时间戳代码:

void RTPReceptionStats::noteIncomingPacket(u_int16_t seqNum, 
                                           u_int32_t rtpTimestamp,
                                           unsigned timestampFrequency,
                                           Boolean useForJitterCalculation,
                                           struct timeval& resultPresentationTime,
                                           Boolean& resultHasBeenSyncedUsingRTCP,
                                           unsigned packetSize) 
{
    ...

    // Record the inter-packet delay
    struct timeval timeNow;
    gettimeofday(&timeNow, NULL);

    ...

    // Return the 'presentation time' that corresponds to "rtpTimestamp":
    if (fSyncTime.tv_sec == 0 && fSyncTime.tv_usec == 0) 
    {
        // This is the first timestamp that we've seen, so use the current
        // 'wall clock' time as the synchronization time.  (This will be
        // corrected later when we receive RTCP SRs.)
        fSyncTimestamp = rtpTimestamp; // 首个RTP Timestamp
        fSyncTime      = timeNow; // 使用当前系统时间作为初始参考时间戳
    }

    int timestampDiff = rtpTimestamp - fSyncTimestamp;

    // Note: This works even if the timestamp wraps around
    // (as long as "int" is 32 bits)

    // Divide this by the timestamp frequency to get real time:
    double timeDiff = timestampDiff/(double)timestampFrequency;

    // Add this to the 'sync time' to get our result:
    unsigned const million = 1000000;
    unsigned seconds, uSeconds;

    if (timeDiff >= 0.0) 
    {
        // 计算时间戳
        seconds  = fSyncTime.tv_sec  + (unsigned)(timeDiff);
        uSeconds = fSyncTime.tv_usec + (unsigned)((timeDiff - (unsigned)timeDiff)*million);

        if (uSeconds >= million) 
        {
            uSeconds -= million;
            ++seconds;
        }
    } 
    else 
    {
        timeDiff = -timeDiff;
        seconds  = fSyncTime.tv_sec  - (unsigned)(timeDiff);
        uSeconds = fSyncTime.tv_usec - (unsigned)((timeDiff - (unsigned)timeDiff)*million);
        if ((int)uSeconds < 0) 
        {
            uSeconds += million;
            --seconds;
        }
    }

    resultPresentationTime.tv_sec  = seconds;
    resultPresentationTime.tv_usec = uSeconds;
    resultHasBeenSyncedUsingRTCP   = fHasBeenSynchronized;

    // Save these as the new synchronization timestamp & time:
    fSyncTimestamp = rtpTimestamp;
    fSyncTime      = resultPresentationTime;

    fPreviousPacketRTPTimestamp = rtpTimestamp;
}

其中有两个重要的参数: fSyncTimestampfSyncTime;

class RTPReceptionStats {
...

private:
  // Used to convert from RTP timestamp to 'wall clock' time:
  Boolean fHasBeenSynchronized;
  u_int32_t fSyncTimestamp;
  struct timeval fSyncTime;
};
  • fSyncTimestamp
    RTP Timestamp, 默认第N帧的rtpTimestamp为第N+1帧的fSyncTimestamp
  • fSyncTime
    'wall clock' time, 默认第N帧的'wall clock' time为第N+1帧的fSyncTime

RTPReceptionStats::noteIncomingPacket的实质是:
将 RTP timestamp 转换为 'wall clock' time

获取首个RTP时,将系统时间作为首个'wall clock' time
后续,当RTP timestamp发生变化时,要将变化的部分转换为real time:

int timestampDiff = rtpTimestamp - fSyncTimestamp;
 // Divide this by the timestamp frequency to get real time: 
double timeDiff = timestampDiff/(double)timestampFrequency;

然后将该部分改变反映到'wall clock' time上, 如:

seconds = fSyncTime.tv_sec + (unsigned)(timeDiff); 
uSeconds = fSyncTime.tv_usec + (unsigned)((timeDiff - (unsigned)timeDiff)*million);

可以看出以上的逻辑中,完全取决于系统时间的精确度,没有任何校正机制。

live555是在哪里实现时间校正的呢?
答案是利用RTSP客户端(数据的接收者)利用RTCP返回的Sender Report, 然后利用其中的NTP TimestampRTP timestamp, 对fSyncTimestampfSyncTime进行校正。

如何实现音视频同步 (live555)_第1张图片
Part of Sender Report RTCP Packet

校正程序如下:

void RTPReceptionStats::noteIncomingSR(u_int32_t ntpTimestampMSW,
                                       u_int32_t ntpTimestampLSW,
                                       u_int32_t rtpTimestamp) 
{
    fLastReceivedSR_NTPmsw = ntpTimestampMSW;
    fLastReceivedSR_NTPlsw = ntpTimestampLSW;

    gettimeofday(&fLastReceivedSR_time, NULL);

    // Use this SR to update time synchronization information:
    // ntpTimestampMSW : NTP timestamp, most significant word (64位NTP时间戳的高32位)
    fSyncTimestamp      = rtpTimestamp;
    fSyncTime.tv_sec    = ntpTimestampMSW - 0x83AA7E80; // 1/1/1900 -> 1/1/1970

    // ntpTimestampLSW  : NTP timestamp, least significant word (64位NTP时间戳的低32位)
    double microseconds = (ntpTimestampLSW * 15625.0) / 0x04000000; // 10^6/2^32
    fSyncTime.tv_usec   = (unsigned)(microseconds + 0.5);
}

通过Sender Report,分别对视频和音频的时间及时进行校正,即可保证视音频同步。

References:

https://en.wikipedia.org/wiki/Network_Time_Protocol
RTP: A Transport Protocol for Real-Time Applications

你可能感兴趣的:(如何实现音视频同步 (live555))