本播放器采用DirectShow的框架来开发,内部由一个source filter和相应的音视频解码器filter组成,实现了基于RTSP/RTP协议的MPEG4、AAC编码的音视流在线播放功能。
2 相关技术
2.1 DirectShow技术简介
DirectShow是Microsoft为开发高性能多媒体应用而开发的底层应用程序接口(API),它是DirectX家族的核心成员之一。DirectShow自身是通过一种系统内置的或程序员开发的过滤器(Filter)来控制和处理多媒体数据的体系结构。该体系结构定义了如何处理和控制过滤器内部及相互之间的多媒体数据流。每个过滤器都有输入或输出针(Pin), 或两者都有。
过滤器(Filter)是DirectShow的基本组成部分,是Filter Graph(过滤器图)中最小的功能模块,DirectShow将多媒体数据的处理分离成不同的步骤,这些不同的步骤由相应的Filter去处理。这样我们可以把不同的过滤器搭配在一起达到我们要求的来处理多媒体数据。过滤器根据实现功能的不同大致可分为3类:
(1) 源过滤器(Source Filters)。源过滤器负责得到原始媒体数据。这些媒体数据的来源包括本地硬盘或网络上的媒体文件、各种采集卡等。
(2) 转换过滤器(Transform Filters)。转换过滤器的任务是处理从其他过滤器中接收的数据,经过一定的处理后再传递给下一个过滤器。编解码器就是典型的转换过滤器。
(3) 表现过滤器(Rendering Filters)。表现过滤器对接收到的数据进行最后的处理。它做的工作有:把媒体数据保存为文件、将数据发送到网络、显示视频、回放音频等。
在DirectShow 系统之上是应用程序(Application) 。应用程序要按照程序所要实现的功能建立起相应的Filter Graph ,然后借助于Filter Graph Manager 来控制整个数据的处理过程。DirectShow 能在Filter Graph 运行的时候接收到各种事件,并通过消息的方式发送到应用程序。这样就实现了应用程序与DirectShow 系统之间的交互。
2.2 RTP/RTCP协议介绍
实时传输协议RTP(Realtime Transport Protocol)是针对Internet 上多媒体数据流的一个传输协议,1996 年由IETF( Internet 工程任务组) 的AVT小组作为RFC1889 发布AVT小组后来对该文档进行了不断改进,于2003年7月提出了代替RFC1889的RFC3550。RTP充分体现了应用层分帧这一现代通信协议的设计思想,允许其用户了解、调整甚至制定连续媒体的打包方案,该协议被广泛用于VoIP、视频等实时媒体的传送。RTP 协议包括RTP 和RTCP(RTP 控制协议) 两个关系十分密切的子协议:(1) RTP协议-传输具有实时特性的数据;(2)RTCP协议-监测QoS 和传送参与传输者的信息。
RTP(实时传输协议) 通常工作在UDP的上层,从上层接收多媒体信息码流(如MPEG-4视频) ,组装成RTP 数据包,然后发送给下层UDP ,相当于OSI 的会话层,提供同步和排序服务。故RTP 协议适用于传送连续性强的数据,如视频、音频等,并对网络引起的时延差错有一定的自适应能力。RTCP 为实时控制协议,用于管理控制信息,如监视网络的延时和带宽,一旦所传输的多媒体信息的带宽发生变化,接收端则通知发送端,广播符号化识别码和编码参数,达到控制传输质量的目的。此外,如果底层网络支持多点传播的话,RTP 还支持使用多点传播向多个目的端点发送数据。
关键技术的基本实现
3.1 数据接收Filter的实现
数据的接收就是要编写一个Source Filter,本播放器的Source Filter名称为CRtspFilter,它采用推模式继承自CSource。同时这个Source Filter带有两个输出PIN实例(音视频两个输出),相应的PIN类是CFilterOutputPin,它继承自CSourceStream,需要实现的几个主要API如下:
// 媒体类型的设置相应API:
HRESULT SetMediaType(const CMediaType *pMediaType);
HRESULT CheckMediaType(const CMediaType *pMediaType);
HRESULT GetMediaType(int iPosition, CMediaType *pmt);
// 数据的接收及传送
HRESULT FillBuffer(IMediaSample *pms);
大致过程如下:
{
unsigned int nTimeStamp = 0;
if(m_pType.IsVideo())
lActualDataSize = m_pParentFilter->GetVideoFrame(pData,lDataLen,&nTimeStamp);
else
lActualDataSize = m_pParentFilter->GetAudioFrame(pData,lDataLen,&nTimeStamp);
if(lActualDataSize > 0)
{
pms->SetActualDataLength(lActualDataSize);
REFERENCE_TIME tSampleStart = m_rtSampleTime;
//if (tSampleStart < 0)
//pms->SetPreroll(true);
REFERENCE_TIME tSampleEnd = 0;
if(m_pType.IsVideo())
tSampleEnd = tSampleStart + m_pParentFilter->m_recv.m_nVideoTimeDiff;//(m_pParentFilter->m_recv.m_nVideoTimeDiff * UNITS)/90000;
else
tSampleEnd = tSampleStart + m_pParentFilter->m_recv.m_nAudioTimeDiff;
m_rtSampleTime = tSampleEnd;
pms->SetTime(&tSampleStart, &tSampleEnd);
pms->SetSyncPoint(TRUE);
}
else
Sleep(200);
}
3.2. 数据的网络传输的实现
(1)其中播放控制采用RTSP协议来实现
2)媒体数据采用RTP协议来传输。由于流媒体服务端对不同编码的数据的拆包都有相应规则,所以客户端在接收后需要对其做相应的组包处理,MPEG4编码的组包逻辑关键过程如下:
//Build a vop
CRtpPacket *packet = NULL;
*nTimeStamp = 0;
while(1)
{
packet = m_VideoQueue->GetReadPacket();
if(packet == NULL)
{
Sleep(50);
packet = m_VideoQueue->GetNextPacket();
if(packet == NULL)
continue;
}
//parse rtp packet
packet->ParseRawData();
//时间戳与上一个不同,则认为是另一个VOP
if(*nTimeStamp != 0 && *nTimeStamp != packet->m_TimeStamp)
{
//重置包位置
m_VideoQueue->ReSetReadPacket();
break;
}
*nTimeStamp = packet->m_TimeStamp;
memcpy(pFrame + nOffSet,packet->m_payload,packet->m_PayloadLength);
nOffSet += packet->m_PayloadLength;
//标志位为1:这是VOP的最后一个(或仅有一个)RTP包
if(packet->m_HasMarker)
break;
}
3.3 数据解码及回放的实现
解码Filter使用的是ffdshow提供的解码器,在接收Filter的后面接上该解码Filter即可,最后接上Renderer Filter就可以把接收到的数据回放出来。
#include "ElemType.h" ElementaryType::ElementaryType() : m_cDecoderSpecific(0), m_tFrame(0), m_depth(0) { } ElementaryType::~ElementaryType() { } // void ElementaryType::SetCodeType(const char *psType) { if(psType == NULL) return; if(strncmp(psType,"MP4V-ES",sizeof("MP4V-ES")) == 0) m_type = Video_Mpeg4; else if(strncmp(psType,"MP4A-LATM",sizeof("MP4A-LATM")) == 0) m_type = Audio_AAC; else m_type = First_Video; } void ElementaryType::SetFrameSize(unsigned short nWidth,unsigned short nHeight) { m_cx = nWidth; m_cy = nHeight; } bool ElementaryType::IsVideo() { return (m_type > First_Video) && (m_type < First_Audio); } bool ElementaryType::SetType(const CMediaType* pmt) { if (m_mtChosen != *pmt) { m_mtChosen = *pmt; int idx = 0; CMediaType mtCompare; while (GetType(&mtCompare, idx)) { if (mtCompare == *pmt) { // if (m_type == Audio_AAC) // m_pHandler = new CoreAACHandler(); // else if ((m_type == Video_Mpeg4) && (idx == 0)) // m_pHandler = new DivxHandler(m_pDecoderSpecific, m_cDecoderSpecific); //// else if ((m_type == Video_H264) && (*pmt->FormatType() != FORMAT_MPEG2Video)) //// m_pHandler = new H264ByteStreamHandler(m_pDecoderSpecific, m_cDecoderSpecific); // else // m_pHandler = new NoChangeHandler(); return true; } idx++; } return false; } return true; } bool ElementaryType::GetType(CMediaType* pmt, int nType) { switch (m_type) { case Video_H264: break; case Video_Mpeg4: //case Video_H263: return GetType_Mpeg4V(pmt, nType); break; case Video_FOURCC: if (nType == 0) return GetType_FOURCC(pmt); break; case Audio_AAC: if (nType == 0) return GetType_AAC(pmt); break; case Audio_SAMR: case Audio_SAMW: if (nType == 0) return true;//GetType_SAMR(pmt); break; case Audio_WAVEFORMATEX: if (nType == 0) return GetType_WAVEFORMATEX(pmt); break; } return false; } bool ElementaryType::GetType_Mpeg4V(CMediaType* pmt, int n) { DWORD fourcc; if (n == 0) fourcc = DWORD('V4PM'); else if (n == 1) fourcc = DWORD('XVID'); else if (n == 2) fourcc = DWORD('DVIX'); else return false; pmt->InitMediaType(); pmt->SetType(&MEDIATYPE_Video); FOURCCMap divx(fourcc); pmt->SetSubtype(&divx); pmt->SetFormatType(&FORMAT_VideoInfo); m_cDecoderSpecific = 0; VIDEOINFOHEADER* pVI = (VIDEOINFOHEADER*)pmt->AllocFormatBuffer(sizeof(VIDEOINFOHEADER) + m_cDecoderSpecific); ZeroMemory(pVI, sizeof(VIDEOINFOHEADER)); pVI->bmiHeader.biSize = sizeof(BITMAPINFOHEADER); pVI->bmiHeader.biPlanes = 1; pVI->bmiHeader.biBitCount = 24; pVI->bmiHeader.biWidth = m_cx; pVI->bmiHeader.biHeight = m_cy; pVI->bmiHeader.biSizeImage = DIBSIZE(pVI->bmiHeader); pVI->bmiHeader.biCompression = fourcc; //pVI->AvgTimePerFrame = 1251079;//m_tFrame; //? // BYTE* pDecSpecific = (BYTE*)(pVI+1); // //m_pDecoderSpecific = new BYTE[33]; // BYTE temp[33] = {0x00,0x00,0x01,0xB0,0x08,0x00,0x00,0x01,0xB5,0x0E,0xA0,0x20,0x20,0x2F,0x00,0x00,0x01,0x00,0x00,0x00,0x01,0x20,0x00,0xC7,0x88,0xBA,0x98,0x50,0x58,0x41,0x21,0x46,0x3F}; // CopyMemory(pDecSpecific, temp, m_cDecoderSpecific); return true; } bool ElementaryType::GetType_FOURCC(CMediaType* pmt) { pmt->InitMediaType(); pmt->SetType(&MEDIATYPE_Video); pmt->SetFormatType(&FORMAT_VideoInfo); VIDEOINFOHEADER* pVI = (VIDEOINFOHEADER*)pmt->AllocFormatBuffer(sizeof(VIDEOINFOHEADER)); ZeroMemory(pVI, sizeof(VIDEOINFOHEADER)); pVI->bmiHeader.biSize = sizeof(BITMAPINFOHEADER); pVI->bmiHeader.biPlanes = 1; pVI->bmiHeader.biBitCount = (WORD)m_depth; pVI->bmiHeader.biWidth = m_cx; pVI->bmiHeader.biHeight = m_cy; pVI->bmiHeader.biSizeImage = DIBSIZE(pVI->bmiHeader); pVI->bmiHeader.biCompression = BI_RGB; pVI->AvgTimePerFrame = m_tFrame; pmt->SetSampleSize(pVI->bmiHeader.biSizeImage); if (m_fourcc == DWORD('rle ')) { pmt->SetSubtype(&MEDIASUBTYPE_QTRle); } else { FOURCCMap fcc(m_fourcc); pmt->SetSubtype(&fcc); pVI->bmiHeader.biCompression = m_fourcc; } return true; } // static const int ElementaryType::SamplingFrequencies[] = { 96000, 88200, 64000, 48000, 44100, 32000, 24000, 22050, 16000, 12000, 11025, 8000, 7350, 0, 0, 0, }; bool ElementaryType::GetType_AAC(CMediaType* pmt) { const int WAVE_FORMAT_AAC = 0x00ff; // set for Free AAC Decoder faad pmt->InitMediaType(); pmt->SetType(&MEDIATYPE_Audio); FOURCCMap faad(WAVE_FORMAT_AAC); pmt->SetSubtype(&faad); pmt->SetFormatType(&FORMAT_WaveFormatEx); m_cDecoderSpecific = 2; WAVEFORMATEX* pwfx = (WAVEFORMATEX*)pmt->AllocFormatBuffer(sizeof(WAVEFORMATEX) + m_cDecoderSpecific); ZeroMemory(pwfx, sizeof(WAVEFORMATEX)); pwfx->cbSize = WORD(m_cDecoderSpecific); m_pDecoderSpecific = new BYTE[m_cDecoderSpecific]; m_pDecoderSpecific[0] = 0x14; m_pDecoderSpecific[1] = 0x88; CopyMemory((pwfx+1), m_pDecoderSpecific, m_cDecoderSpecific); // parse decoder-specific info to get rate/channels //long samplerate = ((m_pDecoderSpecific[0] & 0x7) << 1) + ((m_pDecoderSpecific[1] & 0x80) >> 7); pwfx->nSamplesPerSec = 12000;//SamplingFrequencies[samplerate]; pwfx->nBlockAlign = 1; pwfx->wBitsPerSample = 16; pwfx->wFormatTag = WAVE_FORMAT_AAC; pwfx->nChannels = 1;//(m_pDecoderSpecific[1] & 0x78) >> 3; return true; } bool ElementaryType::GetType_WAVEFORMATEX(CMediaType* pmt) { // the dec-specific info is a waveformatex WAVEFORMATEX* pwfx = (WAVEFORMATEX*)(BYTE*)m_pDecoderSpecific; if ((m_cDecoderSpecific < sizeof(WAVEFORMATEX)) || (m_cDecoderSpecific < int(sizeof(WAVEFORMATEX) + pwfx->cbSize))) { return false; } pmt->InitMediaType(); pmt->SetType(&MEDIATYPE_Audio); FOURCCMap subtype(pwfx->wFormatTag); pmt->SetSubtype(&subtype); pmt->SetFormatType(&FORMAT_WaveFormatEx); int cLen = pwfx->cbSize + sizeof(WAVEFORMATEX); WAVEFORMATEX* pwfxMT = (WAVEFORMATEX*)pmt->AllocFormatBuffer(cLen); CopyMemory(pwfxMT, pwfx, cLen); return true; }
#ifndef __H_ElementaryType__ #define __H_ElementaryType__ #include <streams.h> class ElementaryType { public: ElementaryType(); ~ElementaryType(); void SetCodeType(const char *psType); bool IsVideo(); bool GetType(CMediaType* pmt, int nType); bool SetType(const CMediaType* pmt); void SetFrameSize(unsigned short nWidth,unsigned short nHeight); // these are the types we currently understand enum eESType { First_Video = 0, Video_Mpeg4, Video_H264, Video_H263, Video_FOURCC, First_Audio, Audio_AAC = First_Audio, Audio_SAMR, Audio_SAMW, Audio_WAVEFORMATEX, }; private: bool GetType_H264(CMediaType* pmt); bool GetType_H264ByteStream(CMediaType* pmt); bool GetType_Mpeg4V(CMediaType* pmt, int n); bool GetType_AAC(CMediaType* pmt); bool GetType_SAMR(CMediaType* pmt); bool GetType_SAMW(CMediaType* pmt); bool GetType_WAVEFORMATEX(CMediaType* pmt); //bool ParseDescriptor(Atom* patmESD); bool GetType_FOURCC(CMediaType* pmt); private: eESType m_type; BYTE* m_pDecoderSpecific; long m_cDecoderSpecific; unsigned short m_cx; unsigned short m_cy; static const int SamplingFrequencies[]; REFERENCE_TIME m_tFrame; // fourcc and bitdepth -- for uncompressed or RLE format DWORD m_fourcc; int m_depth; CMediaType m_mtChosen; //FormatHandler* m_pHandler; }; #endif
#include "Queue.h" CRtpQueue::CRtpQueue(unsigned int MaxSize) { m_head = m_tail = 0; m_length = 0; m_max = MaxSize; m_Packets = new CRtpPacket[MaxSize]; } //Delete a queue CRtpQueue::~CRtpQueue() { if(m_Packets) delete []m_Packets; } /** * @brief delete a queue */ void CRtpQueue::QueueFlush() { m_head = m_tail = 0; m_length = 0; } //Get length of queue bool CRtpQueue::IsFull(unsigned int tail) { if(((m_tail + 1) % m_max == m_head)) return true; else if((m_tail + 1) % m_max < m_head) { if((tail + 1) % m_max > m_head) return true; } return false; } /** * @brief push tail of queue * @param m_queue queue * @param data node data */ bool CRtpQueue::PushPacket(unsigned char *pBuff,unsigned int nDataLen,unsigned short nNum) { unsigned int tail = nNum % m_max; if(IsFull(tail)) return false; if((m_head <= m_tail && m_head <= tail) || (m_tail <= m_head && tail <= m_head)) { CRtpPacket *packet = &m_Packets[tail]; //LockQueue(); memset(packet->m_packet,0,sizeof(packet->m_packet)); memcpy(packet->m_packet,pBuff,nDataLen); packet->m_PacketLength = nDataLen; packet->m_HasData = 1; //m_tail是表示最大值 if((m_head <= m_tail && m_tail <= tail) || (m_tail <= tail && tail <= m_head)) m_tail = (tail + 1) % m_max; else { return true; } } else //滞后的包 return true; //UnlockQueue(); //Wakeup(); return true; } /** * @brief pop data at the head of queue * @param m_queue queue * @return queue node data */ CRtpPacket* CRtpQueue::GetReadPacket() { CRtpPacket *packet = NULL; if(m_head != m_tail && (m_max + m_tail - m_head) % m_max > 30) //not empty { //LockQueue(); packet = &m_Packets[m_head]; if(!packet->m_HasData) return NULL; m_head = (m_head + 1) % m_max; if(packet) packet->m_HasData = 0; //m_length--; //UnlockQueue(m_queue); } else { return packet; } return packet; } CRtpPacket* CRtpQueue::GetNextPacket() { CRtpPacket *packet = NULL; if(m_head != m_tail && (m_max + m_tail - m_head) % m_max > 30) //not empty { //LockQueue(); packet = &m_Packets[m_head]; m_head = (m_head + 1) % m_max; if(packet) { if(!packet->m_HasData) return NULL; else packet->m_HasData = 0; } //m_length--; //UnlockQueue(m_queue); } return packet; } // void CRtpQueue::ReSetReadPacket() { m_head = (m_head - 1) % m_max; CRtpPacket *packet = &m_Packets[m_head]; if(packet) packet->m_HasData = 1; } void CRtpQueue::LockQueue() { //pthread_mutex_lock(&m_mutex); } void CRtpQueue::UnlockQueue() { //pthread_mutex_unlock(&m_mutex); } int CRtpQueue::Wakeup() { return 0;//pthread_cond_broadcast(&m_cond); } int CRtpQueue::Wait() { //pthread_cond_wait(&m_cond,&m_mutex); //pthread_mutex_unlock(&m_mutex); return 0; } /* timeout unit : millisecond */ int CRtpQueue::WaitTimeout(int timeout) { if(timeout < 0) timeout = 0; if(0 == timeout) { //pthread_cond_wait(&m_cond, &m_mutex); //pthread_mutex_unlock(&m_mutex); } else { /*struct timespec abstime; struct timeval tv; long s; gettimeofday(&tv, NULL); s = timeout * 1000 + tv.tv_usec; tv.tv_sec += s / 1000000; tv.tv_usec = s % 1000000; abstime.tv_sec = tv.tv_sec; abstime.tv_nsec = tv.tv_usec * 1000; //pthread_cond_timedwait(&m_cond, &m_mutex, &abstime); //pthread_mutex_unlock(&m_mutex);*/ } return 0; }
1. 播放器简介