1. Overview
RTSP / RTP protocol is composed by the three sub-protocols -- RTSP, RTP and RTCP.
2. RTSP
The Real Time Streaming Protocol (RTSP) is a network control protocol designed for use in entertainment and communications systems to control streaming media servers. The protocol is used for establishing and controlling media sessions between end points.
RTSP uses TCP to maintain an end-to-end connection.
Presented here are the basic RTSP requests. The default transport layer port number is 554.
OPTIONS, DESCRIBE, SETUP, TEARDOWN, PLAY, PAUSE, GET_PARAMETER, SET_PARAMETER.
3. RTP
The Real-time Transport Protocol (RTP) defines a standardized packet format for delivering audio and video over IP networks.
RTP is used in conjunction with the RTP Control Protocol (RTCP). While RTP carries the media streams (e.g., audio and video), RTCP is used to monitor transmission statistics and quality of service (QoS) and aids synchronization of multiple streams. RTP is originated and received on even port numbers and the associated RTCP communication uses the next higher odd port number.
RTP packet header |
|||||||
bit offset |
0-1 |
2 |
3 |
4-7 |
8 |
9-15 |
16-31 |
0 |
Version |
P |
X |
CC |
M |
PT |
Sequence Number |
32 |
Timestamp |
||||||
64 |
SSRC identifier |
||||||
96 |
CSRC identifiers |
||||||
96+32×CC |
Profile-specific extension header ID |
Extension header length |
|||||
128+32×CC |
Extension header |
The RTP header has a minimum size of 12 bytes. After the header, optional header extensions may be present. This is followed by the RTP payload, the format of which is determined by the particular class of application. The fields in the header are as follows:
· Version: (2 bits) Indicates the version of the protocol. Current version is 2.
· P (Padding): (1 bit) Used to indicate if there are extra padding bytes at the end of the RTP packet. A padding might be used to fill up a block of certain size, for example as required by an encryption algorithm. The last byte of the padding contains the number of padding bytes that were added (including itself).
· X (Extension): (1 bit) Indicates presence of an Extension header between standard header and payload data. This is application or profile specific.
· CC (CSRC Count): (4 bits) Contains the number of CSRC identifiers (defined below) that follow the fixed header.
· M (Marker): (1 bit) Used at the application level and defined by a profile. If it is set, it means that the current data has some special relevance for the application.
· PT (Payload Type): (7 bits) Indicates the format of the payload and determines its interpretation by the application. This is specified by an RTP profile. For example, see RTP Profile for audio and video conferences with minimal control (RFC 3551).
· Sequence Number: (16 bits) The sequence number is incremented by one for each RTP data packet sent and is to be used by the receiver to detect packet loss and to restore packet sequence. The RTP does not specify any action on packet loss; it is left to the application to take appropriate action. For example, video applications may play the last known frame in place of the missing frame. According to RFC 3550, the initial value of the sequence number should be random to make known-plaintext attacks on encryption more difficult. RTP provides no guarantee of delivery, but the presence of sequence numbers makes it possible to detect missing packets.
· Timestamp: (32 bits) Used to enable the receiver to play back the received samples at appropriate intervals. When several media streams are present, the timestamps are independent in each stream, and may not be relied upon for media synchronization. The granularity of the timing is application specific. For example, an audio application that samples data once every 125 µs (8 kHz, a common sample rate in digital telephony) could use that value as its clock resolution. The clock granularity is one of the details that is specified in the RTP profile for an application.
· SSRC: (32 bits) Synchronization source identifier uniquely identifies the source of a stream. The synchronization sources within the same RTP session will be unique.
· CSRC: Contributing source IDs enumerate contributing sources to a stream which has been generated from multiple sources.
· Extension header: (optional) The first 32-bit word contains a profile-specific identifier (16 bits) and a length specifier (16 bits) that indicates the length of the extension (EHL=extension header length) in 32-bit units, excluding the 32 bits of the extension header.
Examples of RTP header:
0 1 2 3 4 5 6 7 8 9 a b c d e f
00h: 80 60 C6 C6 01 49 A5 10 00 00 00 00 00 00 01 BA
10h: 44 14 9C C8 E4 01 00 5F 6B F8 00 00 01 E0 5F 8C
=> 0x80=1000 0000 => Version=10bit, P=0bit, X=0bit, CC=0000bit
=> 0x60=0110 0000 => M=0bit, PT=1100000bit=0x60
=> 0xC6C6=50886dec => SequenceNumber=50886dec
=> 0x0149A510=21603600dec => Timestamp=21603600dec
=> 0x00000000 => SSRC=0x00
=> 0x000001BA => CSRC=0x01BA
Reference:
Ø RTP Payload Format for H.264 Video
Ø RTP Payload Format for MPEG-4 Audio/Visual Streams
4. RTCP
The RTP Control Protocol (RTCP) is a sister protocol of the Real-time Transport Protocol (RTP).
RTCP provides out-of-band statistics and control information for an RTP flow. It partners RTP in the delivery and packaging of multimedia data, but does not transport any media streams itself. Typically RTP will be sent on an even-numbered UDP port, with RTCP messages being sent over the next higher odd-numbered port. The primary function of RTCP is to provide feedback on the quality of service (QoS) in media distribution by periodically sending statistics information to participants in a streaming multimedia session.
RTCP gathers statistics for a media connection and information such as transmitted octet and packet counts, lost packet counts, jitter, and round-trip delay time. An application may use this information to control quality of service parameters, perhaps by limiting flow, or using a different codec.
RTCP distinguishes several types of packets: sender report, receiver report, source description, and bye. In addition, the protocol is extensible and allows application-specific RTCP packets. A standards-based extension of RTCP is the Extended Report packet type introduced by RFC 3611.
附1. RTSP Interleaved Frame
参考:Real Time Streaming Protocol (RTSP)
rfc2326 10.12 Embedded (Interleaved) Binary Data
Interleaved binary data SHOULD only be used if RTSP is carried over TCP.
Stream data such as RTP packets is encapsulated by an ASCII dollar sign (24 hexadecimal), followed by a one-byte channel identifier, followed by the length of the encapsulated binary data as a binary, two-byte integer in network byte order. The stream data follows immediately afterwards, without a CRLF, but including the upper-layer protocol headers. Each $ block contains exactly one upper-layer protocol data unit, e.g., one RTP packet.
The channel identifier is defined in the Transport header with the interleaved parameter(Section 12.39).
When the transport choice is RTP, RTCP messages are also interleaved by the server over the TCP connection. As a default, RTCP packets are sent on the first available channel higher than the RTP channel. The client MAY explicitly request RTCP packets on another channel. This is done by specifying two channels in the interleaved parameter of the Transport header(Section 12.39).
Example:
SETUP Transport: RTP/AVP/TCP;interleaved=0-1
Then the channel identifier of RTP is 0, 1 for RTCP.