1. Introduction
The Real Time Streaming Protocol (RTSP) is a network control protocol designed for use in entertainment and communications systems to control streaming media servers. The protocol is used for establishing and controlling media sessions between end points. Clients of media servers issue VCR-style commands, such as play, record and pause, to facilitate real-time control of the media streaming from the server to a client (Video On Demand) or from a client to the server (Voice Recording).
The transmission of streaming data itself is not a task of RTSP. Most RTSP servers use the Real-time Transport Protocol (RTP) in conjunction with Real-time Control Protocol (RTCP) for media stream delivery. However, some vendors implement proprietary transport protocols. The RTSP server software from RealNetworks, for example, also used RealNetworks' proprietary Real Data Transport (RDT).
RTSP was developed by RealNetworks, Netscape[1] and Columbia University, with the first draft submitted to IETF in 1996.[2] It was standardized by the Multiparty Multimedia Session Control Working Group (MMUSIC WG) of the Internet Engineering Task Force (IETF) and published as RFC 2326 in 1998.[3] RTSP 2.0 published as RFC 7826 in 2016 as a replacement of RTSP 1.0. RTSP 2.0 is based on RTSP 1.0 but is not backwards compatible other than in the basic version negotiation mechanism.
It is similar in syntax and operation to HTTP/1.1. Unlike SIP and H.323, the purpose of RTSP is to access existing media files over the network and to control the replay of the media. The typical communication is between a client (running Real Player for example) and a streaming media server. Commands include the ability to pause and play media files form the remote server.
RTSP is a control channel protocol between media client and media server, the data channel uses a different protocol, normally RTP/RTCP.
RTSP is over TCP. RTP/RTCP is over UDP. The ports for RTP/RTCP packets are dynamically negotiated by client and server using RTSP.
2. RTSP messages
2.1 RTSP message format
RTSP is text based and uses ISO 10646 chracter set in UTF-8 encoding. Lines are terminated by CRLF, and a empty line is the seperator of message and body.
The following chart shows the format of a RTSP message.
First line is called start-line, for request message from client to server, start-line represents RTSP method; for response message from server to client, start-line represents RTSP status code as reply of method.
The following chart shows format of a start-line in request message:
The following chart shows format of a start-line in response message:
2.1.1 RTSP method
In common, 9 kinds of method during one transation:
OPTION: it represents a request for information about the communivation options available on the request/response chain identified by the Request-URI. This method allows the client to determine the options and/or requirements associated with a resource, or the capabilities of a server, without implaying a resource action or initiating a resource retrieval.
DESCRIBE: it retrieve the description of a presentation or media object identified by the request URL from a server. It may use the Accept header to specify the description formats that the client understands.
ANNOUNCE: when sent from client to server, it posts the description of a presentation or media object identified by the request URL to server. when sent from server to client, it updates the session description in real-time.
SETUP: it is used to request a URI specifies the transport mechanism to be used for the streamed media.
PLAY: it tells the server to start sending data via the mechanism specified in SETUP.
PAUSE: it request the stream delivery to be interrupted temporarily.
TEARDOWN: it stops the stream delivery for the given URI, freeing the resource associated with it.
GET_PARAMETER: it retrieves the value of a parameter of a presentation or stream specified in the URI.
SET_PARAMETER: it sets the value of a parameter for a presentation or stream specified by the URI.
2.1.2 RTSP method examples:
OPTIONS
An OPTION request returns the request types the server will accept.
C->S: OPTIONS rtsp://example.com/media.mp4 RTSP/1.0
CSeq: 1
Require: implicit-play
Proxy-Require: gzipped-messages
S->C: RTSP/1.0 200 OK
CSeq: 1
Public: DESCRIBE, SETUP, TEARDOWN, PLAY, PAUSE
DESCRIBE
A DESCRIBE request includes an RTSP URL(rtsp://..), and the type of reply data that can be handle. This replay includes the presentation description, typically in Session Description Protocol (SDP) format. Among other things, the presentation description lists the media streams controlled with the aggregate URL. In the typical case, there is one media stream each for audio and video.
C->S: DESCRIBE rtsp://example.com/media.mp4 RTSP/1.0
CSeq: 2
S->C: RTSP/1.0 200 OK
CSeq: 2
Content-Base: rtsp://example.com/media.mp4
Content-Type: application/sdp
Content-Length: 460
m=video 0 RTP/AVP 96
a=control:streamid=0
a=range:npt=0-7.741000
a=length:npt=7.741000
a=rtpmap:96 MP4V-ES/5544
a=mimetype:string;"video/MP4V-ES"
a=AvgBitRate:integer;304018
a=StreamName:string;"hinted video track"
m=audio 0 RTP/AVP 97
a=control:streamid=1
a=range:npt=0-7.712000
a=length:npt=7.712000
a=rtpmap:97 mpeg4-generic/32000/2
a=mimetype:string;"audio/mpeg4-generic"
a=AvgBitRate:integer;65790
a=StreamName:string;"hinted audio track"
SETUP
A SETUP request specifies how a single media stream must be transported. This must be done before a PLAY request is sent. The request contains the media stream URL and a transport specifier. This specifier typically includes a local port for receiving RTP data (audio or video), and another for RTCP data (meda information). The server reply usually confirms the chosen parameters, and fills in the missing parts, such as the server's chosen ports. Each media stream must be configured using SETUP before an aggregate play request may be sent.
C->S: SETUP rtsp://example.com/media.mp4/streamid=0 RTSP/1.0
CSeq: 3
Transport: RTP/AVP;unicast;client_port=8000-8001
S->C: RTSP/1.0 200 OK
CSeq: 3
Transport: RTP/AVP;unicast;client_port=8000-8001;server_port=9000-9001;ssrc=1234ABCD
Session: 12345678
PLAY
A PLAY request will cause one or all media streams to be played. Play requests can be stacked by sending multiple PLAY requests. The URL may be the aggregate URL (to play all media streams), or a single media stream URL (to play only that stream). A range can be specified. If no range is specified, the stream is played from the beginning and plays to the end, or , if the stream is paused, it is resumed at the point it was paused.
C->S: PLAY rtsp://example.com/media.mp4 RTSP/1.0
CSeq: 4
Range: npt=5-20
Session: 12345678
S->C: RTSP/1.0 200 OK
CSeq: 4
Session: 12345678
RTP-Info: url=rtsp://example.com/media.mp4/streamid=0;seq=9810092;rtptime=3450012
PAUSE
A PAUSE request temporarily halts one or all media streams, so it can later be resumed with a PLAY request. The request contains an aggregate or media stream URL. A range parameter on a PAUSE request specifies when to pause. When the range parameter is omitted, the pause occures immediately and indefinitely.
C->S: PAUSE rtsp://example.com/media.mp4 RTSP/1.0
CSeq: 5
Session: 12345678
S->C: RTSP/1.0 200 OK
CSeq: 5
Session: 12345678
ANNOUNCE
The ANNOUNCE method serves two purposes:
When sent from client to server, ANNOUNCE posts the description of a presentation or media object identified by the request URL to a server. When sent from server to client, ANNOUNCE updates the session decription in real-time. If a new media stream is added to a presentation (e.g., during a live presentation), the whole presentation description should be sent again, rather than just the additional components, so that components can be deleted.
C->S: ANNOUNCE rtsp://example.com/media.mp4 RTSP/1.0
CSeq: 7
Date: 23 Jan 1997 15:35:06 GMT
Session: 12345678
Content-Type: application/sdp
Content-Length: 332
v=0
o=mhandley 2890844526 2890845468 IN IP4 126.16.64.4
s=SDP Seminar
i=A Seminar on the session description protocol
u=http://www.cs.ucl.ac.uk/staff/M.Handley/sdp.03.ps
[email protected] (Mark Handley)
c=IN IP4 224.2.17.12/127
t=2873397496 2873404696
a=recvonly
m=audio 3456 RTP/AVP 0
m=video 2232 RTP/AVP 31
S->C: RTSP/1.0 200 OK
CSeq: 7
TEARDOWN
A TEARDOWN request is used to terminate the session. It stops all media streams and frees all session related data on server.
C->S: TEARDOWN rtsp://example.com/media.mp4 RTSP/1.0
CSeq: 8
Session: 12345678
S->C: RTSP/1.0 200 OK
CSeq: 8
GET_PARAMETER
The GET_PARAMETER requests the value of a parameter of a presentation or stream specified in the URI. The content of the reply and response is left to the implementation. GET_PARAMTER with on entity body may be used to test client or server liveness("ping").
S->C: GET_PARAMETER rtsp://example.com/media.mp4 RTSP/1.0
CSeq: 9
Content-Type: text/parameters
Session: 12345678
Content-Length: 15
packets_received
jitter
C->S: RTSP/1.0 200 OK
CSeq: 9
Content-Length: 46
Content-Type: text/parameters
packets_received: 10
jitter: 0.3838
SET_PARAMETER
This method requests to set the value of a parameter for a presentation or stream specified by the URI.
C->S: SET_PARAMETER rtsp://example.com/media.mp4 RTSP/1.0
CSeq: 10
Content-length: 20
Content-type: text/parameters
barparam: barstuff
S->C: RTSP/1.0 451 Invalid Parameter
CSeq: 10
Content-length: 10
Content-type: text/parameters
barparam
2.1.3 RTSP status-code
The first digit of the Status-code defines the class of response.
1**: Informational- Request received, continuing process;
2**: Sucess;
3**: Redirection- Further action must be taken in order to complete the request;
4**: Client Error- The request contains bad syntax or cannot be fulfilled;
5**: Server Error- The server failed to fulfill an apparently valid request.
2.1.4 RTSP header
CSeq: It specifies the sequence number for an RTSP request-reponse pair. For every RTSP request containing the given sequence number, there will be a corresponding response having the same number.
Content-Length: This field contains the length of the content of the method, i.e., after the double CRLF following the last header.
TRANSPORT: This field indicates which transport protocol is to be used an configures its parameters.
SESSION: This request and response header field identifies an RTSP session started by the media server in a SETUP response and concluded by TEARDOWN on the presentation URL.
2.2 RTSP conversation
The following chart describe a standard PTSP conversation.
Attachment has a sample packet capture on a RTSP conversation.