http://blog.csdn.net/onlycoder_net/article/details/76702432
v=0
//sdp版本号,一直为0,rfc4566规定
o=- 7017624586836067756 2 IN IP4 127.0.0.1
// RFC 4566 o=
//username如何没有使用-代替,7017624586836067756是整个会话的编号,2代表会话版本,如果在会话
//过程中有改变编码之类的操作,重新生成sdp时,sess-id不变,sess-version加1
s=-
//会话名,没有的话使用-代替
t=0 0
//两个值分别是会话的起始时间和结束时间,这里都是0代表没有限制
a=group:BUNDLE audio video data
//需要共用一个传输通道传输的媒体,如果没有这一行,音视频,数据就会分别单独用一个udp端口来发送
a=msid-semantic: WMS h1aZ20mbQB0GSsq0YxLfJmiYWE9CBfGch97C
//WMS是WebRTC Media Stream简称,这一行定义了本客户端支持同时传输多个流,一个流可以包括多个track,
//一般定义了这个,后面a=ssrc这一行就会有msid,mslabel等属性
m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 126
//m=audio说明本会话包含音频,9代表音频使用端口9来传输,但是在webrtc中一现在一般不使用,如果设置为0,代表不
//传输音频,UDP/TLS/RTP/SAVPF是表示用户来传输音频支持的协议,udp,tls,rtp代表使用udp来传输rtp包,并使用tls加密
//SAVPF代表使用srtcp的反馈机制来控制通信过程,后台111 103 104 9 0 8 106 105 13 126表示本会话音频支持的编码,后台几行会有详细补充说明
c=IN IP4 0.0.0.0
//这一行表示你要用来接收或者发送音频使用的IP地址,webrtc使用ice传输,不使用这个地址
a=rtcp:9 IN IP4 0.0.0.0
//用来传输rtcp地地址和端口,webrtc中不使用
a=ice-ufrag:khLS
a=ice-pwd:cxLzteJaJBou3DspNaPsJhlQ
//以上两行是ice协商过程中的安全验证信息
a=fingerprint:sha-256 FA:14:42:3B:C7:97:1B:E8:AE:0C2:71:03:05:05:16:8F:B9:C7:98:E9:60:43:4B:5B:2C:28:EE:5C:8F3:17
//以上这行是dtls协商过程中需要的认证信息
a=setup:actpass
//以上这行代表本客户端在dtls协商过程中,可以做客户端也可以做服务端,参考rfc4145 rfc4572
a=mid:audio
//在前面BUNDLE这一行中用到的媒体标识
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
//上一行指出我要在rtp头部中加入音量信息,参考 rfc6464
a=sendrecv
//上一行指出我是双向通信,另外几种类型是recvonly,sendonly,inactive
a=rtcp-mux
//上一行指出rtp,rtcp包使用同一个端口来传输
//下面几行都是对m=audio这一行的媒体编码补充说明,指出了编码采用的编号,采样率,声道等
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
//以上这行说明opus编码支持使用rtcp来控制拥塞,参考https://tools.ietf.org/html/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=fmtp:111 minptime=10;useinbandfec=1
//对opus编码可选的补充说明,minptime代表最小打包时长是10ms,useinbandfec=1代表使用opus编码内置fec特性
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:126 telephone-event/8000
a=ssrc:18509423 cname:sTjtznXLCNH7nbRw
//cname用来标识一个数据源,ssrc当发生冲突时可能会发生变化,但是cname不会发生变化,也会出现在rtcp包中SDEC中,
//用于音视频同步
a=ssrc:18509423 msid:h1aZ20mbQB0GSsq0YxLfJmiYWE9CBfGch97C 15598a91-caf9-4fff-a28f-3082310b2b7a
//以上这一行定义了ssrc和WebRTC中的MediaStream,AudioTrack之间的关系,msid后面第一个属性是stream-d,第二个是track-id
a=ssrc:18509423 mslabel:h1aZ20mbQB0GSsq0YxLfJmiYWE9CBfGch97C
a=ssrc:18509423 label:15598a91-caf9-4fff-a28f-3082310b2b7a
m=video 9 UDP/TLS/RTP/SAVPF 100 101 107 116 117 96 97 99 98
//参考上面m=audio,含义类似
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:khLS
a=ice-pwd:cxLzteJaJBou3DspNaPsJhlQ
a=fingerprint:sha-256 FA:14:42:3B:C7:97:1B:E8:AE:0C2:71:03:05:05:16:8F:B9:C7:98:E9:60:43:4B:5B:2C:28:EE:5C:8F3:17
a=setup:actpass
a=mid:video
a=extmap:2 urn:ietf:params:rtp-hdrext:toffset
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:4 urn:3gpp:video-orientation
a=extmap:5 http://www.ietf.org/id/draft-hol ... de-cc-extensions-01
a=extmap:6 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay
a=sendrecv
a=rtcp-mux
a=rtcp-rsize
a=rtpmap:100 VP8/90000
a=rtcp-fb:100 ccm fir
//ccm是codec control using RTCP feedback message简称,意思是支持使用rtcp反馈机制来实现编码控制,fir是Full Intra Request
//简称,意思是接收方通知发送方发送幅完全帧过来
a=rtcp-fb:100 nack
//支持丢包重传,参考rfc4585
a=rtcp-fb:100 nack pli
//支持关键帧丢包重传,参考rfc4585
a=rtcp-fb:100 goog-remb
//支持使用rtcp包来控制发送方的码流
a=rtcp-fb:100 transport-cc
//参考上面opus
a=rtpmap:101 VP9/90000
a=rtcp-fb:101 ccm fir
a=rtcp-fb:101 nack
a=rtcp-fb:101 nack pli
a=rtcp-fb:101 goog-remb
a=rtcp-fb:101 transport-cc
a=rtpmap:107 H264/90000
a=rtcp-fb:107 ccm fir
a=rtcp-fb:107 nack
a=rtcp-fb:107 nack pli
a=rtcp-fb:107 goog-remb
a=rtcp-fb:107 transport-cc
a=fmtp:107 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f
//h264编码可选的附加说明
a=rtpmap:116 red/90000
//fec冗余编码,一般如果sdp中有这一行的话,rtp头部负载类型就是116,否则就是各编码原生负责类型
a=rtpmap:117 ulpfec/90000
//支持ULP FEC,参考rfc5109
a=rtpmap:96 rtx/90000
a=fmtp:96 apt=100
//以上两行是VP8编码的重传包rtp类型
a=rtpmap:97 rtx/90000
a=fmtp:97 apt=101
a=rtpmap:99 rtx/90000
a=fmtp:99 apt=107
a=rtpmap:98 rtx/90000
a=fmtp:98 apt=116
a=ssrc-group:FID 3463951252 1461041037
//在webrtc中,重传包和正常包ssrc是不同的,上一行中前一个是正常rtp包的ssrc,后一个是重传包的ssrc
a=ssrc:3463951252 cname:sTjtznXLCNH7nbRw
a=ssrc:3463951252 msid:h1aZ20mbQB0GSsq0YxLfJmiYWE9CBfGch97C ead4b4e9-b650-4ed5-86f8-6f5f5806346d
a=ssrc:3463951252 mslabel:h1aZ20mbQB0GSsq0YxLfJmiYWE9CBfGch97C
a=ssrc:3463951252 label:ead4b4e9-b650-4ed5-86f8-6f5f5806346d
a=ssrc:1461041037 cname:sTjtznXLCNH7nbRw
a=ssrc:1461041037 msid:h1aZ20mbQB0GSsq0YxLfJmiYWE9CBfGch97C ead4b4e9-b650-4ed5-86f8-6f5f5806346d
a=ssrc:1461041037 mslabel:h1aZ20mbQB0GSsq0YxLfJmiYWE9CBfGch97C
a=ssrc:1461041037 label:ead4b4e9-b650-4ed5-86f8-6f5f5806346d
m=application 9 DTLS/SCTP 5000
c=IN IP4 0.0.0.0
a=ice-ufrag:khLS
a=ice-pwd:cxLzteJaJBou3DspNaPsJhlQ
a=fingerprint:sha-256 FA:14:42:3B:C7:97:1B:E8:AE:0C2:71:03:05:05:16:8F:B9:C7:98:E9:60:43:4B:5B:2C:28:EE:5C:8F3:17
a=setup:actpass
a=mid:data
a=sctpmap:5000 webrtc-datachannel 1024
转一篇文章:
SDP—Session Description Protocol
The Session Description Protocol, defined by RFC 2327 [1], was developed by the IETF MMUSIC working group. It is more of a description syntax than a protocol in that it does not provide a full-range media negotiation capability. The original purpose of SDP was to describe multicast sessions set up over the Internet's multicast backbone, the MBONE. The first application of SDP was by the experimental Session Announcement Protocol (SAP) [2] used to post and retrieve announcements of MBONE sessions. SAP messages carry a SDP message body, and was the template for SIP's use of SDP. Even though it was designed for multicast, SDP has been applied to the more general problem of describing general multimedia sessions established using SIP.
As seen in the examples of Chapter 3, SDP contains the following information about the media session:
IP Address (IPv4 address or host name);
Port number (used by UDP or TCP for transport);
Media type (audio, video, interactive whiteboard, and so forth);
Media encoding scheme (PCM A-Law, MPEG II video, and so forth).
In addition, SDP contains information about the following:
Subject of the session;
Start and stop times;
Contact information about the session.
Like SIP, SDP uses text coding. An SDP message is composed of a series of lines, called fields, whose names are abbreviated by a single lower-case letter, and are in a required order to simplify parsing. The set of mandatory SDP fields is shown in Table 2.1. The complete set is shown in Table 7.1.
Field |
Name |
Mandatory/Optional |
---|---|---|
v= |
Protocol version number |
m |
o= |
Owner/creator and session identifier |
m |
s= |
Session name |
m |
i= |
Session information |
o |
u= |
Uniform Resource Identifer |
o |
e= |
Email address |
o |
p= |
Phone number |
o |
c= |
Connection information |
m |
b= |
Bandwidth information |
o |
t= |
Time session starts and stops |
m |
r= |
Repeat times |
o |
z= |
Time zone corrections |
o |
k= |
Encryption key |
o |
a= |
Attribute lines |
o |
m= |
Media information |
o |
a= |
Media attributes |
o |
SDP was not designed to be easily extensible, and parsing rules are strict. The only way to extend or add new capabilities to SDP is to define a new attribute type. However, unknown attribute types can be silently ignored. A SDP parser must not ignore an unknown field, a missing mandatory field, or an out-of-sequence line. An example SDP message containing many of the optional fields is shown here:
v=0
o=johnston 2890844526 2890844526 IN IP4 43.32.1.5
s=SIP Tutorial
i=This broadcast will cover this new IETF protocol
u=http://www.digitalari.com/sip
e=Alan Johnston [email protected]
p=+1-314-555-3333 (Daytime Only)
c=IN IP4 225.45.3.56/236
b=CT:144
t=2877631875 2879633673
m=audio 49172 RTP/AVP 0
a=rtpmap:0 PCMU/8000
m=video 23422 RTP/AVP 31
a=rtpmap:31 H261/90000
The general form of a SDP message is:
x=parameter1 parameter2 ... parameterN
The line begins with a single lower-case letter x. There are never any spaces between the letter and the =, and there is exactly one space between each parameter. Each field has a defined number of parameters. Each line ends with a CRLF. The individual fields will now be discussed in detail.
The v= field contains the SDP version number. Because the current version of SDP is 0, a valid SDP message will always begin with v=0.
The o= field contains information about the originator of the session and session identifiers. This field is used to uniquely identify the session. The field contains:
o=username session-id version network-type address-type
address
The username parameter contains the originator's login or host or - if none. The session-id parameter is a Network Time Protocol (NTP) [3] timestamp or a random number used to ensure uniqueness. The version is a numeric field that is increased for each change to the session, also recommended to be a NTP timestamp. The network-type is always IN for Internet. The address-type parameter is either IP4 or IP6 for IPv4 or IPv6 address either in dotted decimal form or a fully qualified host name.
The s= field contains a name for the session. It can contain any non-zero number of characters. The optional i=field contains information about the session. It can contain any number of characters.
The optional u= field contains a uniform resource indicator (URI) with more information about the session.
The optional e= field contains an e-mail address of the host of the session. If a display name is used, the e-mail address is enclosed in <>. The optional p= field contains a phone number. The phone number should be given in globalized format, beginning with a +, then the country code, a space or −, then the local number. Either spaces or − are permitted as spacers in SDP. A comment may be present in ().
The c= field contains information about the media connection. The field contains:
c=network-type address-type connection-address
The network-type parameter is defined as IN for the Internet. The address type is defined as IP4 for IPv4 addresses, IP6 for IPv6 addresses. The connection-address is the IP address that will be sending the media packets, which could be either multicast or unicast. If multicast, the connection-address field contains:
connection-address=base-multicast-address/ttl/number-of-
addresses
where ttl is the time-to-live value, and number-of-addresses indicates how many contiguous multicast addresses are included starting with the base-multicast-address.
The optional b= field contains information about the bandwidth required. It is of the form:
b=modifier:bandwidth-value
The modifier is either CT for conference total or AS for application specific. CT is used for multicast session to specify the total bandwidth that can be used by all participants in the session. AS is used to specify the bandwidth of a single site. The bandwidth-value parameter is the specified number of kilobytes per second.
The t= field contains the start time and stop time of the session.
t=start-time stop-time
The times are specified using NTP timestamps. For a scheduled session, a stop-time of zero indicates that the session goes on indefinitely. A start-time and stop-time of zero for a scheduled session indicates that it is permanent. The optional r= field contains information about the repeat times that can be specified in either in NTP or in days (d), hours (h), or minutes (m). The optional z= field contains information about the time zone offsets. This field is used if a reoccurring session spans a change from daylight-savings to standard time, or vice versa.
The optional k= field contains the encryption key to be used for the media session. The field contains:
k=method:encryption-key
The method parameter can be clear, base64, uri, or prompt. If the method is prompt, the key will not be carried in SDP; instead, the user will be prompted as they join the encrypted session. Otherwise, the key is sent in the encryption-key parameter.
The optional m= field contains information about the type of media session. The field contains:
m=media port transport format-list
The media parameter is either audio, video, application, data, telephone-event, or control. The port parameter contains the port number. The transport parameter contains the transport protocol, which is either RTP/AVP or udp. (RTP/AVP stands for Real-time Transport Protocol [4] / audio video profiles [5], which is described in Section 7.3.) The format-list contains more information about the media. Usually, it contains media payload types defined in RTP audio video profiles. More than one media payload type can be listed, allowing multiple alternative codecs for the media session. For example, the following media field lists three codecs:
m=audio 49430 RTP/AVP 0 6 8 99
One of these three codecs can be used for the audio media session. If the intention is to establish three audio channels, three separate media fields would be used. For non-RTP media, Internet media types should be listed in the format-list. For example,
m=application 52341 udp wb
could be used to specify the application/wb media type.
The optional a= field contains attributes of the preceding media session. This field can be used to extend SDP to provide more information about the media. If not fully understood by a SDP user, the attribute field can be ignored. There can be one or more attribute fields for each media payload type listed in the media field. For the RTP/AVP example in Section 7.1.10, the following three attribute fields could follow the media field:
a=rtpmap:0 PCMU/8000
a=rtpmap:6 DVI4/16000
a=rtpmap:8 PCMA/8000
a=rtpmap:99 iLBC
Other attributes are shown in Table 7.2. Full details of the use of these attributes are in the standard document [1]. The details of the iLBC (Internet Low Bit Rate) Codec are in [6].
Attribute |
Name |
---|---|
a=rtpmap: |
RTP/AVP list |
a=cat: |
Category of the session |
a=keywds: |
Keywords of session |
a=tool: |
Name of tool used to create SDP |
a=ptime: |
Length of time in milliseconds for each packet |
a=recvonly |
Receive only mode |
a=sendrecv |
Send and receive mode |
a=sendonly |
Send only mode |
a=orient: |
Orientation for whiteboard sessions |
a=type: |
Type of conference |
a=charset: |
Character set used for subject and information fields |
a=sdplang: |
Language for the session description |
a=lang: |
Default language for the session |
a=framerate: |
Maximum video frame rate in frames per second |
a=quality: |
Suggests quality of encoding |
a=fmtp: |
Format transport |
a=mid: |
Media identification grouping |
a=direction: |
Direction for symmetric media |
a=rtcp: |
Explicit RTCP port (and address) |
a=inactive |
Inactive mode |
The use of SDP with SIP is given in the SDP Offer Answer RFC 3264 [7]. The default message body type in SIP is application/sdp. The calling party lists the media capabilities that they are willing to receive in SDP in either an INVITE or in an ACK. The called party lists their media capabilities in the 200 OK response to the INVITE. More generally, offers or answers may be in INVITEs, PRACKs, or UPDATEs or in reliably sent 18x or 200 responses to these methods.
Because SDP was developed with scheduled multicast sessions in mind, many of the fields have little or no meaning in the context of dynamic sessions established using SIP. In order to maintain compatibility with the SDP protocol, however, all required fields are included. A typical SIP use of SDP includes the version, origin, subject, time, connection, and one or more media and attribute fields as shown in Table 2.1. The origin, subject, and time fields are not used by SIP but are included for compatibility. In the SDP standard, the subject field is a required field and must contain at least one character, suggested to be s=− if there is no subject. The time field is usually set to t=0 0.
SIP uses the connection, media, and attribute fields to set up sessions between user agents. Because the type of media session and codec to be used are part of the connection negotiation, SIP can use SDP to specify multiple alternative media types and to selectively accept or decline those media types. When multiple media codecs are listed, the caller and called party's media fields must be aligned—that is, there must be the same number, and they must be listed in the same order. The offer answer specification, RFC 3264 [7], recommends that an attribute containing a=rtpmap: be used for each media field [7]. A media stream is declined by setting the port number to zero for the corresponding media field in the SDP response. In the following example, the caller Tesla wants to set up an audio and video call with two possible audio codecs and a video codec in the SDP carried in the initial INVITE:
v=0
o=Tesla 2890844526 2890844526 IN IP4 lab.high-voltage.org
s=-
c=IN IP4 100.101.102.103
t=0 0
m=audio 49170 RTP/AVP 0 8
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
m=video 49172 RTP/AVP 32
a=rtpmap:32 MPV/90000
m=video 49172 RTP/AVP 32
a=rtpmap:32 MPV/90000
The codecs are referenced by the RTP/AVP profile numbers 0, 8, and 32. The called party Marconi answers the call, chooses the second codec for the first media field and declines the second media field, only wanting a PCM A-Law audio session.
v=0
o=Marconi 2890844526 2890844526 IN IP4 tower.radio.org
s=-
c=IN IP4 200.201.202.203
t=0 0
m=audio 60000 RTP/AVP 8
a=rtpmap:8 PCMA/8000
m=video 0 RTP/AVP 32
t=0 0
m=audio 60000 RTP/AVP 8
a=rtpmap:8 PCMA/8000
m=video 0 RTP/AVP 32
If this audio-only call is not acceptable, then Tesla would send an ACK then a BYE to cancel the call. Otherwise, the audio session would be established and RTP packets exchanged. As this example illustrates, unless the number and order of media fields is maintained, the calling party would not know for certain which media sessions were being accepted and declined by the called party.
One party in a call can temporarily place the other on hold (i.e., suspending the media packet sending). This is done by sending an INVITE with identical SDP to that of the original INVITE but with a=sendonly attribute present. The call is made active again by sending another INVITE with the a=sendrecv attribute present. (Note that older RFC 2543 compliant UAs may initiate hold using c=0.0.0.0.) For further examples of SDP use with SIP, see the SDP Offer Answer Examples document [8].
from:https://blog.csdn.net/voipmaker/article/details/6111629
1.1 crypto属性
a = crypto:<tag> <crypto-suite> <key-params> [<session-params>]
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:d0RmdmcmVCspeEc3QGZiNWpVLFJhQX1cfHAwJSoj|2^20|1:32
标签:用于在offer/answer中选择一种crypto属性
加密套件:描述加密的标识符和身份验证算法
关键参数:method:info。目前method只有一种定义“inline”,表明秘钥就是info
会话参数:
参考自:https://tools.ietf.org/html/rfc4568#section-4
1.2 ssrc属性
a = ssrc:<ssrc-id> <attribute>:<value>
a=ssrc:2 cname:stream_1_cname
a=ssrc:2 label:video_track_id_1
attribute包括:cname(唯一标识一个客户端,一个客户端只有一个cname)
msid
mslabel
label
fmtp
参考自:https://tools.ietf.org/html/rfc5576#section-4
备注:label属性,可以参考:https://www.packetizer.com/rfc/rfc4574/
1.3 ssrc-group属性
a=ssrc-group: ...
a=ssrc-group:FEC 2 3
semantics:有FID(流识别),FEC(前向纠错),SIM(用于simulcate)。
FID:表示同一时刻只能只用一种codec,注意一个FID不要使用同一个port/ip。FID的实现场景:可以用于重传机制的实现
ssrc-id:有多个,表示一个组里面的所有ssrc
参考自:https://tools.ietf.org/html/rfc5576#section-4
备注:关于rtx的文档https://tools.ietf.org/html/rfc4588
1.4 rtpmap属性
a=rtpmap:<payload type> <encoding name>/<clock rate> [/<encoding
parameters>]
a=rtpmap:120 VP8/90000
payload type:有效载荷类型
encoding name:编码器
encoding parameters:如果是音频,可能表示的是通道数
(备注:有ulpfec和flexfec两种payload类型,参考文档为:
ulpfec:https://tools.ietf.org/html/rfc5109
flexfec:https://tools.ietf.org/html/draft-ietf-payload-flexible-fec-scheme-05)
参考自:https://tools.ietf.org/html/rfc4566
1.5 MediaContentDirection属性
a=sendrecv
a=recvonly
a=sendonly
a=inactive
参考自:https://tools.ietf.org/html/rfc4566
1.6 ice-ufrag 和 ice-pwd属性
a=ice-ufrag:<ufrag>
a=ice-pwd:<pwd>
a=ice-ufrag:ufrag_video
a=ice-pwd:pwd_video
ice打洞的用户名和密码
a=ice-ufrag:ufrag_video
a=ice-pwd:pwd_video
参考自:https://tools.ietf.org/html/rfc5245#section-15.4
1.7 candidate属性
a=candidate <foundation> <component-id> <transport> <priority> <connection-address> typ <candidate-types> <rel-addr> <rel-port>
a=candidate:a0+B/4 1 udp 2130706432 74.125.224.39 3457 typ relay generation 2
foundation:用来区别两个candidate是否是一样的类型,一样的base addr, 一样的 stun server
component-id:从1开始递增。RTP的必须是1,RTCP必须是2
priority:优先级,不知道怎么用
cand-type:有四种”host”, “srflx”, “prflx”, “relay”。srflx即server reflexive, prflx即peer reflexive,relay即relayed candidates。应该是四种连接方式。
rel-addr:目前的理解是stun或turn服务器地址
rel-port:
参考自:https://tools.ietf.org/html/rfc5245
1.8 rtcp属性
a=rtcp:<port> <nettype> <addrtype> <connection-address>
a=rtcp:2347 IN IP4 74.125.127.126
rtcp的属性信息
参考自:https://tools.ietf.org/id/draft-ietf-mmusic-sdp4nat-00.txt
1.9 msid-semantic属性
a=msid-semantic:
a=msid-semantic: WMS local_stream_1
WMS表示Webrtc Media Streams
local_stream_1表示msid(msid具体作用应该是和ssrc对应)
参考自:https://tools.ietf.org/html/draft-alvestrand-rtcweb-msid-02#section-3
1.10 msid属性
a=msid:
a=msid: local_stream_1
The value of the “msid” attribute consists of an identifier and an optional “appdata” field.(msid属性由标识符和appdata组成)
This new attribute allows endpoints to associate RTP streams that are described in different media descriptions with the same MediaStreams(msid属性允许端点和RTP流连接在不同的media descriptions中使用相同的MediaStreams)
and to carry an identifier for each MediaStreamTrack in its “appdata” field(appdata放置MediaStreamTrack)
参考自:https://tools.ietf.org/html/draft-ietf-mmusic-msid-16#page-10
备注:webrtc中SdpSerialize函数第二个参数需要设置为true才可以有该属性,如果直接用jsep的toString函数,就不会有这个属性
1.11 group属性
a=group:<semantics> <semantics-extension>
a=group:BUNDLE
“a=group” lines are used to group together several “m” lines that are identified by their “mid” attribute(group属性用于通过mid标识符把多个m属性连接起来)
There MAY be several “a=group” lines in a session description.The “a=group” lines of a session description can use the same or different semantics(group属性可以有多个,并且可以有相同或不同的语义)
参考自:
https://tools.ietf.org/html/rfc5888
https://tools.ietf.org/html/draft-ietf-mmusic-sdp-bundle-negotiation-39
1.12 bundle-only属性
a=bundle-only
a=bundle-only
和group属性结合使用。表示不同的media使用同一个port
1.13 rtcp-fb属性
a=rtcp-fb:<payload> <param>
a=rtcp-fb:96 ccm fir
参考自:https://tools.ietf.org/html/rfc4585
1.14 rtcp-rsize属性
a=rtcp-rsize
a=rtcp-rsize
参考自:https://tools.ietf.org/html/rfc5506
1.15 fingerprint属性
a=fingerprint:<hash-func> <fingerprint>
a=fingerprint:SHA-1 4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB
参考自:https://tools.ietf.org/html/rfc4572#page-7
1.16 extmap属性
a=extmap:<id> <uri>
a=extmap:8 http://www.webrtc.org/experiments/rtp-hdrext/video-timing
rtp的头部扩展。具有三个属性:
1.非对称(可以表示recvonly,sendonly)
2.可以有相互排斥的选择(answer可以选择offer提供相同id中的其中一个rtpextension,id须为4096~4351)
3.在一个会话中可以表示多个头部扩展
参考自:https://tools.ietf.org/html/rfc5285
1.17 fmtp属性
a=fmtp:<payload> <param>
a=fmtp:97 apt=96
表示codec对应的payloadtype,以及param
参考自:https://tools.ietf.org/html/rfc4566
1.18 mid属性
a=mid:
a=mid:audio
表示media的名字,用于查找具体的media
1.19 setup属性
a=setup:
a=setup:active
表示连接中的角色,是主动连接,还是被动连接等
v=0
参考自:https://tools.ietf.org/html/rfc4566
o=(用户名)(会话标识)(版本)(网络类型)(地址类型)(地址)
o=- 18446744069414584320 18446462598732840960 IN IP4 127.0.0.1
参考自:https://tools.ietf.org/html/rfc4566
s=(会话名)
参考自:https://tools.ietf.org/html/rfc4566
m=(媒体)(端口)(传送层)(格式列表)
m=audio 2345 RTP/SAVPF 111 103 104
参考自:https://tools.ietf.org/html/rfc4566
传输速率
参考自:https://tools.ietf.org/html/rfc4566
对于offer/answer,可以查看:
https://tools.ietf.org/html/rfc3264#page-8
注:
1.The answer MUST contain exactly the same number of “m=” lines as the offer(m属性的个数和offer的m属性个数要一致)
2.If the answerer has no media formats in common for a particular offered stream, the answerer MUST reject that media stream by setting the port to zero.(如果answer方没有和offer一样的media formats,那么就通过设置端口为0拒绝这个media stream)
3.answer拒绝:如果要拒绝掉一个media stream,那么就需要把拒绝的media的port设置为0,但是有一种情况要注意,就是a=bundle-only,在前面还有a=group:BUNDLE字段,表示几个media stream公用一个端口,这个时候的media可以设置port为0
1.对于audio和video,都会比较两者的name是否一致,如果payload小于等于95,也会比较id是否一致(因为小于等于95的都是静态的payload)
2.对于audio,会比较两者的clockrate,bitrate,channels必须都一致,或者其中一个为0。
3.对于video,如果是H264,则会比较profile-level-id是否一致
from: https://blog.csdn.net/myiloveuuu/article/details/78998183
SDP结构
from:https://blog.piasy.com/2018/10/14/WebRTC-API-Overview/index.html
首先我们搞清楚 SDP 的基本结构。
总体来说,WebRTC 的 SDP 分为几个部分:
m= 开头的一段叫做 m section,这一行叫 m line,里面有很多 a line 来描述这种 media 的各种属性。我们称一种媒体数据为一种 media,每种 media 在 SDP 里都有 m section。
WebRTC 的 SDP 有三种类型:
说到 SDP,就不得不提它的两种 Plan,它们是表达传输多路媒体流时的两种 SDP 格式。多路媒体流的例子有:录屏 + 相机,或多个相机(视角)。
Plan B 是 SDP 里同类型的媒体流只有一个 m line,同类型的多个媒体流之间通过 msid 区分,而 Unified Plan 则是每个媒体流都有一个 m line,因此如果有两路视频,那就会有两个 video m line。
WebRTC 标准采纳的是 Unified Plan,WebRTC 代码也已支持,所以我们就只关注 Unified Plan 的 API。
参考:Plan B, Unified Plan, Unified Plan vs Plan B。
Plan B 在 WebRTC 源码里对应的是 PC 的 Stream/Sender/Receiver API,Unified Plan 对应的是 Track/Transceiver API。
接着我们梳理一下媒体数据交换过程中的几个关键概念:
以视频为例,数据由发送端的 Capturer 采集,交给 Source,再交给本地的 Track,然后兵分两路,一路由本地 Sink 进行预览,一路由 Transceiver 发送给接收端;接收端 Track 则把数据交给 Sink 渲染。
Capturer 的创建和销毁完全由 APP 层负责,只需要把它和 Source 关联起来即可;创建 Source 需要调用 PC Factory 接口,创建 Track 也是,并且需要提供 Source 参数;Sink 的创建和销毁也由 APP 层负责,只需要把它们添加到 Track 里即可;创建 Transceiver 则需要调用 PC 接口。
好了,接下来我们就看看 PC Factory 和 PC 的接口。
默认的编译选项里,rtc_use_builtin_sw_codecs = false
,因此 USE_BUILTIN_SW_CODECS
未被定义,CreatePeerConnectionFactory
只有一个重载版本:接收三个 thread、adm、audio/video encoder/decoder factory、AudioMixer 和 AudioProcessing。
创建 PC 对象,接收 RTCConfiguration 和 PeerConnectionDependencies,前者用来容纳各种配置,后者则用来容纳各种可定制的接口实现,例如 PortAllocator, AsyncResolverFactory, RTCCertificateGeneratorInterface, SSLCertificateVerifier。
目前 Android/iOS 对 dependencies 的支持还未跟上,虽然这种高级用法的用户也不怕在 native 层自己做封装,但就又得重新造一遍 WebRTC Java/ObjC 代码里的轮子了。
这就是前面我们提到的创建 Audio/Video Source/Track 的接口了。
准备工作相关:
建立 P2P 连接相关:
注意:CreateOffer/CreateAnswer 时传入的 RTCOfferAnswerOptions 里,有 offer_to_receive_X
字段,它们是为了兼容 Plan B 语义的,一旦设置,即便没有 AddTrack,SDP 里也会包含 audio/video 的 m line。使用 Unified Plan 时,不应设置这两个字段,而应提前调用 AddTrack/AddTransceiver/CreateDataChannel,来表明自己是否需要 audio/video/data。
其他接口:
注意:SetBitrateAllocationStrategy
在 Android 和 iOS 平台都没有暴露出来,Android 暴露了 SetBitrate 接口,iOS 则没有,不过可以通过 RTCRtpSender setParameters
限制编码器的输出码率。
回调接口 PeerConnectionObserver:
stable -> have-local-offer -> stable
或 stable -> have-remote-offer -> stable
,具体可以查看 SPEC 4.3 State Definitions;接下来我们重点看一下 transceiver。
SDP 的 m section 里有一行 a=mid:
,定义了这种 media 的 id,叫 mid,例如下面这对 offer 和 answer:
# offer
...
a=group:BUNDLE 0 1 2
...
m=video 9 UDP/TLS/RTP/SAVPF 100 96 97 98 99 101 127 124 125
...
a=mid:0
...
m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 102 0 8 106 105 13 110 112 113 126
...
a=mid:1
...
m=application 9 DTLS/SCTP 5000
...
a=mid:2
...
# answer
...
a=group:BUNDLE 0 1 2
...
m=video 9 UDP/TLS/RTP/SAVPF 100 96 97 98 99 101 127 124
...
a=mid:0
...
m=audio 9 UDP/TLS/RTP/SAVPF 103 104 9 102 0 8 106 105 13 110 112 113 126
...
a=mid:1
...
m=application 9 DTLS/SCTP 5000
...
a=mid:2
...
其中有三种 media: video, audio, application,mid 依次为 0, 1, 2。application 是 DataChannel 的 media type。
我们注意到,offer 和 answer 里同一种 media 的 mid 是相同的,也就是说,对某一端来说,他收发的同一种媒体数据,mid 是相同的。
在 WebRTC 标准里,transceiver 表示的就是收发相同 mid 的 sender 和 receiver 的一个组合体,其中会有 media type, mid, direction, sender, receiver 等字段。其中 direction 有几种取值:kSendRecv, kSendOnly, kRecvOnly, kInactive。
AddTrack 时我们 add 的是本地的 track,即要发送的数据流,首次 AddTrack 时,会创建 transceiver,默认其 direction 是 kSendRecv。尽管在 CreateOffer 时我们可以通过设置 RTCOfferAnswerOptions
的 offer_to_receive_X
字段来控制是否 receive,但这两个字段是 legacy 字段,我们应该尽量避免。那如何控制 transceiver 的方向呢?我们可以使用 AddTransceiver 接口。
如果想要创建 kSendOnly 的 transceiver,可以传入 track,并在 RtpTransceiverInit
中设置 direction 为 kSendOnly;或者只传入 media type 和 init 结构体,稍后再 AddTrack。如果想要创建 kRecvOnly 的 transceiver,可以只传入 media type 和 init 结构体,并且不 AddTrack。
transceiver 何时与 SDP 里的 m section 关联呢?offer 端在创建 offer 时,会根据已有的 transceiver 创建 m section,并记下每个 transceiver 在 SDP 里对应的 m section 的 index 值,以便在 SetLocalDescription 时,可以为 transceiver 设置正确的 mid;answer 端在 SetRemoteDescription(offer 端发来的 offer)时,如果 offer 里的 m section 有 recv 方向,那就按 media type 来查找已有的 transceiver,如果能找到就可以将其关联起来,否则就创建一个 kRecvOnly 的 transceiver(因为 offer 只有可能是 kSendOnly 了,不发也不收的 media,不会出现在 SDP 里,那对此 offer 的回应也就只能是 kRecvOnly 了)。
总结一下,无论是 offer 端还是 answer 端,需要发送的 media,才提前添加好有 send 方向的 transceiver,仅接收的 media,无需提前添加 transceiver(提前添加了也不会被使用)。
本文是 Piasy 原创,发表于 https://blog.piasy.com,请阅读原文支持原创 https://blog.piasy.com/2019/01/01/WebRTC-RTP-Mux-Demux/
之前我在为 janus-pp-rec 增加视频旋正功能一文中简单介绍了一点 RTP 协议的内容,重点关注的是视频方向的 RTP header extension,这次我们更深入的了解一下 RTP 协议的内容,看看 H.264 视频数据是如何封装和解封装的。
我们首先了解一下 RTP H.264 相关的 RFC,下面的内容是对两篇 RFC 的总结:RTP: A Transport Protocol for Real-Time Applications, RTP Payload Format for H.264 Video。
包头有固定 12 个字节部分,以及可选的 csrc
和 ext
数据(在为 janus-pp-rec 增加视频旋正功能一文中有更详细的介绍):
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC |M| PT | sequence number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| synchronization source (SSRC) identifier |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| contributing source (CSRC) identifiers |
| .... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
接着是载荷数据,载荷长度在包头中有记录。载荷数据的格式,由不同的 profile 单独定义,profile 的 payload type 值,通过 SDP 协商确定。
下面我们了解一下 H.264 载荷的格式。
H.264 载荷数据的第一个字节格式和 NAL 头一样,其 type 定义如下:
Table 1. Summary of NAL unit types and the corresponding packet
types
NAL Unit Packet Packet Type Name Section
Type Type
-------------------------------------------------------------
0 reserved -
1-23 NAL unit Single NAL unit packet 5.6
24 STAP-A Single-time aggregation packet 5.7.1
25 STAP-B Single-time aggregation packet 5.7.1
26 MTAP16 Multi-time aggregation packet 5.7.2
27 MTAP24 Multi-time aggregation packet 5.7.2
28 FU-A Fragmentation unit 5.8
29 FU-B Fragmentation unit 5.8
30-31 reserved -
H.264 载荷数据的封包有三种模式:Single NAL unit mode (0), Non-interleaved mode (1), Interleaved mode (2)。它们各自支持的 type 见下表:
Table 3. Summary of allowed NAL unit types for each packetization
mode (yes = allowed, no = disallowed, ig = ignore)
Payload Packet Single NAL Non-Interleaved Interleaved
Type Type Unit Mode Mode Mode
-------------------------------------------------------------
0 reserved ig ig ig
1-23 NAL unit yes yes no
24 STAP-A no yes no
25 STAP-B no no yes
26 MTAP16 no no yes
27 MTAP24 no no yes
28 FU-A no yes yes
29 FU-B no no yes
30-31 reserved ig ig ig
注意:WebRTC iOS H.264 编码时,无论是 baseline 还是 high profile,都是使用的 Non-interleaved mode,WebRTC Android 也是如此。
因此 WebRTC 里实际使用的只有三种封包模式:NAL unit, STAP-A, FU-A。那我们接下来就看一下这三种模式。
如果 type 为 [1, 23]
,则该 RTP 包只包含一个 NALU:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|NRI| Type | |
+-+-+-+-+-+-+-+-+ |
| |
| Bytes 2..n of a single NAL unit |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2. RTP payload format for single NAL unit packet
为了体现/应对有线网络和无线网络的 MTU 巨大差异,RTP 协议定义了包聚合策略:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|NRI| Type | |
+-+-+-+-+-+-+-+-+ |
| |
| one or more aggregation units |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3. RTP payload format for aggregation packets
STAP-A 示例:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RTP Header |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|STAP-A NAL HDR | NALU 1 Size | NALU 1 HDR |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NALU 1 Data |
: :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | NALU 2 Size | NALU 2 HDR |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NALU 2 Data |
: :
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 7. An example of an RTP packet including an STAP-A
containing two single-time aggregation units
在应用层实现包拆分而不是依赖下层网络的拆分机制,好处有二:
每个分包都有一个编号,一个 NALU 拆分的 RTP 包其序列必须顺序且连续,中间不得插入其他数据的 RTP 包序号。FU 只能拆分 NALU,STAP 和 MTAP 不能拆分,FU 也不能嵌套。FU-A 没有 DON,FU-B 有 DON。
FU-A 格式如下:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| FU indicator | FU header | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| |
| FU payload |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 14. RTP payload format for FU-A
FU header 格式如下:
+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
|S|E|R| Type |
+---------------+
了解完了理论部分,接下来我们看看 WebRTC 里是如何实现的,WebRTC 把视频数据封装成 RTP packet 的逻辑在 RTPSenderVideo::SendVideo
函数中。
RTPSenderVideo::SendVideo
其实封包的过程,就是计算一帧数据需要封多少个包、每个包放多少载荷,为此我们需要知道各种封包模式下,每个包的最大载荷(包大小减去头部大小)。
首先计算一个包的最大容量,这个容量是指可以用来容纳 RTP 头部和载荷的容量,FEC、重传的开销排除在外:
// Maximum size of packet including rtp headers.
// Extra space left in case packet will be resent using fec or rtx.
int packet_capacity = rtp_sender_->MaxRtpPacketSize() - fec_packet_overhead -
(rtp_sender_->RtxStatus() ? kRtxHeaderSize : 0);
rtp_sender_->MaxRtpPacketSize
默认会被设置为 1460,但如果需要发送视频,则会被设置为 1200。(why ???)
接着准备四种包的模板:
single_packet
: 对应 NAL unit 和 STAP-A 的包;first_packet
: 对应 FU-A 的首个包;middle_packet
: 对应 FU-A 的中间包;last_packet
: 对应 FU-A 的最后一个包;准备过程包括:
RTPSender::AllocatePacket
里设置 ssrc
和 csrcs
字段,预留 AbsoluteSendTime
, TransmissionOffset
和 TransportSequenceNumber
extension 的空间,并且按需设置 PlayoutDelayLimits
和 RtpMid
extension;RTPSenderVideo::SendVideo
里设置 payload_type
, rtp_timestamp
和 capture_time_ms
字段;AddRtpHeaderExtensions
里按需设置 VideoOrientation
, VideoContentTypeExtension
, VideoTimingExtension
和 RtpGenericFrameDescriptorExtension
extension;first_packet
, middle_packet
和 last_packet
均是拷贝自 single_packet
,因此代码里只调用了 AddRtpHeaderExtensions
设置它们的 extension;这些模板一是后续封包时可以直接拿来用,二是可以准确地知道 RTP 头部需要多少空间,正如注释所言:
Simplest way to estimate how much extensions would occupy is to set them.
知道了每种包的头部需要多少空间后,就知道每个包最多可以容纳多少载荷了(为 RtpPacketizer::PayloadSizeLimits
的各个字段赋值):
max_payload_len
,最大载荷可用空间:包的最大容量减去中包头部大小;single_packet_reduction_len
,封单包时,载荷可用空间还需要在 max_payload_len
的基础上打个折扣:单包与中包头部大小之差;即包的最大容量减去单包头部大小;first_packet_reduction_len
,封多包时,首包载荷可用空间也需要在 max_payload_len
的基础上打个折扣:首包与中包头部大小之差;last_packet_reduction_len
,封多包时,末包载荷可用空间也需要在 max_payload_len
的基础上打个折扣:末包与中包头部大小之差;准备好了模板、知道了 limits 之后,就创建 RtpPacketizer
,通过其 NumPackets
接口得知这一帧图像需要封装为多少个包,再调用其 NextPacket
封装每个包。调用 NextPacket
之后还不算完,还得调用 RTPSender::AssignSequenceNumber
分配序列号,如果需要设置 VideoTimingExtension
,还得设置 packetization_finish_time_ms
。最后,就是调用 FEC 处理,或直接调用 RTPSenderVideo::SendVideoPacket
发送 RTP 报文了。
视频编码为 H.264 时,RtpPacketizer
的实现类是 RtpPacketizerH264
,接下来,我们就看一下 H.264 的封包逻辑。
RtpPacketizerH264
RtpPacketizerH264
构造时会根据 RTPFragmentationHeader
的内容,生成 RtpPacketizerH264::Fragment
数组 input_fragments_
,Fragment
里面包含了每个 NALU 载荷起始字节的指针、NALU 的长度。
RTPFragmentationHeader
其实就是这帧图像里面每个 NALU 的信息:载荷在 buffer 里的 offset、载荷长度。这些信息在编码器输出数据之后解析生成,扫描整个 buffer,查找 NALU start code(001
或 0001
),统计每个 NALU 的 offset 和长度。安卓的实现在 sdk/android/src/jni/videoencoderwrapper.cc
的 VideoEncoderWrapper::ParseFragmentationHeader
中,iOS 的实现在 sdk/objc/components/video_codec/nalu_rewriter.cc
的 H264CMSampleBufferToAnnexBBuffer
中。
H.264 规范里定义了一幅图像分片为多个 NALU 的功能,但我观察了一下 iPhone 6 编出来的数据,非关键帧都只有一个 NALU,关键帧有两个 NALU,而且前面都添加了 SPS 和 PPS,所以关键帧会有四个 NALU。
有了 input_fragments_
后,就会在 GeneratePackets
中遍历之,对每个 Fragment
,根据 packetization_mode
执行不同的封包逻辑:
SingleNalUnit
,那就为这个 Fragment
(其实就是一个 NALU)生成一个 PacketUnit
;如果是 NonInterleaved
(WebRTC Native SDK 实际使用的 mode),那就看这个 Fragment
能否放进单个包里,先计算单个包能容纳多少数据:
int single_packet_capacity = limits_.max_payload_len;
if (input_fragments_.size() == 1)
single_packet_capacity -= limits_.single_packet_reduction_len;
else if (i == 0)
single_packet_capacity -= limits_.first_packet_reduction_len;
else if (i + 1 == input_fragments_.size())
single_packet_capacity -= limits_.last_packet_reduction_len;
max_payload_len
扣除各种情况的折扣之后,剩下的就是 single_packet_capacity
;fragment_len > single_packet_capacity
,就说明无法放进单个包,那就要做 Fragmentation 了,即调用 PacketizeFuA
,否则说明可以放进单个包,那就可以做 Aggregation,即调用 PacketizeStapA
;PacketizeFuA
就是看怎么把一个 Fragment
分成多个包了,然后生成每个 PacketUnit
,这个分的逻辑实现在 SplitAboutEqually
函数中,里面处理了不少边界情况,大体思想就是把数据放进尽可能少的包、每个包的大小尽可能相近;它生成的 PacketUnit
的 aggregated
字段都是 false;PacketizeStapA
则是看能把多少个 Fragment
放进一个包,这里也会为每个 Fragment
生成一个 PacketUnit
,但只会对 num_packets_left_
做一次加一操作;它生成的 PacketUnit
除了最后一个的 aggregated
字段为 false,其他都为 true;GeneratePackets
执行完毕后,就算出了 num_packets_left_
的值,即此帧图像需要多少个 RTP包,并且也准备好了 PacketUnit
数组。
之后在 RTPSenderVideo::SendVideo
里就会调用 num_packets_left_
次 NextPacket
来实际组装每一个 RTP 包了,我们现在就看看 NextPacket
的逻辑:
PacketUnit
:PacketUnit
的 first_fragment
和 last_fragment
字段都是 true,那就直接把载荷拷进去;SingleNalUnit
,也有可能是 NonInterleaved
的 STAP-A 包,因为 NonInterleaved
时,如果 Fragment
可以放进一个包,那就会封为 STAP-A,而如果只生成了一个 PacketUnit
,那它的 first_fragment
和 last_fragment
都会是 true;aggregated
字段为 true,那就调用 NextAggregatePacket
封 STAP-A 包;
PacketUnit
,退出循环的条件是 !packet->aggregated
或 packet->last_fragment
,由于需要放进一个包的一系列 PacketUnit
里只有最后一个 last_fragment
字段为 true(这个逻辑在 PacketizeStapA
里),因此可以正确退出循环;aggregated
字段为 false,就调用 NextFragmentPacket
封 FU-A 包;好了,至此我们就已经看完了 H.264 封装 RTP 包的逻辑,可以长舒一口气了 :)
了解了封包的实现,我们接下来看看解包是怎么实现的,解包比封包稍微复杂一点,关键就在于包的到达可能是乱序的(丢包重传也可以认为是一种乱序)。
解包过程包括两大步:先解析出 RTP 的头部和载荷;再解析载荷部分,根据不同的封包模式,对封包过程做一个逆操作,就能得到一帧完整的数据。前者在 Call::DeliverRtp
中调用 RtpPacket::ParseBuffer
中实现,后者则比较复杂,因为需要处理乱序问题,逻辑起始点是 RtpVideoStreamReceiver::ReceivePacket
函数。
RtpPacket::ParseBuffer
ParseBuffer
的任务有三点:
payload_type_
, sequence_number_
, timestamp_
, ssrc_
等;RtpVideoStreamReceiver::ReceivePacket
首先根据不同的 payload type,创建不同的 RtpDepacketizer
去解析载荷内容,H.264 的解析逻辑在 RtpDepacketizerH264::Parse
中实现,其主要任务就是找到实际数据的位置和大小:
ParseFuaNalu
里完成;ProcessStapAOrSingleNalu
里完成;然后解析 RTP 扩展头的实际数据,包括 VideoOrientation
等。
最后构造 VCMPacket
,并调用 PacketBuffer::InsertPacket
放入包缓冲区中。
PacketBuffer::InsertPacket
PacketBuffer
封装了 RTP 包处理乱序到达的逻辑,大体思路就是:
data_buffer_
数组里,并在 sequence_buffer_
数组里记下这个序列号的一些属性;FindFrames
,从已收到的包列表中,找出完整的帧;RtpFrameReferenceFinder::ManageFrame
,由其确保帧可以解码后,再回调出去,进入后续的解码环节;PacketBuffer::FindFrames
每次收到包后,会触发 FindFrames
,我们会从刚收到的包的序列号向后查找:
packet->is_last_packet_in_frame
标志;frame_begin
(即 packet->is_first_packet_in_frame
)标志判断帧起始,H.264 则靠时间戳的变化来判断帧起始;RtpFrameReferenceFinder::ManageFrame
从载荷里解析出来的帧数据都是完整的帧,但不一定能解码,比如 H.264 有前向参考(P 帧需要参考前面的 I 帧才能解码),也有后向参考(B 帧需要参考前面的 I/P 帧和后面的 P 帧才能解码),因此需要等这一帧的参考帧都收到之后,才能回调出去。
虽然 PacketBuffer
处理了 RTP 报文乱序到达的问题,输出了一个个完整的帧,但并没有保证帧是按序到达的,所以仍需 RtpFrameReferenceFinder
来处理帧乱序到达的问题。
RtpFrameReferenceFinder
的代码细节这里就不展开了,有兴趣/需求的朋友可以自行阅读。
好了,至此我们就已经看完了 H.264 解封装 RTP 包的逻辑,可以再长舒一口气了 :)
最后,我们再总结一下 WebRTC RTP 封包解包相关数据结构:
RtpPacket
: RTP 报文的数据结构,里面定义了各种标准头部字段、扩展头部、数据缓冲区等;RtpPacketToSend
: 发送端封包用到的数据结构,继承自 RtpPacket
,加了一些扩展头部设置逻辑的封装;RtpPacketReceived
: 接收端解包用到的数据结构,也继承自 RtpPacket
,加了获取扩展头部逻辑的封装;最后的最后,我再分享一个内容:序列号的比较算法。
由于序列号可能发生回绕,所以不能直接比较,有一个 RFC 文档专门定义了这个比较算法:Serial Number Arithmetic。
这个 RFC 里首先定义了序列号的定义法:n 位无符号数,最低序列号为 0,最高序列号为 2^n-1
,序列号没有最大最小值,每个序列号至少需要 n 位来保存。
接着它定义了序列号的加法:在 [0, 2^n-1]
范围内的合法序列号值 s,加 m 的值为 (s+m) % (2^n)
,这里的加法和取模,都是常规定义的加法和取模。
最后它定义了序列号的比较算法(RFC 里为了严谨,引入了另外两个普通正整数,这里简单起见我们就不引入了):
s
和 s+m
(m
为普通正整数),只有 m
为 0 时,它们才相等;即给定两个序列号值,完全无法判断其是否相等,不过通常我们不需要判等,而是判断大小;(s1 < s2 && s2 - s1 < 2^(n-1)) || (s1 > s2 && s1 - s2 > 2^(n-1))
时,序列号 s1
小于 s2
;即值小不过一半范围,或大过一半范围,例如 n=3,2-1 < 4
,故 1 比 2 小,7-2 > 4
,故 7 比 2 小;(s1 < s2 && s2 - s1 > 2^(n-1)) || (s1 > s2 && s1 - s2 < 2^(n-1))
时,序列号 s1
大于 s2
;即值小过一半范围,或大不过一半范围,例如 n=3,7-2 > 4
,故 2 比 7 大,2-1 < 4
,故 2 比 1 大;细心的朋友也许会举出一个例子:7 和 3 谁大谁小?它们其实无法区分大小。就像 3 和 3 是否相等一样,无法区分。RFC 里故意不对这种序列号对的大小问题作出定义,因为着实不好定义。
WebRTC 的实现逻辑主要在 rtc_base/numerics/sequence_number_util.h
和 rtc_base/numerics/mod_ops.h
中:
template
inline bool AheadOf(T a, T b) {
static_assert(std::is_unsigned::value,
"Type must be an unsigned integer.");
return a != b && AheadOrAt(a, b);
}
template
inline typename std::enable_if<(M == 0), bool>::type AheadOrAt(T a, T b) {
static_assert(std::is_unsigned::value,
"Type must be an unsigned integer.");
const T maxDist = std::numeric_limits::max() / 2 + T(1);
if (a - b == maxDist)
return b < a;
return ForwardDiff(b, a) < maxDist;
}
template
inline typename std::enable_if<(M == 0), T>::type ForwardDiff(T a, T b) {
static_assert(std::is_unsigned::value,
"Type must be an unsigned integer.");
return b - a;
}
其实就是通过无符号数减法的溢出,把 RFC 定义的两种或起来的情况统一了,以及对于 RFC 未定义的情况,定义成了值大小的比较。