chinabinlang

WebRTC56版本SDP详细解析

http://blog.csdn.net/onlycoder_net/article/details/76702432

v=0

//sdp版本号，一直为0,rfc4566规定

o=- 7017624586836067756 2 IN IP4 127.0.0.1

// RFC 4566 o=

//username如何没有使用-代替，7017624586836067756是整个会话的编号，2代表会话版本，如果在会话

//过程中有改变编码之类的操作，重新生成sdp时,sess-id不变，sess-version加1

s=-

//会话名，没有的话使用-代替

t=0 0

//两个值分别是会话的起始时间和结束时间，这里都是0代表没有限制

a=group:BUNDLE audio video data

//需要共用一个传输通道传输的媒体，如果没有这一行，音视频，数据就会分别单独用一个udp端口来发送

a=msid-semantic: WMS h1aZ20mbQB0GSsq0YxLfJmiYWE9CBfGch97C

//WMS是WebRTC Media Stream简称，这一行定义了本客户端支持同时传输多个流，一个流可以包括多个track,

//一般定义了这个，后面a=ssrc这一行就会有msid,mslabel等属性

m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 126

//m=audio说明本会话包含音频，9代表音频使用端口9来传输，但是在webrtc中一现在一般不使用，如果设置为0，代表不

//传输音频,UDP/TLS/RTP/SAVPF是表示用户来传输音频支持的协议，udp，tls,rtp代表使用udp来传输rtp包，并使用tls加密

//SAVPF代表使用srtcp的反馈机制来控制通信过程,后台111 103 104 9 0 8 106 105 13 126表示本会话音频支持的编码，后台几行会有详细补充说明

c=IN IP4 0.0.0.0

//这一行表示你要用来接收或者发送音频使用的IP地址，webrtc使用ice传输，不使用这个地址

a=rtcp:9 IN IP4 0.0.0.0

//用来传输rtcp地地址和端口，webrtc中不使用

a=ice-ufrag:khLS

a=ice-pwd:cxLzteJaJBou3DspNaPsJhlQ

//以上两行是ice协商过程中的安全验证信息

a=fingerprint:sha-256 FA:14:42:3B:C7:97:1B:E8:AE:0C2:71:03:05:05:16:8F:B9:C7:98:E9:60:43:4B:5B:2C:28:EE:5C:8F3:17

//以上这行是dtls协商过程中需要的认证信息

a=setup:actpass

//以上这行代表本客户端在dtls协商过程中，可以做客户端也可以做服务端，参考rfc4145 rfc4572

a=mid:audio

//在前面BUNDLE这一行中用到的媒体标识

a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level

//上一行指出我要在rtp头部中加入音量信息，参考 rfc6464

a=sendrecv

//上一行指出我是双向通信，另外几种类型是recvonly,sendonly,inactive

a=rtcp-mux

//上一行指出rtp,rtcp包使用同一个端口来传输

//下面几行都是对m=audio这一行的媒体编码补充说明，指出了编码采用的编号，采样率，声道等

a=rtpmap:111 opus/48000/2

a=rtcp-fb:111 transport-cc

//以上这行说明opus编码支持使用rtcp来控制拥塞，参考https://tools.ietf.org/html/draft-holmer-rmcat-transport-wide-cc-extensions-01

a=fmtp:111 minptime=10;useinbandfec=1

//对opus编码可选的补充说明,minptime代表最小打包时长是10ms，useinbandfec=1代表使用opus编码内置fec特性

a=rtpmap:103 ISAC/16000

a=rtpmap:104 ISAC/32000

a=rtpmap:9 G722/8000

a=rtpmap:0 PCMU/8000

a=rtpmap:8 PCMA/8000

a=rtpmap:106 CN/32000

a=rtpmap:105 CN/16000

a=rtpmap:13 CN/8000

a=rtpmap:126 telephone-event/8000

a=ssrc:18509423 cname:sTjtznXLCNH7nbRw

//cname用来标识一个数据源，ssrc当发生冲突时可能会发生变化，但是cname不会发生变化，也会出现在rtcp包中SDEC中，

//用于音视频同步

a=ssrc:18509423 msid:h1aZ20mbQB0GSsq0YxLfJmiYWE9CBfGch97C 15598a91-caf9-4fff-a28f-3082310b2b7a

//以上这一行定义了ssrc和WebRTC中的MediaStream,AudioTrack之间的关系，msid后面第一个属性是stream-d,第二个是track-id

a=ssrc:18509423 mslabel:h1aZ20mbQB0GSsq0YxLfJmiYWE9CBfGch97C

a=ssrc:18509423 label:15598a91-caf9-4fff-a28f-3082310b2b7a

m=video 9 UDP/TLS/RTP/SAVPF 100 101 107 116 117 96 97 99 98

//参考上面m=audio,含义类似

c=IN IP4 0.0.0.0

a=rtcp:9 IN IP4 0.0.0.0

a=ice-ufrag:khLS

a=ice-pwd:cxLzteJaJBou3DspNaPsJhlQ

a=fingerprint:sha-256 FA:14:42:3B:C7:97:1B:E8:AE:0C2:71:03:05:05:16:8F:B9:C7:98:E9:60:43:4B:5B:2C:28:EE:5C:8F3:17

a=setup:actpass

a=mid:video

a=extmap:2 urn:ietf:params:rtp-hdrext:toffset

a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time

a=extmap:4 urn:3gpp:video-orientation

a=extmap:5 http://www.ietf.org/id/draft-hol ... de-cc-extensions-01

a=extmap:6 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay

a=sendrecv

a=rtcp-mux

a=rtcp-rsize

a=rtpmap:100 VP8/90000

a=rtcp-fb:100 ccm fir

//ccm是codec control using RTCP feedback message简称，意思是支持使用rtcp反馈机制来实现编码控制，fir是Full Intra Request

//简称，意思是接收方通知发送方发送幅完全帧过来

a=rtcp-fb:100 nack

//支持丢包重传，参考rfc4585

a=rtcp-fb:100 nack pli

//支持关键帧丢包重传,参考rfc4585

a=rtcp-fb:100 goog-remb

//支持使用rtcp包来控制发送方的码流

a=rtcp-fb:100 transport-cc

//参考上面opus

a=rtpmap:101 VP9/90000

a=rtcp-fb:101 ccm fir

a=rtcp-fb:101 nack

a=rtcp-fb:101 nack pli

a=rtcp-fb:101 goog-remb

a=rtcp-fb:101 transport-cc

a=rtpmap:107 H264/90000

a=rtcp-fb:107 ccm fir

a=rtcp-fb:107 nack

a=rtcp-fb:107 nack pli

a=rtcp-fb:107 goog-remb

a=rtcp-fb:107 transport-cc

a=fmtp:107 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f

//h264编码可选的附加说明

a=rtpmap:116 red/90000

//fec冗余编码，一般如果sdp中有这一行的话，rtp头部负载类型就是116，否则就是各编码原生负责类型

a=rtpmap:117 ulpfec/90000

//支持ULP FEC，参考rfc5109

a=rtpmap:96 rtx/90000

a=fmtp:96 apt=100

//以上两行是VP8编码的重传包rtp类型

a=rtpmap:97 rtx/90000

a=fmtp:97 apt=101

a=rtpmap:99 rtx/90000

a=fmtp:99 apt=107

a=rtpmap:98 rtx/90000

a=fmtp:98 apt=116

a=ssrc-group:FID 3463951252 1461041037

//在webrtc中，重传包和正常包ssrc是不同的，上一行中前一个是正常rtp包的ssrc,后一个是重传包的ssrc

a=ssrc:3463951252 cname:sTjtznXLCNH7nbRw

a=ssrc:3463951252 msid:h1aZ20mbQB0GSsq0YxLfJmiYWE9CBfGch97C ead4b4e9-b650-4ed5-86f8-6f5f5806346d

a=ssrc:3463951252 mslabel:h1aZ20mbQB0GSsq0YxLfJmiYWE9CBfGch97C

a=ssrc:3463951252 label:ead4b4e9-b650-4ed5-86f8-6f5f5806346d

a=ssrc:1461041037 cname:sTjtznXLCNH7nbRw

a=ssrc:1461041037 msid:h1aZ20mbQB0GSsq0YxLfJmiYWE9CBfGch97C ead4b4e9-b650-4ed5-86f8-6f5f5806346d

a=ssrc:1461041037 mslabel:h1aZ20mbQB0GSsq0YxLfJmiYWE9CBfGch97C

a=ssrc:1461041037 label:ead4b4e9-b650-4ed5-86f8-6f5f5806346d

m=application 9 DTLS/SCTP 5000

c=IN IP4 0.0.0.0

a=ice-ufrag:khLS

a=ice-pwd:cxLzteJaJBou3DspNaPsJhlQ

a=fingerprint:sha-256 FA:14:42:3B:C7:97:1B:E8:AE:0C2:71:03:05:05:16:8F:B9:C7:98:E9:60:43:4B:5B:2C:28:EE:5C:8F3:17

a=setup:actpass

a=mid:data

a=sctpmap:5000 webrtc-datachannel 1024

转一篇文章：

SDP 协议简单解析

SDP—Session Description Protocol

The Session Description Protocol, defined by RFC 2327 [1], was developed by the IETF MMUSIC working group. It is more of a description syntax than a protocol in that it does not provide a full-range media negotiation capability. The original purpose of SDP was to describe multicast sessions set up over the Internet's multicast backbone, the MBONE. The first application of SDP was by the experimental Session Announcement Protocol (SAP) [2] used to post and retrieve announcements of MBONE sessions. SAP messages carry a SDP message body, and was the template for SIP's use of SDP. Even though it was designed for multicast, SDP has been applied to the more general problem of describing general multimedia sessions established using SIP.

As seen in the examples of Chapter 3, SDP contains the following information about the media session:

IP Address (IPv4 address or host name);
Port number (used by UDP or TCP for transport);
Media type (audio, video, interactive whiteboard, and so forth);
Media encoding scheme (PCM A-Law, MPEG II video, and so forth).

In addition, SDP contains information about the following:

Subject of the session;
Start and stop times;
Contact information about the session.

Like SIP, SDP uses text coding. An SDP message is composed of a series of lines, called fields, whose names are abbreviated by a single lower-case letter, and are in a required order to simplify parsing. The set of mandatory SDP fields is shown in Table 2.1. The complete set is shown in Table 7.1.

Table 7.1: SDP Field List in Their Required Order
Field	Name	Mandatory/Optional
v=	Protocol version number	m
o=	Owner/creator and session identifier	m
s=	Session name	m
i=	Session information	o
u=	Uniform Resource Identifer	o
e=	Email address	o
p=	Phone number	o
c=	Connection information	m
b=	Bandwidth information	o
t=	Time session starts and stops	m
r=	Repeat times	o
z=	Time zone corrections	o
k=	Encryption key	o
a=	Attribute lines	o
m=	Media information	o
a=	Media attributes	o

SDP was not designed to be easily extensible, and parsing rules are strict. The only way to extend or add new capabilities to SDP is to define a new attribute type. However, unknown attribute types can be silently ignored. A SDP parser must not ignore an unknown field, a missing mandatory field, or an out-of-sequence line. An example SDP message containing many of the optional fields is shown here:

     v=0
     o=johnston 2890844526 2890844526 IN IP4 43.32.1.5
     s=SIP Tutorial
     i=This broadcast will cover this new IETF protocol
     u=http://www.digitalari.com/sip
     e=Alan Johnston [email protected]
     p=+1-314-555-3333 (Daytime Only)
     c=IN IP4 225.45.3.56/236
     b=CT:144
     t=2877631875 2879633673
     m=audio 49172 RTP/AVP 0
     a=rtpmap:0 PCMU/8000
     m=video 23422 RTP/AVP 31
     a=rtpmap:31 H261/90000

The general form of a SDP message is:

     x=parameter1 parameter2 ... parameterN

The line begins with a single lower-case letter x. There are never any spaces between the letter and the =, and there is exactly one space between each parameter. Each field has a defined number of parameters. Each line ends with a CRLF. The individual fields will now be discussed in detail.

7.1.1 Protocol Version

The v= field contains the SDP version number. Because the current version of SDP is 0, a valid SDP message will always begin with v=0.

7.1.2 Origin

The o= field contains information about the originator of the session and session identifiers. This field is used to uniquely identify the session. The field contains:


     o=username session-id version network-type address-type
     address

The username parameter contains the originator's login or host or - if none. The session-id parameter is a Network Time Protocol (NTP) [3] timestamp or a random number used to ensure uniqueness. The version is a numeric field that is increased for each change to the session, also recommended to be a NTP timestamp. The network-type is always IN for Internet. The address-type parameter is either IP4 or IP6 for IPv4 or IPv6 address either in dotted decimal form or a fully qualified host name.

7.1.3 Session Name and Information

The s= field contains a name for the session. It can contain any non-zero number of characters. The optional i=field contains information about the session. It can contain any number of characters.

7.1.4 URI

The optional u= field contains a uniform resource indicator (URI) with more information about the session.

7.1.5 E-Mail Address and Phone Number

The optional e= field contains an e-mail address of the host of the session. If a display name is used, the e-mail address is enclosed in <>. The optional p= field contains a phone number. The phone number should be given in globalized format, beginning with a +, then the country code, a space or −, then the local number. Either spaces or − are permitted as spacers in SDP. A comment may be present in ().

7.1.6 Connection Data

The c= field contains information about the media connection. The field contains:

     c=network-type address-type connection-address

The network-type parameter is defined as IN for the Internet. The address type is defined as IP4 for IPv4 addresses, IP6 for IPv6 addresses. The connection-address is the IP address that will be sending the media packets, which could be either multicast or unicast. If multicast, the connection-address field contains:


     connection-address=base-multicast-address/ttl/number-of-
       addresses

where ttl is the time-to-live value, and number-of-addresses indicates how many contiguous multicast addresses are included starting with the base-multicast-address.

7.1.7 Bandwidth

The optional b= field contains information about the bandwidth required. It is of the form:

     b=modifier:bandwidth-value

The modifier is either CT for conference total or AS for application specific. CT is used for multicast session to specify the total bandwidth that can be used by all participants in the session. AS is used to specify the bandwidth of a single site. The bandwidth-value parameter is the specified number of kilobytes per second.

7.1.8 Time, Repeat Times, and Time Zones

The t= field contains the start time and stop time of the session.

     t=start-time stop-time

The times are specified using NTP timestamps. For a scheduled session, a stop-time of zero indicates that the session goes on indefinitely. A start-time and stop-time of zero for a scheduled session indicates that it is permanent. The optional r= field contains information about the repeat times that can be specified in either in NTP or in days (d), hours (h), or minutes (m). The optional z= field contains information about the time zone offsets. This field is used if a reoccurring session spans a change from daylight-savings to standard time, or vice versa.

7.1.9 Encryption Keys

The optional k= field contains the encryption key to be used for the media session. The field contains:

     k=method:encryption-key

The method parameter can be clear, base64, uri, or prompt. If the method is prompt, the key will not be carried in SDP; instead, the user will be prompted as they join the encrypted session. Otherwise, the key is sent in the encryption-key parameter.

7.1.10 Media Announcements

The optional m= field contains information about the type of media session. The field contains:

     m=media port transport format-list

The media parameter is either audio, video, application, data, telephone-event, or control. The port parameter contains the port number. The transport parameter contains the transport protocol, which is either RTP/AVP or udp. (RTP/AVP stands for Real-time Transport Protocol [4] / audio video profiles [5], which is described in Section 7.3.) The format-list contains more information about the media. Usually, it contains media payload types defined in RTP audio video profiles. More than one media payload type can be listed, allowing multiple alternative codecs for the media session. For example, the following media field lists three codecs:

     m=audio 49430 RTP/AVP 0 6 8 99

One of these three codecs can be used for the audio media session. If the intention is to establish three audio channels, three separate media fields would be used. For non-RTP media, Internet media types should be listed in the format-list. For example,

     m=application 52341 udp wb

could be used to specify the application/wb media type.

7.1.11 Attributes

The optional a= field contains attributes of the preceding media session. This field can be used to extend SDP to provide more information about the media. If not fully understood by a SDP user, the attribute field can be ignored. There can be one or more attribute fields for each media payload type listed in the media field. For the RTP/AVP example in Section 7.1.10, the following three attribute fields could follow the media field:

     a=rtpmap:0 PCMU/8000
     a=rtpmap:6 DVI4/16000
     a=rtpmap:8 PCMA/8000
     a=rtpmap:99 iLBC

Other attributes are shown in Table 7.2. Full details of the use of these attributes are in the standard document [1]. The details of the iLBC (Internet Low Bit Rate) Codec are in [6].

Table 7.2: SDP Attribute Values
Attribute	Name
a=rtpmap:	RTP/AVP list
a=cat:	Category of the session
a=keywds:	Keywords of session
a=tool:	Name of tool used to create SDP
a=ptime:	Length of time in milliseconds for each packet
a=recvonly	Receive only mode
a=sendrecv	Send and receive mode
a=sendonly	Send only mode
a=orient:	Orientation for whiteboard sessions
a=type:	Type of conference
a=charset:	Character set used for subject and information fields
a=sdplang:	Language for the session description
a=lang:	Default language for the session
a=framerate:	Maximum video frame rate in frames per second
a=quality:	Suggests quality of encoding
a=fmtp:	Format transport
a=mid:	Media identification grouping
a=direction:	Direction for symmetric media
a=rtcp:	Explicit RTCP port (and address)
a=inactive	Inactive mode

7.1.12 Use of SDP in SIP

The use of SDP with SIP is given in the SDP Offer Answer RFC 3264 [7]. The default message body type in SIP is application/sdp. The calling party lists the media capabilities that they are willing to receive in SDP in either an INVITE or in an ACK. The called party lists their media capabilities in the 200 OK response to the INVITE. More generally, offers or answers may be in INVITEs, PRACKs, or UPDATEs or in reliably sent 18x or 200 responses to these methods.

Because SDP was developed with scheduled multicast sessions in mind, many of the fields have little or no meaning in the context of dynamic sessions established using SIP. In order to maintain compatibility with the SDP protocol, however, all required fields are included. A typical SIP use of SDP includes the version, origin, subject, time, connection, and one or more media and attribute fields as shown in Table 2.1. The origin, subject, and time fields are not used by SIP but are included for compatibility. In the SDP standard, the subject field is a required field and must contain at least one character, suggested to be s=− if there is no subject. The time field is usually set to t=0 0.

SIP uses the connection, media, and attribute fields to set up sessions between user agents. Because the type of media session and codec to be used are part of the connection negotiation, SIP can use SDP to specify multiple alternative media types and to selectively accept or decline those media types. When multiple media codecs are listed, the caller and called party's media fields must be aligned—that is, there must be the same number, and they must be listed in the same order. The offer answer specification, RFC 3264 [7], recommends that an attribute containing a=rtpmap: be used for each media field [7]. A media stream is declined by setting the port number to zero for the corresponding media field in the SDP response. In the following example, the caller Tesla wants to set up an audio and video call with two possible audio codecs and a video codec in the SDP carried in the initial INVITE:

     v=0
     o=Tesla 2890844526 2890844526 IN IP4 lab.high-voltage.org
     s=-
     c=IN IP4 100.101.102.103
     t=0 0
     m=audio 49170 RTP/AVP 0 8
     a=rtpmap:0 PCMU/8000
     a=rtpmap:8 PCMA/8000
     m=video 49172 RTP/AVP 32
     a=rtpmap:32 MPV/90000

     m=video 49172 RTP/AVP 32
     a=rtpmap:32 MPV/90000

The codecs are referenced by the RTP/AVP profile numbers 0, 8, and 32. The called party Marconi answers the call, chooses the second codec for the first media field and declines the second media field, only wanting a PCM A-Law audio session.

     v=0
     o=Marconi 2890844526 2890844526 IN IP4 tower.radio.org
     s=-
     c=IN IP4 200.201.202.203
     t=0 0
     m=audio 60000 RTP/AVP 8
     a=rtpmap:8 PCMA/8000
     m=video 0 RTP/AVP 32

     t=0 0
     m=audio 60000 RTP/AVP 8
     a=rtpmap:8 PCMA/8000
     m=video 0 RTP/AVP 32

If this audio-only call is not acceptable, then Tesla would send an ACK then a BYE to cancel the call. Otherwise, the audio session would be established and RTP packets exchanged. As this example illustrates, unless the number and order of media fields is maintained, the calling party would not know for certain which media sessions were being accepted and declined by the called party.

One party in a call can temporarily place the other on hold (i.e., suspending the media packet sending). This is done by sending an INVITE with identical SDP to that of the original INVITE but with a=sendonly attribute present. The call is made active again by sending another INVITE with the a=sendrecv attribute present. (Note that older RFC 2543 compliant UAs may initiate hold using c=0.0.0.0.) For further examples of SDP use with SIP, see the SDP Offer Answer Examples document [8].

from：https://blog.csdn.net/voipmaker/article/details/6111629

1.a字段

1.1 crypto属性

 a = crypto：<tag> <crypto-suite> <key-params> [<session-params>]

a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:d0RmdmcmVCspeEc3QGZiNWpVLFJhQX1cfHAwJSoj|2^20|1:32

标签：用于在offer/answer中选择一种crypto属性

加密套件：描述加密的标识符和身份验证算法

关键参数：method：info。目前method只有一种定义“inline”，表明秘钥就是info

会话参数：

参考自：https://tools.ietf.org/html/rfc4568#section-4

1.2 ssrc属性

a = ssrc：<ssrc-id> <attribute>：<value>

a=ssrc:2 cname:stream_1_cname

a=ssrc:2 label:video_track_id_1

attribute包括：cname(唯一标识一个客户端，一个客户端只有一个cname)

msid

mslabel

label

fmtp

参考自：https://tools.ietf.org/html/rfc5576#section-4

备注：label属性，可以参考：https://www.packetizer.com/rfc/rfc4574/

1.3 ssrc-group属性

 a=ssrc-group:  ...

a=ssrc-group:FEC 2 3

semantics：有FID（流识别），FEC（前向纠错），SIM（用于simulcate）。

FID：表示同一时刻只能只用一种codec，注意一个FID不要使用同一个port/ip。FID的实现场景：可以用于重传机制的实现

ssrc-id：有多个，表示一个组里面的所有ssrc

参考自：https://tools.ietf.org/html/rfc5576#section-4

备注：关于rtx的文档https://tools.ietf.org/html/rfc4588

1.4 rtpmap属性

 a=rtpmap:<payload type> <encoding name>/<clock rate> [/<encoding
         parameters>]

a=rtpmap:120 VP8/90000

payload type：有效载荷类型

encoding name：编码器

encoding parameters：如果是音频，可能表示的是通道数

（备注：有ulpfec和flexfec两种payload类型，参考文档为：

ulpfec：https://tools.ietf.org/html/rfc5109

flexfec：https://tools.ietf.org/html/draft-ietf-payload-flexible-fec-scheme-05）

参考自：https://tools.ietf.org/html/rfc4566

1.5 MediaContentDirection属性

a=sendrecv
a=recvonly
a=sendonly
a=inactive

参考自：https://tools.ietf.org/html/rfc4566

1.6 ice-ufrag 和 ice-pwd属性

a=ice-ufrag:<ufrag>
a=ice-pwd:<pwd>

a=ice-ufrag:ufrag_video

a=ice-pwd:pwd_video

ice打洞的用户名和密码

a=ice-ufrag:ufrag_video

a=ice-pwd:pwd_video

参考自：https://tools.ietf.org/html/rfc5245#section-15.4

1.7 candidate属性

a=candidate <foundation> <component-id> <transport> <priority> <connection-address> typ <candidate-types> <rel-addr> <rel-port>

a=candidate:a0+B/4 1 udp 2130706432 74.125.224.39 3457 typ relay generation 2

foundation:用来区别两个candidate是否是一样的类型，一样的base addr，一样的 stun server

component-id：从1开始递增。RTP的必须是1，RTCP必须是2

priority：优先级，不知道怎么用

cand-type：有四种”host”, “srflx”, “prflx”, “relay”。srflx即server reflexive, prflx即peer reflexive，relay即relayed candidates。应该是四种连接方式。

rel-addr:目前的理解是stun或turn服务器地址

rel-port:

参考自：https://tools.ietf.org/html/rfc5245

1.8 rtcp属性

a=rtcp:<port> <nettype> <addrtype> <connection-address>

a=rtcp:2347 IN IP4 74.125.127.126

rtcp的属性信息

参考自：https://tools.ietf.org/id/draft-ietf-mmusic-sdp4nat-00.txt

1.9 msid-semantic属性

a=msid-semantic:

a=msid-semantic: WMS local_stream_1

WMS表示Webrtc Media Streams

local_stream_1表示msid（msid具体作用应该是和ssrc对应）

参考自：https://tools.ietf.org/html/draft-alvestrand-rtcweb-msid-02#section-3

1.10 msid属性

a=msid:

a=msid: local_stream_1

The value of the “msid” attribute consists of an identifier and an optional “appdata” field.(msid属性由标识符和appdata组成)

This new attribute allows endpoints to associate RTP streams that are described in different media descriptions with the same MediaStreams（msid属性允许端点和RTP流连接在不同的media descriptions中使用相同的MediaStreams）

and to carry an identifier for each MediaStreamTrack in its “appdata” field（appdata放置MediaStreamTrack）

参考自：https://tools.ietf.org/html/draft-ietf-mmusic-msid-16#page-10

备注：webrtc中SdpSerialize函数第二个参数需要设置为true才可以有该属性，如果直接用jsep的toString函数，就不会有这个属性

1.11 group属性

a=group:<semantics> <semantics-extension>

a=group:BUNDLE

“a=group” lines are used to group together several “m” lines that are identified by their “mid” attribute（group属性用于通过mid标识符把多个m属性连接起来）

There MAY be several “a=group” lines in a session description.The “a=group” lines of a session description can use the same or different semantics（group属性可以有多个，并且可以有相同或不同的语义）

参考自：

https://tools.ietf.org/html/rfc5888

https://tools.ietf.org/html/draft-ietf-mmusic-sdp-bundle-negotiation-39

1.12 bundle-only属性

a=bundle-only

a=bundle-only

和group属性结合使用。表示不同的media使用同一个port

1.13 rtcp-fb属性

a=rtcp-fb:<payload> <param>

a=rtcp-fb:96 ccm fir

参考自：https://tools.ietf.org/html/rfc4585

1.14 rtcp-rsize属性

a=rtcp-rsize

a=rtcp-rsize

参考自：https://tools.ietf.org/html/rfc5506

1.15 fingerprint属性

a=fingerprint:<hash-func> <fingerprint>

a=fingerprint:SHA-1 4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB

参考自：https://tools.ietf.org/html/rfc4572#page-7

1.16 extmap属性

a=extmap:<id> <uri>

a=extmap:8 http://www.webrtc.org/experiments/rtp-hdrext/video-timing

rtp的头部扩展。具有三个属性：

1.非对称（可以表示recvonly，sendonly）

2.可以有相互排斥的选择（answer可以选择offer提供相同id中的其中一个rtpextension，id须为4096~4351）

3.在一个会话中可以表示多个头部扩展

参考自：https://tools.ietf.org/html/rfc5285

1.17 fmtp属性

a=fmtp:<payload> <param>

a=fmtp:97 apt=96

表示codec对应的payloadtype，以及param

参考自：https://tools.ietf.org/html/rfc4566

1.18 mid属性

a=mid:

a=mid:audio

表示media的名字，用于查找具体的media

1.19 setup属性

a=setup:

a=setup:active

表示连接中的角色，是主动连接，还是被动连接等

2 v字段

v=0

参考自：https://tools.ietf.org/html/rfc4566

3 o字段

o=(用户名)（会话标识）（版本）（网络类型）（地址类型）（地址）

o=- 18446744069414584320 18446462598732840960 IN IP4 127.0.0.1

参考自：https://tools.ietf.org/html/rfc4566

4 s字段

s=(会话名)

参考自：https://tools.ietf.org/html/rfc4566

5 m字段

m=(媒体)（端口）（传送层）（格式列表）

m=audio 2345 RTP/SAVPF 111 103 104

参考自：https://tools.ietf.org/html/rfc4566

6 b字段

传输速率

参考自：https://tools.ietf.org/html/rfc4566

offer/answer：

对于offer/answer，可以查看：

https://tools.ietf.org/html/rfc3264#page-8

注：

1.The answer MUST contain exactly the same number of “m=” lines as the offer（m属性的个数和offer的m属性个数要一致）

2.If the answerer has no media formats in common for a particular offered stream, the answerer MUST reject that media stream by setting the port to zero.（如果answer方没有和offer一样的media formats，那么就通过设置端口为0拒绝这个media stream）

3.answer拒绝：如果要拒绝掉一个media stream，那么就需要把拒绝的media的port设置为0，但是有一种情况要注意，就是a=bundle-only，在前面还有a=group:BUNDLE字段，表示几个media stream公用一个端口，这个时候的media可以设置port为0

CreateAnswer比较codec

1.对于audio和video，都会比较两者的name是否一致，如果payload小于等于95，也会比较id是否一致（因为小于等于95的都是静态的payload）

2.对于audio，会比较两者的clockrate，bitrate，channels必须都一致，或者其中一个为0。

3.对于video，如果是H264，则会比较profile-level-id是否一致

from: https://blog.csdn.net/myiloveuuu/article/details/78998183

SDP结构

from：https://blog.piasy.com/2018/10/14/WebRTC-API-Overview/index.html

SDP 基本结构

首先我们搞清楚 SDP 的基本结构。

总体来说，WebRTC 的 SDP 分为几个部分：

session metadata: v=, o=, s=, t=
network description: c=, a=candidate
stream description: m=, a=rtpmap, a=fmtp, a=sendrecv …
security description: a=crypto, a=ice-frag, a=ice-pwd, a=fingerprint
QoS, grouping description: a=rtcp-fb, a=group, a=rtcpmux

m= 开头的一段叫做 m section，这一行叫 m line，里面有很多 a line 来描述这种 media 的各种属性。我们称一种媒体数据为一种 media，每种 media 在 SDP 里都有 m section。

WebRTC 的 SDP 有三种类型：

offer: 发起方提供的自己对本次通话的描述；
answer: 其他方收到 offer 后，给出的回应；
pranswer: provisional answer，非最终 answer，之后可能被 pranswer 或 answer 更新；

Plan B v.s. Unified Plan

说到 SDP，就不得不提它的两种 Plan，它们是表达传输多路媒体流时的两种 SDP 格式。多路媒体流的例子有：录屏 + 相机，或多个相机（视角）。

Plan B 是 SDP 里同类型的媒体流只有一个 m line，同类型的多个媒体流之间通过 msid 区分，而 Unified Plan 则是每个媒体流都有一个 m line，因此如果有两路视频，那就会有两个 video m line。

WebRTC 标准采纳的是 Unified Plan，WebRTC 代码也已支持，所以我们就只关注 Unified Plan 的 API。

参考：Plan B, Unified Plan, Unified Plan vs Plan B。

Plan B 在 WebRTC 源码里对应的是 PC 的 Stream/Sender/Receiver API，Unified Plan 对应的是 Track/Transceiver API。

Capturer/Source/Track/Sink/Transceiver

接着我们梳理一下媒体数据交换过程中的几个关键概念：

Capturer: 负责数据采集，只有视频才有这一层抽象，它有多种实现，相机采集（Android 还有 Camera1/Camera2 两套）、录屏采集、视频文件采集等；
Source: 数据源；
Track: 媒体数据交换的载体，发送端把本地的 Track 发送给远程的接收端，每个 Track 都有自己的 track id，多个关联的 Track 有一个相同的 stream id；
Sink: Track 数据的消费者，只有视频才有这一层封装，发送端视频的本地预览、接收端收到远程视频后的渲染，都由 Sink 负责；
Transceiver: 负责收发媒体数据；

以视频为例，数据由发送端的 Capturer 采集，交给 Source，再交给本地的 Track，然后兵分两路，一路由本地 Sink 进行预览，一路由 Transceiver 发送给接收端；接收端 Track 则把数据交给 Sink 渲染。

Capturer 的创建和销毁完全由 APP 层负责，只需要把它和 Source 关联起来即可；创建 Source 需要调用 PC Factory 接口，创建 Track 也是，并且需要提供 Source 参数；Sink 的创建和销毁也由 APP 层负责，只需要把它们添加到 Track 里即可；创建 Transceiver 则需要调用 PC 接口。

好了，接下来我们就看看 PC Factory 和 PC 的接口。

PeerConnectionFactory 接口

CreatePeerConnectionFactory

默认的编译选项里，rtc_use_builtin_sw_codecs = false，因此 USE_BUILTIN_SW_CODECS 未被定义，CreatePeerConnectionFactory 只有一个重载版本：接收三个 thread、adm、audio/video encoder/decoder factory、AudioMixer 和 AudioProcessing。

CreatePeerConnection

创建 PC 对象，接收 RTCConfiguration 和 PeerConnectionDependencies，前者用来容纳各种配置，后者则用来容纳各种可定制的接口实现，例如 PortAllocator, AsyncResolverFactory, RTCCertificateGeneratorInterface, SSLCertificateVerifier。

目前 Android/iOS 对 dependencies 的支持还未跟上，虽然这种高级用法的用户也不怕在 native 层自己做封装，但就又得重新造一遍 WebRTC Java/ObjC 代码里的轮子了。

CreateAudio/VideoSource/Track

这就是前面我们提到的创建 Audio/Video Source/Track 的接口了。

PeerConnection 接口

准备工作相关：

AddTrack: 添加要发送的 track；
AddTransceiver: 添加 transceiver；
CreateDataChannel: 添加 DataChannel；
RemoveTrack: 移除 track；
GetTransceivers: 获取所有的 transceiver；

建立 P2P 连接相关：

CreateOffer: 创建 offer；
CreateAnswer: 创建 answer；
SetLocalDescription: 设置本地 SDP；
SetRemoteDescription: 设置远端 SDP；
AddIceCandidate: 添加 ICE candidate；
RemoveIceCandidates: 移除 ICE candidate；

注意：CreateOffer/CreateAnswer 时传入的 RTCOfferAnswerOptions 里，有 offer_to_receive_X 字段，它们是为了兼容 Plan B 语义的，一旦设置，即便没有 AddTrack，SDP 里也会包含 audio/video 的 m line。使用 Unified Plan 时，不应设置这两个字段，而应提前调用 AddTrack/AddTransceiver/CreateDataChannel，来表明自己是否需要 audio/video/data。

其他接口：

GetStats: 获取统计数据；
SetBitrate: 设置这个 PC 总的发送码率，包括初始码率、最小码率、最大码率；
SetBitrateAllocationStrategy: 设置自定义码率分配策略，可以通过这个接口实现针对每个 track 的码率分配策略；

注意：SetBitrateAllocationStrategy 在 Android 和 iOS 平台都没有暴露出来，Android 暴露了 SetBitrate 接口，iOS 则没有，不过可以通过 RTCRtpSender setParameters 限制编码器的输出码率。

回调接口 PeerConnectionObserver：

OnSignalingChange: 产生/设置 SDP 后，会触发 signaling state 变化，常见的变化是 stable -> have-local-offer -> stable 或 stable -> have-remote-offer -> stable，具体可以查看 SPEC 4.3 State Definitions；
OnRenegotiationNeeded: 需要重新协商（重新建立 P2P 连接）时回调，例如 ICE restart 时会回调；
OnIceGatheringChange: ICE candidate 收集状态变化后回调；
OnIceConnectionChange: ICE 连接状态变化后回调；
OnIceCandidate: 收集到本地 ICE candidate 后回调；
OnIceCandidatesRemoved: 本地 ICE candidate 被移除后回调；
OnTrack: 调用 SetRemoteDescription 后，如果 SDP 表明将会创建接收用的 transceiver，就会回调这个接口；
OnRemoveTrack: 当确定一个 track 不再接收媒体数据后，会回调这个接口，track 不会移除，但 transceiver 的 recv 方向将会被去掉；

接下来我们重点看一下 transceiver。

RtpTransceiver

SDP 的 m section 里有一行 a=mid:，定义了这种 media 的 id，叫 mid，例如下面这对 offer 和 answer:

# offer

...
a=group:BUNDLE 0 1 2
...
m=video 9 UDP/TLS/RTP/SAVPF 100 96 97 98 99 101 127 124 125
...
a=mid:0
...
m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 102 0 8 106 105 13 110 112 113 126
...
a=mid:1
...
m=application 9 DTLS/SCTP 5000
...
a=mid:2
...


# answer

...
a=group:BUNDLE 0 1 2
...
m=video 9 UDP/TLS/RTP/SAVPF 100 96 97 98 99 101 127 124
...
a=mid:0
...
m=audio 9 UDP/TLS/RTP/SAVPF 103 104 9 102 0 8 106 105 13 110 112 113 126
...
a=mid:1
...
m=application 9 DTLS/SCTP 5000
...
a=mid:2
...

其中有三种 media: video, audio, application，mid 依次为 0, 1, 2。application 是 DataChannel 的 media type。

我们注意到，offer 和 answer 里同一种 media 的 mid 是相同的，也就是说，对某一端来说，他收发的同一种媒体数据，mid 是相同的。

在 WebRTC 标准里，transceiver 表示的就是收发相同 mid 的 sender 和 receiver 的一个组合体，其中会有 media type, mid, direction, sender, receiver 等字段。其中 direction 有几种取值：kSendRecv, kSendOnly, kRecvOnly, kInactive。

AddTrack 时我们 add 的是本地的 track，即要发送的数据流，首次 AddTrack 时，会创建 transceiver，默认其 direction 是 kSendRecv。尽管在 CreateOffer 时我们可以通过设置 RTCOfferAnswerOptions 的 offer_to_receive_X 字段来控制是否 receive，但这两个字段是 legacy 字段，我们应该尽量避免。那如何控制 transceiver 的方向呢？我们可以使用 AddTransceiver 接口。

如果想要创建 kSendOnly 的 transceiver，可以传入 track，并在 RtpTransceiverInit 中设置 direction 为 kSendOnly；或者只传入 media type 和 init 结构体，稍后再 AddTrack。如果想要创建 kRecvOnly 的 transceiver，可以只传入 media type 和 init 结构体，并且不 AddTrack。

transceiver 何时与 SDP 里的 m section 关联呢？offer 端在创建 offer 时，会根据已有的 transceiver 创建 m section，并记下每个 transceiver 在 SDP 里对应的 m section 的 index 值，以便在 SetLocalDescription 时，可以为 transceiver 设置正确的 mid；answer 端在 SetRemoteDescription（offer 端发来的 offer）时，如果 offer 里的 m section 有 recv 方向，那就按 media type 来查找已有的 transceiver，如果能找到就可以将其关联起来，否则就创建一个 kRecvOnly 的 transceiver（因为 offer 只有可能是 kSendOnly 了，不发也不收的 media，不会出现在 SDP 里，那对此 offer 的回应也就只能是 kRecvOnly 了）。

总结一下，无论是 offer 端还是 answer 端，需要发送的 media，才提前添加好有 send 方向的 transceiver，仅接收的 media，无需提前添加 transceiver（提前添加了也不会被使用）。

附录一：SDP 部分细节

m line 里会指明传输协议，例如 UDP/TLS/RTP/SAVPF，最后的 SAVPF 还有其他几种值：AVP, SAVP, AVPF, SAVPF
- AVP 意为 AV profile
- S 意为 secure
- F 意为 feedback
rtpmap 是描述 codec 的，但有特殊的 rtx codec，其实不是 codec，例如 rtx；
fmtp 补充描述 codec 的参数，format parameters
- max-fr: maximum framerate
- profile-level-id: H.264 的 profile level id
rtx 描述重传策略，由 rtpmap 指明，它的参数由 fmtp 描述
- apt: associated payload type，指明所描述的 stream；
- rtx-time: rtp 包在缓冲区保留时间；
rtcp-fb: RTCP 反馈机制
- offer 里面列出一些反馈机制，answer 里应移除不理解、不支持的机制，但不能修改；
- ack rpsi/app
- nack pli/sli/rpsi/app
- rpsi: reference picture selection indication
- app: app 层反馈机制
- pli: picture loss indication，表明收流端丢失了一幅图像的一些数据，发送端可能会发送一个 I 帧（类似于 FIR），但要考虑拥塞控制
- sli: slice loss indication
- ccm fir: codec control message, full intra refresh
fec 类似于 rtx，也由 rtpmap 指明，它的参数由 fmtp 指明；

本文是 Piasy 原创，发表于 https://blog.piasy.com，请阅读原文支持原创 https://blog.piasy.com/2019/01/01/WebRTC-RTP-Mux-Demux/

之前我在为 janus-pp-rec 增加视频旋正功能一文中简单介绍了一点 RTP 协议的内容，重点关注的是视频方向的 RTP header extension，这次我们更深入的了解一下 RTP 协议的内容，看看 H.264 视频数据是如何封装和解封装的。

再谈 RTP 协议

我们首先了解一下 RTP H.264 相关的 RFC，下面的内容是对两篇 RFC 的总结：RTP: A Transport Protocol for Real-Time Applications, RTP Payload Format for H.264 Video。

RTP 包结构

包头有固定 12 个字节部分，以及可选的 csrc 和 ext 数据（在为 janus-pp-rec 增加视频旋正功能一文中有更详细的介绍）：

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           timestamp                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           synchronization source (SSRC) identifier            |
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |            contributing source (CSRC) identifiers             |
   |                             ....                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

接着是载荷数据，载荷长度在包头中有记录。载荷数据的格式，由不同的 profile 单独定义，profile 的 payload type 值，通过 SDP 协商确定。

下面我们了解一下 H.264 载荷的格式。

H.264 载荷

H.264 载荷数据的第一个字节格式和 NAL 头一样，其 type 定义如下：

      Table 1.  Summary of NAL unit types and the corresponding packet
                types

      NAL Unit  Packet    Packet Type Name               Section
      Type      Type
      -------------------------------------------------------------
      0        reserved                                     -
      1-23     NAL unit  Single NAL unit packet             5.6
      24       STAP-A    Single-time aggregation packet     5.7.1
      25       STAP-B    Single-time aggregation packet     5.7.1
      26       MTAP16    Multi-time aggregation packet      5.7.2
      27       MTAP24    Multi-time aggregation packet      5.7.2
      28       FU-A      Fragmentation unit                 5.8
      29       FU-B      Fragmentation unit                 5.8
      30-31    reserved                                     -

H.264 载荷数据的封包有三种模式：Single NAL unit mode (0), Non-interleaved mode (1), Interleaved mode (2)。它们各自支持的 type 见下表：

      Table 3.  Summary of allowed NAL unit types for each packetization
                mode (yes = allowed, no = disallowed, ig = ignore)

      Payload Packet    Single NAL    Non-Interleaved    Interleaved
      Type    Type      Unit Mode           Mode             Mode
      -------------------------------------------------------------
      0      reserved      ig               ig               ig
      1-23   NAL unit     yes              yes               no
      24     STAP-A        no              yes               no
      25     STAP-B        no               no              yes
      26     MTAP16        no               no              yes
      27     MTAP24        no               no              yes
      28     FU-A          no              yes              yes
      29     FU-B          no               no              yes
      30-31  reserved      ig               ig               ig

注意：WebRTC iOS H.264 编码时，无论是 baseline 还是 high profile，都是使用的 Non-interleaved mode，WebRTC Android 也是如此。

因此 WebRTC 里实际使用的只有三种封包模式：NAL unit, STAP-A, FU-A。那我们接下来就看一下这三种模式。

NAL unit

如果 type 为 [1, 23]，则该 RTP 包只包含一个 NALU：

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |F|NRI|  Type   |                                               |
    +-+-+-+-+-+-+-+-+                                               |
    |                                                               |
    |               Bytes 2..n of a single NAL unit                 |
    |                                                               |
    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                               :...OPTIONAL RTP padding        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    Figure 2.  RTP payload format for single NAL unit packet

包聚合（Aggregation Packets）

为了体现/应对有线网络和无线网络的 MTU 巨大差异，RTP 协议定义了包聚合策略：

STAP-A：聚合的 NALU 时间戳都一样，无 DON（decoding order number）；
STAP-B：聚合的 NALU 时间戳都一样，有 DON；
MTAP16：聚合的 NALU 时间戳不同，时间戳差值用 16 bit 记录；
MTAP24：聚合的 NALU 时间戳不同，时间戳差值用 24 bit 记录；
包聚合时，RTP 的时间戳是所有 NALU 时间戳的最小值；

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |F|NRI|  Type   |                                               |
    +-+-+-+-+-+-+-+-+                                               |
    |                                                               |
    |             one or more aggregation units                     |
    |                                                               |
    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                               :...OPTIONAL RTP padding        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    Figure 3.  RTP payload format for aggregation packets

STAP-A 示例：

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                          RTP Header                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |STAP-A NAL HDR |         NALU 1 Size           | NALU 1 HDR    |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                         NALU 1 Data                           |
    :                                                               :
    +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |               | NALU 2 Size                   | NALU 2 HDR    |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                         NALU 2 Data                           |
    :                                                               :
    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                               :...OPTIONAL RTP padding        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    Figure 7.  An example of an RTP packet including an STAP-A
               containing two single-time aggregation units

包拆分（Fragmentation Units，FUs）

在应用层实现包拆分而不是依赖下层网络的拆分机制，好处有二：

可以支持超过 64 KB（IPv4 包最大长度为 64 KB）的 NALU，高清视频文件可能有超大的 NALU；
可以利用 FEC（forward error correction）；

每个分包都有一个编号，一个 NALU 拆分的 RTP 包其序列必须顺序且连续，中间不得插入其他数据的 RTP 包序号。FU 只能拆分 NALU，STAP 和 MTAP 不能拆分，FU 也不能嵌套。FU-A 没有 DON，FU-B 有 DON。

FU-A 格式如下：

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | FU indicator  |   FU header   |                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
    |                                                               |
    |                         FU payload                            |
    |                                                               |
    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                               :...OPTIONAL RTP padding        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    Figure 14.  RTP payload format for FU-A

FU header 格式如下：

      +---------------+
      |0|1|2|3|4|5|6|7|
      +-+-+-+-+-+-+-+-+
      |S|E|R|  Type   |
      +---------------+

S: start bit, 置一表明这是 NALU 的首个 fragment；
E: end bit, 置一表明是 NALU 的最后一个 fragment；
R: reserved，必须置零；
Type: 取值含义与 NALU header 的 type 字段一致；

WebRTC H.264 封包实现

了解完了理论部分，接下来我们看看 WebRTC 里是如何实现的，WebRTC 把视频数据封装成 RTP packet 的逻辑在 RTPSenderVideo::SendVideo 函数中。

`RTPSenderVideo::SendVideo`

其实封包的过程，就是计算一帧数据需要封多少个包、每个包放多少载荷，为此我们需要知道各种封包模式下，每个包的最大载荷（包大小减去头部大小）。

首先计算一个包的最大容量，这个容量是指可以用来容纳 RTP 头部和载荷的容量，FEC、重传的开销排除在外：

// Maximum size of packet including rtp headers.
// Extra space left in case packet will be resent using fec or rtx.
int packet_capacity = rtp_sender_->MaxRtpPacketSize() - fec_packet_overhead -
                    (rtp_sender_->RtxStatus() ? kRtxHeaderSize : 0);

rtp_sender_->MaxRtpPacketSize 默认会被设置为 1460，但如果需要发送视频，则会被设置为 1200。（why ???）

接着准备四种包的模板：

single_packet: 对应 NAL unit 和 STAP-A 的包；
first_packet: 对应 FU-A 的首个包；
middle_packet: 对应 FU-A 的中间包；
last_packet: 对应 FU-A 的最后一个包；

准备过程包括：

在 RTPSender::AllocatePacket 里设置 ssrc 和 csrcs 字段，预留 AbsoluteSendTime, TransmissionOffset 和 TransportSequenceNumber extension 的空间，并且按需设置 PlayoutDelayLimits 和 RtpMid extension；
在 RTPSenderVideo::SendVideo 里设置 payload_type, rtp_timestamp 和 capture_time_ms 字段；
在 AddRtpHeaderExtensions 里按需设置 VideoOrientation, VideoContentTypeExtension, VideoTimingExtension 和 RtpGenericFrameDescriptorExtension extension；
first_packet, middle_packet 和 last_packet 均是拷贝自 single_packet，因此代码里只调用了 AddRtpHeaderExtensions 设置它们的 extension；

这些模板一是后续封包时可以直接拿来用，二是可以准确地知道 RTP 头部需要多少空间，正如注释所言：

Simplest way to estimate how much extensions would occupy is to set them.

知道了每种包的头部需要多少空间后，就知道每个包最多可以容纳多少载荷了（为 RtpPacketizer::PayloadSizeLimits 的各个字段赋值）：

max_payload_len，最大载荷可用空间：包的最大容量减去中包头部大小；
single_packet_reduction_len，封单包时，载荷可用空间还需要在 max_payload_len 的基础上打个折扣：单包与中包头部大小之差；即包的最大容量减去单包头部大小；
first_packet_reduction_len，封多包时，首包载荷可用空间也需要在 max_payload_len的基础上打个折扣：首包与中包头部大小之差；
last_packet_reduction_len，封多包时，末包载荷可用空间也需要在 max_payload_len的基础上打个折扣：末包与中包头部大小之差；

准备好了模板、知道了 limits 之后，就创建 RtpPacketizer，通过其 NumPackets 接口得知这一帧图像需要封装为多少个包，再调用其 NextPacket 封装每个包。调用 NextPacket 之后还不算完，还得调用 RTPSender::AssignSequenceNumber 分配序列号，如果需要设置 VideoTimingExtension，还得设置 packetization_finish_time_ms。最后，就是调用 FEC 处理，或直接调用 RTPSenderVideo::SendVideoPacket 发送 RTP 报文了。

视频编码为 H.264 时，RtpPacketizer 的实现类是 RtpPacketizerH264，接下来，我们就看一下 H.264 的封包逻辑。

`RtpPacketizerH264`

RtpPacketizerH264 构造时会根据 RTPFragmentationHeader 的内容，生成 RtpPacketizerH264::Fragment 数组 input_fragments_，Fragment 里面包含了每个 NALU 载荷起始字节的指针、NALU 的长度。

RTPFragmentationHeader 其实就是这帧图像里面每个 NALU 的信息：载荷在 buffer 里的 offset、载荷长度。这些信息在编码器输出数据之后解析生成，扫描整个 buffer，查找 NALU start code（001 或 0001），统计每个 NALU 的 offset 和长度。安卓的实现在 sdk/android/src/jni/videoencoderwrapper.cc 的 VideoEncoderWrapper::ParseFragmentationHeader 中，iOS 的实现在 sdk/objc/components/video_codec/nalu_rewriter.cc 的 H264CMSampleBufferToAnnexBBuffer 中。

H.264 规范里定义了一幅图像分片为多个 NALU 的功能，但我观察了一下 iPhone 6 编出来的数据，非关键帧都只有一个 NALU，关键帧有两个 NALU，而且前面都添加了 SPS 和 PPS，所以关键帧会有四个 NALU。

有了 input_fragments_ 后，就会在 GeneratePackets 中遍历之，对每个 Fragment，根据 packetization_mode 执行不同的封包逻辑：

如果是 SingleNalUnit，那就为这个 Fragment（其实就是一个 NALU）生成一个 PacketUnit；

如果是 NonInterleaved（WebRTC Native SDK 实际使用的 mode），那就看这个 Fragment 能否放进单个包里，先计算单个包能容纳多少数据：

  int single_packet_capacity = limits_.max_payload_len;
  if (input_fragments_.size() == 1)
      single_packet_capacity -= limits_.single_packet_reduction_len;
  else if (i == 0)
      single_packet_capacity -= limits_.first_packet_reduction_len;
  else if (i + 1 == input_fragments_.size())
      single_packet_capacity -= limits_.last_packet_reduction_len;

逻辑并不复杂，max_payload_len 扣除各种情况的折扣之后，剩下的就是 single_packet_capacity；
如果 fragment_len > single_packet_capacity，就说明无法放进单个包，那就要做 Fragmentation 了，即调用 PacketizeFuA，否则说明可以放进单个包，那就可以做 Aggregation，即调用 PacketizeStapA；
PacketizeFuA 就是看怎么把一个 Fragment 分成多个包了，然后生成每个 PacketUnit，这个分的逻辑实现在 SplitAboutEqually 函数中，里面处理了不少边界情况，大体思想就是把数据放进尽可能少的包、每个包的大小尽可能相近；它生成的 PacketUnit 的 aggregated 字段都是 false；
PacketizeStapA 则是看能把多少个 Fragment 放进一个包，这里也会为每个 Fragment 生成一个 PacketUnit，但只会对 num_packets_left_ 做一次加一操作；它生成的 PacketUnit 除了最后一个的 aggregated 字段为 false，其他都为 true；

GeneratePackets 执行完毕后，就算出了 num_packets_left_ 的值，即此帧图像需要多少个 RTP包，并且也准备好了 PacketUnit 数组。

之后在 RTPSenderVideo::SendVideo 里就会调用 num_packets_left_ 次 NextPacket 来实际组装每一个 RTP 包了，我们现在就看看 NextPacket 的逻辑：

检查首个 PacketUnit：
如果 PacketUnit 的 first_fragment 和 last_fragment 字段都是 true，那就直接把载荷拷进去；
这种情况有可能是 SingleNalUnit，也有可能是 NonInterleaved 的 STAP-A 包，因为 NonInterleaved 时，如果 Fragment 可以放进一个包，那就会封为 STAP-A，而如果只生成了一个 PacketUnit，那它的 first_fragment 和 last_fragment 都会是 true；
否则，如果 aggregated 字段为 true，那就调用 NextAggregatePacket 封 STAP-A 包；
- 这里只提一点我看了比较久才看清楚的逻辑：这个函数里通过一个循环不停的消费 PacketUnit，退出循环的条件是 !packet->aggregated 或 packet->last_fragment，由于需要放进一个包的一系列 PacketUnit 里只有最后一个 last_fragment 字段为 true（这个逻辑在 PacketizeStapA 里），因此可以正确退出循环；
如果 aggregated 字段为 false，就调用 NextFragmentPacket 封 FU-A 包；

好了，至此我们就已经看完了 H.264 封装 RTP 包的逻辑，可以长舒一口气了 :)

WebRTC H.264 解包实现

了解了封包的实现，我们接下来看看解包是怎么实现的，解包比封包稍微复杂一点，关键就在于包的到达可能是乱序的（丢包重传也可以认为是一种乱序）。

解包过程包括两大步：先解析出 RTP 的头部和载荷；再解析载荷部分，根据不同的封包模式，对封包过程做一个逆操作，就能得到一帧完整的数据。前者在 Call::DeliverRtp 中调用 RtpPacket::ParseBuffer 中实现，后者则比较复杂，因为需要处理乱序问题，逻辑起始点是 RtpVideoStreamReceiver::ReceivePacket 函数。

`RtpPacket::ParseBuffer`

ParseBuffer 的任务有三点：

解析 RTP 标准头的各个字段，包括 payload_type_, sequence_number_, timestamp_, ssrc_ 等；
解析 RTP 扩展头的元数据，即偏移量和长度；
确定 RTP 载荷的偏移量和长度，完成了第二点后做个减法就可以得到；

`RtpVideoStreamReceiver::ReceivePacket`

首先根据不同的 payload type，创建不同的 RtpDepacketizer 去解析载荷内容，H.264 的解析逻辑在 RtpDepacketizerH264::Parse 中实现，其主要任务就是找到实际数据的位置和大小：

检查载荷的第一个字节里的 type 字段（低五位），以判断包类型（NAL unit, STAP-A, FU-A）；
FU-A 的解析在 ParseFuaNalu 里完成；
NAL unit 和 STAP-A 的解析在 ProcessStapAOrSingleNalu 里完成；
具体解析代码这里不做展开；

然后解析 RTP 扩展头的实际数据，包括 VideoOrientation 等。

最后构造 VCMPacket，并调用 PacketBuffer::InsertPacket 放入包缓冲区中。

`PacketBuffer::InsertPacket`

PacketBuffer 封装了 RTP 包处理乱序到达的逻辑，大体思路就是：

收到每个包之后，检查序列号：
- 确定已经收到过的包，就会直接丢弃；
- 否则就把包放进 data_buffer_ 数组里，并在 sequence_buffer_ 数组里记下这个序列号的一些属性；
- 每个包在上述两个数组里存放的下标是序列号模以数组大小，因此是按序列号顺序存放的；
调用 FindFrames，从已收到的包列表中，找出完整的帧；
把完整的帧交给 RtpFrameReferenceFinder::ManageFrame，由其确保帧可以解码后，再回调出去，进入后续的解码环节；

`PacketBuffer::FindFrames`

每次收到包后，会触发 FindFrames，我们会从刚收到的包的序列号向后查找：

只有一个包满足以下两个条件之一才会进行检查：
- 该包是「帧起始」包；
- 该包前一个序号的包是连续的，何谓连续？就是说它是帧起始包，或它之前的所有序列号都已经收到了；
- 举个例子，假设 1 是帧起始包，那收到 1 的时候肯定会检查，之后收到 2 时，由于 1 是连续的，所以 2 也会检查，但如果收到 4（假设 4 不是帧起始），那 4 就不会检查，再收到 3 时，就会依次检查 3 和 4；
我们首先感兴趣的是「帧末尾」包，即有 packet->is_last_packet_in_frame 标志；
找到帧末尾包后，再反过来向前查找「帧起始」包；
- VP8/VP9 靠 frame_begin（即 packet->is_first_packet_in_frame）标志判断帧起始，H.264 则靠时间戳的变化来判断帧起始；
- 正常情况下，由于我们只检查帧起始包和连续包，所以一旦找到了帧末尾包，向前就一定能找到帧起始包；
找到了帧末尾包和帧起始包，就可以构造完整的帧了，不过这里只是记录元数据，不会做帧数据的拷贝；

`RtpFrameReferenceFinder::ManageFrame`

从载荷里解析出来的帧数据都是完整的帧，但不一定能解码，比如 H.264 有前向参考（P 帧需要参考前面的 I 帧才能解码），也有后向参考（B 帧需要参考前面的 I/P 帧和后面的 P 帧才能解码），因此需要等这一帧的参考帧都收到之后，才能回调出去。

虽然 PacketBuffer 处理了 RTP 报文乱序到达的问题，输出了一个个完整的帧，但并没有保证帧是按序到达的，所以仍需 RtpFrameReferenceFinder 来处理帧乱序到达的问题。

RtpFrameReferenceFinder 的代码细节这里就不展开了，有兴趣/需求的朋友可以自行阅读。

好了，至此我们就已经看完了 H.264 解封装 RTP 包的逻辑，可以再长舒一口气了 :)

WebRTC RTP 封包解包相关数据结构

最后，我们再总结一下 WebRTC RTP 封包解包相关数据结构：

RtpPacket: RTP 报文的数据结构，里面定义了各种标准头部字段、扩展头部、数据缓冲区等；
RtpPacketToSend: 发送端封包用到的数据结构，继承自 RtpPacket，加了一些扩展头部设置逻辑的封装；
RtpPacketReceived: 接收端解包用到的数据结构，也继承自 RtpPacket，加了获取扩展头部逻辑的封装；

最后的最后，我再分享一个内容：序列号的比较算法。

序列号的比较算法

由于序列号可能发生回绕，所以不能直接比较，有一个 RFC 文档专门定义了这个比较算法：Serial Number Arithmetic。

这个 RFC 里首先定义了序列号的定义法：n 位无符号数，最低序列号为 0，最高序列号为 2^n-1，序列号没有最大最小值，每个序列号至少需要 n 位来保存。

接着它定义了序列号的加法：在 [0, 2^n-1] 范围内的合法序列号值 s，加 m 的值为 (s+m) % (2^n)，这里的加法和取模，都是常规定义的加法和取模。

最后它定义了序列号的比较算法（RFC 里为了严谨，引入了另外两个普通正整数，这里简单起见我们就不引入了）：

判等：序列号 s 和 s+m（m 为普通正整数），只有 m 为 0 时，它们才相等；即给定两个序列号值，完全无法判断其是否相等，不过通常我们不需要判等，而是判断大小；
判小：当且仅当 (s1 < s2 && s2 - s1 < 2^(n-1)) || (s1 > s2 && s1 - s2 > 2^(n-1)) 时，序列号 s1 小于 s2；即值小不过一半范围，或大过一半范围，例如 n=3，2-1 < 4，故 1 比 2 小，7-2 > 4，故 7 比 2 小；
判大：当且仅当 (s1 < s2 && s2 - s1 > 2^(n-1)) || (s1 > s2 && s1 - s2 < 2^(n-1)) 时，序列号 s1 大于 s2；即值小过一半范围，或大不过一半范围，例如 n=3，7-2 > 4，故 2 比 7 大，2-1 < 4，故 2 比 1 大；

细心的朋友也许会举出一个例子：7 和 3 谁大谁小？它们其实无法区分大小。就像 3 和 3 是否相等一样，无法区分。RFC 里故意不对这种序列号对的大小问题作出定义，因为着实不好定义。

WebRTC 的实现逻辑主要在 rtc_base/numerics/sequence_number_util.h 和 rtc_base/numerics/mod_ops.h 中：

template 
inline bool AheadOf(T a, T b) {
  static_assert(std::is_unsigned::value,
                "Type must be an unsigned integer.");
  return a != b && AheadOrAt(a, b);
}

template 
inline typename std::enable_if<(M == 0), bool>::type AheadOrAt(T a, T b) {
  static_assert(std::is_unsigned::value,
                "Type must be an unsigned integer.");
  const T maxDist = std::numeric_limits::max() / 2 + T(1);
  if (a - b == maxDist)
    return b < a;
  return ForwardDiff(b, a) < maxDist;
}

template 
inline typename std::enable_if<(M == 0), T>::type ForwardDiff(T a, T b) {
  static_assert(std::is_unsigned::value,
                "Type must be an unsigned integer.");
  return b - a;
}

首先序列号必须是无符号数；
然后 WebRTC 定义了「前向距离」这个概念，即后数加多少能加到前数（考虑无符号数的溢出）；
还定义了「最大距离」这个概念，可以理解为两个数之差的绝对值的最大可能取值，也就是最大取值范围的一半（向上取整）；
最后，a 领先于 b 的的条件就是：若 ab 前向距离为最大距离，那 a 大于 b 就是领先于 b，否则，若 ab 前向距离小于最大距离，那 a 就领先于 b；

其实就是通过无符号数减法的溢出，把 RFC 定义的两种或起来的情况统一了，以及对于 RFC 未定义的情况，定义成了值大小的比较。

你可能感兴趣的:(WebRTC)

H5播放webrtc视频视频处理html5
一、简介WebRTC概念WebRTC是由Google主导的，由一组标准、协议和JavaScriptAPI组成，用于实现浏览器之间（端到端之间）的音频、视频及数据共享。WebRTC不需要安装任何插件，通过简单的JavaScriptAPI就可以使得实时通信变成一种标准功能。为什么使用webrtc现在各大浏览器以及终已经逐渐加大对WebRTC技术的支持。下图是webrtc官网给出的现在已经提供支持了的浏
pion/webrtc interceptor yinhezhanshen webrtc go
interceptor是pion/webrtc中一个很重要的组件。它是一个用于构建RTP/RTCP通信软件的框架。该框架定义了每个拦截器必须满足的接口。然后依次运行这些拦截器。它还提供了对构建RTC软件有用的通用拦截器。这个软件包是为pion/webrtc构建的，但其设计思想是不仅仅pion/webrtc可以使用，其他的工程都可以使用。它有以下原则：有用的默认值。每个拦截器都将被配置为提供良好的默
【webrtc】rtp扩展：绝对发送时间、绝对采集时间、时间戳插值等风来不如迎风去 WebRTC入门与实战 webrtc
WebRTCRTP标头扩展审查绝对发送时间和绝对采集时间，二者是不同的当没有打开绝对采集时间戳的扩展时，webrtc依旧会使用AbsoluteCaptureTimeInterpolator在接收测估算绝对采集时间戳。AbsoluteSendTime带宽估计算法用TheAbsoluteSendTimeextensionisusedtostampRTPpacketswithatimestampshow
WebRTC简介及实战应用 — 从0到1实现实时音视频聊天等功能不怕麻烦的鹿丸 HTML5 JavaScript WebRTC webrtc 实时音视频前端音视频 javascript html5 html
一、WebRTC简介WebRTC是由一家名为GobalIPSolutions，简称GIPS的瑞典公司开发的。Google在2011年收购了GIPS，并将其源代码开源。然后又与IETF和W3C的相关标准机构合作，以确保行业达成共识。其中：WebReal-TimeCommunications(WEBRTC)W3C组织：定义浏览器API。Real-TimeCommunicationinWeb-brows
WebRTC简介小柒的博客 5.linux webrtc
WebRTC简介WebRTC（WebReal-TimeCommunication）是一种支持浏览器之间进行实时音视频通信和数据传输的开放标准和技术。它由Google发起，现已成为W3C和IETF的标准。WebRTC允许开发者在不依赖第三方插件或软件的情况下，直接在网页中实现点对点（P2P）的实时通信。直接在浏览器中运行，无需安装额外的插件或软件。WebRTC是开源技术，并且是W3C和IETF的标准
B站自研的第二代视频连麦系统（上）哔哩哔哩技术 webrtc 直播
导读本系列文章将从客户端、服务器以及音视频编码优化三个层面，介绍如何基于WebRTC构建视频连麦系统。希望通过这一系列的讲解，帮助开发者更全面地了解WebRTC的核心技术与实践应用。背景在文章《B站在实时音视频技术领域的探索与实践》中，提到了直播行业从传统娱乐直播发展到教育、电商等新形式，用户对实时互动直播的需求增加。B站基于WebRTC的开发了一套视频连麦系统：这套系统优先选择UDP协议以保证低
WebRTC新增FFmpeg视频编解码模块程序员老舅音视频开发进阶 webrtc 视频编解码实时音视频音视频 c++
1整体描述目前webrtc内置的视频编解码器包括：VP8、VP9、AV1和H264。一般情况下载pc端基本可以满足大部分的需求，但是有时候为了进行编解码器的扩展包括支持H265或者是支持硬件编解码以提升效率时需要新增编解码模块。2新增外部编码器编码器实现的要点包括两个部分：一是需要实现以VideoEncoder为基类的编码器对象，核心API实现如下：（1）初始化编码器，将编码参数传入进行初始化。v
【owt】构建m79的owt-client-native：使用vs2017 等风来不如迎风去 WebRTC入门与实战 windows
家里电脑换成了台式机，拷贝代码发现了三年前的owt客户端mfc工程。不用下载第三方库，试着构建下：owt-client-native我这里有3年前的代码，思索了下还是用vs2017构建吧：重新构建一下选用x86的vs2017vs的命令行控制台cls可以清理屏幕之前构建过vs2022的webrtc原版所以这里构建都在控制塔设置环境变量。遇到了一些问题，记录下来：可以看到默认使用了我电脑安装的最新的s
浅谈流媒体协议以及视频编解码纠结哥_Shrek 视频编解码
流媒体协议介绍流媒体协议用于传输视频、音频等多媒体数据，确保数据流畅地传输到用户设备。常见的流媒体协议包括RTMP、HLS、DASH、WebRTC等，每种协议具有不同的特点和适用场景。1.RTMP(Real-TimeMessagingProtocol)定义：由Adobe提出的实时消息传输协议，最初用于Flash播放器的流媒体播放。特点：低延迟，适合实时直播。支持推流和拉流，常用于直播推流（主播端推
【转载】2020融云：基于WebRTC的低延迟视频直播等风来不如迎风去 WebRTC入门与实战 webrtc 音视频网络
原文直接访问本文是读书笔记。基于WebRTC的低延迟视频直播需要学习rtp包的缓存设计，于是找到了这一篇文章rtp包缓存如何适应直播需求？直播与实时通信的区别流量更少：RTMP或者HLS主要基于TCP传输，WebRTC是基于UDP的传输，**UDP协议的头小。**TCP为了保证传输质量，因此会产生很多ACK，在网络不好的情况下会产生很多重传包，而WebRTC传输是基于RTP和RTCP，重传策略是基
webrtc在ubuntu系统下的编译备忘录 BZJ110 ffmpge 视频音频 webrtc 音频编码解码
刚刚开始接触rtc相关领域，第一步就是尝试下载代码和编译，运行测试demo，按照网上的教程执行成功，现在记录下来，当做备忘录，提供以后查看学习。转载并修改点击http://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/depot_tools.html进行查看depot_tools工具的li
WebRTC中RTP/RTCP协议实现分析音视频开发老马音视频开发流媒体服务器 Android音视频开发网络网络协议 webrtc 音视频流媒体
一前言RTP/RTCP协议是流媒体通信的基石。RTP协议定义流媒体数据在互联网上传输的数据包格式，而RTCP协议则负责可靠传输、流量控制和拥塞控制等服务质量保证。在WebRTC项目中，RTP/RTCP模块作为传输模块的一部分，负责对发送端采集到的媒体数据进行进行封包，然后交给上层网络模块发送；在接收端RTP/RTCP模块收到上层模块的数据包后，进行解包操作，最后把负载发送到解码模块。因此，RTP/
语聊房软件开发流程与基础功能 ALLSectorSorft java html5 javascript
开发一款语聊房软件需要系统的规划和多领域技术整合。以下是关键流程、基础功能及示例代码：---一、开发流程1.需求分析-明确目标用户（社交/游戏/教育）-竞品分析（Clubhouse/Discord/狼人杀）-核心功能优先级排序2.技术选型-实时语音：声网Agora（推荐）/腾讯云TRTC/WebRTC-即时通讯：Socket.io/Sendbird/Firebase-后端框架：Node.js/Sp
WebRTC学习二：WebRTC音视频数据采集 m0_74823239 webrtc 学习音视频
系列文章目录第一篇基于SRS的WebRTC环境搭建第二篇基于SRS实现RTSP接入与WebRTC播放第三篇centos下基于ZLMediaKit的WebRTC环境搭建第四篇WebRTC学习一：获取音频和视频设备第五篇WebRTC学习二：WebRTC音视频数据采集文章目录系列文章目录前言一、获取音视频流1.设置请求的媒体流参数2.调用getUserMedia3.处理获取到的媒体流4.处理错误二、完整
【开发日志】数字人+LLM：从概念到实现的全程记录！ AI大模型-王哥大模型学习大模型教程大模型人工智能 LLM 数字人大模型入门
数字人是各种技术的集合，所以文章尽可能完整的介绍，项目中涉及的大小模型均可在本地部署并在我本人机器上运行。系统环境：CPU:i91490016GBGPU:GTX40608GBSYS:Windows11WSL:Ubuntu22.04本文章使用到的技术内容:数字人框架:LiveTalking大模型:Llama3.1TTS:GPT-SoVits语音转视频:Wav2Lip前端展示：WebRTC项目整体架构
前端修仙路 - WebRTC之设备管理 terryvvan 前端 webrtc
WebRTC-设备管理WebRTC（WebReal-TimeCommunications）是一项实时通讯技术，它允许网络应用或者站点，在不借助中间媒介的情况下，建立浏览器之间点对点（Peer-to-Peer）的连接，实现视频流和（或）音频流或者其他任意数据的传输。本文主要介绍关于摄像头、麦克风，扬声器、屏幕共享相关控制和调用。申请权限与设备开启//getUserMedia获取设备流的同时，浏览器会
webrtc-stream和m7s可以实现thingsboard加载视频的功能，还有其其他网页的方法也可以实现如flask 鼾声鼾语自动驾驶云平台音视频 python 开发语言 ubuntu github
方法1，推荐参考的案例：去学习webrtc-stream下载地址：去下载方法2，推荐：m7s下载地址：去下载m7s流媒体服务搭建方法方法3不推荐，不能够直接迁入到thingsboard中flask实现网页读取rtsp视频流创建app.py在app.py同级目录下创建文件夹和htmltemplates/index.html启动app.py，然后在地址中就可以看到视频了
vue3读取webrtc-stream 视频流 cyw8998 webrtc
一.首先下载webrtc-stream，方便自己本地搭建视频流服务https://download.csdn.net/download/cyw8998/90373521解压后，启动命令webrtc-streamer.exe-H127.0.0.1:8020二.vue3代码如下videoimport'@/assets/adapter.min.js';import'@/assets/webrtcstre
嵌入式WebRTC压缩至670K，目标将so动态库压缩至500K，.a静态库还可以更小 xiejiashu WEBRTC webrtc 嵌入式WebRTC WebRTC嵌入式 EasyRTC WebRTC设备
最近把EasyRTC的效果发布出去给各大IPC厂商体验了一下，直接就用EasyRTC与各个厂商的负责人进行的通话，在通话中，用户就反馈效果确实不错！这两天有用户要在海思hi3516cv610上使用EasyRTC，工具链是：gcc-20240318-arm-v01c02-linux-musleabi，我们编译了一下，编译出来的版本大小在670K左右：在我们还没对编译进行压缩优化的情况下，嵌入式的静态
WebRTC与EasyRTC:开启智能硬件音视频通讯的全新旅程 EasyNVR EasyRTC webrtc 智能硬件音视频网络视频监控安全
在当今数字化时代，音视频通讯技术正以前所未有的速度革新着我们的生活与工作方式。WebRTC与EasyRTC作为这一领域的佼佼者，正携手为智能硬件的音视频通讯注入强大动力，开启全新的篇章。一、WebRTC与智能硬件融合的崭新趋势WebRTC技术，凭借其无需插件或额外软件即可实现点对点实时通信的卓越优势，已然成为视频会议、语音通话及文件共享等众多领域的宠儿。如今，它更是泛娱乐直播、在线教育及远程会议等
Ubuntu 编译和运行ZLMediaKit 深度视觉机器 Ubuntu20 ubuntu linux 运维
本文描述了如何在Ubuntu上构建ZLMediaKIt项目源码，以及如何体验其WebRTC推流和播放功能。实验环境操作系统版本：Ubuntu22.04.3LTSgcc版本：11.4.0g++版本：11.4.0依赖库安装#让ZLMediaKit媒体服务器具备WebRTC流转发功能的必备依赖包sudoaptinstallcmakepkg-configgccg++libssl-devlibsrtp2-d
webrtc报错：Unable to load:src/third_party/usrsctp/BUILD.gn 、、、、南山小雨、、、、 webrtc webrtc 音视频 ffmpeg
gngenout/linux--args=‘is_debug=truetarget_os=“linux”target_cpu=“x64”is_clang=falsetreat_warnings_as_errors=falsertc_include_tests=falsertc_use_h264=trueis_component_build=falseuse_custom_libcxx=falser
webrtc android sdk,Android 龗帅 webrtc android sdk
AndroidPrebuiltlibrariesTheeasiestwaytogetstartedisusingtheofficialprebuiltlibrariesavailableatJCenter.Theselibrariesarecompiledfromthetip-of-treeandaremeantfordevelopmentpurposesonly.OnAndroidStudio3
Webrtc源码编译会头痛的可达鸭 WebRTC webrtc 音视频
FGYP_GENERATORSGoogle官方文档参考：Development|WebRTC安装VSVS必须安装在C盘，否则需要修改很多文件1、DesktopdevelopmentwithC++2、MFCandATLsupport3、Windows10SDK选择版本：10.0.19041.0WindowsSoftwareDevelopmentKit，下载后安装上4、通过卸载面板，选择windows
webrtc android native code编译，只生成libwebrtc.a wiki分享 webrtc webrtc android
1.下载depottoolsgitclonehttps://chromium.googlesource.com/chromium/tools/depot_tools.gitdepot_tools添加到PATHexportPATH=$PATH:/pathtodepot_tools2.下载webrtcandroid源码mkdirwebrtc_androidcdwebrtc_androidfetch--
使用Python运行SRSPlayer WebRTC WaJulia python webrtc 开发语言
SRSPlayerWebRTC是一个基于Python的开源项目，它提供了在Web浏览器中使用WebRTC进行实时音频和视频通信的功能。在本文中，我们将介绍如何使用Python运行SRSPlayerWebRTC，并提供相应的源代码。SRSPlayerWebRTC依赖于Python的Web框架和WebRTC库，因此在开始之前，我们需要确保这些依赖已经安装在我们的系统中。我们将使用Flask作为Web框
实现使用Python和OpenCV播放RTMP视频流媒体的WebRTC FdviAutoit python opencv 媒体 WebRTC
WebRTC（Web实时通信）是一种用于浏览器之间实时通信的开放标准。它可以在Web浏览器中实现音频、视频和数据的传输，为实时通信应用程序提供了强大的功能。本文将介绍如何使用Python和OpenCV库来实现通过RTMP（实时媒体传输协议）播放视频流，并结合WebRTC实现实时的视频通信。在开始之前，确保你已经安装了Python和相应的库。你可以使用pip命令来安装OpenCV和其他所需的库：pi
OrangePi香橙派usb摄像头+srs服务+ffmpeg+h264解码 Fatfish_treeFans ffmpeg h.264 linux webrtc 视频编解码
1、香橙派用的zero2，debian系统，usb免驱摄像头usb摄像头默认地址：/dev/video0查看usb是否识别：lsusb。查看可以用：ls/dev/video*,查看所有设备。2、安装srs服务，并推流webrtc参考：Build|SRS下载镜像到当前命令行路径：gitclone-bdevelophttps://gitee.com/ossrs/srs.git编译，注意需要切换到srs
Licode简介及与SRS对比 Ryan-S webrtc Licode
Licode是一个开源的WebRTC通信框架，专注于多人实时音视频互动（如视频会议），而SRS是一个通用的流媒体服务器，支持直播、低延迟流分发等场景。以下是两者的详细对比和Licode的核心解析：一、Licode核心解析1.定位与设计目标核心功能：基于WebRTC的多人实时音视频通信（SFU/MCU混合架构）。设计思想：分布式架构：支持横向扩展（多个节点组成集群）。房间管理：以“房间（Room）”
pyannote 语音活动检测/说话者变化检测/语音重叠检测 wx:pjcoder python-语音 pytorch 自然语言处理
人机语音交互人机语音交互的关键点一是唤醒词，之后就是语音活动检测，最后一步要解决“鸡尾酒会效应”。我正在探索语音活动检测的解决方案，遇到了这个工具包于是试了一下。这个项目是基于PyTorch的，与webrtcvad有着天壤之别,在嘈杂环境下解决语音活动检测还是得靠神经网络,而webrtcvad在嘈杂状态下是无法工作的，感兴趣的同学可以看一下，或许你们有更好的解决方案。webrtcvad#-*-co
Spring中@Value注解，需要注意的地方无量 spring bean @Value xml
Spring 3以后,支持@Value注解的方式获取properties文件中的配置值，简化了读取配置文件的复杂操作 1、在applicationContext.xml文件(或引用文件中)中配置properties文件 <bean id="appProperty" class="org.springframework.beans.fac
mongoDB 分片开窍的石头 mongodb
mongoDB的分片。要mongos查询数据时候先查询configsvr看数据在那台shard上，configsvr上边放的是metar信息，指的是那条数据在那个片上。由此可以看出mongo在做分片的时候咱们至少要有一个configsvr,和两个以上的shard（片）信息。第一步启动两台以上的mongo服务 &nb
OVER(PARTITION BY)函数用法 0624chenhong oracle
这篇写得很好，引自 http://www.cnblogs.com/lanzi/archive/2010/10/26/1861338.html OVER(PARTITION BY)函数用法 2010年10月26日 OVER(PARTITION BY)函数介绍开窗函数 &nb
Android开发中，ADB server didn't ACK 解决方法一炮送你回车库 Android开发
首先通知：凡是安装360、豌豆荚、腾讯管家的全部卸载，然后再尝试。一直没搞明白这个问题咋出现的，但今天看到一个方法，搞定了！原来是豌豆荚占用了 5037 端口导致。参见原文章：一个豌豆荚引发的血案——关于ADB server didn't ACK的问题简单来讲，首先将Windows任务进程中的豌豆荚干掉，如果还是不行，再继续按下列步骤排查。 &nb
canvas中的像素绘制问题换个号韩国红果果 JavaScript canvas
pixl的绘制，1.如果绘制点正处于相邻像素交叉线，绘制x像素的线宽，则从交叉线分别向前向后绘制x/2个像素，如果x/2是整数，则刚好填满x个像素，如果是小数，则先把整数格填满，再去绘制剩下的小数部分，绘制时，是将小数部分的颜色用来除以一个像素的宽度，颜色会变淡。所以要用整数坐标来画的话（即绘制点正处于相邻像素交叉线时），线宽必须是2的整数倍。否则会出现不饱满的像素。 2.如果绘制点为一个像素的
编码乱码问题灵静志远 java jvm jsp 编码
1、JVM中单个字符占用的字节长度跟编码方式有关，而默认编码方式又跟平台是一一对应的或说平台决定了默认字符编码方式；2、对于单个字符：ISO-8859-1单字节编码，GBK双字节编码，UTF-8三字节编码；因此中文平台(中文平台默认字符集编码GBK)下一个中文字符占2个字节，而英文平台(英文平台默认字符集编码Cp1252(类似于ISO-8859-1))。 3、getBytes()、getByte
java 求几个月后的日期 darkranger calendar getinstance
Date plandate = planDate.toDate(); SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd"); Calendar cal = Calendar.getInstance(); cal.setTime(plandate); // 取得三个月后时间 cal.add(Calendar.M
数据库设计的三大范式（通俗易懂） aijuans 数据库复习
关系数据库中的关系必须满足一定的要求。满足不同程度要求的为不同范式。数据库的设计范式是数据库设计所需要满足的规范。只有理解数据库的设计范式，才能设计出高效率、优雅的数据库，否则可能会设计出错误的数据库. 目前，主要有六种范式：第一范式、第二范式、第三范式、BC范式、第四范式和第五范式。满足最低要求的叫第一范式，简称1NF。在第一范式基础上进一步满足一些要求的为第二范式，简称2NF。其余依此类推。
想学工作流怎么入手 atongyeye jbpm
工作流在工作中变得越来越重要，很多朋友想学工作流却不知如何入手。很多朋友习惯性的这看一点，那了解一点，既不系统，也容易半途而废。好比学武功，最好的办法是有一本武功秘籍。研究明白，则犹如打通任督二脉。系统学习工作流，很重要的一本书《JBPM工作流开发指南》。本人苦苦学习两个月，基本上可以解决大部分流程问题。整理一下学习思路，有兴趣的朋友可以参考下。 1 首先要
Context和SQLiteOpenHelper创建数据库百合不是茶 android Context创建数据库
一直以为安卓数据库的创建就是使用SQLiteOpenHelper创建,但是最近在android的一本书上看到了Context也可以创建数据库,下面我们一起分析这两种方式创建数据库的方式和区别,重点在SQLiteOpenHelper 一:SQLiteOpenHelper创建数据库: 1,SQLi
浅谈group by和distinct bijian1013 oracle 数据库 group by distinct
group by和distinct只了去重意义一样，但是group by应用范围更广泛些，如分组汇总或者从聚合函数里筛选数据等。譬如：统计每id数并且只显示数大于3 select id ,count(id) from ta
vi opertion 征客丶 mac opration vi
进入 command mode （命令行模式）按 esc 键再按 shift + 冒号注：以下命令中带 $ 【在命令行模式下进行】，不带 $ 【在非命令行模式下进行】一、文件操作 1.1、强制退出不保存 $ q! 1.2、保存 $ w 1.3、保存并退出 $ wq 1.4、刷新或重新加载已打开的文件 $ e 二、光标移动 2.1、跳到指定行数字
【Spark十四】深入Spark RDD第三部分RDD基本API bit1129 spark
对于K/V类型的RDD,如下操作是什么含义？ val rdd = sc.parallelize(List(("A",3),("C",6),("A",1),("B",5)) rdd.reduceByKey(_+_).collect reduceByKey在这里的操作，是把
java类加载机制 BlueSkator java 虚拟机
java类加载机制 1.java类加载器的树状结构引导类加载器 ^ | 扩展类加载器 ^ | 系统类加载器 java使用代理模式来完成类加载，java的类加载器也有类似于继承的关系，引导类是最顶层的加载器，它是所有类的根加载器，它负责加载java核心库。当一个类加载器接到装载类到虚拟机的请求时，通常会代理给父类加载器，若已经是根加载器了，就自己完成加载。虚拟机区分一个Cla
动态添加文本框 BreakingBad 文本框
<script> var num=1; function AddInput() { var str=""; str+="<input
读《研磨设计模式》-代码笔记-单例模式 bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ public class Singleton { } /* * 懒汉模式。注意，getInstance如果在多线程环境中调用，需要加上synchronized，否则存在线程不安全问题 */ class LazySingleton
iOS应用打包发布常见问题 chenhbc ios iOS发布 iOS上传 iOS打包
这个月公司安排我一个人做iOS客户端开发，由于急着用，我先发布一个版本，由于第一次发布iOS应用，期间出了不少问题，记录于此。 1、使用Application Loader 发布时报错：Communication error.please use diagnostic mode to check connectivity.you need to have outbound acc
工作流复杂拓扑结构处理新思路 comsci 设计模式工作算法企业应用 OO
我们走的设计路线和国外的产品不太一样，不一样在哪里呢？国外的流程的设计思路是通过事先定义一整套规则(类似XPDL)来约束和控制流程图的复杂度(我对国外的产品了解不够多，仅仅是在有限的了解程度上面提出这样的看法)，从而避免在流程引擎中处理这些复杂的图的问题，而我们却没有通过事先定义这样的复杂的规则来约束和降低用户自定义流程图的灵活性，这样一来，在引擎和流程流转控制这一个层面就会遇到很
oracle 11g新特性Flashback data archive daizj oracle
1. 什么是flashback data archive Flashback data archive是oracle 11g中引入的一个新特性。Flashback archive是一个新的数据库对象，用于存储一个或多表的历史数据。Flashback archive是一个逻辑对象，概念上类似于表空间。实际上flashback archive可以看作是存储一个或多个表的所有事务变化的逻辑空间。
多叉树:2-3-4树 dieslrae 树
平衡树多叉树,每个节点最多有4个子节点和3个数据项,2,3,4的含义是指一个节点可能含有的子节点的个数,效率比红黑树稍差.一般不允许出现重复关键字值.2-3-4树有以下特征: 1、有一个数据项的节点总是有2个子节点(称为2-节点) 2、有两个数据项的节点总是有3个子节点(称为3-节
C语言学习七动态分配 malloc的使用 dcj3sjt126com c language malloc
/* 2013年3月15日15:16:24 malloc 就memory(内存) allocate(分配)的缩写本程序没有实际含义，只是理解使用 */ # include <stdio.h> # include <malloc.h> int main(void) { int i = 5; //分配了4个字节静态分配 int * p
Objective-C编码规范[译] dcj3sjt126com 代码规范
原文链接 : The official raywenderlich.com Objective-C style guide 原文作者 : raywenderlich.com Team 译文出自 : raywenderlich.com Objective-C编码规范译者 : Sam Lau
0.性能优化-目录 frank1234 性能优化
从今天开始笔者陆续发表一些性能测试相关的文章，主要是对自己前段时间学习的总结，由于水平有限，性能测试领域很深，本人理解的也比较浅，欢迎各位大咖批评指正。主要内容包括：一、性能测试指标吞吐量、TPS、响应时间、负载、可扩展性、PV、思考时间 http://frank1234.iteye.com/blog/2180305 二、性能测试策略生产环境相同基准测试预热等 htt
Java父类取得子类传递的泛型参数Class类型 happyqing java 泛型父类子类 Class
import java.lang.reflect.ParameterizedType; import java.lang.reflect.Type; import org.junit.Test; abstract class BaseDao<T> { public void getType() { //Class<E> clazz =
跟我学SpringMVC目录汇总贴、PDF下载、源码下载 jinnianshilongnian springMVC
----广告-------------------------------------------------------------- 网站核心商详页开发掌握Java技术，掌握并发/异步工具使用，熟悉spring、ibatis框架；掌握数据库技术，表设计和索引优化，分库分表/读写分离；了解缓存技术，熟练使用如Redis/Memcached等主流技术；了解Ngin
the HTTP rewrite module requires the PCRE library 流浪鱼 rewrite
./configure: error: the HTTP rewrite module requires the PCRE library. 模块依赖性Nginx需要依赖下面3个包 1. gzip 模块需要 zlib 库 ( 下载: http://www.zlib.net/ ) 2. rewrite 模块需要 pcre 库 ( 下载: http://www.pcre.org/ ) 3. s
第12章 Ajax（中） onestopweb Ajax
index.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/
Optimize query with Query Stripping in Web Intelligence blueoxygen BO
http://wiki.sdn.sap.com/wiki/display/BOBJ/Optimize+query+with+Query+Stripping+in+Web+Intelligence and a very straightfoward video http://www.sdn.sap.com/irj/scn/events?rid=/library/uuid/40ec3a0c-936
Java开发者写SQL时常犯的10个错误 tomcat_oracle java sql
1、不用PreparedStatements 　　有意思的是，在JDBC出现了许多年后的今天，这个错误依然出现在博客、论坛和邮件列表中，即便要记住和理解它是一件很简单的事。开发者不使用PreparedStatements的原因可能有如下几个：　　他们对PreparedStatements不了解　　他们认为使用PreparedStatements太慢了　　他们认为写Prepar
世纪互联与结盟有感阿尔萨斯
10月10日，世纪互联与（Foxcon）签约成立合资公司，有感。全球电子制造业巨头（全球500强企业）与世纪互联共同看好IDC、云计算等业务在中国的增长空间，双方迅速果断出手，在资本层面上达成合作，此举体现了全球电子制造业巨头对世纪互联IDC业务的欣赏与信任，另一方面反映出世纪互联目前良好的运营状况与广阔的发展前景。众所周知，精于电子产品制造（世界第一），对于世纪互联而言，能够与结盟