NALU 头由一个字节组成, 它的语法如下:
+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
|F|NRI| Type |
+---------------+
F: 1 个比特.
forbidden_zero_bit. 在 H.264 规范中规定了这一位必须为 0.
NRI: 2 个比特.
nal_ref_idc. 取 00 ~ 11, 似乎指示这个 NALU 的重要性, 如 00 的 NALU 解码器可以丢弃它而不影响图像的回放. 不过一般情况下不太关心
这个属性.
Type: 5 个比特.
nal_unit_type. 这个 NALU 单元的类型. 简述如下:
0 没有定义
1-23 NAL单元 单个 NAL 单元包.
24 STAP-A 单一时间的组合包
24 STAP-B 单一时间的组合包
26 MTAP16 多个时间的组合包
27 MTAP24 多个时间的组合包
28 FU-A 分片的单元
29 FU-B 分片的单元
30-31 没有定义
2. 打包模式
下面是 RFC 3550 中规定的 RTP 头的结构.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC |M| PT | sequence number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| synchronization source (SSRC) identifier |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| contributing source (CSRC) identifiers |
| .... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
负载类型 Payload type (PT): 7 bits
序列号 Sequence number (SN): 16 bits
时间戳 Timestamp: 32 bits
H.264 Payload 格式定义了三种不同的基本的负载(Payload)结构. 接收端可能通过 RTP Payload
的第一个字节来识别它们. 这一个字节类似 NALU 头的格式, 而这个头结构的 NAL 单元类型字段
则指出了代表的是哪一种结构,
这个字节的结构如下, 可以看出它和 H.264 的 NALU 头结构是一样的.
+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
|F|NRI| Type |
+---------------+
字段 Type: 这个 RTP payload 中 NAL 单元的类型. 这个字段和 H.264 中类型字段的区别是, 当 type
的值为 24 ~ 31 表示这是一个特别格式的 NAL 单元, 而 H.264 中, 只取 1~23 是有效的值.
24 STAP-A 单一时间的组合包
24 STAP-B 单一时间的组合包
26 MTAP16 多个时间的组合包
27 MTAP24 多个时间的组合包
28 FU-A 分片的单元
29 FU-B 分片的单元
30-31 没有定义
可能的结构类型分别有:
1. 单一 NAL 单元模式
即一个 RTP 包仅由一个完整的 NALU 组成. 这种情况下 RTP NAL 头类型字段和原始的 H.264的
NALU 头类型字段是一样的.
2. 组合封包模式
即可能是由多个 NAL 单元组成一个 RTP 包. 分别有4种组合方式: STAP-A, STAP-B, MTAP16, MTAP24.
那么这里的类型值分别是 24, 25, 26 以及 27.
3. 分片封包模式
用于把一个 NALU 单元封装成多个 RTP 包. 存在两种类型 FU-A 和 FU-B. 类型值分别是 28 和 29.
2.1 单一 NAL 单元模式
对于 NALU 的长度小于 MTU 大小的包, 一般采用单一 NAL 单元模式.
对于一个原始的 H.264 NALU 单元常由 [Start Code] [NALU Header] [NALU Payload] 三部分组成, 其中 Start Code 用于标示这是一个
NALU 单元的开始, 必须是 "00 00 00 01" 或 "00 00 01", NALU 头仅一个字节, 其后都是 NALU 单元内容.
打包时去除 "00 00 01" 或 "00 00 00 01" 的开始码, 把其他数据封包的 RTP 包即可.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|NRI| type | |
+-+-+-+-+-+-+-+-+ |
| |
| Bytes 2..n of a Single NAL unit |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
如有一个 H.264 的 NALU 是这样的:
[00 00 00 01 67 42 A0 1E 23 56 0E 2F ... ]
这是一个序列参数集 NAL 单元. [00 00 00 01] 是四个字节的开始码, 67 是 NALU 头, 42 开始的数据是 NALU 内容.
封装成 RTP 包将如下:
[ RTP Header ] [ 67 42 A0 1E 23 56 0E 2F ]
即只要去掉 4 个字节的开始码就可以了.
2.2 组合封包模式
其次, 当 NALU 的长度特别小时, 可以把几个 NALU 单元封在一个 RTP 包中.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RTP Header |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|STAP-A NAL HDR | NALU 1 Size | NALU 1 HDR |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NALU 1 Data |
: :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | NALU 2 Size | NALU 2 HDR |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NALU 2 Data |
: :
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2.3 Fragmentation Units (FUs).
而当 NALU 的长度超过 MTU 时, 就必须对 NALU 单元进行分片封包. 也称为 Fragmentation Units (FUs).
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| FU indicator | FU header | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| |
| FU payload |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 14. RTP payload format for FU-A
The FU indicator octet has the following format:
+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
|F|NRI| Type |
+---------------+
The FU header has the following format:
+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
|S|E|R| Type |
+---------------+
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
header |V=2|P| RC | PT=SR=200 | length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SSRC of sender |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
sender | NTP timestamp, most significant word |
info +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NTP timestamp, least significant word |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RTP timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| sender's packet count |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| sender's octet count |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
report | SSRC_1 (SSRC of first source) |
block +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1 | fraction lost | cumulative number of packets lost |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| extended highest sequence number received |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| interarrival jitter |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| last SR (LSR) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| delay since last SR (DLSR) |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
report | SSRC_2 (SSRC of second source) |
block +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2 : ... :
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| profile-specific extensions |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The sender report packet consists of three sections, possibly
followed by a fourth profile-specific extension section if defined.
The first section, the header, is 8 octets long. The fields have the
following meaning:
version (V): 2 bits
Identifies the version of RTP, which is the same in RTCP packets
as in RTP data packets. The version defined by this specification
is two (2).
padding (P): 1 bit
If the padding bit is set, this individual RTCP packet contains
some additional padding octets at the end which are not part of
the control information but are included in the length field. The
last octet of the padding is a count of how many padding octets
should be ignored, including itself (it will be a multiple of
four). Padding may be needed by some encryption algorithms with
fixed block sizes. In a compound RTCP packet, padding is only
required on one individual packet because the compound packet is
encrypted as a whole for the method in Section 9.1. Thus, padding
MUST only be added to the last individual packet, and if padding
is added to that packet, the padding bit MUST be set only on that
packet. This convention aids the header validity checks described
in Appendix A.2 and allows detection of packets from some early
implementations that incorrectly set the padding bit on the first
individual packet and add padding to the last individual packet.
reception report count (RC): 5 bits
The number of reception report blocks contained in this packet. A
value of zero is valid.
packet type (PT): 8 bits
Contains the constant 200 to identify this as an RTCP SR packet.
length: 16 bits
The length of this RTCP packet in 32-bit words minus one,
including the header and any padding. (The offset of one makes
zero a valid length and avoids a possible infinite loop in
scanning a compound RTCP packet, while counting 32-bit words
avoids a validity check for a multiple of 4.)
SSRC: 32 bits
The synchronization source identifier for the originator of this
SR packet.
The second section, the sender information, is 20 octets long and is
present in every sender report packet. It summarizes the data
transmissions from this sender. The fields have the following
meaning:
NTP timestamp: 64 bits
Indicates the wallclock time (see Section 4) when this report was
sent so that it may be used in combination with timestamps
returned in reception reports from other receivers to measure
round-trip propagation to those receivers. Receivers should
expect that the measurement accuracy of the timestamp may be
limited to far less than the resolution of the NTP timestamp. The
measurement uncertainty of the timestamp is not indicated as it
may not be known. On a system that has no notion of wallclock
time but does have some system-specific clock such as "system
uptime", a sender MAY use that clock as a reference to calculate
relative NTP timestamps. It is important to choose a commonly
used clock so that if separate implementations are used to produce
the individual streams of a multimedia session, all
implementations will use the same clock. Until the year 2036,
relative and absolute timestamps will differ in the high bit so
(invalid) comparisons will show a large difference; by then one
hopes relative timestamps will no longer be needed. A sender that
has no notion of wallclock or elapsed time MAY set the NTP
timestamp to zero.
RTP timestamp: 32 bits
Corresponds to the same time as the NTP timestamp (above), but in
the same units and with the same random offset as the RTP
timestamps in data packets. This correspondence may be used for
intra- and inter-media synchronization for sources whose NTP
timestamps are synchronized, and may be used by media-independent
receivers to estimate the nominal RTP clock frequency. Note that
in most cases this timestamp will not be equal to the RTP
timestamp in any adjacent data packet. Rather, it MUST be
calculated from the corresponding NTP timestamp using the
relationship between the RTP timestamp counter and real time as
maintained by periodically checking the wallclock time at a
sampling instant.
sender's packet count: 32 bits
The total number of RTP data packets transmitted by the sender
since starting transmission up until the time this SR packet was
generated. The count SHOULD be reset if the sender changes its
SSRC identifier.
sender's octet count: 32 bits
The total number of payload octets (i.e., not including header or
padding) transmitted in RTP data packets by the sender since
starting transmission up until the time this SR packet was
generated. The count SHOULD be reset if the sender changes its
SSRC identifier. This field can be used to estimate the average
payload data rate.
The third section contains zero or more reception report blocks
depending on the number of other sources heard by this sender since
the last report. Each reception report block conveys statistics on
the reception of RTP packets from a single synchronization source.
Receivers SHOULD NOT carry over statistics when a source changes its
SSRC identifier due to a collision. These statistics are:
SSRC_n (source identifier): 32 bits
The SSRC identifier of the source to which the information in this
reception report block pertains.
fraction lost: 8 bits
The fraction of RTP data packets from source SSRC_n lost since the
previous SR or RR packet was sent, expressed as a fixed point
number with the binary point at the left edge of the field. (That
is equivalent to taking the integer part after multiplying the
loss fraction by 256.) This fraction is defined to be the number
of packets lost divided by the number of packets expected, as
defined in the next paragraph. An implementation is shown in
Appendix A.3. If the loss is negative due to duplicates, the
fraction lost is set to zero. Note that a receiver cannot tell
whether any packets were lost after the last one received, and
that there will be no reception report block issued for a source
if all packets from that source sent during the last reporting
interval have been lost.
cumulative number of packets lost: 24 bits
The total number of RTP data packets from source SSRC_n that have
been lost since the beginning of reception. This number is
defined to be the number of packets expected less the number of
packets actually received, where the number of packets received
includes any which are late or duplicates. Thus, packets that
arrive late are not counted as lost, and the loss may be negative
if there are duplicates. The number of packets expected is
defined to be the extended last sequence number received, as
defined next, less the initial sequence number received. This may
be calculated as shown in Appendix A.3.
extended highest sequence number received: 32 bits
The low 16 bits contain the highest sequence number received in an
RTP data packet from source SSRC_n, and the most significant 16
bits extend that sequence number with the corresponding count of
sequence number cycles, which may be maintained according to the
algorithm in Appendix A.1. Note that different receivers within
the same session will generate different extensions to the
sequence number if their start times differ significantly.
interarrival jitter: 32 bits
An estimate of the statistical variance of the RTP data packet
interarrival time, measured in timestamp units and expressed as an
unsigned integer. The interarrival jitter J is defined to be the
mean deviation (smoothed absolute value) of the difference D in
packet spacing at the receiver compared to the sender for a pair
of packets. As shown in the equation below, this is equivalent to
the difference in the "relative transit time" for the two packets;
the relative transit time is the difference between a packet's RTP
timestamp and the receiver's clock at the time of arrival,
measured in the same units.
If Si is the RTP timestamp from packet i, and Ri is the time of
arrival in RTP timestamp units for packet i, then for two packets
i and j, D may be expressed as
D(i,j) = (Rj - Ri) - (Sj - Si) = (Rj - Sj) - (Ri - Si)
The interarrival jitter SHOULD be calculated continuously as each
data packet i is received from source SSRC_n, using this
difference D for that packet and the previous packet i-1 in order
of arrival (not necessarily in sequence), according to the formula
J(i) = J(i-1) + (|D(i-1,i)| - J(i-1))/16
Whenever a reception report is issued, the current value of J is
sampled.
The jitter calculation MUST conform to the formula specified here
in order to allow profile-independent monitors to make valid
interpretations of reports coming from different implementations.
This algorithm is the optimal first-order estimator and the gain
parameter 1/16 gives a good noise reduction ratio while
maintaining a reasonable rate of convergence [22]. A sample
implementation is shown in Appendix A.8. See Section 6.4.4 for a
discussion of the effects of varying packet duration and delay
before transmission.
last SR timestamp (LSR): 32 bits
The middle 32 bits out of 64 in the NTP timestamp (as explained in
Section 4) received as part of the most recent RTCP sender report
(SR) packet from source SSRC_n. If no SR has been received yet,
the field is set to zero.
delay since last SR (DLSR): 32 bits
The delay, expressed in units of 1/65536 seconds, between
receiving the last SR packet from source SSRC_n and sending this
reception report block. If no SR packet has been received yet
from SSRC_n, the DLSR field is set to zero.
Let SSRC_r denote the receiver issuing this receiver report.
Source SSRC_n can compute the round-trip propagation delay to
SSRC_r by recording the time A when this reception report block is
received. It calculates the total round-trip time A-LSR using the
last SR timestamp (LSR) field, and then subtracting this field to
leave the round-trip propagation delay as (A - LSR - DLSR). This
is illustrated in Fig. 2. Times are shown in both a hexadecimal
representation of the 32-bit fields and the equivalent floating-
point decimal representation. Colons indicate a 32-bit field
divided into a 16-bit integer part and 16-bit fraction part.
This may be used as an approximate measure of distance to cluster
receivers, although some links have very asymmetric delays.
[10 Nov 1995 11:33:25.125 UTC] [10 Nov 1995 11:33:36.5 UTC]
n SR(n) A=b710:8000 (46864.500 s)
---------------------------------------------------------------->
v ^
ntp_sec =0xb44db705 v ^ dlsr=0x0005:4000 ( 5.250s)
ntp_frac=0x20000000 v ^ lsr =0xb705:2000 (46853.125s)
(3024992005.125 s) v ^
r v ^ RR(n)
---------------------------------------------------------------->
|<-DLSR->|
(5.250 s)
A 0xb710:8000 (46864.500 s)
DLSR -0x0005:4000 ( 5.250 s)
LSR -0xb705:2000 (46853.125 s)
-------------------------------
delay 0x0006:2000 ( 6.125 s)