mp4dash生成的MPD文件中的Bandwidth取值及其对客户端码率选择的影响

现象

在DSAH视频系统(服务器&播放器)搭建的过程中,发现使用MP4Boxmp4dash生成的MPD文件里视频码率(即bandwidth)不一样。

ffmpeg输出的各分辨率视频码率(kbps)为:

  • 1920x1080(1080p):3988.49
  • 1280x720(720p):1983.08
  • 896x504(480p):1131.42
  • 640x360(360p):676.67
  • 256x144(144p):147.76

MP4Box生成的MPD文件中各分辨率视频码率(kbps)为:

  • 1920x1080(1080p):3988.497
  • 1280x720(720p):1983.089
  • 896x504(480p):1131.432
  • 640x360(360p):676.676
  • 256x144(144p):147.765

可以看出,MP4Boxffmpeg编码输出的视频码率基本一致(详见Ubuntu下GPAC(MP4Box)的安装 | 基于MP4Box搭建DASH视频系统)。

mp4dash生成的MPD文件中各分辨率视频码率(kbps)为:

  • 1920x1080(1080p):16079.970
  • 1280x720(720p):7753.362
  • 896x504(480p):4310.870
  • 640x360(360p):2391.544
  • 256x144(144p):402.408

mp4dash生成的MPD文件中的码率全部都比之前的码率要高,大概是后者的2~4倍。


原因

幸运的是,在Bento4的Github项目上,也有人发现了类似的问题,并且得到了项目作者的解答,参考:

  • mp4dash: Definition of required bandwidth
  • How Bandwidth is calculated and written in Manifest file
  • HLS fmp4 “BANDWIDTH” calculation

2016.08.11:

The required bandwidth calculation is somewhat complicated. What this value represents is the bandwidth value for which, if the throughput remains constant as that value there should never be an underflow situation. The client is only required to buffer @minBufferTime worth of data. In theory, a precise calculation for this would require looking at every frame, and taking the possible frame reordering into account. But the current method isn’t quite that complicated. It looks at the minBufferTime value and the individual segment sizes, and finds a value for which the client buffer would never go empty. This is better than just taking the average segment bitrate (which would be wrong, since there are often peaks), but not quite as precise as looking at individual frames.

2017.06.29:

The reason you are seeing a different value for the bandwidth in the MPD and from ffmpeg/mediainfo is because the MPD value is computed in a different way, in order to comply with the specification. The bandwidth reported by ffmpeg or mediainfo is the average bandwidth for the stream (number of bytes divided by duration), whereas for the MPD the calculation is done by segment, and also based on the buffer model, that takes the minBufferTime into account. To be more specific, the value is an indication to the player that if it starts playing after having buffered ‘minBufferTime’, and if the network bandwidth is exactly the value in the MPD, then the buffer will never completely empty. So if your encoder creates segments for which the average bitrate for the segment is higher, or if within a segment you have a higher bitrate at some point of the the segment than other points, you will see the MPD bandwidth value be somewhat different from the stream’s average bandwidth.

2018.01.02:

The bandwidth calculation for DASH streams is actually not that straightforward. The current version of the packager uses method that should be fairly close to what’s expected: it is based on the minBufferTime value for the MPD, and the size of the frames found in the media. The rule is that the bandwidth value should be such that if a player had exactly that constant bandwidth, and respected the minBufferTime value, it would never underflow. So the peak media bandwidth isn’t really a good indicator, nor is the average bandwidth. Bento4 looks at a variable buffer over time, and tries to compute what value of the bandwidth would be able to guarantee never to underflow. If you change the minBufferTime value, the bandwidth calculation will change.

细节不太了解,而且计算规则随着版本迭代好像略有变化,大概意思是mp4dash在计算MPD中的bandwidth时考虑了minBufferTime视频帧的大小

具体的计算规则为:考虑buffer随时间的变化,根据minBufferTime值计算保证buffer不为空的播放器带宽(假设带宽不变)。也就是说,当播放器的带宽恒为MPD中的bandwidth时,在buffer超过minBufferTime以后,buffer永远不会为空。因此,bandwidth既非流的峰值码率,也非其平均码率。

注意到,在mp4dash的选项中有--min-buffer-time=(见mp4dash),默认情况下该值会被自动计算。通过此选项改变默认值,也会影响MPD文件中bandwidth的取值。


One More Thing

不过写到这里我不禁有点好奇,MPD文件中bandwidth的取值到底会对客户端的码率决策有什么样的影响呢?

为了弄清楚这个问题,专门去查看了DASH-IF的Guideline:Guidelines for Implementation: DASH-IF Interoperability Points,里面正好有相关的内容。

首先是关于MPD中minBufferTimebandwidth的描述(P40 3.2.8):

The MPD contains a pair of values for a bandwidth and buffering description, namely the Minimum Buffer Time ( M B T MBT MBT) expressed by the value of MPD@minBufferTime and bandwidth ( B W BW BW) expressed by the value of Representation@bandwidth. The following holds:

  • the value of the minimum buffer time does not provide any instructions to the client on how long to buffer the media. The value however describes how much buffer a client should have under ideal network conditions. As such, M B T MBT MBT is not describing the burstiness or jitter in the network, it is describing the burstiness or jitter in the content encoding. Together with the B W BW BW value, it is a property of the content. Using the “leaky bucket” model, it is the size of the bucket that makes B W BW BW true, given the way the content is encoded.
  • The minimum buffer time provides information that for each Stream Access Point (and in the case of DASH-IF therefore each start of the Media Segment), the property of the stream: If the Representation (starting at any segment) is delivered over a constant bitrate channel with bitrate equal to value of the B W BW BW attribute, then each presentation time P T PT PT is available at the client latest at time with a delay of at most P T + M B T PT + MBT PT+MBT.
  • In the absence of any other guidance, the M B T MBT MBT should be set to the maximum GOP size (coded video sequence) of the content, which quite often is identical to the maximum segment duration for the live profile or the maximum subsegment duration for the On-Demand profile. The M B T MBT MBT may be set to a smaller value than maximum (sub)segment duration, but should not be set to a higher value.

以上内容中有几个值得关注的点:

  • minBufferTime并非指示客户端应缓存多长的视频,而是告诉客户端在理想的网络条件下应保留多少buffer。minBufferTime描述的是编码产生的码率抖动,而非网络抖动。我的理解是,在网络带宽恒定且等于bandwidth时,客户端buffer大小等于minBufferTime时,播放视频不会卡顿。
  • 一般情况下,minBufferTime的值应小于等于最大segment(直播)或subsegment(点播)时长

之后,关于客户端如何选择合适的视频码率级别(P41 3.2.8):

A DASH client decides downloading the next segment based on the following status information:

  • the currently available buffer in the media pipeline, b u f f e r buffer buffer
  • the currently estimated download rate, r a t e rate rate
  • the value of the attribute @minBufferTime, M B T MBT MBT
  • the set of values of the @bandwidth attribute for each Representation i i i, B W [ i ] BW[i] BW[i]

The task of the client is to select a suitable Representation i i i.
The relevant issue is that starting from a SAP (Stream Access Point) on, the DASH client can continue to playout the data. This means that at the current time it does have b u f f e r buffer buffer data in the buffer. Based on this model the client can download a Representation i i i for which B W [ i ] ≤ r a t e ∗ b u f f e r / M B T BW[i] ≤ rate*buffer/MBT BW[i]ratebuffer/MBT without emptying the buffer.

注意,这里提到了客户端选择码率级别(Representation)的公式,即:
B W [ i ] ≤ r a t e ∗ b u f f e r / M B T (1) \tag{1} BW[i] ≤ rate*buffer/MBT BW[i]ratebuffer/MBT(1)

怎么理解这个式子呢?我们先来对这个式子做个变形:
B W [ i ] r a t e ≤ b u f f e r M B T (2) \tag{2} \frac{BW[i]}{rate} ≤ \frac{buffer}{MBT} rateBW[i]MBTbuffer(2)

这里有三种情况:
1 ≤ B W [ i ] r a t e ≤ b u f f e r M B T (3) \tag{3} 1 ≤ \frac{BW[i]}{rate} ≤ \frac{buffer}{MBT} 1rateBW[i]MBTbuffer(3)

B W [ i ] r a t e ≤ 1 ≤ b u f f e r M B T (4) \tag{4} \frac{BW[i]}{rate} ≤ 1 ≤ \frac{buffer}{MBT} rateBW[i]1MBTbuffer(4)

B W [ i ] r a t e ≤ b u f f e r M B T ≤ 1 (5) \tag{5} \frac{BW[i]}{rate} ≤ \frac{buffer}{MBT} ≤ 1 rateBW[i]MBTbuffer1(5)

为了便于分析,从式 ( 4 ) (4) (4)入手,有:
b u f f e r ≥ M B T , r a t e ≥ B W [ i ] (4) \tag{4} buffer ≥ MBT, rate ≥ BW[i] bufferMBT,rateBW[i](4)

这个就好理解了:当客户端buffer不低于 M B T MBT MBT且带宽不低于 B W [ i ] BW[i] BW[i]时,选择 B W [ i ] BW[i] BW[i]对应的Representation i i i

我们知道,客户端选择码率的目标是避免卡顿的基础上尽可能选择更高的视频码率1(即 B W BW BW更高的Representation)。而避免卡顿最保险的情况,就是 ( 4 ) (4) (4)所对应的情形:客户端buffer中缓存的视频不会因为视频段码率的抖动而耗尽,网络带宽也足够在视频段的播放时长内完成新视频段的下载,此时的码率选择绝对不会引起卡顿。

分析完这个特殊情况后,让我们再看回式 ( 3 ) (3) (3)、式 ( 5 ) (5) (5)。利用同样的思路,可以看出,式 ( 3 ) (3) (3)中有 b u f f e r ≥ M B T , r a t e ≤ B W [ i ] buffer ≥ MBT, rate ≤ BW[i] bufferMBT,rateBW[i],式 ( 5 ) (5) (5)中有 b u f f e r ≤ M B T , r a t e ≥ B W [ i ] buffer ≤ MBT, rate ≥ BW[i] bufferMBT,rateBW[i]。两者的共同之处在于,既存在避免卡顿的保险因素,也存在引起卡顿的风险因素。那么如何确保码率选择不会引起卡顿呢?答案就是让避免卡顿的保险程度大于引起卡顿的风险程度。也就是说,这里有客户端buffer和网络带宽两个因素,当buffer较少(或带宽较低)时,带宽必须足够地高(或buffer必须足够地多),才能抵消buffer较少(或带宽较低)引起卡顿的风险,保证不卡顿。这也就是式 ( 1 ) (1) (1)表达的完整含义。

最后,回到我们的问题上来:MPD文件中的bandwidth如何影响客户端码率选择?经过以上分析,我们可以看出,如果在MPD文件中提高了bandwidth值,则客户端会倾向于选择真实码率更低的Representation,从而更容易避免卡顿,但同时也会导致更低的视频质量。总得来说,mp4dash的这种做法有利有弊。


  1. 简便起见,这里认为视频质量与视频码率呈正相关。实际并非如此,因为码率的提高对于质量的提高存在边际效应。 ↩︎

你可能感兴趣的:(#,DASH,工具)