网络模型
拥塞控制算法要解决网络传输中的拥塞问题,并且尽可能的高效的利用网络带宽。基于对网络的研究,在BBR算法中将网络模型简化成如下:
抽象模型:
网络链路相当于管道, 有一个最窄的地方, 当发送带宽超过这个最窄的地方时,管道中会开始有排队,当排队队列超过管道长度时,会发生丢包。
关键概念:
- BtlBw : 瓶颈带宽, 即管道中最窄的地方, 相当于管道中最小的直径。
- RTprop : 管道中没有排队时, 重点是没有排队, 发道一个包在管道中一个来回的时长, 相当于管道的长度。
- BDP(Bandwidth-delay product) : 已经发送了,但还没有收到acks(被称作inflight), 且管道刚好装满。BDP = BtlBw * RTprop. 比如BtlBw = 10, RTprop = 6, BDP = 60相当于,管道中有30, 另有30到达了, 但发送端还没收到这个60的数据的ack, 因为ack需要延迟3个时间单位才能到达。
三个阶段:
- app limited : 发送的数据量很少, inflight的数据量小于BDP, 这时候可以用较大的速率发送数据(可大于BtlBw), 只受RTprop的影响。
- bandwidth limited : 此时管道已满, inflight的数据量开始超过了BDP, 由于瓶颈带宽所在的网络节点存在缓冲区, 继续发送数据, 这个缓冲区开始排队,这个时候发送速率受BtlBw的限制了。
- buffer limited: 当瓶颈缓冲区的队列满了后, 开始出现丢包。
基于丢包率的拥塞控制算法作用在第3个阶段, 显然此时已经较晚了, rtt此时也较高。
Leonard Kleinrock证明最佳的调节点在上图BDP线, 但Jeffrey M. Jaffe证明了在这个点是无法得到解的。
BBR算法由google团队提出 , 作用在第二个阶段, 较靠近BDP线的位置,也就是说它需要有少量的瓶颈缓冲区排队,来检测出BtlBw。
BBR算法
做两件事情:
- 周期性的探测瓶颈带宽BtlBw
如上图, 要探测瓶颈带宽,需要进入第二阶段, 让缓冲区有一些排队, 因此bbr设计了一个pacing_gain, 如pacing_gain_cycle = { 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.25, 0.75}, 周期性的上探BtlBw, 为了快速的清除排队,在上探(1.25)后紧跟一个下降(0.75)在一个rtt完成, 当链路带宽突然增加时,BtlBw是1.25^n的指数增长, 带宽跟随非常即时。
- 周期性的探测RTprop
要探测RTprop,需要进入第一阶段, 因此需要排空缓冲队列, 并且使得管道不是满的状态, 即inflight < BDP。
可见在探测RTprop时,会减少throughtput, RTprop的探测周期相对长一些,一般是10s
发送窗口
发送窗口 =
在探测BtlBw时, 增大cwnd_gain, 让缓冲区有一些排队;
在探测RTprop时, 减少cwnd_gain, 排空缓冲区;
拥塞控制四个阶段
启动阶段, 包括app limited和部分bandwidth limited, tcp拥塞算法称作慢启动。
排空阶段, inflight > BDP, 排空队列, 计算RTprop, tcp拥塞算法称作拥塞避免
瓶颈带宽探测阶段, 增大cwnd_gain, 缓冲区排队, 计算BtlBw, tcp拥塞算法称作拥塞阶段
丢包恢复, 缓冲区队列满了, 链路开始丢包, 减少cwnd_gain, 避免队列满了丢包(BBR算法避免瓶颈节点缓冲队列满而丢包,因此没了这个阶段)
代码分析
基于开源的picoquic代码做分析
启动阶段:
void BBREnterStartup(picoquic_bbr_state_t* bbr_state)
{
bbr_state->state = picoquic_bbr_alg_startup;
/*启动阶段快速提高发送速率*/
bbr_state->pacing_gain = BBR_HIGH_GAIN;
bbr_state->cwnd_gain = BBR_HIGH_GAIN;
}
排空阶段:
void BBRCheckDrain(picoquic_bbr_state_t* bbr_state, uint64_t bytes_in_transit, uint64_t current_time)
{
/*由启动阶段转入排空阶段,inflight达到BDP, 管道满了,缓冲区有排队*/
if (bbr_state->state == picoquic_bbr_alg_startup && bbr_state->filled_pipe) {
BBREnterDrain(bbr_state);
}
/*由排空阶段转入瓶颈带宽探测阶段, inflight <= BDP, 缓存区无排队*/
if (bbr_state->state == picoquic_bbr_alg_drain && bytes_in_transit <= BBRInflight(bbr_state, 1.0)) {
BBREnterProbeBW(bbr_state, current_time); /* we estimate queue is drained */
}
}
void BBREnterDrain(picoquic_bbr_state_t* bbr_state)
{
bbr_state->state = picoquic_bbr_alg_drain;
/*进入排空阶段,减少发送速率*/
bbr_state->pacing_gain = 1.0 / BBR_HIGH_GAIN; /* pace slowly */
bbr_state->cwnd_gain = BBR_HIGH_GAIN; /* maintain cwnd */
}
瓶颈带宽探测阶段
/*判断进入下一个pacing_gain*/
int BBRIsNextCyclePhase(picoquic_bbr_state_t* bbr_state, uint64_t prior_in_flight, uint64_t packets_lost, uint64_t current_time)
{
/*一个循环至少要大于RTprop的时间*/
int is_full_length = (current_time - bbr_state->cycle_stamp) > bbr_state->rt_prop;
if (bbr_state->pacing_gain != 1.0) {
if (bbr_state->pacing_gain > 1.0) {
/*队列满buffer limited阶段或是带宽上探完成*/
is_full_length &=
(packets_lost > 0 ||
prior_in_flight >= BBRInflight(bbr_state, bbr_state->pacing_gain));
}
else { /* (BBR.pacing_gain < 1) */
/*带宽恢复完成*/
is_full_length &= prior_in_flight <= BBRInflight(bbr_state, 1.0);
}
}
return is_full_length;
}
/*进入下一个pacing_gain, pacing_gain_cycle = { 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.25, 0.75}*/
void BBRAdvanceCyclePhase(picoquic_bbr_state_t* bbr_state, uint64_t current_time)
{
bbr_state->cycle_stamp = current_time;
bbr_state->cycle_index++;
/*完成了一个循环*/
if (bbr_state->cycle_index >= BBR_GAIN_CYCLE_LEN) {
int start = bbr_state->cycle_start;
/*瓶颈带宽有增长, 下一个循环更快的上探带宽*/
if (bbr_state->btl_bw_increased) {
bbr_state->btl_bw_increased = 0;
start++;
if (start > BBR_GAIN_CYCLE_MAX_START) {
start = BBR_GAIN_CYCLE_MAX_START;
}
}
else if (start > 0) {
/*瓶颈带宽无增长, 下一个循环逐渐回归原点*/
start--;
}
bbr_state->cycle_index = start;
bbr_state->cycle_start = start;
}
bbr_state->pacing_gain = bbr_pacing_gain_cycle[bbr_state->cycle_index];
}
void BBREnterProbeBW(picoquic_bbr_state_t* bbr_state, uint64_t current_time)
{
int start = 0;
bbr_state->state = picoquic_bbr_alg_probe_bw;
bbr_state->pacing_gain = 1.0;
bbr_state->cwnd_gain = 1.5;
/*开始pacing_gain的循环*/
if (bbr_state->rt_prop > PICOQUIC_TARGET_RENO_RTT) {
start = (int)(bbr_state->rt_prop / PICOQUIC_TARGET_RENO_RTT);
if (start > BBR_GAIN_CYCLE_MAX_START) {
start = BBR_GAIN_CYCLE_MAX_START;
}
}
else {
start = 2;
}
bbr_state->cycle_index = start;
bbr_state->cycle_start = start;
bbr_state->btl_bw_increased = 1;
BBRAdvanceCyclePhase(bbr_state, current_time);
}
/* Track the round count using the "delivered" counter. The value carried per
* packet is the delivered count when this packet was sent. If it is greater
* than next_round_delivered, it means that the packet was sent at or after
* the beginning of the round, and thus that at least one RTT has elapsed
* for this round. */
void BBRUpdateBtlBw(picoquic_bbr_state_t* bbr_state, picoquic_path_t* path_x)
{
uint64_t bandwidth_estimate = path_x->bandwidth_estimate;
/*启动阶段,快速增加带宽*/
if (bbr_state->state == picoquic_bbr_alg_startup &&
bandwidth_estimate < (path_x->max_bandwidth_estimate / 2)) {
bandwidth_estimate = path_x->max_bandwidth_estimate/2;
}
/*每个发送包里会携带当前周期发送了的数据量'delivered', 如果这个包携带的delivered大于等于next_round_delivered,则说明这个包是一个新的发送周的*/
if (path_x->delivered_last_packet >= bbr_state->next_round_delivered)
{
bbr_state->next_round_delivered = path_x->delivered;
bbr_state->round_count++;
bbr_state->round_start = 1;
}
else {
bbr_state->round_start = 0;
}
/*为新的发送周期的带宽值留个空位*/
if (bbr_state->round_start) {
/* Forget the oldest BW round, shift by 1, compute the max BTL_BW for
* the remaining rounds, set current round max to current value */
bbr_state->btl_bw = 0;
for (int i = BBR_BTL_BW_FILTER_LENGTH - 2; i >= 0; i--) {
uint64_t b = bbr_state->btl_bw_filter[i];
bbr_state->btl_bw_filter[i + 1] = b;
if (b > bbr_state->btl_bw) {
bbr_state->btl_bw = b;
}
}
bbr_state->btl_bw_filter[0] = 0;
}
/*瓶颈带宽是最大的ack bitrate*/
if (bandwidth_estimate > bbr_state->btl_bw_filter[0]) {
bbr_state->btl_bw_filter[0] =bandwidth_estimate;
if (bandwidth_estimate > bbr_state->btl_bw) {
bbr_state->btl_bw = bandwidth_estimate;
bbr_state->btl_bw_increased = 1;
}
}
}
周期性的探测RTprop
void BBRCheckProbeRTT(picoquic_bbr_state_t* bbr_state, picoquic_path_t* path_x, uint64_t bytes_in_transit, uint64_t current_time)
{
/*rt_prop_expired周期到了,进入RTprop的探测*/
if (bbr_state->state != picoquic_bbr_alg_probe_rtt &&
bbr_state->rt_prop_expired &&
!bbr_state->idle_restart) {
BBREnterProbeRTT(bbr_state);
bbr_state->prior_cwnd = BBRSaveCwnd(bbr_state, path_x);
bbr_state->probe_rtt_done_stamp = 0;
}
/*在RTprop探测过程中, 计算RTprop*/
if (bbr_state->state == picoquic_bbr_alg_probe_rtt) {
BBRHandleProbeRTT(bbr_state, path_x, bytes_in_transit, current_time);
bbr_state->idle_restart = 0;
}
}
void BBREnterProbeRTT(picoquic_bbr_state_t* bbr_state)
{
bbr_state->state = picoquic_bbr_alg_probe_rtt;
/*减少发送窗口到一个BDP, 开始队列排空*/
bbr_state->pacing_gain = 1.0;
bbr_state->cwnd_gain = 1.0;
}
void BBRHandleProbeRTT(picoquic_bbr_state_t* bbr_state, picoquic_path_t * path_x, uint64_t bytes_in_transit, uint64_t current_time)
{
#if 0
/* Ignore low rate samples during ProbeRTT: */
C.app_limited =
(BW.delivered + bytes_in_transit) ? 0 : 1;
#endif
/*inflight的包约4个时,探测了RTprop*/
if (bbr_state->probe_rtt_done_stamp == 0 &&
bytes_in_transit <= BBR_MIN_PIPE_CWND(path_x->send_mtu)) {
bbr_state->probe_rtt_done_stamp =
current_time + BBR_PROBE_RTT_DURATION;
bbr_state->probe_rtt_round_done = 0;
bbr_state->next_round_delivered = path_x->delivered;
}
else if (bbr_state->probe_rtt_done_stamp != 0) {
if (bbr_state->round_start) {
bbr_state->probe_rtt_round_done = 1;
}
if (bbr_state->probe_rtt_round_done &&
current_time > bbr_state->probe_rtt_done_stamp) {
bbr_state->rt_prop_stamp = current_time;
BBRRestoreCwnd(bbr_state, path_x);
BBRExitProbeRTT(bbr_state, current_time);
}
}
}
/* This will use one way samples if available */
/* Should augment that with common RTT filter to suppress jitter */
void BBRUpdateRTprop(picoquic_bbr_state_t* bbr_state, uint64_t rtt_sample, uint64_t current_time)
{
bbr_state->rt_prop_expired =
current_time > bbr_state->rt_prop_stamp + BBR_PROBE_RTT_INTERVAL;
/*探测阶段最小的rtt更新RTprop*/
if (rtt_sample <= bbr_state->rt_prop || bbr_state->rt_prop_expired) {
bbr_state->rt_prop = rtt_sample;
bbr_state->rt_prop_stamp = current_time;
}
else {
uint64_t delta = rtt_sample - bbr_state->rt_prop;
if (20 * delta < bbr_state->rt_prop) {
bbr_state->rt_prop_stamp = current_time;
}
}
}
BBR算法大部分时间运行在瓶颈带宽BtlBw的探测和链路长度RTprop的探测中
每收到一个ack, 进行pacing_gain周期的检测和RTprop周期的检测
void BBRUpdateModelAndState(picoquic_bbr_state_t* bbr_state, picoquic_path_t* path_x,
uint64_t rtt_sample, uint64_t bytes_in_transit, uint64_t packets_lost, uint64_t current_time)
{
/*瓶颈带宽探测*/
BBRUpdateBtlBw(bbr_state, path_x);
BBRCheckCyclePhase(bbr_state, packets_lost, current_time);
BBRCheckFullPipe(bbr_state, path_x->last_bw_estimate_path_limited);
BBRCheckDrain(bbr_state, bytes_in_transit, current_time);
/*RTprop探测*/
BBRUpdateRTprop(bbr_state, rtt_sample, current_time);
BBRCheckProbeRTT(bbr_state, path_x, bytes_in_transit, current_time);
}
BBR算法效果评测
待完成