[Cloud Networking Notes] Congestion

[Cloud Networking Notes] Congestion

参考论文:Data Center TCP (DCTCP)

传统TCP的拥塞控制

传统的拥塞控制是很粗糙的,慢启动,加性增长,乘性减少。如图,

[Cloud Networking Notes] Congestion_第1张图片

存在什么问题?

参考Cloud Networking Quiz 2 的第15题:

Question 15
What problems does TCP’s reaction to loss pose?

  1. Loss is a poor signal of persistent congestion. Losses can also occur due to other transient factors.
  2. Multiplicative decrease can be too aggressive – perhaps the sender’s rate is only marginally larger than the available capacity.
  3. Waiting until a loss before reacting to congestion means waiting until buffer occupancy is large. This increases queuing latencies.

首先,TCP仅仅通过丢包就认为发生拥塞,这点很有问题,丢包有很多原因啊,不一定是以为拥塞。
然后,乘性减少太暴力了。互联网可能还好,但在数据中心,这大为降低传输速率啊。
最后,当发生丢包的时候,queue肯定已经挤爆了,这时候排队延时会大幅上升。下文有个例子可以说明排队延时的影响有多大。

数据中心的通信

拆分/聚合 模式(Partition/Aggregate)

工作流特征

数据中心中的工作流大致可以分成两类:

  1. Query Traffic
    显然,query都是些small flow,但是一般要求他们的延时要很低,因此是延时敏感的。也就是,轻量,但是要快。
  2. Backgroud Traffic
    有点像地下暗流的样子,例如workers更新数据等动作会带来backgroud traffic。一般是large flow,延时要求没那么高,但是吞吐量巨大,因此是吞吐量敏感的。

In summary, throughput-sensitive large flows, delay sensitive short flows and bursty query traffic, co-exist in a data center network.

TCP在数据中心中的性能减损

1.交换机

2.Incast

如下图(a),当一大波small flow聚到同一个端口,就会发生Incast

[Cloud Networking Notes] Congestion_第2张图片

3.产生排队

造成排队的原因如上图(b),small flow和large flow都聚到同一个端口。可想而知,small flow是有多么蛋疼,本来流量很小的,应该传得很快才对,结果却被large flow堵住了…

为什么排队时延的冲击非常大?参考Coursera上的两个Questions:

Question 16
Assume that data travels over fiber at a speed of 2c/3, where c is the speed of light in a vacuum. What is the (one-way) propagation delay across 300 meters of fiber running across a data center floor? (This might seem annoying, but the point of such questions is to make sure you have a better sense of these timescales.)

Answer: 1.5 microseconds

Question 17
Assuming a line rate of 10Gbps, what time elapses between a packet arriving in a buffer with five packets already queued (each packet being 9000 bytes in size), and reaching the head of the queue (i.e., just before its bytes start to get sent across the wire)?

Answer: 36 microseconds

可见排队时延占了大头。

4.缓冲压力

DCTCP 算法

拥塞估计值 α

拥塞窗口并不是像TCP那样,一检测到拥塞就直接减半,而是根据拥塞程度而减小。

cwndcwnd(11/α)

例如,若拥塞程度低, α=0 ,则 cwnd 就不变;若拥塞程度很高, α=1 ,则 cwnd 减半。

你可能感兴趣的:(tcp,networking)