不文主要是对：TCP timestamp的整理。

1 总结

timestamp是TCP头部中的可选字段，为TCP/IP协议栈提供了两个功能：

更加准确的RTT测量数据，尤其是有丢包时 -- RTTM
保证了在极端情况下，TCP的可靠性（解决编号回绕问题） -- PAWS 。
该字段因为解决了编号回绕的问题，那TIME_WAIT的时间可以设为RTO。tcp_tw_recycle实际上就是这样工作的。不推荐配置tcp_tw_recycle，因为同时开启timestamp和tcp_tw_recycle会触发per-host 机制，该机制会因为NAT等机制出现问题。参见:https://www.jianshu.com/p/5405b02bb09f

2 细节

2.1 TCP Timestamps Option (TSopt)

  Kind: 8

     Length: 10 bytes

      +-------+-------+---------------------+---------------------+
      |Kind=8 |  10   |   TS Value (TSval)  |TS Echo Reply (TSecr)|
      +-------+-------+---------------------+---------------------+
          1       1              4                     4

     The Timestamps option carries two four-byte timestamp fields.
     The Timestamp Value field (TSval) contains the current value of
     the timestamp clock of the TCP sending the option.

     The Timestamp Echo Reply field (TSecr) is only valid if the ACK
     bit is set in the TCP header; if it is valid, it echos a times-
     tamp value that was sent by the remote TCP in the TSval field
     of a Timestamps option.  When TSecr is not valid, its value
     must be zero.  The TSecr value will generally be from the most
     recent Timestamp option that was received; however, there are
     exceptions that are explained below.

2.2 建立连接时协商

timestamps一个双向的选项，当一方不开启时，两方都将停用timestamps。
比如client端发送的SYN包中带有timestamp选项，但server端并没有开启该选项。
则回复的SYN-ACK将不带timestamp选项，同时client后续回复的ACK也不会带有timestamp选项。
当然，如果client发送的SYN包中就不带timestamp，双向都将停用timestamp。

2.3 具体过程

tcp_timestamps的本质是记录数据包的发送时间。基本的步骤如下

发送方在发送数据时，将一个timestamp(表示发送时间)放在包里面
接收方在收到数据包后，在对应的ACK包中将收到的timestamp返回给发送方(echo back)
发送发收到ACK包后，用当前时刻now - ACK包中的timestamp就能得到准确的RTT。当然实际运用中要考虑到RTT的波动，因此有了后续的(Round-Trip Time Measurement)RTTM机制

2.4 timestamp对于精确计算RTT的作用

如果没有timestamp，RTT的计算会怎样？

1. TCP层在发送出一个SKB时，使用skb->when记录发送出去的时间
2. TCP层在收到SKB数据包的确认时，使用now - skb->when来计算RTT

但上面的机制在丢包发生时会有问题，比如

1. TCP层第一次发送SKB的时间是send_time1, TCP层重传一个数据包的时间是send_time2
2. 当TCP层收到SKB的确认包的时间是recv_time

但是RTT应该是 (recv_time - send_time1)呢，还是(recv_time - send_time2)呢？
以上两种方式都不可取！因为无法判断出recv_time对应的ACK是确认第一次数据包的发送还是确认重传数据包。因此TCP协议栈只能选择非重传数据包进行RTT采样。但是当出现严重丢包(比如整个窗口全部丢失)时，就完全没有数据包可以用于RTT采样。这样后续计算SRTT和RTO就会出现较大的偏差。
timestamp选项很好的解决了上述问题，因为ACK包里面带的TSecr值，一定是触发这个ACK的数据包在发送端发送的时间。不管数据包是否重传都能准确的计算RTT(前提是TSecr遵循RTTM中的计算原则)。

当然timestamp不仅解决了RTT计算的问题，还很好的为PAWS机制提供的信息依据。

2.4.1 RTTM算法

RTTM规定了一些使用TSecr计算RTT的原则，具体如下
(英文水平有限，为保持原意就使用RFC中的原话了)

a.  A TSecr value received in a segment is used to update the
    averaged RTT measurement only if the segment acknowledges
    some new data
b.  The data-sender TCP must measure the effective RTT, including the additional
    time due to delayed ACKs. Thus, when delayed ACKs are in use, the receiver should
    reply with the TSval field from the earliest
c.  An ACK for an out-of-order segment should therefore contain the 
    timestamp from the most recent segment that advanced the window
d.  The timestamp from the latest segment (which filled the hole) must be echoed
        在ACK被重传的数据时，应该使用重传数据包中的TSval进行回复

如果对以上的特殊情况有疑问，还请直接去看RFC，里面有example解释。

最后，实际上计算RTO除了以上使用TSecr的原则外，还有一些更复杂的计算方法RFC 7323。
比如对于每一个RTT采样R，

RTTVAR = (1 - beta) * RTTVAR + beta * |SRTT - R|
SRTT = (1 - alpha) * SRTT + alpha * R

2.5 timestamp对于PAWS的作用

PAWS — Protect Againest Wrapped Sequence numbers
目的是解决在高带宽下，TCP序号可能被重复使用而带来的问题。

PAWS同样依赖于timestamp，并且假设在一个TCP流中，按序收到的所有TCP包的timestamp值都是线性递增的。而在正常情况下，每条TCP流按序发送的数据包所带的timestamp值也确实是线性增加的。
首先给出几个变量的定义，之后具体介绍PAWS的工作过程

Per-Connection State Variables
    TS.Recent:       Latest received Timestamp
    Last.ACK.sent:   Last ACK field sent

Option Fields in Current Segment
    SEG.TSval:   TSval field from TSopt in current segment.
    SEG.TSecr:   TSecr field from TSopt in current segment.

TS.Recent存放着按序达到的所有TCP数据包的最晚的一个时间戳，即只有在
SEG.SEQ <= Last.ACK.sent < SEG.SEG + SEG.LEN(有新的数据被按序确认了)时，
才会去更新TS.Recent的值。

假设三个数据包的*第一次*发送时间分别是A，B和C(A < B < C)，但A和C含有相同的序列号。
而A数据包由于某种原因，在阻塞在了网络中，因此发送方进行了重传，重传时间为A2

PAWS要解决的主要问题就是：
    当接收端在接收到A2后，又接着确认到了数据包B，下一个想接收的数据是数据包C
    此时如果收到了数据包A(A从阻塞中恢复过来了，但并未真的丢失)，
    由于A与C的序列号是相同的。如果没有别的保护措施就会出现数据紊乱，没有做到可靠传输

PAWS的做法就是，如果收到的一个TCP数据包的timestamp值小于TS.Recnt，则会丢弃该数据包。  
因此数据包A到达接收方后，接收方的TS.Recent应该是数据包B中的timestamp
而A < B，故A包就会被丢弃。而真正有效的数据C到达接收后，由于B < C，因此能被正常接收

PAWS的更多细节

1. It is recommended that RST segments NOT carry timestamps, and that
RST segments be acceptable regardless of their timestamp.

2. PAWS is defined strictly within a single connection; the last timestamp is
TS.Recent is kept in the connection control block, and
discarded when a connection is closed.

3. An additional mechanism could be added to the TCP, a per-host
cache of the last timestamp received from any connection.
This value could then be used in the PAWS mechanism to reject
old duplicate segments from earlier incarnations of the
connection, if the timestamp clock can be guaranteed to have
ticked at least once since the old connection was open.

从第三点可以看到，如果针对per-host的使用PAWS中的机制，则会解决TIME-WAIT中考虑的上一个流的数据包在下一条流中被当做有效数据包的情况，这样就没有必要等待2*MSL来结束TIME-WAIT了。只要等待足够的RTO，解决好需要重传最后一个ACK的情况就可以了。配置了tcp_tw_recycle参数后会触发该机制。

image.png

因此Linux就实现了这样一种机制：

当timestamp和tw_recycle两个选项同时开启的情况下，开启per-host的PAWS机制。
从而能快速回收处于TIME-WAIT状态的TCP流。

但这样真的就能完美的解决令无数人头疼的TIME-WAIT吗？答案是否定的！
因为公网中存在太多的NAT设置，当使用per-host的PAWS机制时，是无法保证timestamp是线性递增这一假设的。因为使用同一个NAT地址的两个真实的机器，他们的timestamp是不能保证同步的(其实一致也没有用，NAT就是per-host PAWS机制的死敌)。

3 参考资料

Documentation: ip-sysctl.txt
RFC 1323: TCP Extensions for High Performance
RFC 7323: TCP Extensions for High Performance
SACK
What benefit is conferred by TCP timestamp?

TCP-timestamp