关于TCP window size, MSS, RTT具体的解释就不多说了,网上有很多的解释.但是最好的学习和复习的方法就是看TCP/IP第一卷协议,从17章开始。讲的很详细也讲了很多现象背后的原因。接下来我们主要是从实践的角度看在linux上如何改变这些TCP的属性的默认值。但是在某些系统中更该默认值不一定生效,因为取决于协议栈的实现。比如MSS,window是需要TCP客户端和服务段协商的。下次有时间我可以把协议栈的代码贴出来。
route 命令支持设置在某条路由上建立的TCP连接的MSS,WINDOW,initial RTT。一定要注意是在某条路由上建立的,在协议栈中可以根据目的地址,找到这条路由进而取得这些默认值。
mss M set the TCP Maximum Segment Size (MSS) for connections over this
route to M bytes. The default is the device MTU minus headers,
or a lower MTU when path mtu discovery occurred. This setting
can be used to force smaller TCP packets on the other end when
path mtu discovery does not work (usually because of misconfig‐
ured firewalls that block ICMP Fragmentation Needed)
window W
set the TCP window size for connections over this route to W
bytes. This is typically only used on AX.25 networks and with
drivers unable to handle back to back frames.
irtt I set the initial round trip time (irtt) for TCP connections over
this route to I milliseconds (1-12000). This is typically only
used on AX.25 networks. If omitted the RFC 1122 default of 300ms
is used.
root@baohua-VirtualBox:/home/baohua# route -ne
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 10.0.2.2 0.0.0.0 UG 0 0 0 eth0
10.0.2.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
接来使用firefox访问网易,并用tcpdump抓包
本地发到网易的SYNC包window size为29200, 网易发回的包的window size为65535
root@baohua-VirtualBox:/home/baohua#tcpdump -i eth0
tcpdump: verbose output suppressed, use -vor -vv for full protocol decode
listening on eth0, link-type EN10MB(Ethernet), capture size 65535 bytes
12:32:55.710239 IPpromote.cache-dns.local.64498 > dns86.hzbn.net.domain: 29954+ A?www.163.com. (29)
12:32:55.710380 IPpromote.cache-dns.local.64498 > promote.cache-dns.local.domain: 29954+ A?www.163.com. (29)
12:32:55.710493 IP promote.cache-dns.local.56991> dns86.hzbn.net.domain: 39748+ AAAA? www.163.com. (29)
12:32:55.722128 IP dns86.hzbn.net.domain> promote.cache-dns.local.64498: 29954 5/5/5 CNAME www.163.com.lxdns.com.,CNAME 163.xdwscache.ourglb0.com., A 61.144.14.109, A 116.6.73.241, A 61.144.14.108(315)
12:32:55.722211 IPpromote.cache-dns.local.domain > promote.cache-dns.local.64498: 29954 5/5/5CNAME www.163.com.lxdns.com., CNAME 163.xdwscache.ourglb0.com., A61.144.14.108, A 61.144.14.109, A 116.6.73.241 (315)
12:32:55.756964 IP dns86.hzbn.net.domain> promote.cache-dns.local.56991: 39748 2/1/0 CNAME www.163.com.lxdns.com.,CNAME 163.xdwscache.ourglb0.com. (147)
12:32:55.757551 IPpromote.cache-dns.local.42743 > 61.144.14.109.http: Flags [S], seq1989910246, win 29200, options [mss 1460,sackOK,TS val 11087ecr 0,nop,wscale 7], length 0
12:32:55.760889 IP 61.144.14.109.http >promote.cache-dns.local.42743: Flags [S.], seq309568001,ack 1989910247,win 65535, options [mss 1460],length 0
12:32:55.760956 IPpromote.cache-dns.local.42743 > 61.144.14.109.http: Flags [.], ack 1, win29200, length 0
12:32:55.761276 IPpromote.cache-dns.local.42743 > 61.144.14.109.http: Flags [P.], seq 1:793,ack 1, win 29200, length 792
12:32:55.762672 IP 61.144.14.109.http >promote.cache-dns.local.42743: Flags [.], ack 793, win 65535, length 0
12:32:55.862197 IP 61.144.14.109.http >promote.cache-dns.local.42743: Flags [P.], seq 1:380, ack 793, win 65535,length 379
12:32:55.862226 IPpromote.cache-dns.local.42743 > 61.144.14.109.http: Flags [.], ack 380, win30016, length 0
执行下面的命令改变默认路由的window size 为1024, MSS和RTT可以查看man route。
route del default
route add default gw 10.0.2.2 window 1024 dev eth0
root@baohua-VirtualBox:/home/baohua# route -ne
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 10.0.2.2 0.0.0.0 UG 0 1024 0 eth0
10.0.2.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
echo 0 > /proc/sys/net/ipv4/tcp_window_scaling 禁止window size scale
接来使用firefox访问网易,并用tcpdump抓包
本地发到网易的SYNC包window size为1024, 网易发回的包的window size为65535
11:22:22.506528 IPpromote.cache-dns.local.33561 > dns86.hzbn.net.domain: 10805+ A?www.163.com. (29)
11:22:22.513131 IP dns86.hzbn.net.domain> promote.cache-dns.local.33561: 10805 5/5/5 CNAME www.163.com.lxdns.com.,CNAME 163.xdwscache.ourglb0.com., A 61.144.14.109, A 116.6.73.241, A61.144.14.108 (315)
11:22:22.513918 IPpromote.cache-dns.local.5787 > dns86.hzbn.net.domain: 39013+ AAAA?www.163.com. (29)
11:22:22.521723 IP dns86.hzbn.net.domain> promote.cache-dns.local.5787: 39013 2/1/0 CNAME www.163.com.lxdns.com.,CNAME 163.xdwscache.ourglb0.com. (147)
11:22:22.522192 IPpromote.cache-dns.local.59900 > 61.144.14.109.http: Flags [S], seq4075016443, win1024, options [mss 1460,sackOK,TS val 142808 ecr0,nop,wscale 0], length 0
11:22:22.528431 IP 61.144.14.109.http >promote.cache-dns.local.59900: Flags [S.], seq 256320001, ack 4075016444, win 65535, options [mss 1460], length 0
11:22:22.528490 IPpromote.cache-dns.local.59900 > 61.144.14.109.http: Flags [.], ack 1, win1024, length 0
11:22:22.602996 IPpromote.cache-dns.local.1723 > dns86.hzbn.net.domain: 17734+ A?img1.cache.netease.com. (40)
11:22:22.603107 IPpromote.cache-dns.local.41580 > dns86.hzbn.net.domain: 28128+ AAAA?img1.cache.netease.com. (40)
11:22:22.603819 IP promote.cache-dns.local.1319> dns86.hzbn.net.domain: 64592+ A? img5.cache.netease.com. (40)
附加上linux 改变初始TCP option的三种方法
1, 针对整个系统 修改 /proc/sys/net/ipv4 中的proc文件。
2, 针对某条路由的 TCP option 上的修改,如上述方法。
3, 利用setsockopt()
int getsockopt(int sockfd, int level, int optname,
void *optval, socklen_t *optlen);
int setsockopt(int sockfd, int level, int optname,
const void *optval, socklen_t optlen); /*
* User-settable options (used with setsockopt).
*/
#define TCP_NODELAY 1 /* Don't delay send to coalesce packets */
#define TCP_MAXSEG 2 /* Set maximum segment size */
#define TCP_CORK 3 /* Control sending of partial frames */
#define TCP_KEEPIDLE 4 /* Start keeplives after this period */
#define TCP_KEEPINTVL 5 /* Interval between keepalives */
#define TCP_KEEPCNT 6 /* Number of keepalives before death */
#define TCP_SYNCNT 7 /* Number of SYN retransmits */
#define TCP_LINGER2 8 /* Life time of orphaned FIN-WAIT-2 state */
#define TCP_DEFER_ACCEPT 9 /* Wake up listener only when data arrive */
#define TCP_WINDOW_CLAMP 10 /* Bound advertised window */
#define TCP_INFO 11 /* Information about this connection. */
#define TCP_QUICKACK 12 /* Bock/reenable quick ACKs. */
#define TCP_CONGESTION 13 /* Congestion control algorithm. */
#define TCP_MD5SIG 14 /* TCP MD5 Signature (RFC2385) */
#define TCP_COOKIE_TRANSACTIONS 15 /* TCP Cookie Transactions */
#define TCP_THIN_LINEAR_TIMEOUTS 16 /* Use linear timeouts for thin streams*/
#define TCP_THIN_DUPACK 17 /* Fast retrans. after 1 dupack */
#define TCP_USER_TIMEOUT 18 /* How long for loss retry before timeout */
#define TCP_REPAIR 19 /* TCP sock is under repair right now */
#define TCP_REPAIR_QUEUE 20 /* Set TCP queue to repair */
#define TCP_FASTOPEN 23 /* Enable FastOpen on listeners */
#define TCP_TIMESTAMP 24 /* TCP time stamp */