一文说清楚 Linux TCP 内核参数

目录

    • 1.参数作用域
    • 2.参数简介
      • net.core.netdev_max_backlog
      • net.ipv4.tcp_max_syn_backlog
      • net.ipv4.tcp_syn_retries
      • net.ipv4.tcp_synack_retries
      • net.core.somaxconn
      • net.ipv4.tcp_fin_timeout
      • net.ipv4.tcp_tw_reuse
      • net.ipv4.tcp_tw_recycle
      • net.ipv4.tcp_keepalive_time
      • net.ipv4.tcp_max_tw_buckets
      • net.ipv4.tcp_retries1
      • net.ipv4.tcp_retries2

Linux 运维中绕不开参数优化,尤其是 Naginx、Tomcat 这种 Web 应用中,需要调整很多 TCP 内核参数。不止 Web 应用,像TDengine、TiDB、MatrixDB 这些分布式数据库,同样需要调整 TCP 相关参数。

最常见的参数如下:

net.core.somaxconn
net.core.netdev_max_backlog
net.ipv4.tcp_max_syn_backlog
net.ipv4.tcp_retries2
net.ipv4.tcp_syn_retries
net.ipv4.tcp_synack_retries
net.ipv4.tcp_tw_reuse
net.ipv4.tcp_tw_recycle
net.ipv4.tcp_keepalive_time
net.ipv4.tcp_fin_timeout
net.ipv4.tcp_max_tw_buckets

1.参数作用域

以上参数的介绍不管是 man 还是 baidu 都很容易得到。但是参数间的相互关系,以及如何生效却少有人进行说明。

要理解参数的作用域与相互关系,需要先理解网络连接的基础概念。

以一个简单的 Web 访问为例【与参数无关的内容就不赘述了】:

1.网卡(interface)收到请求,将其转给系统接口(socket)
此时,当系统处理速度小于网卡接收的速度,请求就会被拒绝,为此系统提供一个缓冲区,将系统暂时无法处理的请求缓存起来。这个缓存区大小受 net.core.netdev_max_backlog 控制。
当系统处理速度大于网卡接收速度时,net.core.netdev_max_backlog 参数就没有任何作用了。

2.接口收到请求,开始 TCP 三次握手
a.接口受到syn请求,接口返回syn+ack,接口状态为 SYN_RECEIVED
在整个 SYN_RECEIVED 状态下,接口所能接收的最大连接的数量由net.ipv4.tcp_max_syn_backlog控制。
接口返回syn+ack 信息后,如果没有收到ack,会尝试再次发送syn+ack,尝试次数由net.ipv4.tcp_synack_retries 控制。


作为发送syn的客户端,同样要控制syn的重试次数。由net.ipv4.tcp_syn_retries 控制。

b.接口受到ack 连接建立,接口状态为 ESTABLISHED
连接建立后,在 ESTABLISHED 状态下,如果系统不能即使处理请求,请求会被放入队列。
队列最大连接的数量由net.core.somaxconn控制。
因此需要 net.ipv4.tcp_max_syn_backlog >= net.core.somaxconn

3.数据传输完成完成,开始 TCP 四次挥手,断开连接
a.服务器接受客户端断开请求,返回ack,接口状态为 CLOSE_WAIT
b.服务端返回所有信息后,发送FIN,接口状态为 LAST_ACK


客户端收到FIN后,返回ack,接口状态为 TIME_WAIT
客户端等待 FIN 的时间由 net.ipv4.tcp_fin_timeout 控制。

c.服务端收到客户端返回的ack后接口关闭,状态为CLOSED

4.接口将资源释放
接口进入 TIME_WAIT 状态后,可允许的最大连接数为net.ipv4.tcp_max_tw_buckets,超过这个数量,连接会被摧毁。
net.ipv4.tcp_tw_reuse 决定了是否对 TIME_WAIT 连接进行重用。
net.ipv4.tcp_tw_recycle 决定是否对 TIME_WAIT 接口快速回收,这个参数在内核 4.12 版本后已废弃。

以上整个过程,都受最大连接数限制,最大连接数由以下参数和文件控制:
参数: fs.nr_open nofile
文件:/etc/security/limits.conf /etc/systemd/system.conf /etc/systemd/user.conf /etc/security/limits.d/20-nproc.conf

对所有 TCP 连接,尝试发送的次数和间隔由以下两个参数控制:
net.ipv4.tcp_retries1net.ipv4.tcp_retries2

对于 TCP 的长连接,超时时间由 net.ipv4.tcp_keepalive_time参数控制。

2.参数简介

以下是对上述参数的简单介绍(均来自kernel.org),以后会对每个参数详细解读,并说明在不同应用中的设置建议。

net.core.netdev_max_backlog

Maximum number of packets, queued on the INPUT side, when the interface receives packets faster than kernel can process them.

net.ipv4.tcp_max_syn_backlog

Maximal number of remembered connection requests (SYN_RECV), which have not received an acknowledgment from connecting client.

This is a per-listener limit.

The minimal value is 128 for low memory machines, and it will increase in proportion to the memory of machine.

If server suffers from overload, try increasing this number.

Remember to also check /proc/sys/net/core/somaxconn A SYN_RECV request socket consumes about 304 bytes of memory.

net.ipv4.tcp_syn_retries

Number of times initial SYNs for an active TCP connection attempt will be retransmitted. Should not be higher than 127. Default value is 6, which corresponds to 63seconds till the last retransmission with the current initial RTO of 1second. With this the final timeout for an active TCP connection attempt will happen after 127seconds.

net.ipv4.tcp_synack_retries

Number of times SYNACKs for a passive TCP connection attempt will be retransmitted. Should not be higher than 255. Default value is 5, which corresponds to 31seconds till the last retransmission with the current initial RTO of 1second. With this the final timeout for a passive TCP connection will happen after 63seconds.

net.core.somaxconn

Limit of socket listen() backlog, known in userspace as SOMAXCONN. Defaults to 4096. (Was 128 before linux-5.4) See also tcp_max_syn_backlog for additional tuning for TCP sockets.
Linux net.core.somaxconn 参数详解

net.ipv4.tcp_fin_timeout

The length of time an orphaned (no longer referenced by any application) connection will remain in the FIN_WAIT_2 state before it is aborted at the local end. While a perfectly valid “receive only” state for an un-orphaned connection, an orphaned connection in FIN_WAIT_2 state could otherwise wait forever for the remote to close its end of the connection.

Cf. tcp_max_orphans

Default: 60 seconds

net.ipv4.tcp_tw_reuse

Enable reuse of TIME-WAIT sockets for new connections when it is safe from protocol viewpoint.

0 - disable

1 - global enable

2 - enable for loopback traffic only

It should not be changed without advice/request of technical experts.

Default: 2

net.ipv4.tcp_tw_recycle

Enable fast recycling of TIME_WAIT sockets. Enabling this option is not recommended since this causes problems when working with NAT (Network Address Translation).

The net.ipv4.tcp_tw_recycle has been removed from Linux 4.12 on 2017.

net.ipv4.tcp_keepalive_time

How often TCP sends out keepalive messages when keepalive is enabled. Default: 2hours.

net.ipv4.tcp_max_tw_buckets

Maximal number of timewait sockets held by system simultaneously. If this number is exceeded time-wait socket is immediately destroyed and warning is printed. This limit exists only to prevent simple DoS attacks, you must not lower the limit artificially, but rather increase it (probably, after increasing installed memory), if network conditions require more than default value.

net.ipv4.tcp_retries1

This value influences the time, after which TCP decides, that something is wrong due to unacknowledged RTO retransmissions, and reports this suspicion to the network layer. See tcp_retries2 for more details.

RFC 1122 recommends at least 3 retransmissions, which is the default.

net.ipv4.tcp_retries2

This value influences the timeout of an alive TCP connection, when RTO retransmissions remain unacknowledged. Given a value of N, a hypothetical TCP connection following exponential backoff with an initial RTO of TCP_RTO_MIN would retransmit N times before killing the connection at the (N+1)th RTO.

The default value of 15 yields a hypothetical timeout of 924.6 seconds and is a lower bound for the effective timeout. TCP will effectively time out at the first RTO which exceeds the hypothetical timeout.

RFC 1122 recommends at least 100 seconds for the timeout, which corresponds to a value of at least 8.

你可能感兴趣的:(Linux,linux,tcp/ip,服务器,nginx,tomcat)