nginx的Connection refused

问题描述

nginx的错误日志中突然出现大量的的Connection refused问题,日志如下:

2020/03/19 09:52:53 [error] 20117#20117: *7403411764 connect() failed (111: Connection refused) while connecting to upstream, client: xxx.xxx.xxx.xxx, server: , request: "POST /post/result/lol?type=Bet HTTP/1.1", upstream: "http://xxx.xxx.xxx.xxx/post/result/lol?type=Bet", host: "xxx.xxx.xxx.xxx"
2020/03/19 09:52:53 [error] 20117#20117: *7403411774 connect() failed (111: Connection refused) while connecting to upstream, client: xxx.xxx.xxx.xxx, server: , request: "POST /post/result/csgo?type=RollingBet HTTP/1.1", upstream: "http://xxx.xxx.xxx.xxx/post/result/csgo?type=RollingBet", host: "xxx.xxx.xxx.xxx"
2020/03/19 09:52:54 [error] 20116#20116: *7403411815 connect() failed (111: Connection refused) while connecting to upstream, client: xxx.xxx.xxx.xxx, server: , request: "POST /post/result/lol?type=Bet HTTP/1.1", upstream: "http://xxx.xxx.xxx.xxx/post/result/lol?type=Bet", host: "xxx.xxx.xxx.xxx"

出现这个问题,一开始以为是server节点挂掉,但是查看了下server运行正常;这个错误是突然间爆发大量的错误,查看了相关nginx和服务器监控系统,看到连接数突增。可以说明在高负载下,系统响应变慢,并出现超时或失误失败情况,TIME_WAIT积压。

问题定位

  • 查看了tcp连接命令
# netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
TIME_WAIT 35423
CLOSE_WAIT 23602
SYN_SENT 62
FIN_WAIT1 61
FIN_WAIT2 259
ESTABLISHED 7543
SYN_RECV 3
CLOSING 35
LAST_ACK 507

发现WAIT数量过高,TCP连接断开后,会以TIME_WAIT状态保留一定的时间,然后才会释放端口。当并发请求过多的时候,就会产生大量的TIME_WAIT状态的连接,无法及时断开的话,会占用大量的端口资源和服务器资源,导致很多连接被拒绝了。

  • 修改系统参数
# vim /etc/sysctl.conf
net.ipv4.tcp_fin_timeout = 30 #保留 FIN_WAIT2 的时间, 默认值是60, 单位是秒.
net.ipv4.tcp_timestamps = 1 #时间戳可以避免序列号的卷绕,默认为0,表示关闭;
net.ipv4.tcp_tw_reuse = 1 #表示开启重用。允许将TIME-WAIT sockets重新用于新的TCP连接,默认为0,表示关闭;
net.ipv4.tcp_tw_recycle = 1 # 表示开启TCP连接中TIME-WAIT sockets的快速回收,默认为0,表示关闭。
  • 配置生效
/sbin/sysctl -p
  • 验证效果
# netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
TIME_WAIT 2521
CLOSE_WAIT 13602

WAIT的数量降低了,nginx也没有报Connection refused。

你可能感兴趣的:(异常处理)