2019独角兽企业重金招聘Python工程师标准>>>
nginx的worker进程挂起且某个CPU负载达到100%
场景说明:
#tcp连接状态 [root@ ~]# netstat -nat |awk '{print $6}'|grep -v 'Foreign'|grep -v 'established)'|sort|uniq -c|sort -rn 3010 TIME_WAIT 537 ESTABLISHED 65 SYN_RECV 45 FIN_WAIT2 20 CLOSE_WAIT 18 FIN_WAIT1 9 LISTEN 3 LAST_ACK
#nginx进程 #cpu使用、R状态——运行、使用cpu的时间,明显个其他nginx worker不太一样 [root@ ~]# ps axu|grep nginx root 15878 0.0 0.0 27352 1444 ? Ss Jul30 0:00 nginx: master process /use/local/nginx/sbin/nginx -c /use/local/nginx/conf/nginx.conf nobody 15879 0.9 0.3 36432 11756 ? S Jul30 7:42 nginx: worker process nobody 15881 73.2 0.3 36060 11448 ? R Jul30 626:41 nginx: worker process nobody 15883 72.9 0.3 36116 11428 ? R Jul30 624:24 nginx: worker process nobody 15884 0.8 0.3 36372 11776 ? S Jul30 7:34 nginx: worker process
#内存使用,没什么问题 [root@ ~]# free -m total used free shared buffers cached Mem: 3018 2605 412 0 508 1251 -/+ buffers/cache: 845 2172 Swap: 2047 0 2047
#nginx错误日志,没有输出 [root@ ~]# tail -f /use/local/nginx/logs/error.log
#top详细的cpu信息 top - 14:19:35 up 308 days, 21:47, 1 user, load average: 2.22, 2.59, 2.62 Tasks: 86 total, 3 running, 82 sleeping, 1 stopped, 0 zombie Cpu0 : 2.1%us, 2.1%sy, 0.0%ni, 89.0%id, 0.0%wa, 0.0%hi, 6.7%si, 0.0%st Cpu1 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 :100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 0.4%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.4%si, 0.0%st Mem: 3090600k total, 2701600k used, 389000k free, 521064k buffers Swap: 2096472k total, 0k used, 2096472k free, 1294384k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15883 nobody 25 0 36116 11m 1520 R 99.8 0.4 635:52.34 nginx: worker process 15881 nobody 25 0 36060 11m 1520 R 99.5 0.4 638:09.22 nginx: worker process 13116 nobody 15 0 36616 11m 1512 S 3.7 0.4 0:07.63 nginx: worker process |
官方BUG说明:
This looks very similar to this problem, fixed in 1.1.1/1.0.7: *) Bugfix: nginx hogged CPU if all servers in an upstream were marked as "down". > there was only one server in upstream,which marked 'backup'. after some > test,i found this is the reason. Yes, thank you for report. This is somewhat known issue, 'backup' handling needs attention. Maxim Dounin |
这看起来对这个问题很相似,被修复在1.1.1 / 1.0.7: *)修正:Nginx占据CPU,如果在upstream中的所有servers被标记为“down” >“如果upstream中只有一个服务器,且被标记为“backup”。在一些测试中,我发现是这个原因。 是的,谢谢你的报告。这是已知的问题,“backup”处理需要注意。 |
解决办法:
找个时间重启,就好了
重启nginx |
参考:
http://bbs.linuxtone.org/thread-17226-1-1.html