k8s calico-node 报错:listen tcp 127.0.0.1:9099: bind: address already in use, 原因是runsvdir过于勤快

k8s在master节点上的calico-node pod报错:

felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp 127.0.0.1:9099: bind: address already in use

# ss -ltnp | grep 9099

LISTEN 0 128 127.0.0.1:9099 *:* users:(("calico-node",pid=5886,fd=7))

端口已被占用

# ps -ef | grep felix

root 5886 23986 0 14:09 ? 00:00:01 calico-node -felix

root 9744 9616 0 14:52 ? 00:00:00 runsv felix

root 9753 9744 4 14:52 ? 00:00:02 calico-node -felix

root 11026 36596 0 14:53 pts/0 00:00:00 grep --color=auto felix

root 23986 23584 0 Jul17 ? 00:00:00 runsv felix

kill 掉calico-node -felix 立即就会被拉起。

搜索到一句:felix由runsv管理,runsv会重启felix。

发现felix是被runsv 拉起的,但是kill 掉 runsv还是不行。立即就被拉起了,端口一直占着。

【解决】
# ps -ef | grep runsv
root      7724  7309  0 15:31 ?        00:00:00 /usr/local/bin/runsvdir -P /etc/service/enabled
root      7899  7724  0 15:31 ?        00:00:00 runsv monitor-addresses
root      7900  7724  0 15:31 ?        00:00:00 runsv allocate-tunnel-addrs
root      7901  7724  0 15:31 ?        00:00:00 runsv node-status-reporter
root      7902  7724  0 15:31 ?        00:00:00 runsv bird
root      7903  7724  0 15:31 ?        00:00:00 runsv bird6
root      7904  7724  0 15:31 ?        00:00:00 runsv confd
root      7905  7724  0 15:31 ?        00:00:00 runsv cni
root      8472  7724  0 16:12 ?        00:00:00 runsv felix
root      8508 23584  0 16:12 ?        00:00:00 runsv felix
root     11678 32170  0 16:33 pts/1    00:00:00 grep --color=auto runsv
root     23584 23565  0 Jul17 ?        00:00:31 /usr/local/bin/runsvdir -P /etc/service/enabled
root     23987 23584  0 Jul17 ?        00:00:00 runsv monitor-addresses
root     23988 23584  0 Jul17 ?        00:00:00 runsv allocate-tunnel-addrs
root     23989 23584  0 Jul17 ?        00:00:00 runsv node-status-reporter
root     23990 23584  0 Jul17 ?        00:00:00 runsv bird
root     23991 23584  0 Jul17 ?        00:00:00 runsv bird6
root     23992 23584  0 Jul17 ?        00:00:00 runsv confd
root     23993 23584  0 Jul17 ?        00:00:00 runsv cni
 

最终发现,后面还有个主谋 runsvdir, 把runsvdir一起kill了,余下的runsv瞬间树倒猢狲散了。。

奇怪的是/usr/local/bin/目录下并没有runsvdir, k8s是怎么做到的?

你可能感兴趣的:(k8s,kubernetes,calico,runsv,runsvdir)