现象
业务的个别实例无法通过kubelet exec登录到容器里
原因
kubelet为处理attach 容器的监听的 port(这个port是随机listen的可能使用到k8s为nodeport service限定的端口),与后创建的load balance service产生的nodeport端口冲突。kube-proxy为其生成的iptables规则影响了kubelet端口的正常访问。
kubectl exec 原理可以参考这篇文章:https://www.cnblogs.com/gaorong/p/11873114.html
两种案例
注:前提都是kubelet端口和nodeport类型service端口冲突
一、没有endpoints的nodeport类型的service
没有nodeport类型的service,kube-proxy会为其添加一条拒绝本地网络地址访问目标端口(service的端口)的iptables规则,访问kubeletexec进入容器的请求会因此被拒绝。
- kubelet随机监听一个端口,与CRI Shim通信获取Exec URL
E1217 14:12:45.561106 40804 proxier.go:1054] can't open "nodePort for default/traefik-1566557461:http" (:32445/tcp), skipping this nodePort: listen tcp :32445: bind: address already in use
E1217 14:12:46.072375 40804 proxier.go:1054] can't open "nodePort for default/traefik-1566557461:http" (:32445/tcp), skipping this nodePort: listen tcp :32445: bind: address already in use
- 同时发现kube-proxy错误日志:
E1217 14:12:45.561106 40804 proxier.go:1054] can't open "nodePort for default/traefik-1566557461:http" (:32445/tcp), skipping this nodePort: listen tcp :32445: bind: address already in use
E1217 14:12:46.072375 40804 proxier.go:1054] can't open "nodePort for default/traefik-1566557461:http" (:32445/tcp), skipping this nodePort: listen tcp :32445: bind: address already in use
上边的错误是因为32445端口被kubelet占用了。
但是kube-proxy会创建出一条iptables规则,这条规则加到了input链上的filter表上,导致拒绝访问这个端口
- kubelet 日志:
./kubelet.log.20201216:1095:E1216 17:28:07.041162 39680 server.go:676] Error while proxying request: error dialing backend: dial tcp 127.0.0.1:32445: connect: connection refused
./kubelet.log.20201216:1096:E1216 17:28:08.129135 39680 server.go:676] Error while proxying request: error dialing backend: dial tcp 127.0.0.1:32445: connect: connection refused
curl 这个端口同样也会失败。
- kube-proxy代码逻辑:nodePort 类型的service,只要没有endpoints就会加上一条拒绝访问的iptables规则。原因是防止close_wait:https://github.com/kubernetes/kubernetes/issues/43212
// Capture nodeports. If we had more than 2 rules it might be
// worthwhile to make a new per-service chain for nodeport rules, but
// with just 2 rules it ends up being a waste and a cognitive burden.
if svcInfo.NodePort != 0 {
......
} else {
// No endpoints.
writeLine(proxier.filterRules,
"-A", string(kubeExternalServicesChain),
"-m", "comment", "--comment", fmt.Sprintf(`"%s has no endpoints"`, svcNameString),
"-m", "addrtype", "--dst-type", "LOCAL",
"-m", protocol, "-p", protocol,
"--dport", strconv.Itoa(svcInfo.NodePort),
"-j", "REJECT",
)
}
二、有endpoints的nodeport类型的service
后创建的nodeport类型的service 也会导致kubelet的端口不可用。这里虽然不会在filter链上创建拒绝访问nodeport的iptables规则,但是kube-proxy创建的nat规则,会拦截到kubelet端口的请求,请求直接nat到了业务pod里。
- kubelet 端口
root@:/home/test# netstat -nalp | grep kubelet
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN 13344/kubelet
tcp 0 0 127.0.0.1:61001 0.0.0.0:* LISTEN 13344/kubelet
- 创建nodeport service指定端口61001
root@:/home/test# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 443/TCP 31d
my-nginx NodePort 10.97.114.243 9090:61001/TCP 113m
- kube-proxy 同样会报端口占用的错误
E1218 04:32:34.955415 1 proxier.go:1254] can't open "nodePort for default/my-nginx:http" (:61001/tcp), skipping this nodePort: listen tcp4 :61001: bind: address already in use
E1218 05:32:34.999420 1 proxier.go:1254] can't open "nodePort for default/my-nginx:http" (:61001/tcp), skipping this nodePort: listen tcp4 :61001: bind: address already in use
- 通过kubectl 进入容器,直接访问到了业务nginx,返回404.
root@:~# kubectl exec -it sysctl-modify-78fd5486b-bxg7r sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
error: unable to upgrade connection:
404 Not Found
404 Not Found
nginx/1.18.0
root@:~#
如果业务的pod对请求没有响应,kubelet就会报请求超时
- 导致上面的问题原因是这条iptables规则,将原本访问kubelet 61001的请求nat到了业务pod上
Chain KUBE-NODEPORTS (1 references)
pkts bytes target prot opt in out source destination
0 0 KUBE-MARK-MASQ tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* default/my-nginx:http */ tcp dpt:61001
0 0 KUBE-SVC-SV7AMNAGZFKZEMQ4 tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* default/my-nginx:http */ tcp dpt:61001
解决方法
- k8s 为nodeport service可以使用的端口限定了一个范围,宿主机上其他服务的端口应该避免使用这个范围的端口,以免发生意外情况。linux内核提供了net.ipv4.ip_local_port_range参数,可以限定随机端口的使用范围。
可以使用net.ipv4.ip_local_port_range = 1024 20000(范围需要评估下)来限制kubelet随机端口的使用范围。需要重启kubelet让其重新在这个范围监听端口。
这样也可避免以后其他服务发生类似现象。这种方式目前来说最适合我们的情况,代价也最小。
参考:https://github.com/kubernetes/kubernetes/issues/85418
- 不使用loadbalances 类型的service 提供lb功能,这样做影响比较大
- 修改kubelet 里面listen的端口,这样社区以后的逻辑变更可能会影响我们, 这部分逻辑在dockershim中,高版本kubelet会将dockershim从kubelet中移除,另外该方案不会解决其他组件的端口冲突问题。