1. 问题描述
使用 kubeadm 部署 k8s 集群的时候不知道哪个步骤出了错,导致 kubelet 10250 端口运行的协议、地址出了问题,如下所示:
[[email protected] ~]# netstat -ntpl | grep 10250
tcp 0 0 127.0.0.1:10250 0.0.0.0:* LISTEN 52577/kubelet
查看 kubelet 服务也能看到端口运行在 127.0.0.1 上:
[[email protected] ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf, 20-etcd-service-manager.conf
Active: active (running) since 二 2022-10-04 15:04:47 CST; 4 days ago
Docs: https://kubernetes.io/docs/
Main PID: 52577 (kubelet)
Tasks: 16
Memory: 51.4M
CGroup: /system.slice/kubelet.service
└─52577 /usr/bin/kubelet --address=127.0.0.1 --pod-manifest-path=/etc/kubernetes/manifests --cgroup-driver=systemd --network-plugin=cni --pod-infra-cont...
10月 05 08:46:43 k8s-slave2 kubelet[52577]: I1005 08:46:43.352978 52577 topology_manager.go:200] "Topology Admit Handler"
10月 05 08:46:43 k8s-slave2 kubelet[52577]: I1005 08:46:43.514029 52577 reconciler.go:221] "operationExecutor.VerifyControllerAttachedVolume started for volume ...
10月 05 08:46:44 k8s-slave2 kubelet[52577]: map[string]interface {}{"cniVersion":"0.3.1", "hairpinMode":true, "ipMasq":false, "ipam":map[string]interface {}{"rang...
10月 05 11:13:17 k8s-slave2 kubelet[52577]: {"cniVersion":"0.3.1","hairpinMode":true,"ipMasq":false,"ipam":{"ranges":[[{"subnet":"10.244.1.0/24"}]],"routes":[{"ds...
10月 05 11:13:17 k8s-slave2 kubelet[52577]: I1005 11:13:17.399294 52577 reconciler.go:221] "operationExecutor.VerifyControllerAttachedVolume started for volume ...
10月 05 11:13:17 k8s-slave2 kubelet[52577]: I1005 11:13:17.979964 52577 pod_container_deletor.go:79] "Container not found in pod's containers" contain...d0cda5a59"
10月 05 11:13:18 k8s-slave2 kubelet[52577]: map[string]interface {}{"cniVersion":"0.3.1", "hairpinMode":true, "ipMasq":false, "ipam":map[string]interface {}{"rang...
10月 07 09:51:53 k8s-slave2 kubelet[52577]: {"cniVersion":"0.3.1","hairpinMode":true,"ipMasq":false,"ipam":{"ranges":[[{"subnet":"10.244.1.0/24"}]],"rou...go:187] fa
10月 07 09:51:56 k8s-slave2 kubelet[52577]: E1007 09:51:56.391455 52577 kubelet_node_status.go:460] "Error updating node status, will retry" err="error getting ...
10月 07 09:51:57 k8s-slave2 kubelet[52577]: E1007 09:51:57.185843 52577 controller.go:187] failed to update lease, error: Operation cannot be fulfille... try again
Hint: Some lines were ellipsized, use -l to show in full.
而部署正常的集群,kubelet 的 10250 端口运行情况应该是这样的:
基于 tcp6 协议,而不是 tcp
基于 :: 而不是 127.0.0.1
如下所示:
tcp6 0 0 :::10250 :::* LISTEN 3272/kubelet
2. kubelet 10250 端口介绍
顺便讲下 10250 端口的作用:10250 端口监听的是 kubelet 的 API 接口,是 kubelet 与 apiserver 通信的端口。kubelet 通过 10250 端口请求 apiserver 获取自己所应当处理的任务,并通过该端口访问及获取 node 资源以及状态。kubectl 查看 pod 的日志和 cmd 命令,都是通过 kubelet 端口 10250 访问。
如果 kubelet 10250 端口运行有问题的话则会出现类似如下无法获取日志的情况:
[[email protected] ~]# kubectl logs kube-flannel-ds-9tfc8 -n kube-system
Error from server: Get "https://192.168.100.22:10250/containerLogs/kube-system/kube-flannel-ds-9tfc8/kube-flannel": dial tcp 192.168.100.22:10250: connect: connection refused
从上面的这个报错可以看出来,10250 端口运行在 127.0.0.1 上肯定是不行的
。
3. 修改 10250 端口的运行
怎样将 10250 的端口运行修改正常呢?
思路是:查找 kubelet 的各种配置,看看 127.0.0.1 这个 IP 配置在哪里。
kubelet 相关的配置文件及路径可能涉及如下其中一个或者多个:
- /etc/kubernetes/kubelet.conf
- /var/lib/kubelet/
- /usr/lib/systemd/system/kubelet.service
- /usr/lib/systemd/system/kubelet.service.d/
最简单的办法是,通过命令 systemctl status kubelet 查看 kubelet 引用的关键配置文件到底是哪个。最终确认 kubelet 的配置文件是:
/usr/lib/systemd/system/kubelet.service.d/20-etcd-service-manager.conf
其配置如下,可以看到这里定义了 kubelet 运行的 ip 地址是 127.0.0.1:
[[email protected] ~]# cat /usr/lib/systemd/system/kubelet.service.d/20-etcd-service-manager.conf
[Service]
ExecStart=
# 将下面的 "systemd" 替换为你的容器运行时所使用的 cgroup 驱动。
# kubelet 的默认值为 "cgroupfs"。
ExecStart=/usr/bin/kubelet --address=127.0.0.1 --pod-manifest-path=/etc/kubernetes/manifests --cgroup-driver=systemd --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.2
尝试将 ip 地址改为主机的地址 192.168.100.22,然后重启 kubelet,再次查看 10250 端口运行情况:
[[email protected] ~]# netstat -ntpl | grep kubelet
tcp 0 0 127.0.0.1:38362 0.0.0.0:* LISTEN 16424/kubelet
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN 16424/kubelet
tcp 0 0 192.168.100.22:10250 0.0.0.0:* LISTEN 16424/kubelet
但是这样也不行,该端口仍然基于 tcp 运行而不是 tcp6,同时 127.0.0.1 也需要 10250 端口。
最后找到解决方法,即 将配置文件 /usr/lib/systemd/system/kubelet.service.d/20-etcd-service-manager.conf 文件直接注释掉。然后重启 kubelet 再次查看端口运行情况,已经正常:
[[email protected] kubelet.service.d]# mv 20-etcd-service-manager.conf 20-etcd-service-manager.conf.bak
[[email protected] kubelet.service.d]#
[[email protected] kubelet.service.d]# systemctl daemon-reload
[[email protected] kubelet.service.d]#
[[email protected] kubelet.service.d]# systemctl restart kubelet
[[email protected] kubelet.service.d]#
[[email protected] kubelet.service.d]# netstat -ntpl | grep kubelet
tcp 0 0 127.0.0.1:39386 0.0.0.0:* LISTEN 18890/kubelet
tcp 0 0 127.0.0.1:10248 0.0.0.0:* LISTEN 18890/kubelet
tcp6 0 0 :::10250 :::* LISTEN 18890/kubelet
查看 kubelet 的服务状态,之前的 127.0.0.1 也去掉了:
[[email protected] kubelet.service.d]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since 日 2022-10-09 13:59:45 CST; 35min ago
Docs: https://kubernetes.io/docs/
Main PID: 18890 (kubelet)
Tasks: 15
Memory: 51.0M
CGroup: /system.slice/kubelet.service
└─18890 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubel...
10月 09 13:59:47 k8s-slave2 kubelet[18890]: I1009 13:59:47.143058 18890 reconciler.go:221] "operationExecutor.VerifyControllerAttachedVolume started for volume ...
10月 09 13:59:47 k8s-slave2 kubelet[18890]: I1009 13:59:47.143072 18890 reconciler.go:221] "operationExecutor.VerifyControllerAttachedVolume started for volume ...
10月 09 13:59:47 k8s-slave2 kubelet[18890]: I1009 13:59:47.143086 18890 reconciler.go:221] "operationExecutor.VerifyControllerAttachedVolume started for volume ...
10月 09 13:59:47 k8s-slave2 kubelet[18890]: I1009 13:59:47.143100 18890 reconciler.go:221] "operationExecutor.VerifyControllerAttachedVolume started for volume ...
10月 09 13:59:47 k8s-slave2 kubelet[18890]: I1009 13:59:47.143113 18890 reconciler.go:221] "operationExecutor.VerifyControllerAttachedVolume started for volume ...
10月 09 13:59:47 k8s-slave2 kubelet[18890]: I1009 13:59:47.143129 18890 reconciler.go:221] "operationExecutor.VerifyControllerAttachedVolume started for volume ...
10月 09 13:59:47 k8s-slave2 kubelet[18890]: I1009 13:59:47.143143 18890 reconciler.go:221] "operationExecutor.VerifyControllerAttachedVolume started for volume ...
10月 09 13:59:47 k8s-slave2 kubelet[18890]: I1009 13:59:47.143152 18890 reconciler.go:157] "Reconciler: start to sync state"
10月 09 13:59:48 k8s-slave2 kubelet[18890]: I1009 13:59:48.317492 18890 request.go:665] Waited for 1.071720304s due to client-side throttling, not pri...roxy/token
10月 09 14:27:07 k8s-slave2 kubelet[18890]: I1009 14:27:07.198435 18890 log.go:184] http: superfluous response.WriteHeader call from k8s.io/kubernetes...se.go:220)
Hint: Some lines were ellipsized, use -l to show in full.
再次查看该 node 节点上的 pod 日志,已经可以正常查看了:
[[email protected] ~]# kubectl logs kube-flannel-ds-8mwsd -n kube-system
I1003 11:37:08.049753 1 main.go:207] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[ens33] ifaceRegex:[] ipMasq:true ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true}
W1003 11:37:08.050009 1 client_config.go:614] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I1003 11:37:08.451790 1 kube.go:121] Waiting 10m0s for node controller to sync
I1003 11:37:08.451939 1 kube.go:402] Starting kube subnet manager
I1003 11:37:09.452166 1 kube.go:128] Node controller sync successful
I1003 11:37:09.452199 1 main.go:227] Created subnet manager: Kubernetes Subnet Manager - k8s-slave2
I1003 11:37:09.452206 1 main.go:230] Installing signal handlers
I1003 11:37:09.452354 1 main.go:463] Found network config - Backend type: vxlan
I1003 11:37:09.452652 1 match.go:248] Using interface with name ens33 and address 192.168.100.22
I1003 11:37:09.452676 1 match.go:270] Defaulting external address to interface address (192.168.100.22)
I1003 11:37:09.452733 1 vxlan.go:138] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
I1003 11:37:09.472457 1 kube.go:351] Setting NodeNetworkUnavailable
I1003 11:37:09.481646 1 main.go:412] Current network or subnet (10.244.0.0/16, 10.244.2.0/24) is not equal to previous one (0.0.0.0/0, 0.0.0.0/0), trying to recycle old iptables rules
I1003 11:37:09.746603 1 iptables.go:255] Deleting iptables rule: -s 0.0.0.0/0 -d 0.0.0.0/0 -m comment --comment flanneld masq -j RETURN
I1003 11:37:09.747961 1 iptables.go:255] Deleting iptables rule: -s 0.0.0.0/0 ! -d 224.0.0.0/4 -m comment --comment flanneld masq -j MASQUERADE --random-fully
I1003 11:37:09.748691 1 iptables.go:255] Deleting iptables rule: ! -s 0.0.0.0/0 -d 0.0.0.0/0 -m comment --comment flanneld masq -j RETURN
I1003 11:37:09.749351 1 iptables.go:255] Deleting iptables rule: ! -s 0.0.0.0/0 -d 0.0.0.0/0 -m comment --comment flanneld masq -j MASQUERADE --random-fully
I1003 11:37:09.750317 1 main.go:341] Setting up masking rules
I1003 11:37:09.750945 1 main.go:362] Changing default FORWARD chain policy to ACCEPT
I1003 11:37:09.750995 1 main.go:375] Wrote subnet file to /run/flannel/subnet.env
I1003 11:37:09.751000 1 main.go:379] Running backend.
I1003 11:37:09.845326 1 vxlan_network.go:61] watching for new subnet leases
I1003 11:37:09.846711 1 main.go:400] Waiting for all goroutines to exit
I1003 11:37:09.847083 1 iptables.go:231] Some iptables rules are missing; deleting and recreating rules
I1003 11:37:09.847088 1 iptables.go:255] Deleting iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -m comment --comment flanneld masq -j RETURN
I1003 11:37:09.943307 1 iptables.go:231] Some iptables rules are missing; deleting and recreating rules
I1003 11:37:09.943324 1 iptables.go:255] Deleting iptables rule: -s 10.244.0.0/16 -m comment --comment flanneld forward -j ACCEPT
I1003 11:37:09.943450 1 iptables.go:255] Deleting iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -m comment --comment flanneld masq -j MASQUERADE --random-fully
I1003 11:37:09.944459 1 iptables.go:255] Deleting iptables rule: -d 10.244.0.0/16 -m comment --comment flanneld forward -j ACCEPT
I1003 11:37:09.945214 1 iptables.go:255] Deleting iptables rule: ! -s 10.244.0.0/16 -d 10.244.2.0/24 -m comment --comment flanneld masq -j RETURN
I1003 11:37:09.945335 1 iptables.go:243] Adding iptables rule: -s 10.244.0.0/16 -m comment --comment flanneld forward -j ACCEPT
I1003 11:37:09.946028 1 iptables.go:255] Deleting iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -m comment --comment flanneld masq -j MASQUERADE --random-fully
I1003 11:37:09.947474 1 iptables.go:243] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -m comment --comment flanneld masq -j RETURN
I1003 11:37:09.948330 1 iptables.go:243] Adding iptables rule: -d 10.244.0.0/16 -m comment --comment flanneld forward -j ACCEPT
I1003 11:37:10.044998 1 iptables.go:243] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -m comment --comment flanneld masq -j MASQUERADE --random-fully
I1003 11:37:10.047373 1 iptables.go:243] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.2.0/24 -m comment --comment flanneld masq -j RETURN
I1003 11:37:10.049161 1 iptables.go:243] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -m comment --comment flanneld masq -j MASQUERADE --random-fully