主机网络环境:
公网IP | 私网IP | 网关 | |
master | 192.168.5.120 | 10.2.2.120 | 192.168.5.1 |
node1 | 192.168.5.121 | 10.2.2.121 | 192.168.5.1 |
node2 | 192.168.5.122 | 10.2.2.122 | 192.168.5.1 |
k8s版本:v1.13.3
安装方式:
参考:https://github.com/gjmzj/kubeasz/releases/tag/1.0.0rc1
为了安全,把各个服务绑定在内网段(10.2.2.0/24)
所以hosts配置为:
# cat hosts
# 集群部署节点:一般为运行ansible 脚本的节点
# 变量 NTP_ENABLED (=yes/no) 设置集群是否安装 chrony 时间同步
[deploy]
10.2.2.120 NTP_ENABLED=no
# etcd集群请提供如下NODE_NAME,请注意etcd集群必须是1,3,5,7...奇数个节点
[etcd]
10.2.2.120 NODE_NAME=etcd1
[kube-master]
10.2.2.120
[kube-node]
10.2.2.121
10.2.2.122
......
安装完成后,查看coredns,dashbord,metrics-server 的日志:
# kubectl get po -o wide --all-namespaces=true
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-dc8bbbcf9-4rsfl 0/1 CrashLoopBackOff 18 55m 172.20.1.5 10.2.2.121
coredns-dc8bbbcf9-7rz2p 0/1 CrashLoopBackOff 18 55m 172.20.2.4 10.2.2.122
kubernetes-dashboard-6685cb584f-nvc8p 0/1 CrashLoopBackOff 20 55m 172.20.2.5 10.2.2.122
metrics-server-79558444c6-gtt4t 0/1 CrashLoopBackOff 6 9m27s 172.20.1.6 10.2.2.121
# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.5.1 0.0.0.0 UG 100 0 0 enp0s3
10.2.2.0 0.0.0.0 255.255.255.0 U 101 0 0 enp0s8
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.20.1.0 172.20.1.0 255.255.255.0 UG 0 0 0 flannel.1
172.20.2.0 172.20.2.0 255.255.255.0 UG 0 0 0 flannel.1
192.168.5.0 0.0.0.0 255.255.255.0 U 100 0 0 enp0s3
# kubectl logs metrics-server-79558444c6-56qmd -n kube-system
panic: Get https://10.68.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.68.0.1:443: connect: connection refused
goroutine 1 [running]:
main.main()
/go/src/github.com/kubernetes-incubator/metrics-server/cmd/metrics-server/metrics-server.go:39 +0x13b
# kubectl logs kubernetes-dashboard-6685cb584f-nvc8p -n kube-system
2019/03/11 13:19:06 Starting overwatch
2019/03/11 13:19:06 Using in-cluster config to connect to apiserver
2019/03/11 13:19:06 Using service account token for csrf signing
2019/03/11 13:19:06 No request provided. Skipping authorization
2019/03/11 13:19:06 Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service account's configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get https://10.68.0.1:443/version: dial tcp 10.68.0.1:443: getsockopt: connection refused
Refer to our FAQ and wiki pages for more information: https://github.com/kubernetes/dashboard/wiki/FAQ
# kubectl logs coredns-dc8bbbcf9-7rz2p -n kube-system
E0311 13:46:20.106731 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:318: Failed to list *v1.Namespace: Get https://10.68.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.68.0.1:443: connect: connection refused
检查iptables 规则:
# iptables-save |grep KUBE-SEP-VPBSGNC2TAY6H4RC
-A KUBE-SERVICES -d 10.68.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
:KUBE-SEP-VPBSGNC2TAY6H4RC - [0:0]
-A KUBE-SEP-VPBSGNC2TAY6H4RC -s 192.168.5.120/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-VPBSGNC2TAY6H4RC -p tcp -m tcp -j DNAT --to-destination 192.168.5.120:6443
-A KUBE-SVC-NPX46M4PTMTKRN6Y -j KUBE-SEP-VPBSGNC2TAY6H4RC
这条规则 --to-destination 192.168.5.120:6443 为啥会是外网的ip?我猜是生成iptables规则错了,导致访问不到kube-apiserver 。
而kube-apiserver 绑定的私网ip :10.2.2.120:6443
# netstat -anp |grep LISTEN |grep 6443
tcp 0 0 10.2.2.120:6443 0.0.0.0:* LISTEN 10996/kube-apiserve
再检查svc中的kubernetes的endpoint :
[root@k8s-master ansible]# kubectl get svc kubernetes
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.68.0.1 443/TCP 19h
[root@k8s-master ansible]# kubectl get ep kubernetes
NAME ENDPOINTS AGE
kubernetes 192.168.5.120:6443 19h
从endpoint中可以看到,是endpoint的地址是错误,导致生成iptables规则也跟着错了。
这里比较奇怪,为什么endpoint的地址不取内网的IP地址呢?参照一些资料,原来kube-apiserver 启动过程中,会从/proc/net/route中检查系统的default gateway,如果系统没配置default gw ,启动就会失败。如果检查到default gw后,就会取该网段的网卡ip和默认端口(6443)分配给endpoint(kubernetes)。
而我的master主机设置外网(192.168.5.1)为default gateway,所以endpoint的地址为:192.168.5.120:6443
修正这个问题,只要把master的缺省网关设置为内网网关(10.2.2.1)就行,如下:
公网IP | 私网IP | 网关 | |
master | 192.168.5.120 | 10.2.2.120 | 10.2.2.1 |
node1 | 192.168.5.121 | 10.2.2.121 | 192.168.5.1 |
node2 | 192.168.5.122 | 10.2.2.122 | 192.168.5.1 |
并重启kube-apiserver, 然后查看endpoint,已经正确。
[root@k8s-master ansible]# kubectl get ep kubernetes
NAME ENDPOINTS AGE
kubernetes 10.2.2.120:6443 19h
查看iptable规则,也正确了:
:KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
-A KUBE-SERVICES -d 10.68.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SVC-NPX46M4PTMTKRN6Y -j KUBE-SEP-26L3TDXW4RODGS5U
-A KUBE-SEP-26L3TDXW4RODGS5U -p tcp -m tcp -j DNAT --to-destination 10.2.2.120:6443
coredns,dashbord,metrics-server也可以访问kube-apiserver 了:
# kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-dc8bbbcf9-4rsfl 1/1 Running 165 2d
coredns-dc8bbbcf9-7rz2p 1/1 Running 164 2d
kube-flannel-ds-amd64-pf4n9 1/1 Running 3 24h
kube-flannel-ds-amd64-r6l5q 1/1 Running 3 25h
kube-flannel-ds-amd64-ztgsm 1/1 Running 3 25h
kubernetes-dashboard-6685cb584f-8g8zh 1/1 Running 57 28h
metrics-server-79558444c6-l8qvh 1/1 Running 88 28h
由于上面更改缺省路由的方法:我们把master的缺省网关改为内网网关,这样导致了master不能访问公网,你只能想办法把10.2.2.1接上公网。
另外一个解决方法:在每个kube-apiserver 上加了启动参数 --advertise-address 指定内网IP,就好了:
参数说明:--advertise-address # 对集群中成员提供API服务地址
[root@k8s-master ~]# cat /etc/systemd/system/kube-apiserver.service
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target
[Service]
ExecStart=/opt/kube/bin/kube-apiserver \
......
--bind-address=10.2.2.120 \
--advertise-address=10.2.2.120 \
......
重启kube-apiserver后,再检查endpoint,已经正确了:
# kubectl get ep kubernetes
NAME ENDPOINTS AGE
kubernetes 10.2.2.120:6443 6d1h
其他问题:
2. 如果flannel 出现问题,可以在flannel.service添加: -iface=enp0s8 参数,指定网卡,如果flannel是容器安装,你需要在yml文件中加上:
- args:
- --iface=enp0s8
参考:
https://github.com/kubernetes/kubernetes/issues/57534
https://github.com/gjmzj/kubeasz/issues/479