kubernetes-双网卡下,coredns,dashbord,metrics-server不能访问kube-apiserver

主机网络环境:

  公网IP 私网IP 网关
master 192.168.5.120 10.2.2.120 192.168.5.1
node1 192.168.5.121 10.2.2.121 192.168.5.1
node2 192.168.5.122 10.2.2.122 192.168.5.1

k8s版本:v1.13.3

安装方式:

参考:https://github.com/gjmzj/kubeasz/releases/tag/1.0.0rc1 

为了安全,把各个服务绑定在内网段(10.2.2.0/24)

所以hosts配置为:

# cat hosts 
# 集群部署节点:一般为运行ansible 脚本的节点
# 变量 NTP_ENABLED (=yes/no) 设置集群是否安装 chrony 时间同步
[deploy]
10.2.2.120 NTP_ENABLED=no

# etcd集群请提供如下NODE_NAME,请注意etcd集群必须是1,3,5,7...奇数个节点
[etcd]
10.2.2.120 NODE_NAME=etcd1

[kube-master]
10.2.2.120

[kube-node]
10.2.2.121
10.2.2.122

......

安装完成后,查看coredns,dashbord,metrics-server 的日志:

# kubectl get po -o wide --all-namespaces=true
NAME                                    READY   STATUS             RESTARTS   AGE     IP           NODE         NOMINATED NODE   READINESS GATES
coredns-dc8bbbcf9-4rsfl                 0/1     CrashLoopBackOff   18         55m     172.20.1.5   10.2.2.121              
coredns-dc8bbbcf9-7rz2p                 0/1     CrashLoopBackOff   18         55m     172.20.2.4   10.2.2.122              
kubernetes-dashboard-6685cb584f-nvc8p   0/1     CrashLoopBackOff   20         55m     172.20.2.5   10.2.2.122              
metrics-server-79558444c6-gtt4t         0/1     CrashLoopBackOff   6          9m27s   172.20.1.6   10.2.2.121              

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.5.1     0.0.0.0         UG    100    0        0 enp0s3
10.2.2.0        0.0.0.0         255.255.255.0   U     101    0        0 enp0s8
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
172.20.1.0      172.20.1.0      255.255.255.0   UG    0      0        0 flannel.1
172.20.2.0      172.20.2.0      255.255.255.0   UG    0      0        0 flannel.1
192.168.5.0     0.0.0.0         255.255.255.0   U     100    0        0 enp0s3

# kubectl logs metrics-server-79558444c6-56qmd -n kube-system
panic: Get https://10.68.0.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 10.68.0.1:443: connect: connection refused

goroutine 1 [running]:
main.main()
	/go/src/github.com/kubernetes-incubator/metrics-server/cmd/metrics-server/metrics-server.go:39 +0x13b

# kubectl logs kubernetes-dashboard-6685cb584f-nvc8p -n kube-system
2019/03/11 13:19:06 Starting overwatch
2019/03/11 13:19:06 Using in-cluster config to connect to apiserver
2019/03/11 13:19:06 Using service account token for csrf signing
2019/03/11 13:19:06 No request provided. Skipping authorization
2019/03/11 13:19:06 Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service account's configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get https://10.68.0.1:443/version: dial tcp 10.68.0.1:443: getsockopt: connection refused
Refer to our FAQ and wiki pages for more information: https://github.com/kubernetes/dashboard/wiki/FAQ

# kubectl logs coredns-dc8bbbcf9-7rz2p -n kube-system

E0311 13:46:20.106731       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:318: Failed to list *v1.Namespace: Get https://10.68.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.68.0.1:443: connect: connection refused

检查iptables 规则:

# iptables-save |grep KUBE-SEP-VPBSGNC2TAY6H4RC

-A KUBE-SERVICES -d 10.68.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
:KUBE-SEP-VPBSGNC2TAY6H4RC - [0:0]
-A KUBE-SEP-VPBSGNC2TAY6H4RC -s 192.168.5.120/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-VPBSGNC2TAY6H4RC -p tcp -m tcp -j DNAT --to-destination 192.168.5.120:6443
-A KUBE-SVC-NPX46M4PTMTKRN6Y -j KUBE-SEP-VPBSGNC2TAY6H4RC

这条规则 --to-destination 192.168.5.120:6443 为啥会是外网的ip?我猜是生成iptables规则错了,导致访问不到kube-apiserver 。

而kube-apiserver 绑定的私网ip :10.2.2.120:6443

# netstat -anp |grep LISTEN |grep 6443
tcp        0      0 10.2.2.120:6443         0.0.0.0:*               LISTEN      10996/kube-apiserve

再检查svc中的kubernetes的endpoint :

[root@k8s-master ansible]# kubectl get svc kubernetes
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.68.0.1            443/TCP   19h
[root@k8s-master ansible]# kubectl get ep kubernetes
NAME         ENDPOINTS            AGE
kubernetes   192.168.5.120:6443   19h

从endpoint中可以看到,是endpoint的地址是错误,导致生成iptables规则也跟着错了。

这里比较奇怪,为什么endpoint的地址不取内网的IP地址呢?参照一些资料,原来kube-apiserver 启动过程中,会从/proc/net/route中检查系统的default gateway,如果系统没配置default gw ,启动就会失败。如果检查到default gw后,就会取该网段的网卡ip和默认端口(6443)分配给endpoint(kubernetes)。

而我的master主机设置外网(192.168.5.1)为default gateway,所以endpoint的地址为:192.168.5.120:6443

修正这个问题,只要把master的缺省网关设置为内网网关(10.2.2.1)就行,如下:

  公网IP 私网IP 网关
master 192.168.5.120 10.2.2.120 10.2.2.1
node1 192.168.5.121 10.2.2.121 192.168.5.1
node2 192.168.5.122 10.2.2.122 192.168.5.1

并重启kube-apiserver, 然后查看endpoint,已经正确。

[root@k8s-master ansible]# kubectl get ep kubernetes
NAME         ENDPOINTS         AGE
kubernetes   10.2.2.120:6443   19h

查看iptable规则,也正确了:

:KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
-A KUBE-SERVICES -d 10.68.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SVC-NPX46M4PTMTKRN6Y -j KUBE-SEP-26L3TDXW4RODGS5U
-A KUBE-SEP-26L3TDXW4RODGS5U -p tcp -m tcp -j DNAT --to-destination 10.2.2.120:6443

coredns,dashbord,metrics-server也可以访问kube-apiserver 了: 

 

# kubectl get po -n kube-system
NAME                                    READY   STATUS    RESTARTS   AGE
coredns-dc8bbbcf9-4rsfl                 1/1     Running   165        2d
coredns-dc8bbbcf9-7rz2p                 1/1     Running   164        2d
kube-flannel-ds-amd64-pf4n9             1/1     Running   3          24h
kube-flannel-ds-amd64-r6l5q             1/1     Running   3          25h
kube-flannel-ds-amd64-ztgsm             1/1     Running   3          25h
kubernetes-dashboard-6685cb584f-8g8zh   1/1     Running   57         28h
metrics-server-79558444c6-l8qvh         1/1     Running   88         28h

由于上面更改缺省路由的方法:我们把master的缺省网关改为内网网关,这样导致了master不能访问公网,你只能想办法把10.2.2.1接上公网。

另外一个解决方法:在每个kube-apiserver 上加了启动参数 --advertise-address 指定内网IP,就好了:

参数说明:--advertise-address  # 对集群中成员提供API服务地址

[root@k8s-master ~]# cat /etc/systemd/system/kube-apiserver.service 
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target

[Service]
ExecStart=/opt/kube/bin/kube-apiserver \
......
  --bind-address=10.2.2.120 \
  --advertise-address=10.2.2.120 \
......

重启kube-apiserver后,再检查endpoint,已经正确了:

# kubectl get ep kubernetes
NAME         ENDPOINTS         AGE
kubernetes   10.2.2.120:6443   6d1h

 

其他问题:

2. 如果flannel 出现问题,可以在flannel.service添加: -iface=enp0s8 参数,指定网卡,如果flannel是容器安装,你需要在yml文件中加上:

      - args:
        - --iface=enp0s8

参考:

https://github.com/kubernetes/kubernetes/issues/57534

https://github.com/gjmzj/kubeasz/issues/479

 

你可能感兴趣的:(kubernetes)