本地三个节点搭k8s,结果前两个节点的pod互通,第三个节点不能与前两个的pod通信。
查看路由,发现第三个节点没有建立通信的路由。
hadoop002节点路由详情,加粗为路由详情。hadoop003无此路由。
[root@hadoop002 beh]# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default gateway 0.0.0.0 UG 100 0 0 ens192
172.16.31.0 0.0.0.0 255.255.255.0 U 100 0 0 ens192
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
172.18.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-f33940ad6bcc
192.168.72.192 0.0.0.0 255.255.255.192 U 0 0 0 *
192.168.72.241 0.0.0.0 255.255.255.255 UH 0 0 0 cali835b424b828
192.168.72.243 0.0.0.0 255.255.255.255 UH 0 0 0 calid14de0a1fe6
192.168.72.244 0.0.0.0 255.255.255.255 UH 0 0 0 calibae9713a5c9
192.168.72.245 0.0.0.0 255.255.255.255 UH 0 0 0 calif15216f38d6
192.168.72.247 0.0.0.0 255.255.255.255 UH 0 0 0 cali07b42699ca8
192.168.72.253 0.0.0.0 255.255.255.255 UH 0 0 0 calied45b975889
192.168.135.128 hadoop001 255.255.255.192 UG 0 0 0 tunl0
[root@hadoop002 beh]# ip route
default via 172.16.31.254 dev ens192 proto static metric 100
172.16.31.0/24 dev ens192 proto kernel scope link src 172.16.31.122 metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.18.0.0/16 dev br-f33940ad6bcc proto kernel scope link src 172.18.0.1
blackhole 192.168.72.192/26 proto bird
192.168.72.241 dev cali835b424b828 scope link
192.168.72.243 dev calid14de0a1fe6 scope link
192.168.72.244 dev calibae9713a5c9 scope link
192.168.72.245 dev calif15216f38d6 scope link
192.168.72.247 dev cali07b42699ca8 scope link
192.168.72.253 dev calied45b975889 scope link
192.168.135.128/26 via 172.16.31.121 dev tunl0 proto bird onlink
想手动添加下面两条路由,均没有成功。
ip route add 172.16.31.121/23 dev tunl0
route add -net 192.168.135.128 gw hadoop001 metric 0 netmask 255.255.255.192 dev tunl0
删除calico etcd数据后,重置k8s,路由信息全部消失。
查看calico-node日志,发现报错
bird: BGP: Unexpected connect from unknown address
重置了好几遍,结果所有节点都不通了,没办法动用calicoctl。
对比hadoop001集群和dlw1集群,dlw1状态正常。发现了一些异常,hadoop001出现的是172.18.0.1这类奇怪的ip,不是实际主机ip,进一步查看calico-node的日志发现更多线索。
[root@hadoop001 beh]# DATASTORE_TYPE=kubernetes KUBECONFIG=~/.kube/config ./calicoctl node status
Calico process is running.
IPv4 BGP status
+--------------+-------------------+-------+----------+---------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+--------------+-------------------+-------+----------+---------+
| 172.18.0.1 | node-to-node mesh | start | 07:16:12 | Connect |
| 172.19.0.1 | node-to-node mesh | start | 07:16:12 | Connect |
+--------------+-------------------+-------+----------+---------+
IPv6 BGP status
No IPv6 peers found.
-----------------------------分割线--------------------------------
[root@dlw1 tbc]# DATASTORE_TYPE=kubernetes KUBECONFIG=~/.kube/config ./calicoctl node status
Calico process is running.
IPv4 BGP status
+--------------+-------------------+-------+------------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+--------------+-------------------+-------+------------+-------------+
| 172.16.40.2 | node-to-node mesh | up | 2018-11-03 | Established |
| 172.16.40.3 | node-to-node mesh | up | 2018-11-03 | Established |
+--------------+-------------------+-------+------------+-------------+
IPv6 BGP status
No IPv6 peers found.
hadoop002日志也发现了这个现象,
2018-11-06 07:27:35.639 [INFO][85] int_dataplane.go 574: Received *proto.HostMetadataUpdate update from calculation graph msg=hostname:"hadoop002" ipv4_addr:"172.18.0.1"
2018-11-06 07:27:35.639 [INFO][85] int_dataplane.go 574: Received *proto.HostMetadataUpdate update from calculation graph msg=hostname:"hadoop003" ipv4_addr:"172.19.0.1"
2018-11-06 07:27:35.639 [INFO][85] int_dataplane.go 574: Received *proto.HostMetadataUpdate update from calculation graph msg=hostname:"hadoop001" ipv4_addr:"172.16.31.121"
dlw2的日志则显示是主机ip
18-11-03 02:51:33.907 [INFO][197] syncer.go 473: Started receiving snapshot snapshotIndex=0x19a8
2018-11-03 02:51:33.908 [INFO][197] int_dataplane.go 574: Received *proto.HostMetadataUpdate update from calculation graph msg=hostname:"dlw1" ipv4_addr:"172.16.40.1"
2018-11-03 02:51:33.919 [INFO][197] int_dataplane.go 574: Received *proto.HostMetadataUpdate update from calculation graph msg=hostname:"dlw2" ipv4_addr:"172.16.40.2"
2018-11-03 02:51:33.919 [INFO][197] int_dataplane.go 574: Received *proto.HostMetadataUpdate update from calculation graph msg=hostname:"dlw3" ipv4_addr:"172.16.40.3"
https://github.com/projectcalico/calico/issues/1941
参考这边文章,在calico-node的yaml文件中配置ip查找策略,定义自动查找且指定网络接口,重启node后网络打通。
- name: IP
value: "autodetect"
- name: IP_AUTODETECTION_METHOD
value: "interface=ens192"
[root@hadoop001 beh]# DATASTORE_TYPE=kubernetes KUBECONFIG=~/.kube/config ./calico
ifctl node status
Calico process is running.
IPv4 BGP status
+---------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+-------------+
| 172.16.31.122 | node-to-node mesh | up | 09:51:28 | Established |
| 172.16.31.123 | node-to-node mesh | up | 09:51:28 | Established |
+---------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.