k8s calico网络排错

本地三个节点搭k8s,结果前两个节点的pod互通,第三个节点不能与前两个的pod通信。

查看路由,发现第三个节点没有建立通信的路由。

 

hadoop002节点路由详情,加粗为路由详情。hadoop003无此路由。

 

[root@hadoop002 beh]# route

Kernel IP routing table

Destination Gateway Genmask Flags Metric Ref Use Iface

default gateway 0.0.0.0 UG 100 0 0 ens192

172.16.31.0 0.0.0.0 255.255.255.0 U 100 0 0 ens192

172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0

172.18.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-f33940ad6bcc

192.168.72.192 0.0.0.0 255.255.255.192 U 0 0 0 *

192.168.72.241 0.0.0.0 255.255.255.255 UH 0 0 0 cali835b424b828

192.168.72.243 0.0.0.0 255.255.255.255 UH 0 0 0 calid14de0a1fe6

192.168.72.244 0.0.0.0 255.255.255.255 UH 0 0 0 calibae9713a5c9

192.168.72.245 0.0.0.0 255.255.255.255 UH 0 0 0 calif15216f38d6

192.168.72.247 0.0.0.0 255.255.255.255 UH 0 0 0 cali07b42699ca8

192.168.72.253 0.0.0.0 255.255.255.255 UH 0 0 0 calied45b975889

192.168.135.128 hadoop001 255.255.255.192 UG 0 0 0 tunl0

 

[root@hadoop002 beh]# ip route

default via 172.16.31.254 dev ens192 proto static metric 100

172.16.31.0/24 dev ens192 proto kernel scope link src 172.16.31.122 metric 100

172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1

172.18.0.0/16 dev br-f33940ad6bcc proto kernel scope link src 172.18.0.1

blackhole 192.168.72.192/26 proto bird

192.168.72.241 dev cali835b424b828 scope link

192.168.72.243 dev calid14de0a1fe6 scope link

192.168.72.244 dev calibae9713a5c9 scope link

192.168.72.245 dev calif15216f38d6 scope link

192.168.72.247 dev cali07b42699ca8 scope link

192.168.72.253 dev calied45b975889 scope link

192.168.135.128/26 via 172.16.31.121 dev tunl0 proto bird onlink

 

想手动添加下面两条路由,均没有成功。

ip route add 172.16.31.121/23 dev tunl0

route add -net 192.168.135.128 gw hadoop001 metric 0 netmask 255.255.255.192 dev tunl0

 

删除calico etcd数据后,重置k8s,路由信息全部消失。

查看calico-node日志,发现报错

bird: BGP: Unexpected connect from unknown address

 

重置了好几遍,结果所有节点都不通了,没办法动用calicoctl。

对比hadoop001集群和dlw1集群,dlw1状态正常。发现了一些异常,hadoop001出现的是172.18.0.1这类奇怪的ip,不是实际主机ip,进一步查看calico-node的日志发现更多线索。

[root@hadoop001 beh]# DATASTORE_TYPE=kubernetes KUBECONFIG=~/.kube/config ./calicoctl node status

Calico process is running.

IPv4 BGP status

+--------------+-------------------+-------+----------+---------+

| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |

+--------------+-------------------+-------+----------+---------+

| 172.18.0.1 | node-to-node mesh | start | 07:16:12 | Connect |

| 172.19.0.1 | node-to-node mesh | start | 07:16:12 | Connect |

+--------------+-------------------+-------+----------+---------+

IPv6 BGP status

No IPv6 peers found.

-----------------------------分割线--------------------------------

[root@dlw1 tbc]# DATASTORE_TYPE=kubernetes KUBECONFIG=~/.kube/config ./calicoctl node status

Calico process is running.

IPv4 BGP status

+--------------+-------------------+-------+------------+-------------+

| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |

+--------------+-------------------+-------+------------+-------------+

| 172.16.40.2 | node-to-node mesh | up | 2018-11-03 | Established |

| 172.16.40.3 | node-to-node mesh | up | 2018-11-03 | Established |

+--------------+-------------------+-------+------------+-------------+

IPv6 BGP status

No IPv6 peers found.

 

hadoop002日志也发现了这个现象,

2018-11-06 07:27:35.639 [INFO][85] int_dataplane.go 574: Received *proto.HostMetadataUpdate update from calculation graph msg=hostname:"hadoop002" ipv4_addr:"172.18.0.1"

2018-11-06 07:27:35.639 [INFO][85] int_dataplane.go 574: Received *proto.HostMetadataUpdate update from calculation graph msg=hostname:"hadoop003" ipv4_addr:"172.19.0.1"

2018-11-06 07:27:35.639 [INFO][85] int_dataplane.go 574: Received *proto.HostMetadataUpdate update from calculation graph msg=hostname:"hadoop001" ipv4_addr:"172.16.31.121"

 

dlw2的日志则显示是主机ip

18-11-03 02:51:33.907 [INFO][197] syncer.go 473: Started receiving snapshot snapshotIndex=0x19a8

2018-11-03 02:51:33.908 [INFO][197] int_dataplane.go 574: Received *proto.HostMetadataUpdate update from calculation graph msg=hostname:"dlw1" ipv4_addr:"172.16.40.1"

2018-11-03 02:51:33.919 [INFO][197] int_dataplane.go 574: Received *proto.HostMetadataUpdate update from calculation graph msg=hostname:"dlw2" ipv4_addr:"172.16.40.2"

2018-11-03 02:51:33.919 [INFO][197] int_dataplane.go 574: Received *proto.HostMetadataUpdate update from calculation graph msg=hostname:"dlw3" ipv4_addr:"172.16.40.3"

 

https://github.com/projectcalico/calico/issues/1941

参考这边文章,在calico-node的yaml文件中配置ip查找策略,定义自动查找且指定网络接口,重启node后网络打通。

- name: IP

value: "autodetect"

- name: IP_AUTODETECTION_METHOD

value: "interface=ens192"

 

[root@hadoop001 beh]# DATASTORE_TYPE=kubernetes KUBECONFIG=~/.kube/config ./calico

 

ifctl node status

Calico process is running.

 

IPv4 BGP status

+---------------+-------------------+-------+----------+-------------+

| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |

+---------------+-------------------+-------+----------+-------------+

| 172.16.31.122 | node-to-node mesh | up | 09:51:28 | Established |

| 172.16.31.123 | node-to-node mesh | up | 09:51:28 | Established |

+---------------+-------------------+-------+----------+-------------+

 

IPv6 BGP status

No IPv6 peers found.

你可能感兴趣的:(容器)