rancher节点 flannel failed to add vxlanRoute 容器网络互 ping 不同 debug 记录

现象: f1 f2 f3 三台机器,f3与其他任意一台互 ping 容器ip,不通

1)检查 flanneld 进程是否正常,查看 flannel subnet 配置,互相 ping subnet 地址 确认不通现象

cat /run/flannel/subnet.env

## 输出
FLANNEL_NETWORK=10.42.0.0/16
FLANNEL_SUBNET=10.42.1.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

## e.g. 在 f2 f3 机器尝试 ping f1
ping 10.42.1.1

2) 检查 flanneld 进程(容器)的日志

docker ps | grep flanneld

docker logs 44465a91c2d3
I1015 01:53:31.040048       1 main.go:474] Determining IP address of default interface
I1015 01:53:31.040382       1 main.go:487] Using interface with name eth0 and address 10.186.24.202
I1015 01:53:31.040398       1 main.go:504] Defaulting external address to interface address (10.186.24.202)
I1015 01:53:31.066320       1 kube.go:130] Waiting 10m0s for node controller to sync
I1015 01:53:31.066375       1 kube.go:283] Starting kube subnet manager
I1015 01:53:32.066502       1 kube.go:137] Node controller sync successful
I1015 01:53:32.066531       1 main.go:234] Created subnet manager: Kubernetes Subnet Manager - finot2
I1015 01:53:32.066536       1 main.go:237] Installing signal handlers
I1015 01:53:32.066588       1 main.go:352] Found network config - Backend type: vxlan
I1015 01:53:32.066634       1 vxlan.go:119] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
I1015 01:53:32.124906       1 main.go:299] Wrote subnet file to /run/flannel/subnet.env
I1015 01:53:32.124922       1 main.go:303] Running backend.
I1015 01:53:32.124929       1 main.go:321] Waiting for all goroutines to exit
I1015 01:53:32.124943       1 vxlan_network.go:56] watching for new subnet leases
I1015 01:53:32.128063       1 iptables.go:114] Some iptables rules are missing; deleting and recreating rules
I1015 01:53:32.128078       1 iptables.go:136] Deleting iptables rule: -s 10.42.0.0/16 -j ACCEPT
I1015 01:53:32.128431       1 iptables.go:114] Some iptables rules are missing; deleting and recreating rules
I1015 01:53:32.128447       1 iptables.go:136] Deleting iptables rule: -s 10.42.0.0/16 -d 10.42.0.0/16 -j RETURN
I1015 01:53:32.129338       1 iptables.go:136] Deleting iptables rule: -d 10.42.0.0/16 -j ACCEPT
I1015 01:53:32.129754       1 iptables.go:136] Deleting iptables rule: -s 10.42.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
I1015 01:53:32.130338       1 iptables.go:124] Adding iptables rule: -s 10.42.0.0/16 -j ACCEPT
I1015 01:53:32.130728       1 iptables.go:136] Deleting iptables rule: ! -s 10.42.0.0/16 -d 10.42.2.0/24 -j RETURN
I1015 01:53:32.131929       1 iptables.go:136] Deleting iptables rule: ! -s 10.42.0.0/16 -d 10.42.0.0/16 -j MASQUERADE
I1015 01:53:32.133119       1 iptables.go:124] Adding iptables rule: -d 10.42.0.0/16 -j ACCEPT
I1015 01:53:33.133267       1 iptables.go:124] Adding iptables rule: -s 10.42.0.0/16 -d 10.42.0.0/16 -j RETURN
I1015 01:53:33.225369       1 iptables.go:124] Adding iptables rule: -s 10.42.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
I1015 01:53:33.227734       1 iptables.go:124] Adding iptables rule: ! -s 10.42.0.0/16 -d 10.42.2.0/24 -j RETURN
I1015 01:53:33.230012       1 iptables.go:124] Adding iptables rule: ! -s 10.42.0.0/16 -d 10.42.0.0/16 -j MASQUERADE
E1015 01:53:49.498557       1 vxlan_network.go:158] failed to add vxlanRoute (10.42.0.0/24 -> 10.42.0.0): invalid argument

3)分析

日志最后一行有报错,添加路由失败。查看所有网络设备和网段,重点关注 10.42.0.0/24

ip a

## 输出
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc mq state UP group default qlen 1000
   ......... 略
3: docker0:  mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:5d:e6:cb:e2 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
    inet 10.42.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
4: flannel.1:  mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether 2a:de:09:69:5f:8c brd ff:ff:ff:ff:ff:ff
    inet 10.42.1.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
5: cni0:  mtu 1450 qdisc noqueue state UP group default qlen 1000
    link/ether 0a:58:0a:2a:01:01 brd ff:ff:ff:ff:ff:ff
    inet 10.42.1.1/24 scope global cni0
       valid_lft forever preferred_lft forever
6: vethb6725a8f@if3:  mtu 1450 qdisc noqueue master cni0 state UP group default
    link/ether 6a:3b:bb:50:1f:fa brd ff:ff:ff:ff:ff:ff link-netnsid 0
7: veth6fdbf195@if3:  mtu 1450 qdisc noqueue master cni0 state UP group default
    link/ether 3e:f9:73:06:28:8b brd ff:ff:ff:ff:ff:ff link-netnsid 1

观察到第三项 docker0 设备 占用了 10.42.0.1/16 网段,与最前面 FLANNEL_NETWORK 声明的网段冲突,导致路由添加失败,Overlay Network 无法做转发

删除 docker0 设备下的 10.42.0.1/16 网段(f1 f2 机器),问题解决

ip addr del 10.42.0.1 dev docker0

# 验证
ip addr show docker0
3: docker0:  mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:cd:ff:50:c1 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever

事后分析,机器曾装过 Rancher 1.X

Rancher 1.X 版本会在 docker0 设备下面添加 10.42 网段做 ipsec 转发。因未知原因未清理干净,与 flannel 网络的默认配置网段发生冲突。

参考: https://github.com/coreos/flannel/issues/844

转自:https://zhuanlan.zhihu.com/p/46804841

你可能感兴趣的:(docker,rancher,k8s)