calico故障问题排查

步骤

假设遇到了从容器A中无法访问容器B的问题:

从calico中获取容器的网卡信息

从calico中获取到发送端容器的workloadEndpoint:

$calicoctl get workloadendpoint --workload=. -o yaml
- apiVersion: v1
  kind: workloadEndpoint
  metadata:
    labels:
      calico/k8s_ns: 
      name: sshproxy-internal
      pod-template-hash: "3693247749"
      tenxcloud.com/appName: sshproxy-internal
      tenxcloud.com/svcName: sshproxy-internal
    name: eth0
    node: dev-slave-107
    orchestrator: k8s
    workload: .
  spec:
    interfaceName: cali69de609d5af
    ipNetworks:
    - 192.168.8.42/32
    mac: b2:21:5b:82:e1:27
    profiles:
    - k8s_ns.

上面的查询结果说明:

1. 容器内的IP为192.168.8.42/32,mac地址是b2:21:5b:82:e1:27
2. 容器位于node端网卡为cali69de609d5af
3. 容器位于dev-slave-107

从calico中获取容器的接收端信息:

$calicoctl get workloadendpoint --workload=. -o yaml
- apiVersion: v1
  kind: workloadEndpoint
  metadata:
    labels:
      calico/k8s_ns: 
      name: sshproxy-cluster
      pod-template-hash: "162298777"
      tenxcloud.com/appName: sshproxy-cluster
      tenxcloud.com/svcName: sshproxy-cluster
    name: eth0kj
    node: dev-slave-140
    orchestrator: k8s
    workload: .
  spec:
    interfaceName: calie664becc2fd
    ipNetworks:
    - 192.168.60.173/32
    mac: da:ba:8d:7a:45:dc
    profiles:
    - k8s_ns.

上面的查询结果说明:

1. 接收端node上的calio网卡为calie664becc2fd
2. 容器位于dev-slave-140

检查发送端的容器

查看容器内网卡是否正确,ip和mac是否与从calico中查询到的一致:

sh-4.2# ip addr
...
3: eth0@if57:  mtu 1500 qdisc noqueue state UP
    link/ether b2:21:5b:82:e1:27 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.8.42/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::b021:5bff:fe82:e127/64 scope link
       valid_lft forever preferred_lft forever

查看容器的默认路由是否是168.254.1.1,且没有额外的路由:

sh-4.2# ip route
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0  scope link

在node上读取node对应的calico网卡的mac:

$ip link show cali69de609d5af
57: cali69de609d5af@if3:  mtu 1500 qdisc noqueue state UP mode DEFAULT
    link/ether ea:88:97:5f:06:d9 brd ff:ff:ff:ff:ff:ff link-netnsid 2

查看容器内记录的168.254.1.1的mac地址是否是node上的calico网卡的mac:

sh-4.2# ip neigh
169.254.1.1 dev eth0 lladdr ea:88:97:5f:06:d9 REACHABLE

检查发送端的node

在node上用tcpdump监听cali69de609d5af网卡,查看是否能够收到从容器内发出的报文

$tcpdump -i cali69de609d5af

检查node上的路由,目标IP的下一跳地址是否正确,目标IP是容器的地址,下一跳是否对应了正确的node ip:

$ip route
...
192.168.60.128/26 via 10.39.0.140 dev eth0  proto bird
...

检查node上的iptables规则,是否将容器的报文正确的送出

检查接收端的node

监听接收端node的网卡,检查是否收到了发送端node发送来的报文

$tcpdump -i eth0

检查接收端node上的路由,检查目标IP是否对应了正确的calico网卡

$ip route
...
192.168.60.173 dev calie664becc2fd  scope link
...

检查接收端node上的iptables规则,是否接受了报文

检查接收端的容器

监听接收端容器的网卡,检查是否收到了报文

最后

颠倒发送端和接收端,再做一次检查

原文来自:http://www.lijiaocn.com/%E9%97%AE%E9%A2%98/2017/08/18/calico-network-problem-resove.html

你可能感兴趣的:(calico故障问题排查)