K8S Calico IP In IP网络模式通信分析

Calico IP-In-IP通信分析

IP In IP网络模型

K8S Calico IP In IP网络模式通信分析_第1张图片

IP In IP开启方式

# 开启IP In IP 模式方式:设置环境变量CALICO_IPV4POOL_IPIP来标识是否开启IPinIP Mode. 如果该变量的值为Always那么就是开启IPIP,如果关闭需要设置为Never
- name: CALICO_IPV4POOL_IPIP
  value: "Always"

测试容器YAML

主机 IP
k8s-master-1 192.168.0.11/24
K8s-node-1 192.168.0.12/24
apiVersion: v1
kind: Service
metadata:
  name: busybox
  namespace: devops
spec:
  selector:
    app: busybox
  type: NodePort
  ports:
  - name: http
    port: 8888
    protocol: TCP
    targetPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox
  namespace: devops
spec:
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: busybox
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      name: busybox
      labels:
        app: busybox
    spec:
      affinity:	# 防止二个busybox 在同一个节点
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - topologyKey: kubernetes.io/hostname
            labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - busybox
      restartPolicy: Always
      containers:
      - command: ["/bin/sh","-c","mkdir -p /var/lib/www && httpd -f -v -p 80 -h /var/lib/www"]
        name: busybox
        image: docker.io/library/busybox:latest
        imagePullPolicy: IfNotPresent
        ports:
        - name: http
          containerPort: 80

IP In IP通信分析(跨主机)

  1. 查看pod信息
╰─ kubectl get pods -n devops -o custom-columns=NAME:.metadata.name,IP:.status.podIP,HOST:.spec.nodeName
NAME                       IP              HOST
busybox-77649b9c55-7d27b   172.16.109.65   k8s-node-1
busybox-77649b9c55-r6bx9   172.16.196.1    k8s-master-1
  1. 进入k8s-master-1的容器busybox-77649b9c55-r6bx9查看路由信息
/ # route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         169.254.1.1     0.0.0.0         UG    0      0        0 eth0
169.254.1.1     0.0.0.0         255.255.255.255 UH    0      0        0 eth0

# 查看容器网络   
/ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1480 qdisc noqueue		# 容器的veth设备对端设备在宿主机的序号为7
    link/ether 86:9c:03:9e:db:9f brd ff:ff:ff:ff:ff:ff
    inet 172.16.196.1/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::849c:3ff:fe9e:db9f/64 scope link
       valid_lft forever preferred_lft forever

从上述中我们可以看出,k8s-master-1的容器busybox-77649b9c55-r6bx9默认有一个网关:169.254.1.1。但是整个网络中没有一张网卡是这个地址

  • 从路由表可以知道 169.254.1.1 是容器的默认网关,但却找不到任何一张网卡对应这个 IP 地址。当一个数据包的目的地址不是本机时,就会查询路由表,从路由表中查到网关后,它首先会通过 ARP广播获得网关的 MAC 地址,然后在发出的网络数据包中将目标 MAC 改为网关的 MAC,而网关的 IP 地址不会出现在任何网络包头中。也就是说,没有人在乎这个 IP 地址究竟是什么,只要能找到对应的 MAC 地址,能响应 ARP 就行了

  • 在Kubernetes Calico网络中,当一个数据包的目的地址不是本网络时,会先发起ARP广播,网关即169.254.1.1收到会将自己的mac地址返回给发送端
    后续的请求由这个veth对进行完成,使用代理arp做了arp欺骗。这样做抑制了arp广播攻击,并且通过代理arp也可以进行跨网络的访问

  • 查看MAC地址信息,这个 MAC 地址应该是 Calico 硬塞进去的,而且还能响应 ARP。正常情况下,内核会对外发送 ARP 请求,询问整个二层网络中谁拥有 169.254.1.1 这个 IP 地址,拥有这个 IP 地址的设备会将自己的 MAC地址返回给对方。但现在的情况比较尴尬,容器和主机都没有这个 IP 地址,甚至连主机上的网卡:calixxxxx,。MAC 地址也是一个无用的 ee:ee:ee:ee:ee:ee

  • 实际上 Calico 利用了网卡的代理 ARP 功能。代理 ARP 是 ARP 协议的一个变种,当 ARP 请求目标跨网段时,网关设备收到此 ARP 请求,会用自己的 MAC 地址返回给请求者,这便是代理 ARP(Proxy ARP)。下面这张图中,电脑发送 ARP 请求服务器8.8.8.8 的 MAC 地址,路由器(网关)收到这个请求时会进行判断,由于目标 8.8.8.8 不属于本网段(即跨网段),此时便返回自己的接口 MAC 地址给 PC,后续电脑访问服务器时,目标 MAC 直接封装为 MAC25
    K8S Calico IP In IP网络模式通信分析_第2张图片

  1. k8s-master-1宿主机节点网卡信息
/ # ip neigh
169.254.1.1 dev eth0 lladdr ee:ee:ee:ee:ee:ee used 0/0/0 probes 1 STALE

# k8s-master-1 自身网卡信息查看
[root@k8s-master-1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:0c:29:b1:02:f0 brd ff:ff:ff:ff:ff:ff
    altname enp2s0
    inet 192.168.0.11/24 brd 192.168.0.255 scope global noprefixroute ens160
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:feb1:2f0/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
3: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 172.16.196.0/32 scope global tunl0
       valid_lft forever preferred_lft forever
4: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
    link/ether fa:89:72:a0:30:aa brd ff:ff:ff:ff:ff:ff
    inet 10.96.0.1/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.96.192.105/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.96.0.10/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
7: cali12242800409@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default				# 容器busybox的对端设备
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-d0ea7fe4-4514-1aed-cfd6-fcaf904b837a
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
8: calif57336c1ec9@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default				# 其他容器的(coredns)
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-dd44e8c7-309b-f336-2e19-5c0f0b0f83ab
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
 
       
# 查看calied926cbf5c7@if4网卡的ARP代理参数
[root@k8s-master-1 ~]# cat /proc/sys/net/ipv4/conf/calif57336c1ec9/proxy_arp
1
  • 通过veth-pair会传递到对端calixxx上,因为calixxx网卡开启了arp proxy,所以它会代答所有的ARP请求,让容器的报文都发到calixxx上,也就是发送到主机网络栈,再使用主机网络栈的路由来送到下一站. 可以通过cat /proc/sys/net/ipv4/conf/calixxx/proxy_arp/来查看,输出都是1
  • Calico 通过一个巧妙的方法将 workload 的所有流量引导到一个特殊的网关 169.254.1.1,从而引流到主机的 calixxx 网络设备上,最终将二三层流量全部转换成三层流量来转发
  • 在主机上通过开启代理 ARP 功能来实现 ARP 应答,使得 ARP 广播被抑制在主机上,抑制了广播风暴,也不会有 ARP 表膨胀的问题

k8s-master-1的busybox-77649b9c55-r6bx9尝试ping k8s-node-1的busybox-77649b9c55-7d27b

# 查看k8s-master-1的busybox容器mac信息(为空)
/ # ip neigh show

# k8s-master-1的busybox 尝试ping k8s-node-1的busybox 
/ # ping -c 1 172.16.109.65
PING 172.16.109.65 (172.16.109.65): 56 data bytes
64 bytes from 172.16.109.65: seq=0 ttl=62 time=1.603 ms

--- 172.16.109.65 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 1.603/1.603/1.603 ms

# 查看ARP信息
/ # arp -n
? (169.254.1.1) at ee:ee:ee:ee:ee:ee [ether]  on eth0

# 查看k8s-master-1 busybox 当前网卡IP
/ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1480 qdisc noqueue
    link/ether 86:9c:03:9e:db:9f brd ff:ff:ff:ff:ff:ff
    inet 172.16.196.1/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::849c:3ff:fe9e:db9f/64 scope link
       valid_lft forever preferred_lft forever

# 查看k8s-master-1路由信息
[root@k8s-master-1 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.0.2     0.0.0.0         UG    100    0        0 ens160
172.16.109.64   192.168.0.12    255.255.255.192 UG    0      0        0 tunl0
172.16.196.0    0.0.0.0         255.255.255.192 U     0      0        0 *					# 路由屏蔽,这里是把网段路由那些借助路由黑洞给屏蔽了
172.16.196.1    0.0.0.0         255.255.255.255 UH    0      0        0 cali12242800409
172.16.196.2    0.0.0.0         255.255.255.255 UH    0      0        0 calif57336c1ec9
192.168.0.0     0.0.0.0         255.255.255.0   U     100    0        0 ens160

k8s-master-1的busybox-77649b9c55-r6bx9尝试ping k8s-node-1的busybox-77649b9c55-7d27b整体流程数据报文流程如下:

  1. 由于172.16.109.65与当前172.16.196.1属于不同的网段,由于跨网段目的MAC地址为网关169.254.1.1的MAC地址,在获取网关的MAC地址时,由于veth-pair特效,eth0(容器)->cali12242800409(宿主机)宿主机的cali12242800409的网卡开启了ARP代理(ARP欺骗)会将MAC地址:ee:ee:ee:ee:ee:ee返回给容器
  2. 当获取到MAC地址后,构建数据报文:src: 172.16.196.1,dst: 172.16.109.65 src_mac: 86:9c:03:9e:db:9f dst_mac: ee:ee:ee:ee:ee:ee,此时容器查询本机路由规则发现命中默认网关路由,将数据报文丢给eth0,然后基于veth-pair设备对特性,数据报文到达宿主机的cali12242800409网卡
[root@k8s-master-1 ~]# tcpdump -i cali12242800409 icmp -e -Nnnvl
dropped privs to tcpdump
tcpdump: listening on cali12242800409, link-type EN10MB (Ethernet), snapshot length 262144 bytes
19:07:34.159492 86:9c:03:9e:db:9f > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 63022, offset 0, flags [DF], proto ICMP (1), length 84)
    172.16.196.1 > 172.16.109.65: ICMP echo request, id 29, seq 0, length 64
19:07:34.161751 ee:ee:ee:ee:ee:ee > 86:9c:03:9e:db:9f, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 62, id 35773, offset 0, flags [none], proto ICMP (1), length 84)
    172.16.109.65 > 172.16.196.1: ICMP echo reply, id 29, seq 0, length 64
  1. 数据报文到达k8s-master-1的cali12242800409后进行路由匹配,此时会匹配到172.16.109.64 192.168.0.12 255.255.255.192 UG 0 0 0 tunl0规则,将数据报文发送给tun10网卡。tunl0是一种ip隧道设备,当ip包进入该设备后,会被Linux中的ipip驱动将该ip包直接封装在宿主机网络的ip包中,然后发送到k8s-node-1的宿主机,我们在k8s-master-1的busybox ping k8s-node-1的busybox期间对k8s-master-1的tunl0网卡进行抓包
[root@k8s-master-1 ~]# tcpdump -i tunl0 -eNnnvl
dropped privs to tcpdump
tcpdump: listening on tunl0, link-type RAW (Raw IP), snapshot length 262144 bytes
19:17:08.330978 ip: (tos 0x0, ttl 63, id 12220, offset 0, flags [DF], proto ICMP (1), length 84)
    172.16.196.1 > 172.16.109.65: ICMP echo request, id 32, seq 0, length 64
19:17:08.332651 ip: (tos 0x0, ttl 63, id 62324, offset 0, flags [none], proto ICMP (1), length 84)
    172.16.109.65 > 172.16.196.1: ICMP echo reply, id 32, seq 0, length 64
  1. 数据报文经tunl0处理过后,会进行报文封装,加了一层传输层的封包。然后数据报文发送给宿主机ens160物理网卡,对k8s-master-1宿主机ens160进行抓包
[root@k8s-master-1 ~]# tcpdump -eni ens160  | grep -i icmp
# 或者使用这种方式抓IPIP报文
[root@k8s-master-1 ~]# tcpdump -i ens160 "ip proto 4" -ennvv
dropped privs to tcpdump
tcpdump: listening on ens160, link-type EN10MB (Ethernet), snapshot length 262144 bytes
20:03:02.812252 00:0c:29:b1:02:f0 > 00:0c:29:90:fa:e2, ethertype IPv4 (0x0800), length 118: (tos 0x0, ttl 63, id 55134, offset 0, flags [DF], proto IPIP (4), length 104)
    192.168.0.11 > 192.168.0.12: (tos 0x0, ttl 63, id 24637, offset 0, flags [DF], proto ICMP (1), length 84)
    172.16.196.1 > 172.16.109.65: ICMP echo request, id 43, seq 0, length 64
20:03:02.815592 00:0c:29:90:fa:e2 > 00:0c:29:b1:02:f0, ethertype IPv4 (0x0800), length 118: (tos 0x0, ttl 63, id 16578, offset 0, flags [none], proto IPIP (4), length 104)
    192.168.0.12 > 192.168.0.11: (tos 0x0, ttl 63, id 383, offset 0, flags [none], proto ICMP (1), length 84)
    172.16.109.65 > 172.16.196.1: ICMP echo reply, id 43, seq 0, length 64

K8S Calico IP In IP网络模式通信分析_第3张图片

  1. K8s-master-1 ens160将数据报文发送给k8s-node-1网卡(ens160),ens160将ipip拆封后,将流量发给tunl0,tunl0再转发给cali1b0be572c83
# 查看k8s-node-1物理网卡信息
[root@k8s-node-1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:0c:29:90:fa:e2 brd ff:ff:ff:ff:ff:ff
    altname enp2s0
    inet 192.168.0.12/24 brd 192.168.0.255 scope global noprefixroute ens160
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe90:fae2/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
3: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 172.16.109.64/32 scope global tunl0
       valid_lft forever preferred_lft forever
4: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
    link/ether c6:db:ff:2e:c2:2b brd ff:ff:ff:ff:ff:ff
    inet 10.96.0.1/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.96.192.105/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.96.0.10/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
7: cali1b0be572c83@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-e85bba11-5d8e-ec3a-6e51-c68d1d27cb9f
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever

172.16.109.65
# 查看路由信息
[root@k8s-node-1 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.0.2     0.0.0.0         UG    100    0        0 ens160
172.16.109.64   0.0.0.0         255.255.255.192 U     0      0        0 *						# 路由屏蔽,这里是把网段路由那些借助路由黑洞给屏蔽了
172.16.109.65   0.0.0.0         255.255.255.255 UH    0      0        0 cali1b0be572c83
172.16.196.0    192.168.0.11    255.255.255.192 UG    0      0        0 tunl0
192.168.0.0     0.0.0.0         255.255.255.0   U     100    0        0 ens160

按照上述分析,网络通信流程如下:

  1. busybox(k8s-master-1)-> calixxx -> tunl0 -> ens160(k8s-master-1) <----> ens160(k8s-node-1) -> calixxx -> tunl0 -> busybox(k8s-node-1)
  2. 根据k8s-master-1宿主机中的路由规则中的下一跳,使用tunl0设备将ip包发送到k8s-node-1的宿主机

IP In IP通信分析(同主机)

查看k8s-master-1的网络相关信息,Calico会为每一个pod分配一小段网络,同时会为每个pod创建一个入的ip route规则

# 查看k8s-master-1上的pod信息
╰─ kubectl get pods -A --field-selector spec.nodeName="k8s-master-1" -o custom-columns=NAME:.metadata.name,IP:.status.podIP,HOST:.spec.nodeName
NAME                       IP             HOST
busybox-77649b9c55-r6bx9   172.16.196.1   k8s-master-1
calico-node-q6cv6          192.168.0.11   k8s-master-1
coredns-7c445fd599-glfl5   172.16.196.2   k8s-master-1

# 查看k8s-master-1网卡信息
[root@k8s-master-1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:0c:29:b1:02:f0 brd ff:ff:ff:ff:ff:ff
    altname enp2s0
    inet 192.168.0.11/24 brd 192.168.0.255 scope global noprefixroute ens160
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:feb1:2f0/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
3: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 172.16.196.0/32 scope global tunl0
       valid_lft forever preferred_lft forever
4: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
    link/ether fa:89:72:a0:30:aa brd ff:ff:ff:ff:ff:ff
    inet 10.96.0.1/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.96.192.105/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.96.0.10/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
7: cali12242800409@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default		# busybox网卡信息
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-d0ea7fe4-4514-1aed-cfd6-fcaf904b837a
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
8: calif57336c1ec9@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default		# coredns网卡信息
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-dd44e8c7-309b-f336-2e19-5c0f0b0f83ab
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever


# 查看k8s-master-1的busybox网卡信息
/ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1480 qdisc noqueue		# 对应k8s-master-1的网卡:cali12242800409
    link/ether 86:9c:03:9e:db:9f brd ff:ff:ff:ff:ff:ff
    inet 172.16.196.1/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::849c:3ff:fe9e:db9f/64 scope link
       valid_lft forever preferred_lft forever

# 查看k8s-master-1的busybox路由信息
/ # route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         169.254.1.1     0.0.0.0         UG    0      0        0 eth0
169.254.1.1     0.0.0.0         255.255.255.255 UH    0      0        0 eth0

# 查看k8s-master-1的coredns网卡信息
[root@k8s-master-1 ~]# crictl inspect 9e93a905b7d87 | grep -i pid
    "pid": 23210,
            "pid": 1
            "type": "pid"

[root@k8s-master-1 ~]# nsenter -t 23210 -n ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default # 对应k8s-master-1的网卡:calif57336c1ec9
    link/ether 8e:0a:84:0d:c2:e4 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.16.196.2/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::8c0a:84ff:fe0d:c2e4/64 scope link
       valid_lft forever preferred_lft forever


# 查看k8s-master-1路由信息
[root@k8s-master-1 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.0.2     0.0.0.0         UG    100    0        0 ens160
172.16.109.64   192.168.0.12    255.255.255.192 UG    0      0        0 tunl0			# 去往pod网络的下一跳是k8s-node-1的物理机地址,网卡是tunl0
172.16.196.0    0.0.0.0         255.255.255.192 U     0      0        0 *					# 路由屏蔽,这里是把网段路由那些借助路由黑洞给屏蔽了
172.16.196.1    0.0.0.0         255.255.255.255 UH    0      0        0 cali12242800409		# busybox pod路由规则
172.16.196.2    0.0.0.0         255.255.255.255 UH    0      0        0 calif57336c1ec9	  # coredns pod路由规则
192.168.0.0     0.0.0.0         255.255.255.0   U     100    0        0 ens160

当k8s-master-1的busybox(172.16.196.1)ping k8s-master-1的coredns(172.16.196.2)主要流程如下:

  1. 172.16.196.1和172.16.196.2属于同一个网段,当是我们查看busybox的路由信息发现并没有172.16.196.0的网段路由信息,所以会去默认网关169.254.1.1获取MAC地址信息,由于busybox的对端网卡:cali12242800409开启了ARP代理,所以会返回ee:ee:ee:ee:ee:ee。然后封装数据报文:src_addr: 172.16.196.1 dst_addr: 172.16.196.2 src_mac: 86:9c:03:9e:db:9f dst_mac: ee:ee:ee:ee:ee:ee , 该数据报文会被送到宿主机的cali12242800409
  2. 当k8s-master-1的宿主机网卡cali12242800409收到数据报文后,查询路由信息,匹配到: 172.16.196.2 0.0.0.0 255.255.255.255 UH 0 0 0 calif57336c1ec9 。然后将数据报文交给calif57336c1ec9网卡(coredns容器eth0的对端网卡)

你可能感兴趣的:(Kubernetes,网络,kubernetes,运维,容器,k8s,云原生)