K8S kubeproxy转发分析

环境信息

节点(node)IP:192.168.0.11
服务配置:3副本Nginx服务
服务CLUSTER-IP:10.254.198.92
服务CLUSTER PORT:80
服务NodePort:32110

如何处理访问Service的流量?

步骤1 将流量导入KUBE-SERVICES链

k8s创建的服务对外提供NodePort或ClusterIP的访问方式,而真正负责服务的是内部各pod(如172.16.0.2,172.16.0.3,172.16.0.4),kube-proxy就是负责外部与内部的转发工作,在使用IPTABLES做转发的模式下,nat表中KUBE-SERVICES链负责该工作,后续详述该链内容,首先分析下如何将访问Service的流量导入KUBE-SERVICES链。

本机通过NodePort或者ClusterIP访问service,经过IPTABLES的主要表、链如下:

NAT OUTPUT
FILTER OUTPUT
NAT POSTROUTING

外部通过NodePort访问service,经过IPTABLES的主要表、链如下:

NAT PREROUTING
FILTER FORWARD
NAT POSTROUTING

分析:
以上两类访问方式流量会分别经过NAT的OUTPUT链和PREROUTING 链,所以可以在这两处将流量截获并转发至KUBE-SERVICES链。

验证:
NAT OUTPUT 链配置:

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
LOG        all  --  0.0.0.0/0            0.0.0.0/0            LOG flags 0 level 4 prefix "** NAT OUTPUT **"
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
...

NAT PREROUTING 链配置:

Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         
LOG        all  --  0.0.0.0/0            0.0.0.0/0            LOG flags 0 level 4 prefix "** NAT PREROUTING **"
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
...

步骤二 KUBE-SERVICES 链进行流量转发

(1)将访问ClusterIP(10.254.198.92:80)和NodePort的流量分成两类处理,以下两条规则分别匹配
ClusterIP和NodePort的流量。

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination         
...
KUBE-SVC-I64SNEMOLCWHJHS3  tcp  --  0.0.0.0/0            10.254.198.92        /* default/nginx-service-nodeport: cluster IP */ tcp dpt:80
KUBE-NODEPORTS  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL

(2)访问ClusterIP的流量进一步处理,最终实现分配给后端pods。

Chain KUBE-SVC-I64SNEMOLCWHJHS3 (2 references)
target     prot opt source               destination         
KUBE-SEP-MMWJ6M2J72TU3J64  all  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service-nodeport: */ statistic mode random probability 0.33332999982
KUBE-SEP-GRLEVIWNO4P37GSQ  all  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service-nodeport: */ statistic mode random probability 0.50000000000
KUBE-SEP-74XRUOWV76LDS3ID  all  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service-nodeport: */

分析:后端有3个pod,以上规则中通过random算法将流量分发,由随机数可以看出并不是平均分配,接下来进一步查看其中1个pod子链的规则。

Chain KUBE-SEP-MMWJ6M2J72TU3J64 (1 references)
target     prot opt source               destination         
KUBE-MARK-MASQ  all  --  172.17.0.2           0.0.0.0/0            /* default/nginx-service-nodeport: */
DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service-nodeport: */ tcp to:172.17.0.2:80

分析:通过DNAT规则可以看出,将流量转发到了POD(172.17.0.2:80)中,其他两条也是类似配置。

(3)访问NodePort的流量进一步处理,最终实现分配给后端pods。

Chain KUBE-NODEPORTS (1 references)
target     prot opt source               destination         
KUBE-MARK-MASQ  tcp  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service-nodeport: */ tcp dpt:32110
KUBE-SVC-I64SNEMOLCWHJHS3  tcp  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service-nodeport: */ tcp dpt:32110

分析:
第一条规则(KUBE-MARK-MASQ)是对流量进行了标记(MARK or 0x4000),返回后继续执行第二条规则。
第二条规则KUBE-SVC-I64SNEMOLCWHJHS3与上面分析的ClusterIP经过的链相同,即进一步分配给后端pod:

Chain KUBE-SVC-I64SNEMOLCWHJHS3 (2 references)
target     prot opt source               destination         
KUBE-SEP-MMWJ6M2J72TU3J64  all  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service-nodeport: */ statistic mode random probability 0.33332999982
KUBE-SEP-GRLEVIWNO4P37GSQ  all  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service-nodeport: */ statistic mode random probability 0.50000000000
KUBE-SEP-74XRUOWV76LDS3ID  all  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service-nodeport: */

整体流程架构图

K8S kubeproxy转发分析_第1张图片

验证

验证通过查看IPTABLES日志分析流量路径及数据包变化,IPTABLES日志可以通过以下命令向特定链中添加:

[root@hjdevelop ~]# iptables -I KUBE-NODEPORTS -s 192.168.0.0/16 -j LOG --log-prefix "** NAT KUBE-NODEPORTS **" -t nat 

本机NodePort访问

curl 192.168.0.11:32110

Jan 21 17:03:39 hjdevelop kernel: ** NAT OUTPUT **IN= OUT=lo SRC=192.168.0.11 DST=192.168.0.11 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=32110 WINDOW=43690 RES=0x00 SYN URGP=0 
Jan 21 17:03:39 hjdevelop kernel: ** NAT KUBE-SERVICES **IN= OUT=lo SRC=192.168.0.11 DST=192.168.0.11 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=32110 WINDOW=43690 RES=0x00 SYN URGP=0 
Jan 21 17:03:39 hjdevelop kernel: ** NAT KUBE-NODEPORTS **IN= OUT=lo SRC=192.168.0.11 DST=192.168.0.11 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=32110 WINDOW=43690 RES=0x00 SYN URGP=0 
Jan 21 17:03:39 hjdevelop kernel: ** NAT KUBE-MARK-MASQ  **IN= OUT=lo SRC=192.168.0.11 DST=192.168.0.11 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=32110 WINDOW=43690 RES=0x00 SYN URGP=0 
Jan 21 17:03:39 hjdevelop kernel: ** NAT KUBE-SVC **IN= OUT=lo SRC=192.168.0.11 DST=192.168.0.11 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=32110 WINDOW=43690 RES=0x00 SYN URGP=0 MARK=0x4000 
Jan 21 17:03:39 hjdevelop kernel: ** NAT KUBE-SEP-MMWJ6M2J72TU3IN= OUT=lo SRC=192.168.0.11 DST=192.168.0.11 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=32110 WINDOW=43690 RES=0x00 SYN URGP=0 MARK=0x4000 
Jan 21 17:03:39 hjdevelop kernel: ** Filter OUTPUT **IN= OUT=lo SRC=192.168.0.11 DST=172.17.0.2 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=80 WINDOW=43690 RES=0x00 SYN URGP=0 MARK=0x4000 
Jan 21 17:03:39 hjdevelop kernel: ** NAT POSTROUTING **IN= OUT=docker0 SRC=192.168.0.11 DST=172.17.0.2 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=80 WINDOW=43690 RES=0x00 SYN URGP=0 MARK=0x4000 

通过IPTABLES日志可以确认顺序为:
OUTPUT–>KUBE-SERVICES–>KUBE-NODEPORTS–>NAT KUBE-MARK-MASQ–>KUBE-SVC–>KUBE-SEP-MMWJ6M2J72TU3IN(该流程将目标IP修改为172.17.0.2)–>Filter OUTPUT–>NAT POSTROUTING

ClusterIP访问

Jan 21 17:08:36 hjdevelop kernel: ** NAT OUTPUT **IN= OUT=eth0 SRC=192.168.0.11 DST=10.254.198.92 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23348 DF PROTO=TCP SPT=43130 DPT=80 WINDOW=28200 RES=0x00 SYN URGP=0 
Jan 21 17:08:36 hjdevelop kernel: ** NAT KUBE-SERVICES **IN= OUT=eth0 SRC=192.168.0.11 DST=10.254.198.92 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23348 DF PROTO=TCP SPT=43130 DPT=80 WINDOW=28200 RES=0x00 SYN URGP=0 
Jan 21 17:08:36 hjdevelop kernel: ** NAT KUBE-SVC **IN= OUT=eth0 SRC=192.168.0.11 DST=10.254.198.92 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23348 DF PROTO=TCP SPT=43130 DPT=80 WINDOW=28200 RES=0x00 SYN URGP=0 
Jan 21 17:08:36 hjdevelop kernel: ** NAT KUBE-SEP-74XRUOWV76LDSIN= OUT=eth0 SRC=192.168.0.11 DST=10.254.198.92 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23348 DF PROTO=TCP SPT=43130 DPT=80 WINDOW=28200 RES=0x00 SYN URGP=0 
Jan 21 17:08:36 hjdevelop kernel: ** Filter OUTPUT **IN= OUT=eth0 SRC=192.168.0.11 DST=172.17.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23348 DF PROTO=TCP SPT=43130 DPT=80 WINDOW=28200 RES=0x00 SYN URGP=0 
Jan 21 17:08:36 hjdevelop kernel: ** NAT POSTROUTING **IN= OUT=docker0 SRC=192.168.0.11 DST=172.17.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23348 DF PROTO=TCP SPT=43130 DPT=80 WINDOW=28200 RES=0x00 SYN URGP=0 

通过IPTABLES日志可以确认顺序为:
OUTPUT–>KUBE-SERVICES–>KUBE-SVC–>KUBE-SEP-74XRUOWV76LDSIN(该流程将目标IP修改为172.17.0.4)–>Filter OUTPUT–>NAT POSTROUTING

外网NodePort访问

Jan 21 17:12:34 hjdevelop kernel: ** NAT PREROUTING **IN=eth0 OUT= MAC=fa:16:3e:32:cd:fb:fa:16:3e:78:fa:aa:08:00 SRC=115.236.50.21 DST=192.168.0.11 LEN=52 TOS=0x00 PREC=0x00 TTL=116 ID=21602 DF PROTO=TCP SPT=60079 DPT=32110 WINDOW=65535 RES=0x00 SYN URGP=0 
Jan 21 17:12:34 hjdevelop kernel: ** NAT KUBE-SERVICES **IN=eth0 OUT= MAC=fa:16:3e:32:cd:fb:fa:16:3e:78:fa:aa:08:00 SRC=115.236.50.21 DST=192.168.0.11 LEN=52 TOS=0x00 PREC=0x00 TTL=116 ID=21602 DF PROTO=TCP SPT=60079 DPT=32110 WINDOW=65535 RES=0x00 SYN URGP=0 
Jan 21 17:12:34 hjdevelop kernel: ** NAT KUBE-NODEPORTS **IN=eth0 OUT= MAC=fa:16:3e:32:cd:fb:fa:16:3e:78:fa:aa:08:00 SRC=115.236.50.21 DST=192.168.0.11 LEN=52 TOS=0x00 PREC=0x00 TTL=116 ID=21602 DF PROTO=TCP SPT=60079 DPT=32110 WINDOW=65535 RES=0x00 SYN URGP=0 
Jan 21 17:12:34 hjdevelop kernel: ** NAT KUBE-MARK-MASQ  **IN=eth0 OUT= MAC=fa:16:3e:32:cd:fb:fa:16:3e:78:fa:aa:08:00 SRC=115.236.50.21 DST=192.168.0.11 LEN=52 TOS=0x00 PREC=0x00 TTL=116 ID=21602 DF PROTO=TCP SPT=60079 DPT=32110 WINDOW=65535 RES=0x00 SYN URGP=0 
Jan 21 17:12:34 hjdevelop kernel: ** NAT KUBE-SVC **IN=eth0 OUT= MAC=fa:16:3e:32:cd:fb:fa:16:3e:78:fa:aa:08:00 SRC=115.236.50.21 DST=192.168.0.11 LEN=52 TOS=0x00 PREC=0x00 TTL=116 ID=21602 DF PROTO=TCP SPT=60079 DPT=32110 WINDOW=65535 RES=0x00 SYN URGP=0 MARK=0x4000 
Jan 21 17:12:34 hjdevelop kernel: ** NAT KUBE-SEP-74XRUOWV76LDSIN=eth0 OUT= MAC=fa:16:3e:32:cd:fb:fa:16:3e:78:fa:aa:08:00 SRC=115.236.50.21 DST=192.168.0.11 LEN=52 TOS=0x00 PREC=0x00 TTL=116 ID=21602 DF PROTO=TCP SPT=60079 DPT=32110 WINDOW=65535 RES=0x00 SYN URGP=0 MARK=0x4000 
Jan 21 17:12:34 hjdevelop kernel: ** FILTER FORWARD **IN=eth0 OUT=docker0 MAC=fa:16:3e:32:cd:fb:fa:16:3e:78:fa:aa:08:00 SRC=115.236.50.21 DST=172.17.0.4 LEN=52 TOS=0x00 PREC=0x00 TTL=115 ID=21602 DF PROTO=TCP SPT=60079 DPT=80 WINDOW=65535 RES=0x00 SYN URGP=0 MARK=0x4000

通过IPTABLES日志可以确认顺序为:
PREROUTING–>KUBE-SERVICES–>KUBE-NODEPORTS–>KUBE-MARK-MASQ–>KUBE-SVC–>KUBE-SEP-74XRUOWV76LDSIN(该流程将目标IP修改为172.17.0.4)–>Filter FORWARD

你可能感兴趣的:(容器)