节点(node)IP:192.168.0.11
服务配置:3副本Nginx服务
服务CLUSTER-IP:10.254.198.92
服务CLUSTER PORT:80
服务NodePort:32110
k8s创建的服务对外提供NodePort或ClusterIP的访问方式,而真正负责服务的是内部各pod(如172.16.0.2,172.16.0.3,172.16.0.4),kube-proxy就是负责外部与内部的转发工作,在使用IPTABLES做转发的模式下,nat表中KUBE-SERVICES链负责该工作,后续详述该链内容,首先分析下如何将访问Service的流量导入KUBE-SERVICES链。
本机通过NodePort或者ClusterIP访问service,经过IPTABLES的主要表、链如下:
表 | 链 |
---|---|
NAT | OUTPUT |
FILTER | OUTPUT |
NAT | POSTROUTING |
外部通过NodePort访问service,经过IPTABLES的主要表、链如下:
表 | 链 |
---|---|
NAT | PREROUTING |
FILTER | FORWARD |
NAT | POSTROUTING |
分析:
以上两类访问方式流量会分别经过NAT的OUTPUT链和PREROUTING 链,所以可以在这两处将流量截获并转发至KUBE-SERVICES链。
验证:
NAT OUTPUT 链配置:
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
LOG all -- 0.0.0.0/0 0.0.0.0/0 LOG flags 0 level 4 prefix "** NAT OUTPUT **"
KUBE-SERVICES all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */
...
NAT PREROUTING 链配置:
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
LOG all -- 0.0.0.0/0 0.0.0.0/0 LOG flags 0 level 4 prefix "** NAT PREROUTING **"
KUBE-SERVICES all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */
...
(1)将访问ClusterIP(10.254.198.92:80)和NodePort的流量分成两类处理,以下两条规则分别匹配
ClusterIP和NodePort的流量。
Chain KUBE-SERVICES (2 references)
target prot opt source destination
...
KUBE-SVC-I64SNEMOLCWHJHS3 tcp -- 0.0.0.0/0 10.254.198.92 /* default/nginx-service-nodeport: cluster IP */ tcp dpt:80
KUBE-NODEPORTS all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL
(2)访问ClusterIP的流量进一步处理,最终实现分配给后端pods。
Chain KUBE-SVC-I64SNEMOLCWHJHS3 (2 references)
target prot opt source destination
KUBE-SEP-MMWJ6M2J72TU3J64 all -- 0.0.0.0/0 0.0.0.0/0 /* default/nginx-service-nodeport: */ statistic mode random probability 0.33332999982
KUBE-SEP-GRLEVIWNO4P37GSQ all -- 0.0.0.0/0 0.0.0.0/0 /* default/nginx-service-nodeport: */ statistic mode random probability 0.50000000000
KUBE-SEP-74XRUOWV76LDS3ID all -- 0.0.0.0/0 0.0.0.0/0 /* default/nginx-service-nodeport: */
分析:后端有3个pod,以上规则中通过random算法将流量分发,由随机数可以看出并不是平均分配,接下来进一步查看其中1个pod子链的规则。
Chain KUBE-SEP-MMWJ6M2J72TU3J64 (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 172.17.0.2 0.0.0.0/0 /* default/nginx-service-nodeport: */
DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 /* default/nginx-service-nodeport: */ tcp to:172.17.0.2:80
分析:通过DNAT规则可以看出,将流量转发到了POD(172.17.0.2:80)中,其他两条也是类似配置。
(3)访问NodePort的流量进一步处理,最终实现分配给后端pods。
Chain KUBE-NODEPORTS (1 references)
target prot opt source destination
KUBE-MARK-MASQ tcp -- 0.0.0.0/0 0.0.0.0/0 /* default/nginx-service-nodeport: */ tcp dpt:32110
KUBE-SVC-I64SNEMOLCWHJHS3 tcp -- 0.0.0.0/0 0.0.0.0/0 /* default/nginx-service-nodeport: */ tcp dpt:32110
分析:
第一条规则(KUBE-MARK-MASQ)是对流量进行了标记(MARK or 0x4000),返回后继续执行第二条规则。
第二条规则KUBE-SVC-I64SNEMOLCWHJHS3与上面分析的ClusterIP经过的链相同,即进一步分配给后端pod:
Chain KUBE-SVC-I64SNEMOLCWHJHS3 (2 references)
target prot opt source destination
KUBE-SEP-MMWJ6M2J72TU3J64 all -- 0.0.0.0/0 0.0.0.0/0 /* default/nginx-service-nodeport: */ statistic mode random probability 0.33332999982
KUBE-SEP-GRLEVIWNO4P37GSQ all -- 0.0.0.0/0 0.0.0.0/0 /* default/nginx-service-nodeport: */ statistic mode random probability 0.50000000000
KUBE-SEP-74XRUOWV76LDS3ID all -- 0.0.0.0/0 0.0.0.0/0 /* default/nginx-service-nodeport: */
验证通过查看IPTABLES日志分析流量路径及数据包变化,IPTABLES日志可以通过以下命令向特定链中添加:
[root@hjdevelop ~]# iptables -I KUBE-NODEPORTS -s 192.168.0.0/16 -j LOG --log-prefix "** NAT KUBE-NODEPORTS **" -t nat
curl 192.168.0.11:32110
Jan 21 17:03:39 hjdevelop kernel: ** NAT OUTPUT **IN= OUT=lo SRC=192.168.0.11 DST=192.168.0.11 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=32110 WINDOW=43690 RES=0x00 SYN URGP=0
Jan 21 17:03:39 hjdevelop kernel: ** NAT KUBE-SERVICES **IN= OUT=lo SRC=192.168.0.11 DST=192.168.0.11 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=32110 WINDOW=43690 RES=0x00 SYN URGP=0
Jan 21 17:03:39 hjdevelop kernel: ** NAT KUBE-NODEPORTS **IN= OUT=lo SRC=192.168.0.11 DST=192.168.0.11 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=32110 WINDOW=43690 RES=0x00 SYN URGP=0
Jan 21 17:03:39 hjdevelop kernel: ** NAT KUBE-MARK-MASQ **IN= OUT=lo SRC=192.168.0.11 DST=192.168.0.11 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=32110 WINDOW=43690 RES=0x00 SYN URGP=0
Jan 21 17:03:39 hjdevelop kernel: ** NAT KUBE-SVC **IN= OUT=lo SRC=192.168.0.11 DST=192.168.0.11 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=32110 WINDOW=43690 RES=0x00 SYN URGP=0 MARK=0x4000
Jan 21 17:03:39 hjdevelop kernel: ** NAT KUBE-SEP-MMWJ6M2J72TU3IN= OUT=lo SRC=192.168.0.11 DST=192.168.0.11 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=32110 WINDOW=43690 RES=0x00 SYN URGP=0 MARK=0x4000
Jan 21 17:03:39 hjdevelop kernel: ** Filter OUTPUT **IN= OUT=lo SRC=192.168.0.11 DST=172.17.0.2 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=80 WINDOW=43690 RES=0x00 SYN URGP=0 MARK=0x4000
Jan 21 17:03:39 hjdevelop kernel: ** NAT POSTROUTING **IN= OUT=docker0 SRC=192.168.0.11 DST=172.17.0.2 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=39079 DF PROTO=TCP SPT=48614 DPT=80 WINDOW=43690 RES=0x00 SYN URGP=0 MARK=0x4000
通过IPTABLES日志可以确认顺序为:
OUTPUT–>KUBE-SERVICES–>KUBE-NODEPORTS–>NAT KUBE-MARK-MASQ–>KUBE-SVC–>KUBE-SEP-MMWJ6M2J72TU3IN(该流程将目标IP修改为172.17.0.2)–>Filter OUTPUT–>NAT POSTROUTING
Jan 21 17:08:36 hjdevelop kernel: ** NAT OUTPUT **IN= OUT=eth0 SRC=192.168.0.11 DST=10.254.198.92 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23348 DF PROTO=TCP SPT=43130 DPT=80 WINDOW=28200 RES=0x00 SYN URGP=0
Jan 21 17:08:36 hjdevelop kernel: ** NAT KUBE-SERVICES **IN= OUT=eth0 SRC=192.168.0.11 DST=10.254.198.92 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23348 DF PROTO=TCP SPT=43130 DPT=80 WINDOW=28200 RES=0x00 SYN URGP=0
Jan 21 17:08:36 hjdevelop kernel: ** NAT KUBE-SVC **IN= OUT=eth0 SRC=192.168.0.11 DST=10.254.198.92 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23348 DF PROTO=TCP SPT=43130 DPT=80 WINDOW=28200 RES=0x00 SYN URGP=0
Jan 21 17:08:36 hjdevelop kernel: ** NAT KUBE-SEP-74XRUOWV76LDSIN= OUT=eth0 SRC=192.168.0.11 DST=10.254.198.92 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23348 DF PROTO=TCP SPT=43130 DPT=80 WINDOW=28200 RES=0x00 SYN URGP=0
Jan 21 17:08:36 hjdevelop kernel: ** Filter OUTPUT **IN= OUT=eth0 SRC=192.168.0.11 DST=172.17.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23348 DF PROTO=TCP SPT=43130 DPT=80 WINDOW=28200 RES=0x00 SYN URGP=0
Jan 21 17:08:36 hjdevelop kernel: ** NAT POSTROUTING **IN= OUT=docker0 SRC=192.168.0.11 DST=172.17.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23348 DF PROTO=TCP SPT=43130 DPT=80 WINDOW=28200 RES=0x00 SYN URGP=0
通过IPTABLES日志可以确认顺序为:
OUTPUT–>KUBE-SERVICES–>KUBE-SVC–>KUBE-SEP-74XRUOWV76LDSIN(该流程将目标IP修改为172.17.0.4)–>Filter OUTPUT–>NAT POSTROUTING
Jan 21 17:12:34 hjdevelop kernel: ** NAT PREROUTING **IN=eth0 OUT= MAC=fa:16:3e:32:cd:fb:fa:16:3e:78:fa:aa:08:00 SRC=115.236.50.21 DST=192.168.0.11 LEN=52 TOS=0x00 PREC=0x00 TTL=116 ID=21602 DF PROTO=TCP SPT=60079 DPT=32110 WINDOW=65535 RES=0x00 SYN URGP=0
Jan 21 17:12:34 hjdevelop kernel: ** NAT KUBE-SERVICES **IN=eth0 OUT= MAC=fa:16:3e:32:cd:fb:fa:16:3e:78:fa:aa:08:00 SRC=115.236.50.21 DST=192.168.0.11 LEN=52 TOS=0x00 PREC=0x00 TTL=116 ID=21602 DF PROTO=TCP SPT=60079 DPT=32110 WINDOW=65535 RES=0x00 SYN URGP=0
Jan 21 17:12:34 hjdevelop kernel: ** NAT KUBE-NODEPORTS **IN=eth0 OUT= MAC=fa:16:3e:32:cd:fb:fa:16:3e:78:fa:aa:08:00 SRC=115.236.50.21 DST=192.168.0.11 LEN=52 TOS=0x00 PREC=0x00 TTL=116 ID=21602 DF PROTO=TCP SPT=60079 DPT=32110 WINDOW=65535 RES=0x00 SYN URGP=0
Jan 21 17:12:34 hjdevelop kernel: ** NAT KUBE-MARK-MASQ **IN=eth0 OUT= MAC=fa:16:3e:32:cd:fb:fa:16:3e:78:fa:aa:08:00 SRC=115.236.50.21 DST=192.168.0.11 LEN=52 TOS=0x00 PREC=0x00 TTL=116 ID=21602 DF PROTO=TCP SPT=60079 DPT=32110 WINDOW=65535 RES=0x00 SYN URGP=0
Jan 21 17:12:34 hjdevelop kernel: ** NAT KUBE-SVC **IN=eth0 OUT= MAC=fa:16:3e:32:cd:fb:fa:16:3e:78:fa:aa:08:00 SRC=115.236.50.21 DST=192.168.0.11 LEN=52 TOS=0x00 PREC=0x00 TTL=116 ID=21602 DF PROTO=TCP SPT=60079 DPT=32110 WINDOW=65535 RES=0x00 SYN URGP=0 MARK=0x4000
Jan 21 17:12:34 hjdevelop kernel: ** NAT KUBE-SEP-74XRUOWV76LDSIN=eth0 OUT= MAC=fa:16:3e:32:cd:fb:fa:16:3e:78:fa:aa:08:00 SRC=115.236.50.21 DST=192.168.0.11 LEN=52 TOS=0x00 PREC=0x00 TTL=116 ID=21602 DF PROTO=TCP SPT=60079 DPT=32110 WINDOW=65535 RES=0x00 SYN URGP=0 MARK=0x4000
Jan 21 17:12:34 hjdevelop kernel: ** FILTER FORWARD **IN=eth0 OUT=docker0 MAC=fa:16:3e:32:cd:fb:fa:16:3e:78:fa:aa:08:00 SRC=115.236.50.21 DST=172.17.0.4 LEN=52 TOS=0x00 PREC=0x00 TTL=115 ID=21602 DF PROTO=TCP SPT=60079 DPT=80 WINDOW=65535 RES=0x00 SYN URGP=0 MARK=0x4000
通过IPTABLES日志可以确认顺序为:
PREROUTING–>KUBE-SERVICES–>KUBE-NODEPORTS–>KUBE-MARK-MASQ–>KUBE-SVC–>KUBE-SEP-74XRUOWV76LDSIN(该流程将目标IP修改为172.17.0.4)–>Filter FORWARD