Reference
cmd目录下的proxy.go
利用Cobra构建CLI接口,并对proxy server进行调用。
kubernetes/cmd/kube-proxy/proxy.go
func main() {
rand.Seed(time.Now().UnixNano())
// 核心调用
command := app.NewProxyCommand()
// TODO: once we switch everything over to Cobra commands, we can go back to calling
// utilflag.InitFlags() (by removing its pflag.Parse() call). For now, we have to set the
// normalize func and add the go flag set by hand.
pflag.CommandLine.SetNormalizeFunc(cliflag.WordSepNormalizeFunc)
pflag.CommandLine.AddGoFlagSet(goflag.CommandLine)
// utilflag.InitFlags()
logs.InitLogs()
defer logs.FlushLogs()
if err := command.Execute(); err != nil {
os.Exit(1)
}
}
kubernetes/cmd/kube-proxy/app/server.go
func NewProxyCommand() *cobra.Command {
.....
cmd := &cobra.Command{
......
Run:func(cmd *cobra.Command, args []string) {
......
opts.Run()
}
}
.....
opts.config, err = opts.ApplyDefaults(opts.config)
.....
}
ApplyDefaults中会配置默认的参数。/pkg/proxy/apis中的init方法内,会向SchemeBuilder注册添加默认参数的方法。这里面对于Porxy会注册SetObjectDefaults_KubeProxyConfiguration,最后调用了SetDefaults_KubeProxyConfiguration。其中IPTables.SyncPeriod.Duration,默认值为30S(该值为失败后的尝试retry间隔,可以使用–iptables-sync-period配置,其指定了iptables重刷新的最大间隔)
kubernetes/cmd/kube-proxy/app/server.go
func (o *Options) Run() error {
defer close(o.errCh)
if len(o.WriteConfigTo) > 0 {
return o.writeConfigFile()
}
proxyServer, err := NewProxyServer(o)
if err != nil {
return err
}
if o.CleanupAndExit {
return proxyServer.CleanupAndExit()
}
o.proxyServer = proxyServer
return o.runLoop()
}
注意:这里BounedFrequencyRunner的同步方法为proxier.syncProxyRules,最小时间间隔为minSyncPeriod(可以使用参数--iptables-min-sync-period配置,未配置时,应该就是0),最大时间间隔为1小时,突发数为2,失败重试间隔为proxier.syncPeriod(默认为30秒,可以用参数--iptables-sync-period配置,其指定了iptables重刷新的最大间隔,不允许为0)
opts.Run()->
o.runLoop()->
o.proxyServer.Run()->
s.Proxier.SyncLoop()
在构建ProxyCommand时,会在其Run方法内调用该方法
在NewProxier时,会为其构建syncRunner,其绑定的处理函数为syncProxyRules。同时,调用ipt.Monitor(Monitor通过创建canary chain和轮询来检测给定的iptables表是否已被外部工具(例如,重新加载防火墙)刷新。 (具体来说,它每隔一段时间轮询[0]表,直到canary chain从表中删除,然后再等待一段额外的时间,以便从其余的表中也删除canary chain。 您可以通过在表[0]中列出一个相对空的表来优化轮询。 当检测到刷新时,将调用reloadFunc,以便调用者可以重新加载自己的iptables规则。 如果它无法创建检测链(无论是最初还是重新加载后),它将记录一个错误并停止监视。 (这个函数应该从goroutine中调用。))
service Add\Update\Delete事件,会记录service变化,加入到proxier的serviceChanges中,并会调用proxier.Sync()方法
Endpoints Add\Update\Delete事件,会记录endpoints变化,加入到proxier的endpointsChanges中,会调用proxier.Sync()方法
Nodes的Add\Update\Delete事件, 会调用proxier.syncProxyRules()
service/endpoints/node事件,最后都会导致syncProxyRules函数的调用
# iptables-save -t filter 查询得到的结果中间省略了部分
*filter
:INPUT ACCEPT [272947:113632328]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [269953:157912430]
:KUBE-EXTERNAL-SERVICES - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-FORWARD - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-PROXY-CANARY - [0:0]
:KUBE-SERVICES - [0:0]
-A INPUT -j KUBE-FIREWALL
masqRule := []string{
"-A", string(kubePostroutingChain),
"-m", "comment", "--comment", `"kubernetes service traffic requiring SNAT"`,
"-m", "mark", "--mark", proxier.masqueradeMark,
"-j", "MASQUERADE",
}
# 其中:
# kubePostroutingChain:KUBE-POSTROUTING
# masqueradeMark:由--iptables-masquerade-bit决定,
masqueradeValue = 1 << uint(14)
masqueradeMark = fmt.Printf("%#08x/%#08x", masqueradeValue, masqueradeValue)
--iptables-masquerade-bit默认为14,这则masqueradeMark默认为0x4000/0x4000
# iptables nat表中对应的规则
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
# 可以通过该命令查看
iptables -t nat -L KUBE-POSTROUTING
writeLine(proxier.natRules, []string{
"-A", string(KubeMarkMasqChain),
"-j", "MARK", "--set-xmark", proxier.masqueradeMark,
}...)
其中:
KubeMarkMasqChain:KUBE-MARK-MASQ
proxier.masqueradeMark:默认为0x4000/0x4000
# 可以通过该命令查询
iptables -t nat -L KUBE-MARK-MASQ
kube-proxy在开始下发svc\ep相关规则前,会现往filter表追加KUBE-SERVICES、KUBE-EXTERNAL-SERVICES、KUBE-FORWARD链跳转规则,会往nat表追加入KUBE-SERVICES、KUBE-NODEPORTS、KUBE-POSTROUTING、KUBE-MARK-MASQ链跳转规则。并为KUBE-POSTROUTING及KUBE-MARK-MASQ链添加默认规则。
# KUBE-POSTROUTING
# Add Rule
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
# Show Rule
iptables -t nat -L KUBE-POSTROUTING
# Show Result
Chain KUBE-POSTROUTING (1 references)
target prot opt source destination
MASQUERADE all -- anywhere anywhere /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000
-----------------------------------------------------------------------------------
# KUBE-MARK-MASQ
# Add Rule
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
# Show Rule
iptables -t nat -L KUBE-MARK-MASQ
# Show Result
Chain KUBE-MARK-MASQ (63 references)
target prot opt source destination
MARK all -- anywhere anywhere MARK or 0x4000
NAT表的PREROUTING链中加入了,KUBE-SERVICES链跳转规则。对于所有的数据包,跳转入KUBE-SERVICES链处理
# Add Rule
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
# Show Rule
iptables -t nat -L PREROUTING
# Show Result
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */
数据包经过NAT表PREROUTING链中规则匹配,进入KUBE-SERVICE链处理
kube-proxy会遍历每个svcInfo,根据clusterIP、启动参数–cluster-cidr及–masquerade-all来配置, jump到KUBE-MARK-MASQ链的匹配规则。
# Add Rule
-A KUBE-SERVICES -d 10.68.81.30/32 -p tcp -m comment --comment "demo/s-test-mq:tcp-5672 cluster IP" -m tcp --dport 5672 -j KUBE-MARK-MASQ
# Add Rule
-A KUBE-SERVICES ! -s 172.20.0.0/16 -d 10.68.81.30/32 -p tcp -m comment --comment "demo/s-test-mq:tcp-5672 cluster IP" -m tcp --dport 5672 -j KUBE-MARK-MASQ
KUBE-MARK-MASQ链内规则会为进入链的所有数据包打上0x4000/0x4000标签(进行完此处理动作后,将会继续比对其它规则)
# Show Rule
iptables -t nat -L KUBE-MARK-MASQ
# Show Result
Chain KUBE-MARK-MASQ (63 references)
target prot opt source destination
MARK all -- anywhere anywhere MARK or 0x4000
kube-proxy会在KUBE-SERVICES中为每个具有clusterIP的 svcPort构建KUBE-SVC-HASH跳转规则,将访问svcPort的数据包,导入到KUBE-SVC-HASH链中
# Add Rule
-A KUBE-SERVICES -d 10.68.249.16/32 -p tcp -m comment --comment "demo/netutil-2:tcp-8081 cluster IP" -m tcp --dport 8081 -j KUBE-SVC-QUZXUNUIPD3MZETI
# Show Rule
KUBE-SVC-QUZXUNUIPD3MZETI tcp -- anywhere 10.68.249.16 /* demo/netutil-2:tcp-8081 cluster IP */ tcp dpt:tproxy
对于External IP,只有当前物理机上设备有这个地址时,kube-proxy才会下发规则.kube-proxy首先在本地使用External IP及svcPort、Protocol打开一个端口,然后下发规则。
kube-proxy会在KUBE-SERVICES内,为每个External IP添加KUBE-MARK-MASQ跳转规则,对于目的地址为External IP的数据包丢入KUBE-MARK-MASQ打mark
# 查询
iptables-save -t nat | grep external
# Add rule
-A KUBE-SERVICES -d 公网IP地址 /32 -p tcp -m comment --comment "demo/hk-nginx-hello:tcp-80 external IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
# 查询
iptables -t nat -L KUBE-SERVICES | grep "external IP" | grep MARK
# 结果
KUBE-MARK-MASQ tcp -- anywhere 地址这里显示的主机名 /* demo/hk-nginx-hello:tcp-80 external IP */ tcp dpt:http
kube-proxy会在KUBE-SERVICES内,为每个External IP添加KUBE-SVC-HASH跳转规则
# Add rule
-A KUBE-SERVICES -d 公网IP地址 /32 -p tcp -m comment --comment "demo/hk-nginx-hello:tcp-80 external IP" -m tcp --dport 80 -m physdev ! --physdev-is-in -m addrtype ! --src-type LOCAL -j KUBE-SVC-UEOQSLEZ4LUM4H7G
-A KUBE-SERVICES -d 公网IP地址 /32 -p tcp -m comment --comment "demo/hk-nginx-hello:tcp-80 external IP" -m tcp --dport 80 -m addrtype --dst-type LOCAL -j KUBE-SVC-UEOQSLEZ4LUM4H7G
# show rule
iptables -t nat -L KUBE-SERVICES | grep "external IP" | grep -v MARK
# 非本地请求,转发到svc
KUBE-SVC-UEOQSLEZ4LUM4H7G tcp -- anywhere 地址这里显示的主机名 /* demo/hk-nginx-hello:tcp-80 external IP */ tcp dpt:http PHYSDEV match ! --physdev-is-in ADDRTYPE match src-type !LOCAL
# 本地请求转发到svc
KUBE-SVC-UEOQSLEZ4LUM4H7G tcp -- anywhere 地址这里显示的主机名 /* demo/hk-nginx-hello:tcp-80 external IP */ tcp dpt:http ADDRTYPE match dst-type LOCAL
kube-proxy会为每个具有LB IP的svcPort构建KUBE-FW-HASH跳转规则,将访问LBIP:svcPort(协议+端口)的数据包,导入到KUBE-FW-HASH链中
# Add Rule
-A KUBE-SERVICES -d 公网IP地址 /32 -p tcp -m comment --comment "demo/netutil-2:tcp-8081 loadbalancer IP" -m tcp --dport 8081 -j KUBE-FW-QUZXUNUIPD3MZETI
# Show Rule
KUBE-FW-QUZXUNUIPD3MZETI tcp -- anywhere 公网IP地址 /* demo/netutil-2:tcp-8081 loadbalancer IP */ tcp dpt:tproxy
对于进入KUBE-FW-HASH链的所有数据包进入KUBE-MARK-MASQ,打上0x4000/0x4000标记
打完MASQ的数据包,进入链表KUBE-SVC-QUZXUNUIPD3MZETI处理(如果这个时候有后端,会把数据包按概率丢给链KUBE-SVC-QUZXUNUIPD3MZETI-HASH处理)
# Show Rule
iptables -t nat -L KUBE-FW-QUZXUNUIPD3MZETI
# Show Rule
Chain KUBE-FW-QUZXUNUIPD3MZETI (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- anywhere anywhere /* demo/netutil-2:tcp-8081 loadbalancer IP */
KUBE-SVC-QUZXUNUIPD3MZETI all -- anywhere anywhere /* demo/netutil-2:tcp-8081 loadbalancer IP */
KUBE-MARK-DROP all -- anywhere anywhere /* demo/netutil-2:tcp-8081 loadbalancer IP */
对于没有endpoint的数据包,不会在KUBE-SVC-HASH中匹配,进入KUBE-MARK-DROP链表处理,会打上0x8000/0x8000标记(这些数据包会在filter被过滤丢弃)
kube-proxy会在KUBE-SERVICES链中构建KUBE-NODEPORTS链,将不匹配ClusterIP、LB规则的数据包导入到KUBE-NODEPORTS链中处理
iptables -t nat -L KUBE-SERVICES| grep KUBE-NODEPORTS
kube-proxy在KUBE-NODEPORTS中添加规则,匹配目标端口是否为NodePort,如果是,将数据包导入KUBE-NODEPORTS,打上MASQ标记
kub-proxy在KUBE-NODEPORTS中添加规则,匹配目标端口是否为NodePort,如果是将数据包导入到KUBE-SVC-HASH链处理
iptables -t nat -L KUBE-NODEPORTS
kube-proxy会在KUBE-SVC-HASH链内为svcPort的每个endpoint构建基于概率的 KUBE-SEP-HASH跳转规则,并为每个endpoint构建KUBE-SEP-HASH规则
# Add Rule
-A KUBE-SVC-QUZXUNUIPD3MZETI -m statistic --mode random --probability 0.33333333349 -j KUBE-SEP-AMWXJCMBQ4RG26Q5
-A KUBE-SVC-QUZXUNUIPD3MZETI -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-2OQUDCMHKG5JUGZU
-A KUBE-SVC-QUZXUNUIPD3MZETI -j KUBE-SEP-Z2UJWNX76VNMX7FW
# Rule show
iptables -t nat -L KUBE-SVC-QUZXUNUIPD3MZETI
# Show Result
Chain KUBE-SVC-QUZXUNUIPD3MZETI (2 references)
target prot opt source destination
KUBE-SEP-AMWXJCMBQ4RG26Q5 all -- anywhere anywhere statistic mode random probability 0.33333333349
KUBE-SEP-2OQUDCMHKG5JUGZU all -- anywhere anywhere statistic mode random probability 0.50000000000
KUBE-SEP-Z2UJWNX76VNMX7FW all -- anywhere anywhere
# Add Rule
-A KUBE-SEP-AMWXJCMBQ4RG26Q5 -s 172.20.4.10/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-AMWXJCMBQ4RG26Q5 -p tcp -m tcp -j DNAT --to-destination 172.20.4.10:8091
# Show Rule
iptables -t nat -L KUBE-SEP-AMWXJCMBQ4RG26Q5
# Show Result
Chain KUBE-SEP-AMWXJCMBQ4RG26Q5 (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 172.20.4.10 anywhere
DNAT tcp -- anywhere anywhere tcp to:172.20.4.10:8091
FILTER 表的INPUT链中配置了KUBE-SERVICES, KUBE-EXTERNAL-SERVICES 跳转规则(KUBE-SERVICES未命中,再匹配KUBE-EXTERNAL-SERVICES)。
iptables -t filter -L INPUT
拒绝目的地址为指定IP的数据包(这些IP对应的Service没有endpoints,这里的目的IP可能是Cluter IP、LB IP)
# Show Rules
iptables -t filter -L KUBE-SERVICES
# Show Result
Chain KUBE-SERVICES (3 references)
target prot opt source destination
REJECT tcp -- anywhere 10.68.236.66 /* default/myservice: has no endpoints */ tcp dpt:ssh reject-with icmp-port-unreachable
REJECT tcp -- anywhere 10.68.121.211 /* demo/test-lb:tcp-80 has no endpoints */ tcp dpt:http reject-with icmp-port-unreachable
REJECT tcp -- anywhere 公网IP地址 /* demo/test-lb:tcp-80 has no endpoints */ tcp dpt:http reject-with icmp-port-unreachable
REJECT tcp -- anywhere 10.68.220.136 /* demo/test3:tcp-80 has no endpoints */ tcp dpt:http reject-with icmp-port-unreachable
REJECT tcp -- anywhere 公网IP地址 /* demo/test3:tcp-80 has no endpoints */ tcp dpt:http reject-with icmp-port-unreachable
拒绝目的地址为指定External IP的数据包(这些IP对应的Service没有endpoints,这里的IP可能是External IP、Node IP)
iptables -t filter -L KUBE-EXTERNAL-SERVICES
FILTER 表的FORWARD链中配置了KUBE-FORWARD, KUBE-SERVICES 跳转规则(KUBE-FORWARD未命中,再匹配KUBE-SERVICES)。
iptables -t filter -L FORWARD
# Show Rules
iptables -t filter -L KUBE-FORWARD
# Show Result
Chain KUBE-FORWARD (1 references)
target prot opt source destination
DROP all -- anywhere anywhere ctstate INVALID
ACCEPT all -- anywhere anywhere /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT all -- 172.20.0.0/16 anywhere /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
ACCEPT all -- anywhere 172.20.0.0/16 /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHE
拒绝目的地址为指定IP的数据包(这些IP对应的Service没有endpoints,这里的目的IP可能是Cluter IP、LB IP)
NAT 表及Filter表中添加了KUBE-SERVICE链表跳转规则。数据包先过NAT链,再过Filter链。
所有数据包进入NAT表的KUBE-SERVICES链处理,KUBE-SERVICES链处理流程和6.1从NAT 表PreRouting 链进入NAT表一致。
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */
数据包进入Filter表的KUBE-SERVICES,KUBE-SERVICES链处理流程和6.2.1 Filter INPUT进入KUBE-SERVICES一致。
kube-proxy在NAT 表的POSTROUTING链中加入了KUBE-POSTROUTING跳转链表规则
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
KUBE-POSTROUTING 链表中对于打了0x4000/0x4000标签的数据包,进行MASQUERADE,进行SNAT。
# Add Rule
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
# Show Rule
iptables -t nat -L KUBE-POSTROUTING
# Show Result
Chain KUBE-POSTROUTING (1 references)
target prot opt source destination
MASQUERADE all -- anywhere anywhere /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000
入包处理整体流程(进入主机)
出包处理整体流程(出主机)