IPtables-Mode Proxier
author:XiaoYang
概述
kube-Proxy提供三种模式(userspace/iptables/ipvs)的proxier实现,userspace是早期的proxy模式,ipvs模式处于实验性阶段proxy模式,本文先从默认的内核级iptables proxier代码实现与逻辑分析开始,其它模式将用专文解析源码。
Iptables-mode Proxier的service配置和代码内都包含一些基础概念如clusterIP、nodeport、loadbalancer、Ingress、ClusterCIDR、onlyLocal、ExternalIP等,请在了解源码之前先熟悉其概念用途场景与类型区别,再看源码将对你理解proxy事半功倍。当然也需要对netfilter、iptables、connTrack等proxy基础依赖的工具熟悉。基础概念部分在本文将不深入介绍,有需求可自行查阅相关资料。
从kube-proxy组件整体框架层的代码来看,在ProxyServer.Run()最后走到了s.Proxier.SyncLoop()执行空间一直无限loop下去。而默认的ProxyServer配置的Proxier对象就是Iptables(if proxyMode == proxyModeIPTables),将调用iptabls-mode的Proxier.SyncLoop(),SyncLoop()时间定时循环执行syncProxyRules()完成services、endpoints与iptables规则的同步操作。
Iptables-mode proxier的负载均衡机制是通过底层netfliter/iptables规则来实现的,通过Informer机制watch服务与端点信息的变更事件触发对iptables的规则的同步更新,如下代码逻辑示意图:
下面proxier源码分析,我们先从proxier的接口、实现类、实现类方法列表一窥究竟,从结构上看整体Proxier的框架。然后我们再详细分析proxier对象的产生时所定义的属性值、值类型和用途。有了前面的两项的了解后我们再来分析proxier类方法的实现,也就是proxier代理逻辑部分(关键逻辑部分在syncProxyRules()方法分析部分)。最后我们分析proxier底层内核iptables的runner实现,也就是proxy上层逻辑层最终会调用iptables命令去执行规则的操作部分。
Proxier 数据结构与类定义
ProxyProvider 代理提供者接口定义,需要实现两个proxy的关键方法Sync()和SyncLoop()
!FILENAME pkg/proxy/types.go:27
type ProxyProvider interface {
// Sync 即时同步Proxy提供者的当前状态至proxy规则
Sync()
// SyncLoop 周期性运行
// 作为一个线程或应用主loop运行,无返回.
SyncLoop()
}
Iptables-mode Proxier 为 ProxyProvider 接口实现类,proxier 类属性项比较多,我们先看一下注释用途与结构定义,在实例化proxier对象时我们再详看。
!FILENAME pkg/proxy/iptables/proxier.go:205
type Proxier struct {
endpointsChanges *proxy.EndpointChangeTracker // 端点更新信息跟踪器
serviceChanges *proxy.ServiceChangeTracker // 服务更新信息跟踪器
mu sync.Mutex // 保护同步锁
serviceMap proxy.ServiceMap // 存放服务列表信息 ①
endpointsMap proxy.EndpointsMap // 存放端点列表信息 ②
portsMap map[utilproxy.LocalPort]utilproxy.Closeable //端口关闭接口map
endpointsSynced bool // ep同步状态
servicesSynced bool // svc同步状态
initialized int32 // 初始化状态
syncRunner *async.BoundedFrequencyRunner // 指定频率运行器,此处用于管理对
// syncProxyRules的调用
iptables utiliptables.Interface // iptables命令执行接口
masqueradeAll bool
masqueradeMark string // SNAT地址伪装Mark
exec utilexec.Interface // exec命令执行工具接口
clusterCIDR string // 集群CIDR
hostname string // 主机名
nodeIP net.IP // 节点IP地址
portMapper utilproxy.PortOpener // TCP/UTP端口打开与监听
recorder record.EventRecorder // 事件记录器
healthChecker healthcheck.Server // healthcheck服务器对象
healthzServer healthcheck.HealthzUpdater // Healthz更新器
precomputedProbabilities []string //预计算可能性
//iptables规则与链数据(Filter/NAT)
iptablesData *bytes.Buffer
existingFilterChainsData *bytes.Buffer
filterChains *bytes.Buffer
filterRules *bytes.Buffer
natChains *bytes.Buffer
natRules *bytes.Buffer
endpointChainsNumber int
// Node节点IP与端口信息
nodePortAddresses []string
networkInterfacer utilproxy.NetworkInterfacer //网络接口
}
① ServiceMap和ServicePort定义
!FILENAME pkg/proxy/service.go:229
type ServiceMap map[ServicePortName]ServicePort
//String() => "NS/SvcName:PortName"
ServicePortName{NamespacedName: svcName, Port: servicePort.Name}
ServiceSpec service.spec定义,在用户前端可定义service的spec配置项。
!FILENAME vendor/k8s.io/api/core/v1/types.go:3606
type ServiceSpec struct {
Ports []ServicePort //服务端口列表
Selector map[string]string //选择器
ClusterIP string //VIP 、 portal
Type ServiceType //服务类型
ExternalIPs []string //外部IP列表,如外部负载均衡
SessionAffinity ServiceAffinity //会话保持
LoadBalancerIP string //service类型为"LoadBalancer"时配置LB ip
LoadBalancerSourceRanges []string //cloud-provider的限制client ip区间
ExternalName string
ExternalTrafficPolicy ServiceExternalTrafficPolicyType
HealthCheckNodePort int32
PublishNotReadyAddresses bool
SessionAffinityConfig *SessionAffinityConfig //会话保持配置信息
}
ServicePort类定义和ServicePort接口
!FILENAME vendor/k8s.io/api/core/v1/types.go:3563
type ServicePort struct {
Name string
Protocol Protocol
Port int32
TargetPort intstr.IntOrString
NodePort int32
}
//ServicePort接口
type ServicePort interface {
// 返回服务字串,格式如: `IP:Port/Protocol`.
String() string
// 返回集群IP字串
ClusterIPString() string
// 返回协议
GetProtocol() v1.Protocol
// 返回健康检测端口
GetHealthCheckNodePort() int
}
② EndpointsMap定义与Endpoint接口
!FILENAME pkg/proxy/endpoints.go:181
type EndpointsMap map[ServicePortName][]Endpoint
type Endpoint interface {
// 返回endpoint字串,格式 `IP:Port`.
String() string
// 是否本地
GetIsLocal() bool
// 返回IP
IP() string
// 返回端口
Port() (int, error)
// 检测两上endpoint是否相等
Equal(Endpoint) bool
}
Endpoints结构与相关定义
!FILENAME vendor/k8s.io/api/core/v1/types.go:3710
type Endpoints struct {
metav1.TypeMeta
metav1.ObjectMeta
Subsets []EndpointSubset
}
type EndpointSubset struct {
Addresses []EndpointAddress // EndpointAddress地址列表
NotReadyAddresses []EndpointAddress
Ports []EndpointPort // EndpointPort端口列表
}
type EndpointAddress struct {
IP string
Hostname string
NodeName *string
TargetRef *ObjectReference
}
type EndpointPort struct {
Name string
Port int32
Protocol Protocol
}
Iptables-mode Proxier提供的方法列表,先大概从名称上来了解一下方法用途,后面我在逻辑部分对主要使用的方法再深入分析。
func (proxier *Proxier) precomputeProbabilities(numberOfPrecomputed int) {/*...*/}
func (proxier *Proxier) probability(n int) string{/*...*/}
func (proxier *Proxier) Sync(){/*...*/}
func (proxier *Proxier) SyncLoop(){/*...*/}
func (proxier *Proxier) setInitialized(value bool){/*...*/}
func (proxier *Proxier) isInitialized() bool{/*...*/}
func (proxier *Proxier) OnServiceAdd(service *v1.Service){/*...*/}
func (proxier *Proxier) OnServiceUpdate(oldService, service *v1.Service){/*...*/}
func (proxier *Proxier) OnServiceDelete(service *v1.Service){/*...*/}
func (proxier *Proxier) OnServiceSynced(){/*...*/}
func (proxier *Proxier) OnEndpointsAdd(endpoints *v1.Endpoints){/*...*/}
func (proxier *Proxier) OnEndpointsUpdate(oldEndpoints, endpoints *v1.Endpoints){/*...*/}
func (proxier *Proxier) OnEndpointsDelete(endpoints *v1.Endpoints) {/*...*/}
func (proxier *Proxier) OnEndpointsSynced() {/*...*/}
func (proxier *Proxier) deleteEndpointConnections(connectionMap []proxy.ServiceEndpoint){/*...*/}
func (proxier *Proxier) appendServiceCommentLocked(args []string, svcName string){/*...*/}
func (proxier *Proxier) syncProxyRules(){/*...*/}
Proxier对象生成与运行
iptables Proxier 构建New
方法,下面省略部分校验的代码,关注关键构造部分。
!FILENAME pkg/proxy/iptables/proxier.go:281
func NewProxier(ipt utiliptables.Interface,
sysctl utilsysctl.Interface,
exec utilexec.Interface,
syncPeriod time.Duration,
minSyncPeriod time.Duration,
masqueradeAll bool,
masqueradeBit int,
clusterCIDR string,
hostname string,
nodeIP net.IP,
recorder record.EventRecorder,
healthzServer healthcheck.HealthzUpdater,
nodePortAddresses []string,
) (*Proxier, error) {
// ...以下为省略部分解析...
// sysctl对"net/ipv4/conf/all/route_localnet"设置 ,内核iptables支持
// sysctl对"net/bridge/bridge-nf-call-iptables"设置,内核iptables支持
// 生产SNAT的IP伪装mark
// 如果节点IP为空,则kube-proxy的nodeIP的初始IP为127.0.0.1
// 集群CIDR检验是否为空,IPv6验证
// ...
//healthcheck服务器对象
healthChecker := healthcheck.NewServer(hostname, recorder, nil, nil)
//proxier对象
proxier := &Proxier{
portsMap: make(map[utilproxy.LocalPort]utilproxy.Closeable),
serviceMap: make(proxy.ServiceMap), //svc存放map
serviceChanges: proxy.NewServiceChangeTracker(newServiceInfo, &isIPv6, recorder), //svc变化跟踪器
endpointsMap: make(proxy.EndpointsMap), //ep存放map
endpointsChanges: proxy.NewEndpointChangeTracker(hostname, newEndpointInfo, &isIPv6, recorder), //ep变化跟踪器
iptables: ipt,
masqueradeAll: masqueradeAll,
masqueradeMark: masqueradeMark,
exec: exec,
clusterCIDR: clusterCIDR,
hostname: hostname,
nodeIP: nodeIP,
portMapper: &listenPortOpener{}, //服务端口监听器
recorder: recorder,
healthChecker: healthChecker,
healthzServer: healthzServer,
precomputedProbabilities: make([]string, 0, 1001),
iptablesData: bytes.NewBuffer(nil), //iptables配置规则数据
existingFilterChainsData: bytes.NewBuffer(nil),
filterChains: bytes.NewBuffer(nil),
filterRules: bytes.NewBuffer(nil),
natChains: bytes.NewBuffer(nil),
natRules: bytes.NewBuffer(nil),
nodePortAddresses: nodePortAddresses,
networkInterfacer: utilproxy.RealNetwork{}, //网络接口
}
burstSyncs := 2
klog.V(3).Infof("minSyncPeriod: %v, syncPeriod: %v, burstSyncs: %d", minSyncPeriod, syncPeriod, burstSyncs)
//运行器执行指定频率对syncProxyRules调用来同步规则,关键的逻辑则在syncProxyRules()方法内,后面有详述方法
proxier.syncRunner = async.NewBoundedFrequencyRunner("sync-runner", proxier.syncProxyRules, minSyncPeriod, syncPeriod, burstSyncs)
return proxier, nil
}
proxier.SyncLoop() 我们知道proxy server的运行时最后的调用就是"s.Proxier.SyncLoop()",此处我们来详细了解一下SyncLoop的proxier运行
实现。
!FILENAME pkg/proxy/iptables/proxier.go:487
func (proxier *Proxier) SyncLoop() {
if proxier.healthzServer != nil {
proxier.healthzServer.UpdateTimestamp()
}
// proxier.syncRunner在proxier对象创建时指定为async.NewBoundedFrequencyRunner(...) ①
proxier.syncRunner.Loop(wait.NeverStop)
}
① async.BoundedFrequencyRunner时间器循环func执行器。
proxier类结构内定义的proxier.syncRunner 类型为async.BoundedFrequencyRunner
!FILENAME pkg/util/async/bounded_frequency_runner.go:31
type BoundedFrequencyRunner struct {
name string
minInterval time.Duration // 两次运行的最小间隔时间
maxInterval time.Duration // 两次运行的最大间隔时间
run chan struct{} // 执行一次run
mu sync.Mutex
fn func() // 需要运行的func
lastRun time.Time // 最近一次运行时间
timer timer // 定时器
limiter rateLimiter // 按需限制运行的QPS
}
BoundedFrequencyRunner实例化构建,通过传参按需来控制对func的调用。
!FILENAME pkg/util/async/bounded_frequency_runner.go:134
func NewBoundedFrequencyRunner(name string, fn func(), minInterval, maxInterval time.Duration, burstRuns int) *BoundedFrequencyRunner {
timer := realTimer{Timer: time.NewTimer(0)} // 立即tick
<-timer.C() // 消费第一次tick,完成一执行run
return construct(name, fn, minInterval, maxInterval, burstRuns, timer) // ->
}
//实例构建
func construct(name string, fn func(), minInterval, maxInterval time.Duration, burstRuns int, timer timer) *BoundedFrequencyRunner {
if maxInterval < minInterval {
panic(fmt.Sprintf("%s: maxInterval (%v) must be >= minInterval (%v)", name, minInterval, maxInterval))
}
if timer == nil {
panic(fmt.Sprintf("%s: timer must be non-nil", name))
}
bfr := &BoundedFrequencyRunner{
name: name,
fn: fn, //被调用处理的func
minInterval: minInterval,
maxInterval: maxInterval,
run: make(chan struct{}, 1),
timer: timer,
}
if minInterval == 0 { //最小间隔时间如果不指定,将不受限制
bfr.limiter = nullLimiter{}
} else {
// TokenBucketRateLimiter的实现流控机制,有兴趣可以再深入了解机制,此处不展开
qps := float32(time.Second) / float32(minInterval)
bfr.limiter = flowcontrol.NewTokenBucketRateLimiterWithClock(qps, burstRuns, timer)
}
return bfr
}
proxier.syncRunner.Loop()
时间器循环运行的实现
func (bfr *BoundedFrequencyRunner) Loop(stop <-chan struct{}) {
klog.V(3).Infof("%s Loop running", bfr.name)
bfr.timer.Reset(bfr.maxInterval)
for {
select {
case <-stop: // 停止channel实现关闭loop
bfr.stop()
klog.V(3).Infof("%s Loop stopping", bfr.name)
return
case <-bfr.timer.C(): //定时器tick运行
bfr.tryRun()
case <-bfr.run: // 执行一次运行
bfr.tryRun()
}
}
}
BoundedFrequencyRunner.tryRun() 按指定频率对func的运行
!FILENAME:pkg/util/async/bounded_frequency_runner.go:211
func (bfr *BoundedFrequencyRunner) tryRun() {
bfr.mu.Lock()
defer bfr.mu.Unlock()
//限制条件允许运行func
if bfr.limiter.TryAccept() {
bfr.fn() // 调用func
bfr.lastRun = bfr.timer.Now() // 记录运行时间
bfr.timer.Stop()
bfr.timer.Reset(bfr.maxInterval) // 重设下次运行时间
klog.V(3).Infof("%s: ran, next possible in %v, periodic in %v", bfr.name, bfr.minInterval, bfr.maxInterval)
return
}
//限制条件不允许运行,计算下次运行时间
elapsed := bfr.timer.Since(bfr.lastRun) // elapsed:上次运行时间到现在已过多久
nextPossible := bfr.minInterval - elapsed // nextPossible:下次运行至少差多久(最小周期)
nextScheduled := bfr.maxInterval - elapsed // nextScheduled:下次运行最迟差多久(最大周期)
klog.V(4).Infof("%s: %v since last run, possible in %v, scheduled in %v", bfr.name, elapsed, nextPossible, nextScheduled)
if nextPossible < nextScheduled {
// Set the timer for ASAP, but don't drain here. Assuming Loop is running,
// it might get a delivery in the mean time, but that is OK.
bfr.timer.Stop()
bfr.timer.Reset(nextPossible)
klog.V(3).Infof("%s: throttled, scheduling run in %v", bfr.name, nextPossible)
}
}
Proxier 服务与端点更新 Tracker
kube-proxy需要及时同步services和endpoints的变化信息,前面我们看到proxier类对象有两个属性:serviceChanges和endpointsChanges是就是用来跟踪Service和Endpoint的更新信息,我们先来分析与之相关这两个类ServiceChangeTracker和EndpointChangeTracker。
ServiceChangeTracker 服务信息变更Tracker
!FILENAME pkg/proxy/service.go:143
type ServiceChangeTracker struct {
// 同步锁保护items
lock sync.Mutex
// items为service变化记录map
items map[types.NamespacedName]*serviceChange
// makeServiceInfo允许proxier在处理服务时定制信息
makeServiceInfo makeServicePortFunc
isIPv6Mode *bool //IPv6
recorder record.EventRecorder //事件记录器
}
//serviceChange类型定义, 新旧服务对.
type serviceChange struct {
previous ServiceMap
current ServiceMap
}
//ServiceMap类型定义
type ServiceMap map[ServicePortName]ServicePort
//ServicePortName类型定义
type ServicePortName struct {
types.NamespacedName
Port string
}
//NamespacedName类型定义
type NamespacedName struct {
Namespace string
Name string
}
//实例化ServiceChangeTracker对象
func NewServiceChangeTracker(makeServiceInfo makeServicePortFunc, isIPv6Mode *bool, recorder record.EventRecorder) *ServiceChangeTracker {
return &ServiceChangeTracker{
items: make(map[types.NamespacedName]*serviceChange),
makeServiceInfo: makeServiceInfo,
isIPv6Mode: isIPv6Mode,
recorder: recorder,
}
}
EndpointChangeTracker.Update()
!FILENAME pkg/proxy/service.go:173
func (ect *EndpointChangeTracker) Update(previous, current *v1.Endpoints) bool {
endpoints := current
if endpoints == nil {
endpoints = previous
}
// previous == nil && current == nil is unexpected, we should return false directly.
if endpoints == nil {
return false
}
namespacedName := types.NamespacedName{Namespace: endpoints.Namespace, Name: endpoints.Name}
ect.lock.Lock()
defer ect.lock.Unlock()
change, exists := ect.items[namespacedName]
if !exists {
change = &endpointsChange{}
change.previous = ect.endpointsToEndpointsMap(previous)
ect.items[namespacedName] = change
}
change.current = ect.endpointsToEndpointsMap(current)
// if change.previous equal to change.current, it means no change
if reflect.DeepEqual(change.previous, change.current) {
delete(ect.items, namespacedName)
}
return len(ect.items) > 0
}
EndpointChangeTracker 端点信息变更Tracker
!FILENAME pkg/proxy/endpoints.go:83
type EndpointChangeTracker struct {
lock sync.Mutex
// kube-proxy运行的主机名.
hostname string
// items为"endpoints"变化记录map
items map[types.NamespacedName]*endpointsChange
makeEndpointInfo makeEndpointFunc
isIPv6Mode *bool
recorder record.EventRecorder
}
//endpointsChange类型定义, 新旧端点对
type endpointsChange struct {
previous EndpointsMap
current EndpointsMap
}
//EndpointsMap类型定义
type EndpointsMap map[ServicePortName][]Endpoint
//Endpoint接口
type Endpoint interface {
// 返回endpoint字串 如格式: `IP:Port`.
String() string
// 是否本地主机
GetIsLocal() bool
// 返回endpoint的IP部分
IP() string
// 返回endpoint的Port部分
Port() (int, error)
// 计算两个endpoint是否相等
Equal(Endpoint) bool
}
//实例化NewEndpointChangeTracker对象
func NewEndpointChangeTracker(hostname string, makeEndpointInfo makeEndpointFunc, isIPv6Mode *bool, recorder record.EventRecorder) *EndpointChangeTracker {
return &EndpointChangeTracker{
hostname: hostname,
items: make(map[types.NamespacedName]*endpointsChange),
makeEndpointInfo: makeEndpointInfo,
isIPv6Mode: isIPv6Mode,
recorder: recorder,
}
}
EndpointChangeTracker.Update()
!FILENAME pkg/proxy/endpoints.go:116
func (ect *EndpointChangeTracker) Update(previous, current *v1.Endpoints) bool {
endpoints := current
if endpoints == nil {
endpoints = previous
}
// previous == nil && current == nil is unexpected, we should return false directly.
if endpoints == nil {
return false
}
namespacedName := types.NamespacedName{Namespace: endpoints.Namespace, Name: endpoints.Name}
ect.lock.Lock()
defer ect.lock.Unlock()
change, exists := ect.items[namespacedName]
if !exists {
change = &endpointsChange{}
change.previous = ect.endpointsToEndpointsMap(previous)
ect.items[namespacedName] = change
}
change.current = ect.endpointsToEndpointsMap(current)
// if change.previous equal to change.current, it means no change
if reflect.DeepEqual(change.previous, change.current) {
delete(ect.items, namespacedName)
}
return len(ect.items) > 0
}
Proxier服务与端点同步(Add / delete /update)都将调用ChangeTracker的update()来执行syncProxyRules
Proxier.OnServiceSynced()服务信息同步
!FILENAME pkg/proxy/iptables/proxier.go:528
func (proxier *Proxier) OnServiceSynced() {
proxier.mu.Lock()
proxier.servicesSynced = true
proxier.setInitialized(proxier.servicesSynced && proxier.endpointsSynced)
proxier.mu.Unlock()
// 非周期同步,单次执行同步
proxier.syncProxyRules()
}
service服务更新(add/delete/update),都调用Proxier.OnServiceUpdate()方法
!FILENAME pkg/proxy/iptables/proxier.go:513
func (proxier *Proxier) OnServiceAdd(service *v1.Service) {
proxier.OnServiceUpdate(nil, service) //传参一:oldService为nil
}
// update Service 包含Add / Delete
func (proxier *Proxier) OnServiceUpdate(oldService, service *v1.Service) {
if proxier.serviceChanges.Update(oldService, service) && proxier.isInitialized() {
proxier.syncRunner.Run() //单次执行
}
}
func (proxier *Proxier) OnServiceDelete(service *v1.Service) {
proxier.OnServiceUpdate(service, nil) //传参二:newService为nil
}
Proxier.OnEndpointsSynced()端点信息同步
!FILENAME pkg/proxy/iptables/proxier.go:552
func (proxier *Proxier) OnEndpointsSynced() {
proxier.mu.Lock()
proxier.endpointsSynced = true
proxier.setInitialized(proxier.servicesSynced && proxier.endpointsSynced)
proxier.mu.Unlock()
// Sync unconditionally - this is called once per lifetime.
proxier.syncProxyRules()
}
同service服务一样endpoint更新(add/delete/update),都调用Proxier.OnEndpointsUpdate()方法
!FILENAME pkg/proxy/iptables/proxier.go:538
func (proxier *Proxier) OnEndpointsAdd(endpoints *v1.Endpoints) {
proxier.OnEndpointsUpdate(nil, endpoints) //传参一:oldEndpoints为nil
}
// update endpoints,包含 Add / Delete
func (proxier *Proxier) OnEndpointsUpdate(oldEndpoints, endpoints *v1.Endpoints) {
if proxier.endpointsChanges.Update(oldEndpoints, endpoints) && proxier.isInitialized() {
proxier.syncRunner.Run()
}
}
func (proxier *Proxier) OnEndpointsDelete(endpoints *v1.Endpoints) {
proxier.OnEndpointsUpdate(endpoints, nil) //传参二:endpoints为nil
}
〜未完待续下篇〜