kube-proxy源码阅读分析

Version: Kubernetes 1.13

前言

Kubernetes每个Node上运行Kube-proxy做网络代理,用来实现Kubernetes中的service与pod之间的流量转发。本文将对Kube-Proxy的源码进行阅读,看看Kube-Proxy具体的实现逻辑。

其中关于Kube-Proxy的实现原理可以参考:

https://kubernetes.io/docs/concepts/services-networking/service/

https://my.oschina.net/jxcdwangtao

源码目录结构:

首先来看看Kube-Proxy的源码目录结构,Kube-Proxy的代码主要集中在cmd/kube-proxy和pkg/proxy这两个文件夹中。

k8s.io/kubernetes/cmd/kube-proxy
.
├── BUILD
├── app
│   ├── BUILD
│   ├── conntrack.go    //nf_contrack-sysctl interface的定义
│   ├── init_others.go    //除windows以外的系统对配置的初始化,现在暂时没有任何处理
│   ├── init_windows.go    //windows系统下给kube-proxy增加windows-service参数
│   ├── server.go        //创建kube-proxy command的NewProxyCommand的函数,ProxyServer的结构体定义以及Run方法
│   ├── server_others.go    //除windows以外的系统创建ProxyServer的newProxyServer函数
│   ├── server_others_test.go
│   ├── server_test.go
│   └── server_windows.go    //windows系统创建ProxyServer的newProxyServer函数
└── proxy.go    //kube-proxy的main函数

k8s.io/kubernetes/pkg/proxy
.
├── BUILD
├── OWNERS
├── apis    //kube-proxy配置参数相关的处理,包括注册、部分参数设置默认值、参数值的validation
│   └── config
│       ├── BUILD
│       ├── OWNERS
│       ├── doc.go
│       ├── fuzzer
│       │   ├── BUILD
│       │   └── fuzzer.go
│       ├── register.go
│       ├── scheme
│       │   ├── BUILD
│       │   ├── scheme.go
│       │   └── scheme_test.go
│       ├── types.go
│       ├── v1alpha1
│       │   ├── BUILD
│       │   ├── defaults.go
│       │   ├── doc.go
│       │   ├── register.go
│       │   ├── zz_generated.conversion.go
│       │   ├── zz_generated.deepcopy.go
│       │   └── zz_generated.defaults.go
│       ├── validation
│       │   ├── BUILD
│       │   ├── validation.go
│       │   └── validation_test.go
│       └── zz_generated.deepcopy.go
├── config    //ServiceHandler & EndpointsHandler interface的定义,ServiceConfig & EndpointsConfig结构体以及其Run方法
│   ├── BUILD
│   ├── OWNERS
│   ├── api_test.go
│   ├── config.go
│   ├── config_test.go
│   └── doc.go
├── doc.go
├── endpoints.go    //endpoints相关的结构体以及方法
├── endpoints_test.go
├── healthcheck    //kube-proxy的healthz的处理以及对service、endpoints的健康检查
│   ├── BUILD
│   ├── OWNERS
│   ├── doc.go
│   ├── healthcheck.go
│   └── healthcheck_test.go
├── iptables    //proxy mode为iptables的实现
│   ├── BUILD
│   ├── OWNERS
│   ├── proxier.go
│   └── proxier_test.go
├── ipvs    //proxy mode为ipvs的实现
│   ├── BUILD
│   ├── OWNERS
│   ├── README.md
│   ├── graceful_termination.go
│   ├── ipset.go
│   ├── ipset_test.go
│   ├── netlink.go
│   ├── netlink_linux.go
│   ├── netlink_unsupported.go
│   ├── proxier.go
│   ├── proxier_test.go
│   └── testing
│       ├── BUILD
│       ├── fake.go
│       ├── fake_test.go
│       └── util.go
├── metrics    //kube-proxy metrics相关函数(prometheus监控kube-proxy)
│   ├── BUILD
│   └── metrics.go
├── service.go    //service相关的结构体以及方法
├── service_test.go
├── types.go    //serviceport & endpoints相关的一些interface和结构体的定义,以及ProxyProvider的定义
├── userspace    //proxy mode为userspace的实现
│   ├── BUILD
│   ├── OWNERS
│   ├── loadbalancer.go
│   ├── port_allocator.go
│   ├── port_allocator_test.go
│   ├── proxier.go
│   ├── proxier_test.go
│   ├── proxysocket.go
│   ├── rlimit.go
│   ├── rlimit_windows.go
│   ├── roundrobin.go
│   └── roundrobin_test.go
├── util    
│   ├── BUILD
│   ├── endpoints.go
│   ├── endpoints_test.go
│   ├── network.go
│   ├── port.go
│   ├── port_test.go
│   ├── testing
│   │   ├── BUILD
│   │   └── fake.go
│   ├── utils.go
│   └── utils_test.go
├── winkernel    //kube-proxy在windows系统下的实现
│   ├── BUILD
│   ├── OWNERS
│   ├── metrics.go
│   └── proxier.go
└── winuserspace    //windows系统proxy mode为userspace的实现
    ├── BUILD
    ├── loadbalancer.go
    ├── proxier.go
    ├── proxier_test.go
    ├── proxysocket.go
    ├── roundrobin.go
    ├── roundrobin_test.go
    └── types.go

main函数

Kubernetes使用的是corba框架,各个组件的Main函数都大同小异,都是先创建组件的command,接着Execute command。

k8s.io/kubernetes/cmd/kube-proxy/proxy.go:35
func main() {
   rand.Seed(time.Now().UnixNano())

   //创建Kube-Proxy command
   command := app.NewProxyCommand()

   // TODO: once we switch everything over to Cobra commands, we can go back to calling
   // utilflag.InitFlags() (by removing its pflag.Parse() call). For now, we have to set the
   // normalize func and add the go flag set by hand.
   pflag.CommandLine.SetNormalizeFunc(utilflag.WordSepNormalizeFunc)
   pflag.CommandLine.AddGoFlagSet(goflag.CommandLine)
   // utilflag.InitFlags()
   logs.InitLogs()
   defer logs.FlushLogs()

   //Execute Kube-Proxy command
   if err := command.Execute(); err != nil {
      fmt.Fprintf(os.Stderr, "error: %v\n", err)
      os.Exit(1)
   }
}

NewProxyCommand函数创建Kube-Proxy command以及command的Run函数

NewProxyCommand的主要逻辑:

  • 创建Kube-Proxy Options并初始化healthzPort为10256,初始化CleanupIPVS为true
  • 创建Kube-Proxy command,Main函数中Execute command即执行command的Run函数,所以看看Run函数做了哪些处理:
    • 如设置了version参数,print Kube-Proxy的版本后退出
    • 打印Kube-Proxy所有参数及其对应的值(未设置的参数打印其default值)
    • 根据系统做不同的初始化
    • 从configfile中load出配置的参数值,设置HostnameOverride,以及设置FeatureGate的Default值
    • 检查所有参数的设置值是否合法
    • 执行options的Run方法
  • 给部分参数设置默认值,具体的参数及其默认值可参考库:k8s.io/kubernetes/pkg/proxy/apis/config/v1alpha1/defaults.go
  • 将Kube-Proxy的flag bind到option的KubeProxyConfiguration
k8s.io/kubernetes/cmd/kube-proxy/app/server.go:351
func NewProxyCommand() *cobra.Command {
   //创建Kube-Proxy Options并初始化healthzPort为10256,初始化CleanupIPVS为true
   opts := NewOptions()

   //创建Kube-Proxy command,Main函数中Execute command即执行command的Run函数,所以看看Run函数做了哪些处理:
     1、如设置了version参数,print Kube-Proxy的版本后退出
     2、打印Kube-Proxy所有参数及其对应的值(未设置的参数打印其default值)
     3、根据系统做不同的初始化
     4、从configfile中load出配置的参数值,设置HostnameOverride,以及设置FeatureGate的Default值
     5、检查所有参数的设置值是否合法
     6、执行options的Run方法
   cmd := &cobra.Command{
      Use: "kube-proxy",
      Long: `The Kubernetes network proxy runs on each node. This
reflects services as defined in the Kubernetes API on each node and can do simple
TCP, UDP, and SCTP stream forwarding or round robin TCP, UDP, and SCTP forwarding across a set of backends.
Service cluster IPs and ports are currently found through Docker-links-compatible
environment variables specifying ports opened by the service proxy. There is an optional
addon that provides cluster DNS for these cluster IPs. The user must create a service
with the apiserver API to configure the proxy.`,
      Run: func(cmd *cobra.Command, args []string) {
         //如设置了version参数,print Kube-Proxy的版本后退出
         verflag.PrintAndExitIfRequested()
         //打印Kube-Proxy所有参数及其对应的值(未设置的参数打印其default值)
         utilflag.PrintFlags(cmd.Flags())

         //根据系统做不同的初始化
         if err := initForOS(opts.WindowsService); err != nil {
            klog.Fatalf("failed OS init: %v", err)
         }

         //从configfile中load出配置的参数值,设置HostnameOverride,以及设置FeatureGate的Default值
         if err := opts.Complete(); err != nil {
            klog.Fatalf("failed complete: %v", err)
         }

         //检查所有参数的设置值是否合法
         if err := opts.Validate(args); err != nil {
            klog.Fatalf("failed validate: %v", err)
         }

         //执行options的Run方法
         klog.Fatal(opts.Run())
      },
   }

   var err error
   //给部分参数设置默认值,具体的参数及其默认值可参考库:k8s.io/kubernetes/pkg/proxy/apis/config/v1alpha1/defaults.go
   opts.config, err = opts.ApplyDefaults(opts.config)
   if err != nil {
      klog.Fatalf("unable to create flag defaults: %v", err)
   }

   //将Kube-Proxy的flag bind到option的KubeProxyConfiguration
   opts.AddFlags(cmd.Flags())

   cmd.MarkFlagFilename("config", "yaml", "yml", "json")

   return cmd
}

Options.Run方法

Kube-Proxy command的Run函数最终调用的是Options.Run方法,Options.Run方法主要就以下3个处理逻辑:

  • 如果设置了--write-config-to,将Kube-Proxy所有的参数及其值写到该文件,然后退出
  • 创建ProxyServer
  • 执行ProxyServer的Run方法

Kube-Proxy最终是创建了一个ProxyServer并执行

k8s.io/kubernetes/cmd/kube-proxy/app/server.go:245
func (o *Options) Run() error {
   //如果设置了--write-config-to,将Kube-Proxy所有的参数及其值写到该文件,然后退出
   if len(o.WriteConfigTo) > 0 {
      return o.writeConfigFile()
   }

   //创建ProxyServer
   proxyServer, err := NewProxyServer(o)
   if err != nil {
      return err
   }

   //执行ProxyServer的Run方法
   return proxyServer.Run()
}

NewProxyServer函数创建ProxyServer

NewProxyServer函数直接调用了newProxyServer函数来创建一个ProxyServer实例,接下来看看newProxyServer函数的具体逻辑:

  • 如果Kube-Proxy的config是空,直接return error
  • 根据Kube-Proxy的config创建Name为"kubeproxy.config.k8s.io"的config
  • 检查--bind-address参数设置的IP是IPV4还是IPV6,对应设置protocol的值
  • 创建iptables、ipset Interface,以及创建ipvs的kernelHandler
  • 检查是否可以使用ipvs mode,使用ipvs mode系统需要安装ip_vs、ip_vs_rr、ip_vs_wrr、ip_vs_sh,另外4.19以下的内核还需要安装nf_conntrack_ipv4,4.19以上的内核则还需要安装nf_conntrack
  • 如果--cleanup-iptables或者--cleanup参数为true,则直接用iptables、ipset、ipvs Interface创建ProxyServer,接着return。注:--cleanup-iptables已Deprecated,如有需要建议使用--cleanup参数
  • 根据kubeconfig和master创建client连接apiserver,以及创建eventClient和event recorder
  • 用--healthz-bind-address和--healthz-port(默认为:0.0.0.0:10256)创建healthz server,方便检查Kube-Proxy是否正常运行
  • 获取Kube-Proxy的proxy mode,即--proxy-mode参数的值,但并不是--proxy-mode设置为ipvs就会使用ipvs mode。如果--proxy-mode设置为ipvs,但是系统不支持ipvs,则mode会改为使用iptables,如果系统还不支持iptables,则mode改为使用userspace;同理如果--proxy-mode设置为iptables,如果系统不支持iptables,则mode改为使用userspace。如果--proxy-mode没有设置,则默认为iptables。其中ipvs和iptables的使用条件如下:
    • ipvs mode 需要安装ip_vs、ip_vs_rr、ip_vs_wrr、ip_vs_sh,另外4.19以下的内核还需要安装nf_conntrack_ipv4,4.19以上的内核则还需要安装nf_conntrack
    • iptables mode需要iptables的version大于等于1.4.11,且/proc/sys/net/ipv4/conf/all/route_localnet有设置值
  • 获取nodeIP,即--bind-address的值,如果--bind-address为指定,nodeIP为node的实际IP地址
  • 根据proxyMode创建相应的proxier以及做对应的一些处理,三个proxyMode的逻辑大同小异,主要处理如下:
    • iptables mode
      • 调用iptables.NewProxier函数创建proxierIPTables
      • 注册metrics,方便监控Kube-Proxy
      • 将proxierIPTables赋给proxier、serviceEventHandler、endpointsEventHandler
      • 清除userspace proxier创建的iptables rule和chain
      • 如果系统支持ipvs mode,则清除ipvs proxier创建的ipvs和iptables rule
    • ipvs mode
      • 调用ipvs.NewProxier创建proxierIPVS
      • 注册metrics,方便监控Kube-Proxy
      • 将proxierIPVS赋给proxier、serviceEventHandler、endpointsEventHandler
      • 清除userspace和iptables proxier创建的iptables rule和chain
    • user mode
      • 调用userspace.NewLoadBalancerRR创建loadBalancer并赋给endpointsEventHandler
      • 调用userspace.NewProxier创建proxierUserspace
      • 将proxierUserspace赋给proxier、serviceEventHandler
      • 清除iptables proxier创建的iptables rule和chain
      • 如果系统支持ipvs mode,则清除ipvs proxier创建的ipvs和iptables rule
  • return ProxyServer
k8s.io/kubernetes/cmd/kube-proxy/app/server_others.go:55
func NewProxyServer(o *Options) (*ProxyServer, error) {
   return newProxyServer(o.config, o.CleanupAndExit, o.CleanupIPVS, o.scheme, o.master)
}

func newProxyServer(
   config *proxyconfigapi.KubeProxyConfiguration,
   cleanupAndExit bool,
   cleanupIPVS bool,
   scheme *runtime.Scheme,
   master string) (*ProxyServer, error) {

   //如果Kube-Proxy的config是空,直接return error
   if config == nil {
      return nil, errors.New("config is required")
   }

   //根据Kube-Proxy的config创建Name为"kubeproxy.config.k8s.io"的config
   if c, err := configz.New(proxyconfigapi.GroupName); err == nil {
      c.Set(config)
   } else {
      return nil, fmt.Errorf("unable to register configz: %s", err)
   }

   //检查--bind-address参数设置的IP是IPV4还是IPV6,对应设置protocol的值
   protocol := utiliptables.ProtocolIpv4
   if net.ParseIP(config.BindAddress).To4() == nil {
      klog.V(0).Infof("IPv6 bind address (%s), assume IPv6 operation", config.BindAddress)
      protocol = utiliptables.ProtocolIpv6
   }

   var iptInterface utiliptables.Interface
   var ipvsInterface utilipvs.Interface
   var kernelHandler ipvs.KernelHandler
   var ipsetInterface utilipset.Interface
   var dbus utildbus.Interface

   // Create a iptables utils.
   execer := exec.New()

   //创建iptables、ipset Interface,以及创建ipvs的kernelHandler
   dbus = utildbus.New()
   iptInterface = utiliptables.New(execer, dbus, protocol)
   kernelHandler = ipvs.NewLinuxKernelHandler()
   ipsetInterface = utilipset.New(execer)

   //检查是否可以使用ipvs mode,使用ipvs mode系统需要安装ip_vs、ip_vs_rr、ip_vs_wrr、ip_vs_sh,另外4.19以下的内核还需要安装nf_conntrack_ipv4,4.19以上的内核则还需要安装nf_conntrack
   canUseIPVS, _ := ipvs.CanUseIPVSProxier(kernelHandler, ipsetInterface)

   //如果可以使用ipvs mode,创建ipvs Interface
   if canUseIPVS {
      ipvsInterface = utilipvs.New(execer)
   }

   //如果--cleanup-iptables或者--cleanup参数为true,则直接用iptables、ipset、ipvs Interface创建ProxyServer,接着return。注:--cleanup-iptables已Deprecated,如有需要建议使用--cleanup参数
   // We omit creation of pretty much everything if we run in cleanup mode
   if cleanupAndExit {
      return &ProxyServer{
         execer:         execer,
         IptInterface:   iptInterface,
         IpvsInterface:  ipvsInterface,
         IpsetInterface: ipsetInterface,
         CleanupAndExit: cleanupAndExit,
      }, nil
   }

   //根据kubeconfig和master创建client连接apiserver,以及创建eventClient
   client, eventClient, err := createClients(config.ClientConnection, master)
   if err != nil {
      return nil, err
   }

   //创建event recorder
   // Create event recorder
   hostname, err := utilnode.GetHostname(config.HostnameOverride)
   if err != nil {
      return nil, err
   }
   eventBroadcaster := record.NewBroadcaster()
   recorder := eventBroadcaster.NewRecorder(scheme, v1.EventSource{Component: "kube-proxy", Host: hostname})

   nodeRef := &v1.ObjectReference{
      Kind:      "Node",
      Name:      hostname,
      UID:       types.UID(hostname),
      Namespace: "",
   }

   //用--healthz-bind-address和--healthz-port(默认为:0.0.0.0:10256)创建healthz server,方便检查Kube-Proxy是否正常运行
   var healthzServer *healthcheck.HealthzServer
   var healthzUpdater healthcheck.HealthzUpdater
   if len(config.HealthzBindAddress) > 0 {
      healthzServer = healthcheck.NewDefaultHealthzServer(config.HealthzBindAddress, 2*config.IPTables.SyncPeriod.Duration, recorder, nodeRef)
      healthzUpdater = healthzServer
   }

   var proxier proxy.ProxyProvider
   var serviceEventHandler proxyconfig.ServiceHandler
   var endpointsEventHandler proxyconfig.EndpointsHandler

   //获取Kube-Proxy的proxy mode,即--proxy-mode参数的值,但并不是--proxy-mode设置为ipvs就会使用ipvs mode。如果--proxy-mode设置为ipvs,但是系统不支持ipvs,则mode会改为使用iptables,如果系统还不支持iptables,则mode改为使用userspace;
     同理如果--proxy-mode设置为iptables,如果系统不支持iptables,则mode改为使用userspace。如果--proxy-mode没有设置,则默认为iptables。其中ipvs和iptables的使用条件如下:
     1、ipvs mode 需要安装ip_vs、ip_vs_rr、ip_vs_wrr、ip_vs_sh,另外4.19以下的内核还需要安装nf_conntrack_ipv4,4.19以上的内核则还需要安装nf_conntrack
     2、iptables mode需要iptables的version大于等于1.4.11,且/proc/sys/net/ipv4/conf/all/route_localnet有设置值
   proxyMode := getProxyMode(string(config.Mode), iptInterface, kernelHandler, ipsetInterface, iptables.LinuxKernelCompatTester{})
   //获取nodeIP,即--bind-address的值,如果--bind-address为指定,nodeIP为node的实际IP地址
   nodeIP := net.ParseIP(config.BindAddress)
   if nodeIP.IsUnspecified() {
      nodeIP = utilnode.GetNodeIP(client, hostname)
   }
   //根据proxyMode创建相应的proxier以及做对应的一些处理,接下来分别看看三个proxyMode分别作了哪些处理:
     1、iptables mode
        a、调用iptables.NewProxier函数创建proxierIPTables
        b、注册metrics,方便监控Kube-Proxy
        c、将proxierIPTables赋给proxier、serviceEventHandler、endpointsEventHandler
        d、清除userspace proxier创建的iptables rule和chain
        e、如果系统支持ipvs mode,则清除ipvs proxier创建的ipvs和iptables rule
     2、ipvs mode
        a、调用ipvs.NewProxier创建proxierIPVS
        b、注册metrics,方便监控Kube-Proxy
        c、将proxierIPVS赋给proxier、serviceEventHandler、endpointsEventHandler
        d、清除userspace和iptables proxier创建的iptables rule和chain
     3、user mode
        a、调用userspace.NewLoadBalancerRR创建loadBalancer并赋给endpointsEventHandler
        b、调用userspace.NewProxier创建proxierUserspace
        c、将proxierUserspace赋给proxier、serviceEventHandler
        d、清除iptables proxier创建的iptables rule和chain
        e、如果系统支持ipvs mode,则清除ipvs proxier创建的ipvs和iptables rule
   if proxyMode == proxyModeIPTables {
      klog.V(0).Info("Using iptables Proxier.")
      if config.IPTables.MasqueradeBit == nil {
         // MasqueradeBit must be specified or defaulted.
         return nil, fmt.Errorf("unable to read IPTables MasqueradeBit from config")
      }

      // TODO this has side effects that should only happen when Run() is invoked.
      proxierIPTables, err := iptables.NewProxier(
         iptInterface,
         utilsysctl.New(),
         execer,
         config.IPTables.SyncPeriod.Duration,
         config.IPTables.MinSyncPeriod.Duration,
         config.IPTables.MasqueradeAll,
         int(*config.IPTables.MasqueradeBit),
         config.ClusterCIDR,
         hostname,
         nodeIP,
         recorder,
         healthzUpdater,
         config.NodePortAddresses,
      )
      if err != nil {
         return nil, fmt.Errorf("unable to create proxier: %v", err)
      }
      metrics.RegisterMetrics()
      proxier = proxierIPTables
      serviceEventHandler = proxierIPTables
      endpointsEventHandler = proxierIPTables
      // No turning back. Remove artifacts that might still exist from the userspace Proxier.
      klog.V(0).Info("Tearing down inactive rules.")
      // TODO this has side effects that should only happen when Run() is invoked.
      userspace.CleanupLeftovers(iptInterface)
      // IPVS Proxier will generate some iptables rules, need to clean them before switching to other proxy mode.
      // Besides, ipvs proxier will create some ipvs rules as well.  Because there is no way to tell if a given
      // ipvs rule is created by IPVS proxier or not.  Users should explicitly specify `--clean-ipvs=true` to flush
      // all ipvs rules when kube-proxy start up.  Users do this operation should be with caution.
      if canUseIPVS {
         ipvs.CleanupLeftovers(ipvsInterface, iptInterface, ipsetInterface, cleanupIPVS)
      }
   } else if proxyMode == proxyModeIPVS {
      klog.V(0).Info("Using ipvs Proxier.")
      proxierIPVS, err := ipvs.NewProxier(
         iptInterface,
         ipvsInterface,
         ipsetInterface,
         utilsysctl.New(),
         execer,
         config.IPVS.SyncPeriod.Duration,
         config.IPVS.MinSyncPeriod.Duration,
         config.IPVS.ExcludeCIDRs,
         config.IPTables.MasqueradeAll,
         int(*config.IPTables.MasqueradeBit),
         config.ClusterCIDR,
         hostname,
         nodeIP,
         recorder,
         healthzServer,
         config.IPVS.Scheduler,
         config.NodePortAddresses,
      )
      if err != nil {
         return nil, fmt.Errorf("unable to create proxier: %v", err)
      }
      metrics.RegisterMetrics()
      proxier = proxierIPVS
      serviceEventHandler = proxierIPVS
      endpointsEventHandler = proxierIPVS
      klog.V(0).Info("Tearing down inactive rules.")
      // TODO this has side effects that should only happen when Run() is invoked.
      userspace.CleanupLeftovers(iptInterface)
      iptables.CleanupLeftovers(iptInterface)
   } else {
      klog.V(0).Info("Using userspace Proxier.")
      // This is a proxy.LoadBalancer which NewProxier needs but has methods we don't need for
      // our config.EndpointsConfigHandler.
      loadBalancer := userspace.NewLoadBalancerRR()
      // set EndpointsConfigHandler to our loadBalancer
      endpointsEventHandler = loadBalancer

      // TODO this has side effects that should only happen when Run() is invoked.
      proxierUserspace, err := userspace.NewProxier(
         loadBalancer,
         net.ParseIP(config.BindAddress),
         iptInterface,
         execer,
         *utilnet.ParsePortRangeOrDie(config.PortRange),
         config.IPTables.SyncPeriod.Duration,
         config.IPTables.MinSyncPeriod.Duration,
         config.UDPIdleTimeout.Duration,
         config.NodePortAddresses,
      )
      if err != nil {
         return nil, fmt.Errorf("unable to create proxier: %v", err)
      }
      serviceEventHandler = proxierUserspace
      proxier = proxierUserspace

      // Remove artifacts from the iptables and ipvs Proxier, if not on Windows.
      klog.V(0).Info("Tearing down inactive rules.")
      // TODO this has side effects that should only happen when Run() is invoked.
      iptables.CleanupLeftovers(iptInterface)
      // IPVS Proxier will generate some iptables rules, need to clean them before switching to other proxy mode.
      // Besides, ipvs proxier will create some ipvs rules as well.  Because there is no way to tell if a given
      // ipvs rule is created by IPVS proxier or not.  Users should explicitly specify `--clean-ipvs=true` to flush
      // all ipvs rules when kube-proxy start up.  Users do this operation should be with caution.
      if canUseIPVS {
         ipvs.CleanupLeftovers(ipvsInterface, iptInterface, ipsetInterface, cleanupIPVS)
      }
   }

   iptInterface.AddReloadFunc(proxier.Sync)

   //return ProxyServer
   return &ProxyServer{
      Client:                 client,
      EventClient:            eventClient,
      IptInterface:           iptInterface,
      IpvsInterface:          ipvsInterface,
      IpsetInterface:         ipsetInterface,
      execer:                 execer,
      Proxier:                proxier,
      Broadcaster:            eventBroadcaster,
      Recorder:               recorder,
      ConntrackConfiguration: config.Conntrack,
      Conntracker:            &realConntracker{},
      ProxyMode:              proxyMode,
      NodeRef:                nodeRef,
      MetricsBindAddress:     config.MetricsBindAddress,
      EnableProfiling:        config.EnableProfiling,
      OOMScoreAdj:            config.OOMScoreAdj,
      ResourceContainer:      config.ResourceContainer,
      ConfigSyncPeriod:       config.ConfigSyncPeriod.Duration,
      ServiceEventHandler:    serviceEventHandler,
      EndpointsEventHandler:  endpointsEventHandler,
      HealthzServer:          healthzServer,
   }, nil
}

ProxyServer.Run方法

最后来看看ProxyServer的Run方法做了什么:

  • 如果--cleanup-iptables或者--cleanup参数为true,直接清空Kube-Proxy创建的iptables & ipvs rules,完成后return
  • 如果--oom-score-adj设置了,则设置OOM score
  • 如果设置--resource-container了,根据创建container name为该参数的值的resource-only container,注:参数--resource-container已Deprecated,之后将会删除
  • start Kube-Proxy record event
  • 启动healthzServer,路径/healthz,以便检查Kube-Proxy是否健康运行
  • 启动metrics server,/proxyMode路径输出Kube-Proxy的proxymode,/metrics输出Kube-Proxy其他监控数据,方便监控系统(i.e Prometheus)取监控数据
  • 根据--conntrack-xxxx参数设置系统中对应nf_conntrack相关的参数值
  • 创建serviceConfig,注册serviceConfig的eventHandlers为ProxyServer的ServiceEventHandler(即对应每个proxymode的proxier),serviceConfig主要watch service的add/update/delete event。注册service AddFunc,最终执行proxier的OnServiceAdd方法;注册service UpdateFunc,最终执行proxier的OnServiceUpdate;注册service DeleteFunc,最终执行proxier的OnServiceDelete。接着执行一次serviceConfig的Run方法,该方法等待service同步完后,执行proxier的OnServiceSynced
  • 创建endpointsConfig,注册endpointsConfig的eventHandlers为ProxyServer的endpointsEventHandler(即对应每个proxymode的proxier),endpointsConfig主要watch endpoints的add/update/delete event。注册endpoints AddFunc,最终执行proxier的OnEndpointsAdd方法;注册endpoints UpdateFunc,最终执行proxier的OnEndpointsUpdate;注册endpoints DeleteFunc,最终执行proxier的OnEndpointsDelete。接着执行一次endpointsConfig的Run方法,该方法等待endpoints同步完后,执行proxier的OnEndpointsSynced
  • 向kube-apiserver写一条"Starting kube-proxy"的event
  • 执行porxier的Syncloop方法
func (s *ProxyServer) Run() error {
   // To help debugging, immediately log version
   klog.Infof("Version: %+v", version.Get())
   //如果--cleanup-iptables或者--cleanup参数为true,直接清空Kube-Proxy创建的iptables & ipvs rules,完成后return
   // remove iptables rules and exit
   if s.CleanupAndExit {
      encounteredError := userspace.CleanupLeftovers(s.IptInterface)
      encounteredError = iptables.CleanupLeftovers(s.IptInterface) || encounteredError
      encounteredError = ipvs.CleanupLeftovers(s.IpvsInterface, s.IptInterface, s.IpsetInterface, s.CleanupIPVS) || encounteredError
      if encounteredError {
         return errors.New("encountered an error while tearing down rules.")
      }
      return nil
   }

   //如果--oom-score-adj设置了,则设置OOM score
   // TODO(vmarmol): Use container config for this.
   var oomAdjuster *oom.OOMAdjuster
   if s.OOMScoreAdj != nil {
      oomAdjuster = oom.NewOOMAdjuster()
      if err := oomAdjuster.ApplyOOMScoreAdj(0, int(*s.OOMScoreAdj)); err != nil {
         klog.V(2).Info(err)
      }
   }

   //如果设置--resource-container了,根据创建container name为该参数的值的resource-only container,注:参数--resource-container已Deprecated,之后将会删除
   if len(s.ResourceContainer) != 0 {
      // Run in its own container.
      if err := resourcecontainer.RunInResourceContainer(s.ResourceContainer); err != nil {
         klog.Warningf("Failed to start in resource-only container %q: %v", s.ResourceContainer, err)
      } else {
         klog.V(2).Infof("Running in resource-only container %q", s.ResourceContainer)
      }
   }

   //start Kube-Proxy record event
   if s.Broadcaster != nil && s.EventClient != nil {
      s.Broadcaster.StartRecordingToSink(&v1core.EventSinkImpl{Interface: s.EventClient.Events("")})
   }

   //启动healthzServer,路径/healthz,以便检查Kube-Proxy是否健康运行
   // Start up a healthz server if requested
   if s.HealthzServer != nil {
      s.HealthzServer.Run()
   }

   //启动metrics server,/proxyMode路径输出Kube-Proxy的proxymode,/metrics输出Kube-Proxy其他监控数据,方便监控系统(i.e Prometheus)取监控数据
   // Start up a metrics server if requested
   if len(s.MetricsBindAddress) > 0 {
      mux := mux.NewPathRecorderMux("kube-proxy")
      healthz.InstallHandler(mux)
      mux.HandleFunc("/proxyMode", func(w http.ResponseWriter, r *http.Request) {
         fmt.Fprintf(w, "%s", s.ProxyMode)
      })
      mux.Handle("/metrics", prometheus.Handler())
      if s.EnableProfiling {
         routes.Profiling{}.Install(mux)
      }
      configz.InstallHandler(mux)
      go wait.Until(func() {
         err := http.ListenAndServe(s.MetricsBindAddress, mux)
         if err != nil {
            utilruntime.HandleError(fmt.Errorf("starting metrics server failed: %v", err))
         }
      }, 5*time.Second, wait.NeverStop)
   }

   //根据--conntrack-xxxx设置系统中对应nf_conntrack相关的参数值
   // Tune conntrack, if requested
   // Conntracker is always nil for windows
   if s.Conntracker != nil {
      max, err := getConntrackMax(s.ConntrackConfiguration)
      if err != nil {
         return err
      }
      if max > 0 {
         err := s.Conntracker.SetMax(max)
         if err != nil {
            if err != readOnlySysFSError {
               return err
            }
            // readOnlySysFSError is caused by a known docker issue (https://github.com/docker/docker/issues/24000),
            // the only remediation we know is to restart the docker daemon.
            // Here we'll send an node event with specific reason and message, the
            // administrator should decide whether and how to handle this issue,
            // whether to drain the node and restart docker.
            // TODO(random-liu): Remove this when the docker bug is fixed.
            const message = "DOCKER RESTART NEEDED (docker issue #24000): /sys is read-only: " +
               "cannot modify conntrack limits, problems may arise later."
            s.Recorder.Eventf(s.NodeRef, api.EventTypeWarning, err.Error(), message)
         }
      }

      if s.ConntrackConfiguration.TCPEstablishedTimeout != nil && s.ConntrackConfiguration.TCPEstablishedTimeout.Duration > 0 {
         timeout := int(s.ConntrackConfiguration.TCPEstablishedTimeout.Duration / time.Second)
         if err := s.Conntracker.SetTCPEstablishedTimeout(timeout); err != nil {
            return err
         }
      }

      if s.ConntrackConfiguration.TCPCloseWaitTimeout != nil && s.ConntrackConfiguration.TCPCloseWaitTimeout.Duration > 0 {
         timeout := int(s.ConntrackConfiguration.TCPCloseWaitTimeout.Duration / time.Second)
         if err := s.Conntracker.SetTCPCloseWaitTimeout(timeout); err != nil {
            return err
         }
      }
   }

   informerFactory := informers.NewSharedInformerFactory(s.Client, s.ConfigSyncPeriod)

   //创建serviceConfig,注册serviceConfig的eventHandlers为ProxyServer的ServiceEventHandler(即对应每个proxymode的proxier),serviceConfig主要watch service的add/update/delete event。
     注册service AddFunc,最终执行proxier的OnServiceAdd方法;注册service UpdateFunc,最终执行proxier的OnServiceUpdate;注册service DeleteFunc,最终执行proxier的OnServiceDelete。
     接着执行一次serviceConfig的Run方法,该方法等待service同步完后,执行proxier的OnServiceSynced
   // Create configs (i.e. Watches for Services and Endpoints)
   // Note: RegisterHandler() calls need to happen before creation of Sources because sources
   // only notify on changes, and the initial update (on process start) may be lost if no handlers
   // are registered yet.
   serviceConfig := config.NewServiceConfig(informerFactory.Core().V1().Services(), s.ConfigSyncPeriod)
   serviceConfig.RegisterEventHandler(s.ServiceEventHandler)
   go serviceConfig.Run(wait.NeverStop)

   //创建endpointsConfig,注册endpointsConfig的eventHandlers为ProxyServer的endpointsEventHandler(即对应每个proxymode的proxier),endpointsConfig主要watch endpoints的add/update/delete event。
     注册endpoints AddFunc,最终执行proxier的OnEndpointsAdd方法;注册endpoints UpdateFunc,最终执行proxier的OnEndpointsUpdate;注册endpoints DeleteFunc,最终执行proxier的OnEndpointsDelete。
     接着执行一次endpointsConfig的Run方法,该方法等待endpoints同步完后,执行proxier的OnEndpointsSynced   
   endpointsConfig := config.NewEndpointsConfig(informerFactory.Core().V1().Endpoints(), s.ConfigSyncPeriod)
   endpointsConfig.RegisterEventHandler(s.EndpointsEventHandler)
   go endpointsConfig.Run(wait.NeverStop)

   // This has to start after the calls to NewServiceConfig and NewEndpointsConfig because those
   // functions must configure their shared informer event handlers first.
   go informerFactory.Start(wait.NeverStop)

   //向kube-apiserver写一条"Starting kube-proxy"的event
   // Birth Cry after the birth is successful
   s.birthCry()

   //执行porxier的Syncloop方法
   // Just loop forever for now...
   s.Proxier.SyncLoop()
   return nil
}

总结

Kube-Proxy watch了集群的service/endpoints资源,最终根据proxymode调用proxier去创建iptables/ipvs rule,Kube-Proxy现在有3种proxymode的proxier,分别是userspace、iptables、ipvs。userspace是v1.1及之前版本的默认模式;v1.1 增加了iptables模式,并在v1.2变成版本中替换userspace变成默认模式至今;v1.8有增加了ipvs模式,ipvs相比iptables在性能和负载均衡多样性上都有很明显优势,相信在不久之后就会变成Kube-Proxy的默认模式。那么iptables、ipvs模式具体又是怎么处理的呢,本篇博文暂不深入,会在之后的博文详细阅读分析相关的代码;userspace模式现在基本已经弃用了,就不再深入阅读了。

转载于:https://my.oschina.net/u/3797264/blog/3023449

你可能感兴趣的:(操作系统,运维,网络)