[k8s源码分析][kube-scheduler]scheduler/algorithmprovider之注册default-scheduler

1. 前言

转载请说明原文出处, 尊重他人劳动成果!

本文将分析默认调度器是如何注册和如何被使用的, 主要涉及到了两个文件pkg/scheduler/factory/plugins.gopkg/scheduler/algorithmprovider/defaults/defaults.go
源码位置: https://github.com/nicktming/kubernetes
分支: tming-v1.13 (基于v1.13版本)

2. 注册默认scheduler

相信大家或多或少都看到过类似下面的文件, 可能有所了解或者不了解, 接下来的内容将会对理解这个文件有所帮助.

{
    "kind" : "Policy",
    "apiVersion" : "v1",
    "predicates" : [
      {"name" : "PodFitsHostPorts"},
      {"name" : "PodFitsResources"},
      {"name" : "NoDiskConflict"},
      {"name" : "MatchNodeSelector"},
      {"name" : "HostName"}
    ],
    "priorities" : [
      {"name" : "LeastRequestedPriority", "weight" : 1},
      {"name" : "BalancedResourceAllocation", "weight" : 1},
      {"name" : "ServiceSpreadingPriority", "weight" : 1},
      {"name" : "EqualPriority", "weight" : 1}
    ],
}

kube-scheduler要调度一个pod的时候, 现在有一些节点, 到底如何给这个pod分配节点呢?
总所周知, kube-scheduler会做预选(predicate)从这些节点选出可以运行这个pod的节点(比如有些节点因为资源不足或者节点亲和性等等无法运行该pod), 然后通过优选(priority)从这些预选结果中选出得分最高的那个节点作为最终要运行的节点.

那么预选(predicate)是必须要通过哪些预选方法比如上面的文件中PodFitsHostPorts, PodFitsResources等等.
优选(priority)是每个方法有一个权重, 该pod在某节点上的得分就是这些方法的总和.

在介绍注册默认调度器前, 需要先介绍pkg/scheduler/factory/plugins.go, 因为该文件就是为注册调度器而准备的.

3. pkg/scheduler/factory/plugins.go

type PluginFactoryArgs struct {
    PodLister                      algorithm.PodLister
    ServiceLister                  algorithm.ServiceLister
    ControllerLister               algorithm.ControllerLister
    ReplicaSetLister               algorithm.ReplicaSetLister
    StatefulSetLister              algorithm.StatefulSetLister
    NodeLister                     algorithm.NodeLister
    PDBLister                      algorithm.PDBLister
    NodeInfo                       predicates.NodeInfo
    PVInfo                         predicates.PersistentVolumeInfo
    PVCInfo                        predicates.PersistentVolumeClaimInfo
    StorageClassInfo               predicates.StorageClassInfo
    VolumeBinder                   *volumebinder.VolumeBinder
    HardPodAffinitySymmetricWeight int32
}
type FitPredicateFactory func(PluginFactoryArgs) algorithm.FitPredicate
type PriorityFunctionFactory func(PluginFactoryArgs) algorithm.PriorityFunction
type PriorityFunctionFactory2 func(PluginFactoryArgs) (algorithm.PriorityMapFunction, algorithm.PriorityReduceFunction)

FitPredicateFactory: 根据PluginFactoryArgs返回预选方法
PriorityFunctionFactory: 根据PluginFactoryArgs返回优选方法 老版本
PriorityFunctionFactory2: 根据PluginFactoryArgs返回优选方法 新版本 返回MapReduce方法

3.1 基本结构

type PriorityConfigFactory struct {
    Function          PriorityFunctionFactory
    MapReduceFunction PriorityFunctionFactory2
    Weight            int
}

var (
    schedulerFactoryMutex sync.Mutex

    // maps that hold registered algorithm types
    fitPredicateMap        = make(map[string]FitPredicateFactory)
    mandatoryFitPredicates = sets.NewString()
    priorityFunctionMap    = make(map[string]PriorityConfigFactory)
    algorithmProviderMap   = make(map[string]AlgorithmProviderConfig)

    // Registered metadata producers
    priorityMetadataProducer  PriorityMetadataProducerFactory
    predicateMetadataProducer PredicateMetadataProducerFactory
)

const (
    // DefaultProvider defines the default algorithm provider name.
    DefaultProvider = "DefaultProvider"
)
type AlgorithmProviderConfig struct {
    FitPredicateKeys     sets.String
    PriorityFunctionKeys sets.String
}

可以看到默认调度器的名字为DefaultProvider.
fitPredicateMap: 是一个全局变量, 存着预选名字(predicate)和对应的生成预选方法的FitPredicateFactory.
priorityFunctionMap: 也是一个全局变量, 存着优选名字(priority)和其对应的生成优选方法的PriorityConfigFactory.
algorithmProviderMap: 也是一个全局变量, 存着该调度器(比如DefaultProvider)和其拥有的所有预选名字和所有优选名字. (因为AlgorithmProviderConfig包含着预选和优选名字)
mandatoryFitPredicates: 全局变量, 存着mandatory的预选名字.

3.2 注册预选方法

// pkg/scheduler/factory/plugins.go

func RegisterFitPredicate(name string, predicate algorithm.FitPredicate) string {
    return RegisterFitPredicateFactory(name, func(PluginFactoryArgs) algorithm.FitPredicate { return predicate })
}
// 通过正则表达式检查一下预选的名字是否合法
var validName = regexp.MustCompile("^[a-zA-Z0-9]([-a-zA-Z0-9]*[a-zA-Z0-9])$")

func validateAlgorithmNameOrDie(name string) {
    if !validName.MatchString(name) {
        klog.Fatalf("Algorithm name %v does not match the name validation regexp \"%v\".", name, validName)
    }
}
func RegisterFitPredicateFactory(name string, predicateFactory FitPredicateFactory) string {
    schedulerFactoryMutex.Lock()
    defer schedulerFactoryMutex.Unlock()
    validateAlgorithmNameOrDie(name)
    fitPredicateMap[name] = predicateFactory
    return name
}

很简单, 就是把预选名字和预选方法传进来, 然后注册的FitPredicateFactory生成预选方法的时候就是返回传入进来的预选方法predicate. 然后返回name.

接下来这个是注册自己的FitPredicateFactory. 这个就什么都没有动, 就是放到map里. 然后返回name. 另外RegisterMandatoryFitPredicate多做了一步就是把该name加入到mandatoryFitPredicates中.

func RegisterFitPredicateFactory(name string, predicateFactory FitPredicateFactory) string {
    schedulerFactoryMutex.Lock()
    defer schedulerFactoryMutex.Unlock()
    validateAlgorithmNameOrDie(name)
    fitPredicateMap[name] = predicateFactory
    return name
}
func RegisterMandatoryFitPredicate(name string, predicate algorithm.FitPredicate) string {
    schedulerFactoryMutex.Lock()
    defer schedulerFactoryMutex.Unlock()
    validateAlgorithmNameOrDie(name)
    fitPredicateMap[name] = func(PluginFactoryArgs) algorithm.FitPredicate { return predicate }
    mandatoryFitPredicates.Insert(name)
    return name
}

接下来看看pkg/scheduler/algorithmprovider/defaults/defaults.godefaultPredicates方法如何注册的.

// pkg/scheduler/algorithmprovider/defaults/defaults.go

func defaultPredicates() sets.String {
    return sets.NewString(
        factory.RegisterFitPredicateFactory(
            predicates.NoVolumeZoneConflictPred,
            func(args factory.PluginFactoryArgs) algorithm.FitPredicate {
                return predicates.NewVolumeZonePredicate(args.PVInfo, args.PVCInfo, args.StorageClassInfo)
            },
        ),
        ...
        factory.RegisterMandatoryFitPredicate(predicates.CheckNodeConditionPred, predicates.CheckNodeConditionPredicate),
        factory.RegisterFitPredicate(predicates.PodToleratesNodeTaintsPred, predicates.PodToleratesNodeTaints),
        ...
    )
}

可以看到既调用了RegisterFitPredicateFactory, RegisterMandatoryFitPredicate, 和 RegisterFitPredicate, 这样fitPredicateMap这个全局变量里面存着所有注册的预选名字以及其对应生成预选方法的predicateFactory.
其中defaultPredicates()的返回值就是fitPredicateMap的所有key.

3.3 注册优选方法

// pkg/scheduler/factory/plugins.go
func RegisterPriorityConfigFactory(name string, pcf PriorityConfigFactory) string {
    schedulerFactoryMutex.Lock()
    defer schedulerFactoryMutex.Unlock()
    validateAlgorithmNameOrDie(name)
    priorityFunctionMap[name] = pcf
    return name
}
func RegisterPriorityFunction2(
    name string,
    mapFunction algorithm.PriorityMapFunction,
    reduceFunction algorithm.PriorityReduceFunction,
    weight int) string {
    return RegisterPriorityConfigFactory(name, PriorityConfigFactory{
        MapReduceFunction: func(PluginFactoryArgs) (algorithm.PriorityMapFunction, algorithm.PriorityReduceFunction) {
            return mapFunction, reduceFunction
        },
        Weight: weight,
    })
}

可以看到RegisterPriorityFunction2是后期版本开发的, 带有map-reduce方法, 为了兼容前面版本, 所以都是注册的生成优选方法的都是PriorityConfigFactory. 然后返回name.

接下来看看pkg/scheduler/algorithmprovider/defaults/defaults.godefaultPredicates方法如何注册的.

// pkg/scheduler/algorithmprovider/defaults/defaults.go

func defaultPriorities() sets.String {
    return sets.NewString(
        // spreads pods by minimizing the number of pods (belonging to the same service or replication controller) on the same node.
        factory.RegisterPriorityConfigFactory(
            "SelectorSpreadPriority",
            factory.PriorityConfigFactory{
                MapReduceFunction: func(args factory.PluginFactoryArgs) (algorithm.PriorityMapFunction, algorithm.PriorityReduceFunction) {
                    return priorities.NewSelectorSpreadPriority(args.ServiceLister, args.ControllerLister, args.ReplicaSetLister, args.StatefulSetLister)
                },
                Weight: 1,
            },
        ),
        ...
        factory.RegisterPriorityFunction2("ImageLocalityPriority", priorities.ImageLocalityPriorityMap, nil, 1),
    )
}

其实更预选一样, 然后注册的优选方法都在全局变量priorityFunctionMap, 并且defaultPriorities()返回的就是注册的所有优选方法的名字.

3.4 注册调度器

可以看到注册一个调度器需要传入调度器的名字(name) 以及该调度器拥有的预选方法(predicateKeys) 和 优选方法(priorityKeys)

// pkg/scheduler/factory/plugins.go

func RegisterAlgorithmProvider(name string, predicateKeys, priorityKeys sets.String) string {
    schedulerFactoryMutex.Lock()
    defer schedulerFactoryMutex.Unlock()
    validateAlgorithmNameOrDie(name)
    algorithmProviderMap[name] = AlgorithmProviderConfig{
        FitPredicateKeys:     predicateKeys,
        PriorityFunctionKeys: priorityKeys,
    }
    return name
}

接下来看看pkg/scheduler/algorithmprovider/defaults/defaults.goregisterAlgorithmProvider方法如何注册的.

// pkg/scheduler/algorithmprovider/defaults/defaults.go
func registerAlgorithmProvider(predSet, priSet sets.String) {
    factory.RegisterAlgorithmProvider(factory.DefaultProvider, predSet, priSet)
    ...
}

可以看到该方法就算是把默认的调度器存到algorithmProviderMap这个全局变量中了. 也就是可以通过algorithmProviderMap["DefaultProvider"]获得默认调度器了.

//  pkg/scheduler/factory/plugins.go

func GetAlgorithmProvider(name string) (*AlgorithmProviderConfig, error) {
    schedulerFactoryMutex.Lock()
    defer schedulerFactoryMutex.Unlock()

    provider, ok := algorithmProviderMap[name]
    if !ok {
        return nil, fmt.Errorf("plugin %q has not been registered", name)
    }

    return &provider, nil
}

根据调度器名字获得调度器. 所以GetAlgorithmProvider("DefaultProvider")就可以获得默认调度器了.

3.5 注册默认调度器

那什么时候会调用defults中的registerAlgorithmProvider方法呢?
可以看到pkg/scheduler/algorithmprovider/defaults/defaults.go中的init方法.

// pkg/scheduler/algorithmprovider/defaults/defaults.go

func init() {
    ...
    registerAlgorithmProvider(defaultPredicates(), defaultPriorities())
    ...
}

也就是引用了pkg/scheduler/algorithmprovider/defaults/defaults.go文件的时候就会把默认调度器注册到algorithmProviderMap全局变量中了.

3.6 使用默认调度器

kube-scheduler启动的时候会进入到pkg/scheduler/scheduler.go中的New方法生成Scheduler实例.

// pkg/scheduler/scheduler.go

// New returns a Scheduler
func New(client clientset.Interface,
    nodeInformer coreinformers.NodeInformer,
    podInformer coreinformers.PodInformer,
    pvInformer coreinformers.PersistentVolumeInformer,
    pvcInformer coreinformers.PersistentVolumeClaimInformer,
    replicationControllerInformer coreinformers.ReplicationControllerInformer,
    replicaSetInformer appsinformers.ReplicaSetInformer,
    statefulSetInformer appsinformers.StatefulSetInformer,
    serviceInformer coreinformers.ServiceInformer,
    pdbInformer policyinformers.PodDisruptionBudgetInformer,
    storageClassInformer storageinformers.StorageClassInformer,
    recorder record.EventRecorder,
    schedulerAlgorithmSource kubeschedulerconfig.SchedulerAlgorithmSource,
    stopCh <-chan struct{},
    opts ...func(o *schedulerOptions)) (*Scheduler, error) {
...
    source := schedulerAlgorithmSource
    switch {
    case source.Provider != nil:
        // Create the config from a named algorithm provider.
        sc, err := configurator.CreateFromProvider(*source.Provider)
        if err != nil {
            return nil, fmt.Errorf("couldn't create scheduler using provider %q: %v", *source.Provider, err)
        }
        config = sc
    case source.Policy != nil:
        // Create the config from a user specified policy source.
        policy := &schedulerapi.Policy{}
        switch {
        case source.Policy.File != nil:
            if err := initPolicyFromFile(source.Policy.File.Path, policy); err != nil {
                return nil, err
            }
        case source.Policy.ConfigMap != nil:
            if err := initPolicyFromConfigMap(client, source.Policy.ConfigMap, policy); err != nil {
                return nil, err
            }
        }
        sc, err := configurator.CreateFromConfig(*policy)
        if err != nil {
            return nil, fmt.Errorf("couldn't create scheduler from policy: %v", err)
        }
        config = sc
    default:
        return nil, fmt.Errorf("unsupported algorithm source: %v", source)
    }
...
}

1.kube-scheduler启动命令中如果配置了config参数也就是说用户自己配置预选和优选方法. (这部分在自定义scheduler部分分析), 会进入到case source.Policy != nil:部分进行操作.
2. 如果没有配置的话就会进入到case source.Provider != nil:部分进行, 因为此时的*source.Provider就是DefaultProvider. 进而configurator.CreateFromProvider(*source.Provider)就会进入到pkg/scheduler/factory/factory.go中进行操作, 因为此时的configurator是一个configFactory对象.

// pkg/scheduler/factory/factory.go

func (c *configFactory) CreateFromProvider(providerName string) (*Config, error) {
    klog.V(2).Infof("Creating scheduler from algorithm provider '%v'", providerName)
    provider, err := GetAlgorithmProvider(providerName)
    if err != nil {
        return nil, err
    }
    return c.CreateFromKeys(provider.FitPredicateKeys, provider.PriorityFunctionKeys, []algorithm.SchedulerExtender{})
}

可以看到该方法中调用了pkg/scheduler/factory/plugins.goGetAlgorithmProvider方法, 所以就获得了默认调度器(DefaultProvider)的配置(预选方法和优选方法).

4. 总结

本文分析了默认调度器是如何注册和如何被使用的, 主要涉及到了两个文件pkg/scheduler/factory/plugins.gopkg/scheduler/algorithmprovider/defaults/defaults.go. 对自定义调度器注册预选和优选信息也会有所帮助, 因为自定义调度器肯定也是往上面说的那些全局变量里面写.

你可能感兴趣的:([k8s源码分析][kube-scheduler]scheduler/algorithmprovider之注册default-scheduler)