当一个集群中有多个节点有足够多的资源来运行容器时,Kubernetes会给每个节点打分,然后选出一个得分最高的结点部署容器。本文讨论与资源(内存和CPU)相关的调度器打分算法和配置。
Kubernetes调度器有三个算法与资源相关。第一个是least_requested算法。它的核心思想是把尽可能地降低每个节点的资源利用率,把容器分散到尽可能多的节点上去。它的打分算法如下:
// Theunused capacity is calculated on a scale of 0-10
// 0being the lowest priority and 10 being the highest.
// Themore unused resources the higher the score is.
func calculateUnusedScore(requested int64, capacity int64, node string) int64 {
if capacity == 0 {
return 0
}
if requested > capacity {
glog.V(10).Infof("Combinedrequested resources %d from existing pods exceeds capacity %d on node %s",
requested, capacity, node)
return 0
}
return ((capacity - requested) * 10) /capacity
}
allocatableResources:= nodeInfo.AllocatableResource()
totalResources:= *podRequests
totalResources.MilliCPU+= nodeInfo.NonZeroRequest().MilliCPU
totalResources.Memory+= nodeInfo.NonZeroRequest().Memory
cpuScore:= calculateUnusedScore(totalResources.MilliCPU, allocatableResources.MilliCPU,node.Name)
memoryScore:= calculateUnusedScore(totalResources.Memory, allocatableResources.Memory,node.Name)
最终的得分是cpuScore和memoryScore的平均值。从上述代码可以看出,资源利用率越低的节点的得分越高。假设集群中有两个分别空余有2个CPU和4个CPU的两个结点,同时我们想通过Kubernetes部署一个需要一个CPU的容器,那么在其他条件等同的情况下根据least_requested算法该容器将优先部署到有4个CPU的节点上。
第二个算法most_requested正好相反,它的核心思想是尽可能地容器集中在资源利用率最高的结点上。这从它对应的代码中不难看出:
// Theused capacity is calculated on a scale of 0-10
// 0being the lowest priority and 10 being the highest.
// Themore resources are used the higher the score is. This function
// isalmost a reversed version of least_requested_priority.calculatUnusedScore
// (10 -calculateUnusedScore). The main difference is in rounding. It was added to
// keepthe final formula clean and not to modify the widely used (by users
// intheir default scheduling policies) calculateUSedScore.
func calculateUsedScore(requested int64, capacity int64, node string) int64 {
if capacity == 0 {
return 0
}
if requested > capacity {
glog.V(10).Infof("Combinedrequested resources %d from existing pods exceeds capacity %d on node %s",
requested, capacity, node)
return 0
}
return (requested * 10) / capacity
}
假设集群中有两个分别空余有2个CPU和4个CPU的两个结点,同时我们想通过Kubernetes部署一个需要一个CPU的容器,那么在其他条件等同的情况下根据most_requested算法该容器将优先部署到有2个CPU的节点上。
第三个算法是balanced_resource_allocation, 它的核心思想是尽量平衡结点的CPU和内存利用率。它的相关代码如下:
allocatableResources:= nodeInfo.AllocatableResource()
totalResources:= *podRequests
totalResources.MilliCPU+= nodeInfo.NonZeroRequest().MilliCPU
totalResources.Memory+= nodeInfo.NonZeroRequest().Memory
cpuFraction:= fractionOfCapacity(totalResources.MilliCPU, allocatableResources.MilliCPU)
memoryFraction:= fractionOfCapacity(totalResources.Memory, allocatableResources.Memory)
score :=int(0)
if cpuFraction >= 1 || memoryFraction >= 1 {
// if requested >= capacity, thecorresponding host should never be preferred.
score = 0
} else {
// Upper and lower boundary of differencebetween cpuFraction and memoryFraction are -1 and 1
// respectively. Multilying the absolutevalue of the difference by 10 scales the value to
// 0-10 with 0 representing well balancedallocation and 10 poorly balanced. Subtracting it from
// 10 leads to the score which also scalesfrom 0 to 10 while 10 representing well balanced.
diff := math.Abs(cpuFraction -memoryFraction)
score = int(10 - diff*10)
}
在上述代码中,先分别求出CPU和内存的利用率,然后得到利用率的差。利用率的差越大的结点得分越低。
Kubernetes尽量避免CPU或者内存使用率为100%的结点。当CPU或者内存使用率为100%时该结点的得分为0,为最低的优先级。
另外,当我们没有明确容器的资源需求(容器部署配置中的Resources中的Requests)时,Kubernetes调度器在给结点打分时会认为这样的容器需要100m的CPU和200M内存(non-zero.go)。
// Foreach of these resources, a pod that doesn't request the resource explicitly
// willbe treated as having requested the amount indicated below, for the purpose
// ofcomputing priority only. This ensures that when scheduling zero-request pods,such
// podswill not all be scheduled to the machine with the smallest in-use request,
// andthat when scheduling regular pods, such pods will not see zero-request pods as
// consuming no resources whatsoever. We chose these values to be similar to the
// resources that we give to cluster addon pods (#10653). But they are prettyarbitrary.
// Asdescribed in #11713, we use request instead of limit to deal with resourcerequirements.
constDefaultMilliCpuRequest int64 = 100 // 0.1 core
constDefaultMemoryRequest int64 = 200 * 1024 * 1024 // 200 MB
Kubernetes调度器kube-scheduler提供--algorithm-provider配置给节点打分的算法。给配置的缺省选项DefaultProvider选用least_requested算法,而ClusterAutoscalerProvider选用most_requested算法。两个选项均采用ballanced_resource_allocation算法。