kube-batch overused问题定位

kube-batch is batch scheduler built on Kubernetes, providing mechanisms for the applications which would like to run batch jobs in Kubernetes.

default-scheduler每次只能调度一个pod，所以我用kube-bach解决多job，多pod调度问题。

背景

创建一个Tensorflow分布式作业，包括2 ps和2 worker四个任务。每个任务对应创建一个K8S Job，并行度1，即一个Pod。

配置文件如下：

- apiVersion: batch/v1
  kind: Job
  metadata:
    name: cyx2-worker-0
        annotations:
          scheduling.k8s.io/group-name: cyx2
  spec:
   template:
     spec:
       containers:
       -  resources:
            limits:
              nvidia.com/gpu: "1"
            requests:
              cpu: "1"
              memory: 1Gi
- apiVersion: batch/v1
  kind: Job
  metadata:
    name: cyx2-ps-0
        annotations:
          scheduling.k8s.io/group-name: cyx2
  spec:
   template:
     spec:
       containers:

这个四个K8S Job都有相同的scheduling.k8s.io/group-name: cyx2，所以可以被一个PodGroup（kube-batch创建的CRD）管理。

问题

我们希望在资源足够的情况下，四个Job都running，否则都不running。但是发现事与愿违。集群资源足够，但是四个Job都是pending。

日志

ps. 只有关键日志

I1009 21:42:36.472045   21605 allocate.go:42] Enter Allocate ...
I1009 21:42:36.472224   21605 allocate.go:118] Binding Task  to node <192.168.47.52>
I1009 21:42:36.472399   21605 allocate.go:118] Binding Task  to node <192.168.47.52>
I1009 21:42:36.472426   21605 allocate.go:72] Queue  is overused, ignore it.
I1009 21:42:36.472431   21605 allocate.go:155] Leaving Allocate ..

这里我们看到调度程序已经进入资源分配阶段，但是只调度了2个worker task，就显示overused。显然问题出现在这里。

overused相关概念在queue，就是说资源使用量超过了queue可使用资源总量，但是我没有设置过queue啊，所以应该是默认配置作梗，只能看源代码了。

源码

之前曲折的代码定位就不复述了，直接到重点代码。

kube-batch\pkg\scheduler\plugins\proportion\proportion.go

remaining := pp.totalResource.Clone()

// Calculates the deserved of each Queue.
attr.deserved.Add(remaining.Clone().Multi(float64(attr.weight) / float64(totalWeight)))

if !attr.deserved.LessEqual(attr.request) {
        attr.deserved = helpers.Min(attr.deserved, attr.request)
}

计算集群资源总数
根据queue权重，设置queue的可以用资源数，默认使用全部资源
比较可用资源和申请资源，取小的。

因为我ps没有设置资源申请，所以queue的可用资源总数就等于两个worker的资源总数。当调度完两个worker之后，资源就用光了，所以overused。

解决

解决方法很简单，给ps也设置资源申请就好了。

QoS

Guaranteed：每个容器都必须设置CPU和内存的限制和请求（最大和最小）。最严格的要求
1. Every Container in the Pod must have a memory limit and a memory request, and they must be the same.
2. Every Container in the Pod must have a CPU limit and a CPU request, and they must be the same.
Burstable：在不满足Guaranteed的情况下，至少设置一个CPU或者内存的请求。
1. The Pod does not meet the criteria for QoS class Guaranteed.
2. At least one Container in the Pod has a memory or CPU request.
BestEffort：什么都不设置，佛系资源申请。
1. For a Pod to be given a QoS class of BestEffort, the Containers in the Pod must not have any memory or CPU limits or requests.

kube-batch overused问题定位

kube-batch overused问题定位

背景

问题

日志

源码

解决

QoS

你可能感兴趣的:(kube-batch overused问题定位)