k8s pod 优先级和抢占式调度

简介

Pod priority and Preemption

在k8s里面调度节点的时候可以给pod指定Priority,让pod有不同的优先级.这样在scheduler调度pod的时候会优先调度优先级高的pod,如果发生资源不够的时候会触发抢占式调度.

启用 Pod priority and Preemption

  • 在1.11之后的版本中默认开启,并且在1.14中变成stable.
  • 在1.11之前的版本需要给kube-scheduler指定--feature-gates=PodPriority=true来开启

example

创建PriorityClass

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "This priority class should be used for Test pods only."

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low-priority
value: 10000
globalDefault: false
description: "This priority class should be used for Test pods only."

上面的yaml中定义了2个优先级 high-priority, low-priority.value分别是1000000,10000.

创建deployment

apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
  name: nginx-deploy-high
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 1 # tells deployment to run 2 pods matching the template
  template:
    metadata:
      labels:
        app: nginx
    spec:
      hostNetwork: true
      priorityClassName: high-priority
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 8088

apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
  name: nginx-deploy-low
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 1 # tells deployment to run 2 pods matching the template
  template:
    metadata:
      labels:
        app: nginx
    spec:
      hostNetwork: true
      priorityClassName: low-priority
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 8088


kubectl create -f ./nginx- deploy-low-priority.yaml
kubectl create -f ./nginx-deploy-high.yaml

About to try and schedule pod prometheus/nginx-deploy-high-76b56d5cc5-vpfjn
I1225 15:10:09.753527       1 scheduler.go:456] Attempting to schedule pod: prometheus/nginx-deploy-high-76b56d5cc5-vpfjn
I1225 15:10:09.753643       1 generic_scheduler.go:648] since alwaysCheckAllPredicates has not been set, the predicate evaluation is short circuited and there are chances of other predicates failing as well.
I1225 15:10:09.753696       1 factory.go:665] Unable to schedule prometheus/nginx-deploy-high-76b56d5cc5-vpfjn: no fit: 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports.; waiting
I1225 15:10:09.753741       1 factory.go:736] Updating pod condition for prometheus/nginx-deploy-high-76b56d5cc5-vpfjn to (PodScheduled==False, Reason=Unschedulable)
I1225 15:10:09.755568       1 generic_scheduler.go:318] Pod prometheus/nginx-deploy-high-76b56d5cc5-vpfjn is not eligible for more preemption.
I1225 15:10:09.755726       1 scheduling_queue.gkube
I1225 15:10:11.729743       1 generic_scheduler.go:1147] Node host108752172 is a potential node for preemption.
I1225 15:10:11.729916       1 generic_scheduler.go:648] since alwaysCheckAllPredicates has not been set, the predicate evaluation is short circuited and there are chances of other predicates failing as well.
I1225 15:10:11.730407       1 cache.go:309] Finished binding for pod ac27a286-4272-47a6-8677-735b23e981fa. Can be expired.
I1225 15:10:11.730627       1 scheduler.go:593] pod prometheus/nginx-deploy-high-76b56d5cc5-vpfjn is bound successfully on node host108752172, 1 nodes evaluated, 1 nodes were found feasible
I1225 15:10:12.066208       1 leaderelection.go:276] successfully renewed lease kube-system/kube-scheduler

分析

上面通过kubectl创建了2个deployment,nginx-deploy-low和nginx-deploy-high. nginx-deploy-low是先创建的,nginx-deploy-high后创建.上面的日志可以看到scheduler在调度nginx-deploy-high-76b56d5cc5-vpfjn的时候发现短裤8088已经被nginx-deploy-low的pod占了.然后nginx-deploy-high-76b56d5cc5-vpfjn这个pod因为Priority的值比low的pod高.所以scheduler会标记Node host108752172 is a potential node for preemption.为可抢占.然后正在running的nginx-deploy-low pod会变成为pending.nginx-deploy-high pod会变为running.

总结

  • 如果有2个pod在调度队列里面,一个的priority比较高,一个比较低.调度器会以优先调度priority值高的.这里因为实验环境不好重新.
  • 如果调度的时候发现资源不够了,scheduler会抢占优先级比较低的pod的资源优先给优先级高的pod.

你可能感兴趣的:(k8s pod 优先级和抢占式调度)