k8s之pod的亲和性调度

Pod 调度策略

  • 节点选择器:NodeSelector,甚至可以设置nodename来选择节点本身。
  • 亲和性调度:NodeAffinity(节点亲和性)、podAffinity(Pod亲和性)、PodAntiAffinity(Pod的反亲和性)
  • 污点和容忍度:Taint、toleration

亲和性调度Affinity

1.node亲和性调度nodeAffinity

节点亲和性可以根据节点上的标签来约束 Pod 可以调度到哪些节点上。 节点亲和性有两种:

  • requiredDuringSchedulingIgnoredDuringExecution: 调度器只有在规则被满足的时候才能执行调度。也就是我们所说的硬亲和。
  • preferredDuringSchedulingIgnoredDuringExecution: 调度器会尝试寻找满足对应规则的节点。如果找不到匹配的节点,调度器仍然会调度该 Pod。被成为软亲和
[root@master ~]# kubectl explain pod.spec.affinity
KIND:       Pod
VERSION:    v1

FIELD: affinity 

DESCRIPTION:
    If specified, the pod's scheduling constraints
    Affinity is a group of affinity scheduling rules.
    
FIELDS:
  nodeAffinity   #节点亲和性
    Describes node affinity scheduling rules for the pod.

  podAffinity    #pod的亲和性
    Describes pod affinity scheduling rules (e.g. co-locate this pod in the same
    node, zone, etc. as some other pod(s)).

  podAntiAffinity        #pod的反亲和性
    Describes pod anti-affinity scheduling rules (e.g. avoid putting this pod in
    the same node, zone, etc. as some other pod(s)).
[root@master ~]# kubectl explain pod.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution
KIND:       Pod
VERSION:    v1

FIELD: requiredDuringSchedulingIgnoredDuringExecution 

DESCRIPTION:
    If the affinity requirements specified by this field are not met at
    scheduling time, the pod will not be scheduled onto the node. If the
    affinity requirements specified by this field cease to be met at some point
    during pod execution (e.g. due to an update), the system may or may not try
    to eventually evict the pod from its node.
    A node selector represents the union of the results of one or more label
    queries over a set of nodes; that is, it represents the OR of the selectors
    represented by the node selector terms.
    
FIELDS:
  nodeSelectorTerms     <[]NodeSelectorTerm> -required-
    Required. A list of node selector terms. The terms are ORed.

nodeAffinity 的基础上添加多个 nodeSelectorTerms 字段,调度的时候 Node 只需要 nodeSelectorTerms 中的某一个符合条件就符合 nodeAffinity 的规则.在nodeSelectorTerms 中添加 matchExpressions,需要可以调度的Node是满足 matchExpressions 中表示的所有规则.

[root@master ~]# kubectl explain pod.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms
KIND:       Pod
VERSION:    v1

FIELD: nodeSelectorTerms <[]NodeSelectorTerm>

DESCRIPTION:
    Required. A list of node selector terms. The terms are ORed.
    A null or empty node selector term matches no objects. The requirements of
    them are ANDed. The TopologySelectorTerm type implements a subset of the
    NodeSelectorTerm.
    
FIELDS:
  matchExpressions      <[]NodeSelectorRequirement>
    A list of node selector requirements by node's labels.

  matchFields   <[]NodeSelectorRequirement>
    A list of node selector requirements by node's fields.
  • matchExpressions : 匹配表达式,这个标签可以指定一段,例如pod中定义的key为zone,operator为In(包含那些),values为 foo和bar。就是在node节点中包含foo和bar的标签中调度

  • matchFields : 匹配字段,不过可以不定义标签值,可以定义匹配在 node 有 zone 标签值为 foo 或 bar 值的节点上运行 pod

[root@master ~]# kubectl explain pod.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms.matchExpressions
KIND:       Pod
VERSION:    v1

FIELD: matchExpressions <[]NodeSelectorRequirement>

DESCRIPTION:
    A list of node selector requirements by node's labels.
    A node selector requirement is a selector that contains values, a key, and
    an operator that relates the key and values.
    
FIELDS:
  key    -required-
    The label key that the selector applies to.

  operator       -required-  #匹配规则
    Represents a key's relationship to a set of values. Valid operators are In,
    NotIn, Exists, DoesNotExist. Gt, and Lt.
    
    Possible enum values:
     - `"DoesNotExist"`
     - `"Exists"`
     - `"Gt"`
     - `"In"`
     - `"Lt"`
     - `"NotIn"`

  values        <[]string>
    An array of string values. If the operator is In or NotIn, the values array
    must be non-empty. If the operator is Exists or DoesNotExist, the values
    array must be empty. If the operator is Gt or Lt, the values array must have
    a single element, which will be interpreted as an integer. This array is
    replaced during a strategic merge patch.

operator 字段来为 Kubernetes 设置在解释规则时要使用的逻辑操作符。 可以使用 InNotInExistsDoesNotExistGtLt 之一作为操作符。

下面是可以在上述 nodeAffinitypodAffinityoperator 字段中可以使用的所有逻辑运算符。

操作符 行为
In 标签值存在于提供的字符串集中
NotIn 标签值不包含在提供的字符串集中
Exists 对象上存在具有此键的标签
DoesNotExist 对象上不存在具有此键的标签

以下操作符只能与 nodeAffinity 一起使用。

操作符 行为
Gt 字段值将被解析为整数,并且该整数小于通过解析此选择算符命名的标签的值所得到的整数
Lt 字段值将被解析为整数,并且该整数大于通过解析此选择算符命名的标签的值所得到的整数
说明:

GtLt 操作符不能与非整数值一起使用。 如果给定的值未解析为整数,则该 Pod 将无法被调度。 另外,GtLt 不适用于 podAffinity

requiredDuringSchedulingIgnoredDuringExecution 硬亲和

[root@master pod]# cat pod-nodeaffinity.yml 
apiVersion: v1
kind: Pod
metadata:
 name: test-pod
 labels:
  app: nginx
spec:
 affinity:
  nodeAffinity:
   requiredDuringSchedulingIgnoredDuringExecution: #硬亲和
    nodeSelectorTerms:     #使用node标签匹配
     - matchExpressions:   #匹配标签表达式
        - key: kubernetes/test-pod #指定节点上存在的标签
          operator: "In"   #定义匹配逻辑
          values:      #标签对应的值
           - "node-1"
           - "testing"
 containers:
  - name: nginx
    image: daocloud.io/library/nginx
    imagePullPolicy: IfNotPresent
    ports:
     - containerPort: 80
  • 节点必须包含一个键名为 kubernetes/test-pod 的标签, 并且该标签的取值必须node-1testing
[root@master pod]# kubectl apply -f pod-nodeaffinity.yml 
pod/test-pod created
[root@master pod]# kubectl get pod 
NAME       READY   STATUS    RESTARTS   AGE
test-pod   0/1     Pending   0          3s

[root@master pod]# kubectl describe pod test-pod 
Name:             test-pod
Namespace:        default
......
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  11s   default-scheduler  0/3 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 2 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..

给任意一个node节点设置标签
[root@master ~]#  kubectl label node node-1 kubernetes/test-pod=node-1
node/node-1 labeled
[root@master ~]# kubectl get pod 
NAME       READY   STATUS              RESTARTS   AGE
test-pod   1/1     Running                 0     3m29s

preferredDuringSchedulingIgnoredDuringExecution: 软亲和

[root@master pod]# cat pod-nodeaffinity-pre.yml 
apiVersion: v1
kind: Pod
metadata:
 name: pod-2
 namespace: default
 labels:
  type: app
spec:
 affinity:
  nodeAffinity:
   preferredDuringSchedulingIgnoredDuringExecution:
     - preference:
        matchExpressions:
         - key: kubernetes/test-app
           operator: "In"
           values:
            - "type"
            - "type-2"
       weight: 1 #设置权重值,值越高优先级越高
     - preference:
        matchExpressions:
         - key: zone
           operator: "In"
           values:
            - "foo"
            - "bar"
       weight: 10
 containers:
  - image: daocloud.io/library/nginx
    imagePullPolicy: IfNotPresent
    name: nginx-2
[root@master pod]# kubectl apply -f pod-nodeaffinity-pre.yml

节点最好具有一个键名为 kubernetes/test-app 且取值为 type 的标签,如果没有也每关系,这时候就会随机调度到一个节点。如果node节点的标签都一样那就会通过权重值来进行选择,权重值大的优先被选择调度。

软硬亲和性同时存在

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  containers:
  - name: with-node-affinity
    image: nginx
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: zone
            operator: In
            values:
            - dev
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd

  • 此节点亲和性规则表示,只能将Pod放在带有标签的键为 zone 值为 dev 的node上,在满足该条件的节点中,应该首选带有键值为 disktype 值为 ssd 的节点.

  • 如果同时指定nodeSelector和nodeAffinity,则必须满足两个条件,才能将Pod调度到候选节点上。

  • 如果指定了多个nodeSelectorTerms关联nodeAffinity类型,那么pod 可以安排到满足nodeSelectorTerms之一的节点。

  • 如果指定matchExpressions与关联的多个nodeSelectorTerms,则只有matchExpressions在满足所有nodeSelectorTerms条件的情况下,才能将Pod调度到节点上。

  • 如果删除或更改计划了pod的节点的标签,则该pod不会被删除。亲和性选择仅在安排pod时有效。

  • weight 在场 preferredDuringSchedulingIgnoredDuringExecution 的范围是从1-100,值越大优先级越高,计算节点权重值之和,和 matchExpressions 的匹配度结合,实现调度 pod 节点的选择。

你可能感兴趣的:(Kubernetes,kubernetes,容器,云原生)