k8s之亲和性、反亲和性

k8s中亲和性与反亲和性

Kubernetes的默认调度器以预选、优选、选定机制完成将每个新的Pod资源绑定至为其选出的目标节点上,不过,它只是Pod对象的默认调度器,默认情况下调度器考虑的是资源足够,并且负载尽量平均。

在使用中,用户还可以自定义调度器插件,并在定义Pod资源配置清单时通过spec.schedulerName指定即可。

一、node亲和性

NodeAffinity意为Node节点亲和性的调度策略,是用于替换NodeSelector的全新调度策略。

定义节点亲和性规则时有两种类型的节点亲和性规则 :硬亲和性required和软亲和性preferred。 硬亲和性实现的是强制性规则,它是Pod调度时必须要满足的规则,而在不存在满足规则的节点时 , Pod对象会被置为Pending状态。 而软亲和性规则实现的是一种柔性调度限制,它倾向于将Pod对象运行于某类特定的节点之上,而调度器也将尽量满足此需求,但在无法满足调度需求时它将退而求其次地选择一个不匹配规则的节点。

1.1、nodeSelector

对于最初的k8s实现pod指定node调度时使用nodeSelector来实现的,主要是通过定义node以及pod标签进行选择,具体实现如下:

为节点添加label标签
[root@node1 ~]# kubectl label node node2 app=web
node/node2 labeled
######
定义deployment启动pod,如下:
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deploy
  labels:
    app: web
spec:
  replicas: 13
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      nodeSelector:
        app: web             ###选择标签
      containers:
      - name: nginx-deploy
        image: nginx:latest
        imagePullPolicy: IfNotPresent
            
[root@node1 ~]# kubectl apply -f deploy-pod.yaml 
deployment.apps/deploy created
[root@node1 ~]# 
########
查看pod启动所在的节点,如下:
[root@node1 ~]# kubectl get pod -o wide
NAME                            READY   STATUS    RESTARTS   AGE     IP              NODE    NOMINATED NODE   READINESS GATES
deploy-6cb97b569b-292kw         1/1     Running   0          11s     172.25.104.62   node2   <none>           <none>
deploy-6cb97b569b-2qbfm         1/1     Running   0          11s     172.25.104.52   node2   <none>           <none>
deploy-6cb97b569b-58px4         1/1     Running   0          11s     172.25.104.54   node2   <none>           <none>
deploy-6cb97b569b-7cmqv         1/1     Running   0          11s     172.25.104.56   node2   <none>           <none>
deploy-6cb97b569b-cmq74         1/1     Running   0          11s     172.25.104.57   node2   <none>           <none>
deploy-6cb97b569b-cpv8x         1/1     Running   0          11s     172.25.104.59   node2   <none>           <none>
deploy-6cb97b569b-d9hwz         1/1     Running   0          11s     172.25.104.63   node2   <none>           <none>
deploy-6cb97b569b-f2zwf         1/1     Running   0          11s     172.25.104.60   node2   <none>           <none>
deploy-6cb97b569b-f6hbl         1/1     Running   0          11s     172.25.104.61   node2   <none>           <none>
deploy-6cb97b569b-kz46f         1/1     Running   0          11s     172.25.104.58   node2   <none>           <none>
deploy-6cb97b569b-mjmnv         1/1     Running   0          11s     172.25.104.55   node2   <none>           <none>
deploy-6cb97b569b-nkdwm         1/1     Running   0          11s     172.25.104.51   node2   <none>           <none>
deploy-6cb97b569b-tg7qc         1/1     Running   0          11s     172.25.104.53   node2   <none>           <none>

1.2、node硬亲和性

为Pod对象使用nodeSelector属性可以基于节点标签匹配的方式将Pod对象强制调度至某一类特定的节点之上 ,不过它仅能基于简单的等值关系定义标签选择器,而nodeAffinity中支持使用 matchExpressions属性构建更为复杂的标签选择机制。

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deploy
  labels:
    app: web
spec:
  replicas: 6
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: nginx-deploy
        image: nginx:latest
        imagePullPolicy: IfNotPresent
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:  # 硬策略
            nodeSelectorTerms:
            - matchExpressions:
              - key: app
                operator: In
                values:
                - web
######
为node添加标签,如下:
[root@node1 ~]# kubectl label node node2 app=web
node/node2 labeled

######
启动pod,如下pod都起在node上:
[root@node1 ~]# kubectl apply -f deploy-pod.yaml 
deployment.apps/deploy created
[root@node1 ~]# kubectl get pod -o wide
NAME                            READY   STATUS    RESTARTS   AGE     IP              NODE    NOMINATED NODE   READINESS GATES
deploy-66747445f7-25pwb         1/1     Running   0          14s     172.25.104.5    node2   <none>           <none>
deploy-66747445f7-4qdf7         1/1     Running   0          14s     172.25.104.10   node2   <none>           <none>
deploy-66747445f7-f24bq         1/1     Running   0          14s     172.25.104.20   node2   <none>           <none>
deploy-66747445f7-f9vbq         1/1     Running   0          14s     172.25.104.34   node2   <none>           <none>
deploy-66747445f7-gx4mq         1/1     Running   0          14s     172.25.104.28   node2   <none>           <none>
deploy-66747445f7-zwtc8         1/1     Running   0          14s     172.25.104.33   node2   <none>           <none>

在定义节点亲和性时,requiredDuringSchedulinglgnoredDuringExecution字段的值是一个对象列表,用于定义节点硬亲和性,它可由一到多个nodeSelectorTerm定义的对象组成, 彼此间为“逻辑或”的关系,进行匹配度检查时,在多个nodeSelectorTerm之间只要满足其中之一 即可。

preferredDuringSchedulingIgnoredDuringExecution和requiredDuringSchedulingIgnoredDuringExecution名字中的后半段符串IgnoredDuringExecution隐含的意义所指,在Pod资源基于节点亲和性规则调度至某节点之后,节点标签发生了改变而不再符合此节点亲和性规则时 ,调度器不会将Pod对象从此节点上移出,因为,它仅对新建的Pod对象生效。

nodeSelectorTerm用于定义节点选择器条目,其值为对象列表,它可由一个或多个matchExpressions对象定义的匹配规则组成,多个规则彼此之间为“逻辑与”的关系, 这就意味着某节点的标签需要完全匹配同一个nodeSelectorTerm下所有的matchExpression对象定义的规则才算成功通过节点选择器条目的检查。而matchExmpressions又可由 一到多 个标签选择器组成,多个标签选择器彼此间为“逻辑与”的关系 。

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deploy
  labels:
    app: web
spec:
  replicas: 6
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: nginx-deploy
        image: nginx:latest
        imagePullPolicy: IfNotPresent
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:  # 硬策略
            nodeSelectorTerms:
            - matchExpressions:
              - key: app
                operator: In
                values:
                - server
                - web
#######为node2和node3打标签
[root@node1 ~]# kubectl label node node2 app=web
node/node2 labeled
[root@node1 ~]# kubectl label node node3 app=server
node/node3 labeled
########启动pod,如下会在node2和node3上调度
[root@node1 ~]# kubectl get pod -o wide
NAME                            READY   STATUS    RESTARTS   AGE     IP              NODE    NOMINATED NODE   READINESS GATES
deploy-d78b4d4d9-5hb4k          1/1     Running   0          2m17s   172.25.104.47   node2   <none>           <none>
deploy-d78b4d4d9-l8tjk          1/1     Running   0          2m17s   172.25.135.61   node3   <none>           <none>
deploy-d78b4d4d9-mcvsk          1/1     Running   0          2m17s   172.25.135.60   node3   <none>           <none>
deploy-d78b4d4d9-mj7gk          1/1     Running   0          2m17s   172.25.104.43   node2   <none>           <none>
deploy-d78b4d4d9-r5xqn          1/1     Running   0          2m17s   172.25.104.45   node2   <none>           <none>
deploy-d78b4d4d9-zl684          1/1     Running   0          2m17s   172.25.104.46   node2   <none>           <none>

构建标签选择器表达式中支持使用操作符有In、Notln、Exists、DoesNotExist、Lt和Gt等

In:label的值在某个列表中
NotIn:label的值不在某个列表中
Gt:label的值大于某个值
Lt:label的值小于某个值
Exists:某个label存在 #####values为任意值。
DoesNotExist:某个label不存在

1.3、node软亲和性

节点软亲和性为节点选择机制提供了一种柔性控制逻辑,被调度的Pod对象不再是“必须”而是“应该”放置于某些特定节点之上,当条件不满足时它也能够接受被编排于其他不符合条件的节点之上。另外,它还为每种倾向性提供了weight属性以便用户定义其优先级,取值范围是1 ~ 100,数字越大优先级越高 。

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deploy
  labels:
    app: web
spec:
  replicas: 6
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: nginx-deploy
        image: nginx:latest
        imagePullPolicy: IfNotPresent
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 60      ###设置app=web的权重为60
              preference:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                    - web

            - weight: 40   ###设置app=server的权重为40
              preference:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                    - server

#####启动pod如下,会发现大部分pod在app=web的node上
[root@node1 ~]# kubectl get pod -o wide  
NAME                            READY   STATUS    RESTARTS   AGE     IP              NODE    NOMINATED NODE   READINESS GATES
deploy-55bf777f76-5466z         1/1     Running   0          9m9s    172.25.104.50   node2   <none>           <none>
deploy-55bf777f76-62rrz         1/1     Running   0          9m9s    172.25.104.49   node2   <none>           <none>
deploy-55bf777f76-bf9bn         1/1     Running   0          9m9s    172.25.104.48   node2   <none>           <none>
deploy-55bf777f76-lx5pz         1/1     Running   0          9m9s    172.25.104.53   node2   <none>           <none>
deploy-55bf777f76-s78v5         1/1     Running   0          9m9s    172.25.135.62   node3   <none>           <none>
deploy-55bf777f76-t9cw9         1/1     Running   0          9m9s    172.25.104.63   node2   <none>           <none>

二、pod亲和性

Pod亲和性指的是满足特定条件的的Pod对象运行在同一个node上, 而反亲和性调度则要求它们不能运行于同一node 。

2.1、pod硬亲和性

Pod强制约束的亲和性调度也使用requiredDuringSchedulinglgnoredDuringExecution属性进行定

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deploy
  labels:
    app: web
spec:
  replicas: 6
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: nginx-deploy
        image: nginx:latest
        imagePullPolicy: IfNotPresent
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - web
            topologyKey: kubernetes.io/hostname
                
######启动pod
[root@node1 ~]# kubectl apply -f deploy-pod-Affinity.yaml 
deployment.apps/deploy created

######查看如下,所有pod会在一台node上启动
[root@node1 ~]# kubectl get po -o wide
NAME                      READY   STATUS    RESTARTS   AGE   IP               NODE    NOMINATED NODE   READINESS GATES
deploy-5b9d5b8b48-2h6qb   1/1     Running   0          6s    172.25.166.150   node1   <none>           <none>
deploy-5b9d5b8b48-8dbtf   1/1     Running   0          6s    172.25.166.151   node1   <none>           <none>
deploy-5b9d5b8b48-9b895   1/1     Running   0          6s    172.25.166.154   node1   <none>           <none>
deploy-5b9d5b8b48-cngp7   1/1     Running   0          6s    172.25.166.149   node1   <none>           <none>
deploy-5b9d5b8b48-qpp9n   1/1     Running   0          6s    172.25.166.152   node1   <none>           <none>
deploy-5b9d5b8b48-ww7jk   1/1     Running   0          6s    172.25.166.153   node1   <none>           <none>

在调度示例中的Deployment控制器创建的Pod资源时,调度器首先会基于标签选择器 查询拥有标签app=db的所有Pod资源,接着获取到它们分别所属 的节点的zone标签值,接下来再查询拥有匹配这些标签值的所有节点,从而完成节点预选。而后根据优选函数计算这些节点的优先级,从而挑选出运行新建Pod对象的节点。

2.2、pod软亲和性

类似于节点亲和性机制,Pod也支持使用preferredDuringSchedulinglgnoredDuringExecution属性定义柔性亲和机制,调度器会尽力确保满足亲和约束的调度逻辑,然而在约束条 件不能得到满足时,它也允许将Pod对象调度至其他节点运行。下面是一个使用了Pod软亲和性调度机制的资源配置清单示例

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-affinity
spec:
  replicas: 5
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      name: myapp
      labels:
        app: myapp
    spec:
      affinity:
        podAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 80
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - {key: app, operator: In, values: ["nginx"]}
              topologyKey: zone
          - weight: 20
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - {key: app, operator: In, values: ["apach"]}
              topologyKey: zone
      containers:
      - name: nginx
        image: nginx
######启动pod
[root@node1 ~]# kubectl apply -f pod-soft.yaml 
deployment.apps/app-affinity created
#####如下:
[root@node1 ~]# kubectl get po -o wide
NAME                            READY   STATUS    RESTARTS   AGE   IP               NODE    NOMINATED NODE   READINESS GATES
app-affinity-66fbb677c7-2kwjf   1/1     Running   0          3s    172.25.166.158   node1   <none>           <none>
app-affinity-66fbb677c7-5thfw   1/1     Running   0          3s    172.25.166.155   node1   <none>           <none>
app-affinity-66fbb677c7-drdml   1/1     Running   0          3s    172.25.166.159   node1   <none>           <none>
app-affinity-66fbb677c7-qq9fn   1/1     Running   0          3s    172.25.166.156   node1   <none>           <none>
app-affinity-66fbb677c7-vq4jg   1/1     Running   0          3s    172.25.166.157   node1   <none>           <none>

它定义了两组亲和性判定机制,一个是选择nginx Pod所在节点的zone标签,并赋予了较高的权重80,另一个是选择apach Pod所在节点的 zone标签,它有着略低的权重20。于是,调度器会将目标节点分为四类 :nginx Pod和apach Pod同时所属的zone、nginx Pod单独所属的zone、apach Pod单独所属的zone,以及其他所有的zone。

2.3、pod反亲和性

podAffinity用于定义Pod对象的亲和约束,对应地,将其替换为podAntiAffinty即可用于定义Pod对象的反亲和约束。不过,反亲和性调度一般用于分散同一类应用的Pod对象等,也包括将不同安全级别的Pod对象调度至不同的区域、机架或节点等。下面的资源配置清单中定义了由同一Deployment创建但彼此基于节点位置互斥的Pod对象:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 4
  template:
    metadata:
      labels:
        app: nginx
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - nginx
            topologyKey: "kubernetes.io/hostname"
      containers:
      - name: nginx-server
        image: nginx:latest
######如下:启动了4个副本,只有三台node,只有一个pod无法调度,其他pod都是分布在不同node上。
[root@node1 yaml]# kubectl get pod -o wide
NAME                     READY   STATUS    RESTARTS   AGE     IP               NODE     NOMINATED NODE   READINESS GATES
nginx-86d6477c48-7mmt2   1/1     Running   0          2m18s   172.25.104.61    node2    <none>           <none>
nginx-86d6477c48-f2z2c   1/1     Running   0          2m18s   172.25.135.38    node3    <none>           <none>
nginx-86d6477c48-nv5x2   1/1     Running   0          2m18s   172.25.166.143   node1    <none>           <none>
nginx-86d6477c48-wsw4x   0/1     Pending   0          2m18s   <none>           <none>   <none>           <none>

参考链接:链接

你可能感兴趣的:(k8s,kubernetes,docker,容器)