前言:
由于Affinity对pod的调度更加精细,我们在使用中逐渐代替了NodeSelector。可以分为node亲和性调度和pod亲和性调度。
1)node亲和性调度:不仅有NodeSelector的硬限制,而且可以在软限制中定义权重。
2)pod亲和性调度:它可以使得pod根据在节点上正在运行的pod的标签(而不是节点的标签)进行调度,要求对节点和pod两个条件进行匹配。
有一些预置的标签,我们可以直接使用
[root@DoM01 ~]# kubectl get node dom03 --show-labels
NAME STATUS ROLES AGE VERSION LABELS
dom03 Ready master 52d v1.15.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=dom03,kubernetes.io/os=linux,node-role.kubernetes.io/master=
[root@DoM01 ~]# kubectl describe node dom01
Name: dom01
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=dom01
kubernetes.io/os=linux
node-role.kubernetes.io/master=
……
说明:
“beta.kubernetes.io/arch=amd64”,“beta.kubernetes.io/os=linux” 这两个在1.18中弃用。
看名字使用,也没什么解释的。
# kubectl label node node名 键=值
[root@DoM01 ~]# kubectl label node don01 zone=east
node/don01 labeled
[root@DoM01 ~]# kubectl describe node don01
Name: don01
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=don01
kubernetes.io/os=linux
zone=east
说明:加 --overwrite参数
# kubectl label node don01 zone=south --overwrite
说明:删除一个key为zone的标签,只需把key的后边加一个减号即会删除该key
# kubectl label node don01 zone-
概述:
equiredDuringSchedulingIgnoredDuringExecution是硬限制,必须满足此条件才可以调度pod到该node上(功能和nodeSelector很像)
示例
apiVersion: v1
kind: Pod
metadata:
name: nginxtest
namespace: test
spec:
affinity:
# 说明是"节点亲和性调度"
nodeAffinity:
# 说明是"节点亲和性调度"
requiredDuringSchedulingIgnoredDuringExecution:
#说明要选择节点了
nodeSelectorTerms:
- matchExpressions:
- key: zone
operator: In
values:
- "east"
containers:
- name: nginxtest
image: harbocto.boe.com.cn/public/nginx
[root@DoM01 test]# kubectl create -f nginx.yml
pod/nginxtest created
[root@DoM01 test]# kubectl get pod -n test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginxtest 1/1 Running 0 91s 10.244.5.166 don03 <none> <none>
说明:可以看到,pod被调度到了一个zone=east的节点don01上
[root@DoM01 test]# kubectl apply -f nginx.yml
The Pod "nginxtest" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations)
core.PodSpec{
……
说明:以上报错为了引出下边两个重要规则。
概述
preferredDuringSchedulingIgnoredDuringExecution 是软限制,强调优先满足制定规则,多个优先级可以设置权重。
示例
apiVersion: v1
kind: Pod
metadata:
name: nginxtest
namespace: test
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 60
preference:
matchExpressions:
- key: zone
operator: In
values:
- "east"
- weight: 80
preference:
matchExpressions:
- key: zone
operator: In
values:
- "south"
containers:
- name: nginxtest
image: harbocto.boe.com.cn/public/nginx
[root@DoM01 test]# kubectl create -f nginx.yml
pod/nginxtest created
[root@DoM01 test]# kubectl get pod -n test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginxtest 1/1 Running 0 23s 10.244.7.190 don05 <none> <none>
说明:虽然 zone=east的权重是60,但是仍可以调度到上边
[root@DoM01 test]# kubectl create -f nginx.yml
pod/nginxtest created
[root@DoM01 ~]# kubectl get pod -n test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginxtest 1/1 Running 0 18m 10.244.3.33 don01 <none> <none>
[root@DoM01 ~]#
说明:可以看到此时该pod被调度到zone=south的节点上了,权重的作用可见一斑。
说明:
根据pod1的标签选是否在某一组(或一个)node节点上部署pod2。
这一组node上用来限制亲和度的标签的key 称为 topologyKey。
因此pod2需要两个标签来确定亲和度:
(1)限制在那个范围内(topologyKey)。( 2)和哪个pod亲和(相应pod的标签)。
关于topologyKey,我们不需要指明值,因为只要同一个值的一组node下亲和就可以了。
说明:
同样分为
“requiredDuringSchedulingIgnoredDuringExecution”
“preferredDuringSchedulingIgnoredDuringExecution”
两种
apiVersion: v1
kind: Pod
metadata:
name: nginx-flag
namespace: test
labels:
security: "S1"
app: "nginx-flag"
spec:
containers:
- name: nginx-flag
image: nginx
apiVersion: v1
kind: Pod
metadata:
name: nginxtest
namespace: test
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- "S1"
topologyKey: kubernetes.io/hostname
containers:
- name: nginxtest
image: harbocto.boe.com.cn/public/nginx
[root@DoM01 test]# kubectl create -f nginx-flag.yml
[root@DoM01 test]# kubectl create -f nginx.yml
pod/nginxtest created
[root@DoM01 test]# kubectl get pod -n test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-flag 1/1 Running 1 14m 10.244.7.191 don05 <none> <none>
nginxtest 1/1 Running 0 58s 10.244.7.192 don05 <none> <none>
说明:可见nginxtest调度到了nginx-flag上
[root@DoM01 test]# kubectl create -f nginx.yml
pod/nginxtest created
[root@DoM01 test]# kubectl get pod -n test
NAME READY STATUS RESTARTS AGE
nginxtest 0/1 Pending 0 3s
如下可见,nginx-flag启动之后,nginxtest被调度到了有nginx-flag的节点上。
[root@DoM01 test]# kubectl create -f nginx-flag.yml
pod/nginx-flag created
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-flag 1/1 Running 0 17s 10.244.5.169 don03 <none> <none>
nginxtest 1/1 Running 0 2m33s 10.244.5.168 don03 <none> <none>
apiVersion: v1
kind: Pod
metadata:
name: nginxtest
namespace: test
spec:
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 20
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- "S1"
topologyKey: kubernetes.io/hostname
- weight: 80
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- "S2"
topologyKey: kubernetes.io/hostname
containers:
- name: nginxtest
image: harbocto.boe.com.cn/public/nginx
[root@DoM01 test]# kubectl create -f nginx.yml
pod/nginxtest created
[root@DoM01 test]# kubectl get pod -n test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-flag 1/1 Running 0 56s 10.244.3.34 don01 <none> <none>
nginxtest 1/1 Running 0 8s 10.244.3.35 don01 <none> <none>
如上,可见nginxtest被调度到了nginx-flag所在的节点上
测试:
再启动一个nginx-flag-02,lable设置为security=S2。由前边可知,nginxtest和它的亲和度是80,但是它会主动选择nginxtest。
apiVersion: v1
kind: Pod
metadata:
name: nginx-flag-02
namespace: test
labels:
security: "S2"
app: "nginx-flag"
spec:
containers:
- name: nginx-flag-02
image: harbocto.boe.com.cn/public/nginx
删除nginxtest,在重新启动
[root@DoM01 test]# kubectl create -f nginx-flag02.yml
pod/nginx-flag-02 created
[root@DoM01 test]# kubectl get pod -n test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-flag 1/1 Running 0 3m18s 10.244.3.34 don01 <none> <none>
nginx-flag-02 1/1 Running 0 6s 10.244.3.36 don01 <none> <none>
nginxtest 1/1 Running 0 2m30s 10.244.3.35 don01 <none> <none>
如上:
发现它们竟然会粘在一起,nginx-flag-02居然也会启动在(看了一下node资源,如果没有亲和度的话nginx-flag-02应该启动在don03上。)
[root@DoM01 test]# kubectl delete -n test pod nginxtest
pod "nginxtest" deleted
[root@DoM01 test]# kubectl delete -n test pod nginx-flag-02
pod "nginx-flag-02" deleted
[root@DoM01 test]# kubectl create -f nginx-flag02.yml
pod/nginx-flag-02 created
[root@DoM01 test]# kubectl get pod -n test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-flag 1/1 Running 0 5m36s 10.244.3.34 don01 <none> <none>
nginx-flag-02 1/1 Running 0 2s 10.244.5.191 don03 <none> <none>
[root@DoM01 test]# kubectl create -f nginx.yml
pod/nginxtest created
[root@DoM01 test]# kubectl get pod -n test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-flag 1/1 Running 0 4h24m 10.244.3.34 don01 <none> <none>
nginx-flag-02 1/1 Running 0 4h19m 10.244.5.191 don03 <none> <none>
nginxtest 1/1 Running 0 14s 10.244.5.194 don03 <none> <none>
说明:如上可见,nginxtest被调度到权重更高的nginx-flag-02的节点上了。
apiVersion: v1
kind: Pod
metadata:
name: nginx-flag-01
namespace: test
labels:
security: "S1"
app: "nginx-flag"
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- "don01"
containers:
- name: nginx-flag-01
image: harbocto.boe.com.cn/public/nginx
---
apiVersion: v1
kind: Pod
metadata:
name: nginx-flag-02
namespace: test
labels:
security: "S2"
app: "nginx-flag"
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- "don02"
containers:
- name: nginx-flag-02
image: harbocto.boe.com.cn/public/nginx
---
apiVersion: v1
kind: Pod
metadata:
name: nginx-flag-03
namespace: test
labels:
security: "S3"
app: "nginx-flag"
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- "don03"
containers:
- name: nginx-flag-03
image: harbocto.boe.com.cn/public/nginx
---
apiVersion: v1
kind: Pod
metadata:
name: nginx-flag-04
namespace: test
labels:
security: "S4"
app: "nginx-flag"
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- "don04"
containers:
- name: nginx-flag-04
image: harbocto.boe.com.cn/public/nginx
apiVersion: v1
kind: Pod
metadata:
name: nginxtest-02
namespace: test
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- "nginx-flag"
topologyKey: kubernetes.io/hostname
containers:
- name: nginxtest-02
image: harbocto.boe.com.cn/public/nginx
[root@DoM01 test]# kubectl get pod -n test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-flag-01 1/1 Running 0 15m 10.244.3.38 don01 <none> <none>
nginx-flag-02 1/1 Running 0 15m 10.244.4.35 don02 <none> <none>
nginx-flag-03 1/1 Running 0 15m 10.244.5.201 don03 <none> <none>
nginx-flag-04 1/1 Running 0 15m 10.244.6.27 don04 <none> <none>
nginxtest-02 1/1 Running 0 14m 10.244.7.210 don05 <none> <none>
如上可见,nginxtest-02被调度到最后剩下的一个节点上了
topology
1)反亲和性 requiredDuringScheduling 中topologyKey 不能为空
2)反亲和性 preferredDuringScheduling 中topologyKey 为空,则被认为是如下的组合:
kubernetes.io/hostname
failure-domain.beta.kubernetes.io/zone
failure-domain.beta.kubernetes.io/region
3)如果admission controller 设置了LimitPodHardAntiAffinityTopology ,则互斥性被限制在 kubernetes.io/hostname
namespace限制
1)位置:和topologyKey同级
2)未定义namespace:表示和参照目标的pod相同
3)设置为空:表示所有namespace