前言:
由于Affinity对pod的调度更加精细,我们在使用中逐渐代替了NodeSelector。可以分为node亲和性调度和pod亲和性调度。
1)node亲和性调度:不仅有NodeSelector的硬限制,而且可以在软限制中定义权重。
2)pod亲和性调度:它可以使得pod根据在节点上正在运行的pod的标签(而不是节点的标签)进行调度,要求对节点和pod两个条件进行匹配。
1. Node Affinity
1.1 node节点的预制标签
有一些预置的标签,我们可以直接使用
- 查看master上的标签
[root@DoM01 ~]# kubectl get node dom03 --show-labels
NAME STATUS ROLES AGE VERSION LABELS
dom03 Ready master 52d v1.15.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=dom03,kubernetes.io/os=linux,node-role.kubernetes.io/master=
- 格式不是很友好,我们用describe看一下
[root@DoM01 ~]# kubectl describe node dom01
Name: dom01
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=dom01
kubernetes.io/os=linux
node-role.kubernetes.io/master=
……
说明:
"beta.kubernetes.io/arch=amd64","beta.kubernetes.io/os=linux" 这两个在1.18中弃用。
看名字使用,也没什么解释的。
1.2 自定义标签
1.2.1 给node增加标签
- 语法
# kubectl label node node名 键=值
- 示例
[root@DoM01 ~]# kubectl label node don01 zone=east
node/don01 labeled
- 验证
如下可见 don01 设置了标签 zone=east
[root@DoM01 ~]# kubectl describe node don01
Name: don01
Roles:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=don01
kubernetes.io/os=linux
zone=east
1.2.2 修改label
说明:加 --overwrite参数
# kubectl label node don01 zone=south --overwrite
1.2.3 修改label
说明:删除一个key为zone的标签,只需把key的后边加一个减号即会删除该key
# kubectl label node don01 zone-
1.3 Require
概述:
equiredDuringSchedulingIgnoredDuringExecution是硬限制,必须满足此条件才可以调度pod到该node上(功能和nodeSelector很像)示例
apiVersion: v1
kind: Pod
metadata:
name: nginxtest
namespace: test
spec:
affinity:
# 说明是"节点亲和性调度"
nodeAffinity:
# 说明是"节点亲和性调度"
requiredDuringSchedulingIgnoredDuringExecution:
#说明要选择节点了
nodeSelectorTerms:
- matchExpressions:
- key: zone
operator: In
values:
- "east"
containers:
- name: nginxtest
image: harbocto.boe.com.cn/public/nginx
- 创建并查看结果
[root@DoM01 test]# kubectl create -f nginx.yml
pod/nginxtest created
[root@DoM01 test]# kubectl get pod -n test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginxtest 1/1 Running 0 91s 10.244.5.166 don03
说明:可以看到,pod被调度到了一个zone=east的节点don01上
- 更改调度
将yml文件修改成调度到zone=south的节点,然后更新pod如下
[root@DoM01 test]# kubectl apply -f nginx.yml
The Pod "nginxtest" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations)
core.PodSpec{
……
说明:以上报错为了引出下边两个重要规则。
- 规则
1)亲和度的值是不能直接修改的。
2)如果此时修改了node标签,使得节点不满足要求,这个改变也将被系统忽略。(即pod仍会在该节点上运行)
1.4 Perferred
概述
preferredDuringSchedulingIgnoredDuringExecution 是软限制,强调优先满足制定规则,多个优先级可以设置权重。示例
apiVersion: v1
kind: Pod
metadata:
name: nginxtest
namespace: test
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 60
preference:
matchExpressions:
- key: zone
operator: In
values:
- "east"
- weight: 80
preference:
matchExpressions:
- key: zone
operator: In
values:
- "south"
containers:
- name: nginxtest
image: harbocto.boe.com.cn/public/nginx
- 创建并查看结果
[root@DoM01 test]# kubectl create -f nginx.yml
pod/nginxtest created
[root@DoM01 test]# kubectl get pod -n test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginxtest 1/1 Running 0 23s 10.244.7.190 don05
说明:虽然 zone=east的权重是60,但是仍可以调度到上边
- 修改east的权重为20,删除pod再启动一下
[root@DoM01 test]# kubectl create -f nginx.yml
pod/nginxtest created
[root@DoM01 ~]# kubectl get pod -n test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginxtest 1/1 Running 0 18m 10.244.3.33 don01
[root@DoM01 ~]#
说明:可以看到此时该pod被调度到zone=south的节点上了,权重的作用可见一斑。
1.3 注意事项
- 如果设置了nodeSelector和nodeAffinity,则需同时满足。
- 如果设置多个nodeSelectorTerms,有一个满足即可。
- 如果nodeSelectorTerms下有多个 matchExpressions,则必须满足所有条件才可以。
2. Pod Affinity
说明:
根据pod1的标签选是否在某一组(或一个)node节点上部署pod2。
这一组node上用来限制亲和度的标签的key 称为 topologyKey。
因此pod2需要两个标签来确定亲和度:
(1)限制在那个范围内(topologyKey)。( 2)和哪个pod亲和(相应pod的标签)。
关于topologyKey,我们不需要指明值,因为只要同一个值的一组node下亲和就可以了。
2.1 Pod Affinity
说明:
同样分为
"requiredDuringSchedulingIgnoredDuringExecution"
"preferredDuringSchedulingIgnoredDuringExecution"
两种
2.1.1 required
- 参照目标pod
apiVersion: v1
kind: Pod
metadata:
name: nginx-flag
namespace: test
labels:
security: "S1"
app: "nginx-flag"
spec:
containers:
- name: nginx-flag
image: nginx
- pod的亲和度调度
apiVersion: v1
kind: Pod
metadata:
name: nginxtest
namespace: test
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- "S1"
topologyKey: kubernetes.io/hostname
containers:
- name: nginxtest
image: harbocto.boe.com.cn/public/nginx
- 启动并查看结果
[root@DoM01 test]# kubectl create -f nginx-flag.yml
[root@DoM01 test]# kubectl create -f nginx.yml
pod/nginxtest created
[root@DoM01 test]# kubectl get pod -n test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-flag 1/1 Running 1 14m 10.244.7.191 don05
nginxtest 1/1 Running 0 58s 10.244.7.192 don05
说明:可见nginxtest调度到了nginx-flag上
- 找不到合适节点
如果没有启动nginx-flag而直接启动nginxtest,系统找不到合适的节点调度,会一直处于pending状态。
[root@DoM01 test]# kubectl create -f nginx.yml
pod/nginxtest created
[root@DoM01 test]# kubectl get pod -n test
NAME READY STATUS RESTARTS AGE
nginxtest 0/1 Pending 0 3s
如下可见,nginx-flag启动之后,nginxtest被调度到了有nginx-flag的节点上。
[root@DoM01 test]# kubectl create -f nginx-flag.yml
pod/nginx-flag created
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-flag 1/1 Running 0 17s 10.244.5.169 don03
nginxtest 1/1 Running 0 2m33s 10.244.5.168 don03
2.1.2 preferred
- 参照目标pod
同上 - pod的亲和度调度
创建nginx.yml文件如下:
apiVersion: v1
kind: Pod
metadata:
name: nginxtest
namespace: test
spec:
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 20
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- "S1"
topologyKey: kubernetes.io/hostname
- weight: 80
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- "S2"
topologyKey: kubernetes.io/hostname
containers:
- name: nginxtest
image: harbocto.boe.com.cn/public/nginx
- 启动和查看
[root@DoM01 test]# kubectl create -f nginx.yml
pod/nginxtest created
[root@DoM01 test]# kubectl get pod -n test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-flag 1/1 Running 0 56s 10.244.3.34 don01
nginxtest 1/1 Running 0 8s 10.244.3.35 don01
如上,可见nginxtest被调度到了nginx-flag所在的节点上
- 亲和度是相互的
测试:
再启动一个nginx-flag-02,lable设置为security=S2。由前边可知,nginxtest和它的亲和度是80,但是它会主动选择nginxtest。
apiVersion: v1
kind: Pod
metadata:
name: nginx-flag-02
namespace: test
labels:
security: "S2"
app: "nginx-flag"
spec:
containers:
- name: nginx-flag-02
image: harbocto.boe.com.cn/public/nginx
删除nginxtest,在重新启动
[root@DoM01 test]# kubectl create -f nginx-flag02.yml
pod/nginx-flag-02 created
[root@DoM01 test]# kubectl get pod -n test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-flag 1/1 Running 0 3m18s 10.244.3.34 don01
nginx-flag-02 1/1 Running 0 6s 10.244.3.36 don01
nginxtest 1/1 Running 0 2m30s 10.244.3.35 don01
如上:
发现它们竟然会粘在一起,nginx-flag-02居然也会启动在(看了一下node资源,如果没有亲和度的话nginx-flag-02应该启动在don03上。)
- 删除ngintest,在删除nginx-flag-02。再启动nginx-flag-02,果然如上边预期被调度到了don03上。
[root@DoM01 test]# kubectl delete -n test pod nginxtest
pod "nginxtest" deleted
[root@DoM01 test]# kubectl delete -n test pod nginx-flag-02
pod "nginx-flag-02" deleted
[root@DoM01 test]# kubectl create -f nginx-flag02.yml
pod/nginx-flag-02 created
[root@DoM01 test]# kubectl get pod -n test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-flag 1/1 Running 0 5m36s 10.244.3.34 don01
nginx-flag-02 1/1 Running 0 2s 10.244.5.191 don03
- 测试亲和度权重
[root@DoM01 test]# kubectl create -f nginx.yml
pod/nginxtest created
[root@DoM01 test]# kubectl get pod -n test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-flag 1/1 Running 0 4h24m 10.244.3.34 don01
nginx-flag-02 1/1 Running 0 4h19m 10.244.5.191 don03
nginxtest 1/1 Running 0 14s 10.244.5.194 don03
说明:如上可见,nginxtest被调度到权重更高的nginx-flag-02的节点上了。
2.2 Pod Anti Affinity
- 参照pod
复制一下刚才的参照pod,把五个节点的四个都占了
apiVersion: v1
kind: Pod
metadata:
name: nginx-flag-01
namespace: test
labels:
security: "S1"
app: "nginx-flag"
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- "don01"
containers:
- name: nginx-flag-01
image: harbocto.boe.com.cn/public/nginx
---
apiVersion: v1
kind: Pod
metadata:
name: nginx-flag-02
namespace: test
labels:
security: "S2"
app: "nginx-flag"
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- "don02"
containers:
- name: nginx-flag-02
image: harbocto.boe.com.cn/public/nginx
---
apiVersion: v1
kind: Pod
metadata:
name: nginx-flag-03
namespace: test
labels:
security: "S3"
app: "nginx-flag"
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- "don03"
containers:
- name: nginx-flag-03
image: harbocto.boe.com.cn/public/nginx
---
apiVersion: v1
kind: Pod
metadata:
name: nginx-flag-04
namespace: test
labels:
security: "S4"
app: "nginx-flag"
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- "don04"
containers:
- name: nginx-flag-04
image: harbocto.boe.com.cn/public/nginx
- pod的反亲和调度
apiVersion: v1
kind: Pod
metadata:
name: nginxtest-02
namespace: test
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- "nginx-flag"
topologyKey: kubernetes.io/hostname
containers:
- name: nginxtest-02
image: harbocto.boe.com.cn/public/nginx
- 创建和查看
[root@DoM01 test]# kubectl get pod -n test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-flag-01 1/1 Running 0 15m 10.244.3.38 don01
nginx-flag-02 1/1 Running 0 15m 10.244.4.35 don02
nginx-flag-03 1/1 Running 0 15m 10.244.5.201 don03
nginx-flag-04 1/1 Running 0 15m 10.244.6.27 don04
nginxtest-02 1/1 Running 0 14m 10.244.7.210 don05
如上可见,nginxtest-02被调度到最后剩下的一个节点上了
2.3 注意事项
topology
1)反亲和性 requiredDuringScheduling 中topologyKey 不能为空
2)反亲和性 preferredDuringScheduling 中topologyKey 为空,则被认为是如下的组合:
kubernetes.io/hostname
failure-domain.beta.kubernetes.io/zone
failure-domain.beta.kubernetes.io/region
3)如果admission controller 设置了LimitPodHardAntiAffinityTopology ,则互斥性被限制在 kubernetes.io/hostnamenamespace限制
1)位置:和topologyKey同级
2)未定义namespace:表示和参照目标的pod相同
3)设置为空:表示所有namespace