Kubernetes属于主从分布式架构,主要由Master Node和Worker Node组成,以及包括客户端命令行工具kubectl和其它附加项。
创建Pod的整个流程,时序图如下:
nodeSelector是最简单的约束方式。nodeSelector是pod.spec的一个字段
通过–show-labels可以查看指定node的labels
[root@master ~]# kubectl get node node1.example.com --show-labels
NAME STATUS ROLES AGE VERSION LABELS
node1.example.com Ready 4d6h v1.23.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1.example.com,kubernetes.io/os=linux
如果没有额外添加 nodes labels,那么看到的如上所示的默认标签。我们可以通过 kubectl label node 命令给指定 node 添加 labels:
[root@master ~]# kubectl label node node1.example.com disktype=ssd
node/node1.example.com labeled
[root@master ~]# kubectl get node node1.example.com --show-labels
NAME STATUS ROLES AGE VERSION LABELS
node1.example.com Ready 4d6h v1.23.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1.example.com,kubernetes.io/os=linux
当然,你也可以通过 kubectl label node 删除指定的 labels(标签 key 接 - 号即可)
[root@master ~]# kubectl label node node1.example.com disktype-
node/node1.example.com unlabeled
[root@master ~]# kubectl get node node1.example.com --show-labels
NAME STATUS ROLES AGE VERSION LABELS
node1.example.com Ready 4d6h v1.23.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1.example.com,kubernetes.io/os=linux
创建测试 pod 并指定 nodeSelector 选项绑定节点:
[root@master ~]# kubectl label node node1.example.com disktype=ssd
node/node1.example.com labeled
[root@master ~]# kubectl get node node1.example.com --show-labels
NAME STATUS ROLES AGE VERSION LABELS
node1.example.com Ready 4d6h v1.23.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1.example.com,kubernetes.io/os=linux
[root@master ~]# vi test.yml
[root@master ~]# cat test.yml
apiVersion: v1
kind: Pod
metadata:
name: test
labels:
env: test
spec:
containers:
- name: test
image: nginx
imagePullPolicy: IfNotPresent
nodeSelector:
disktype: ssd
查看pod调度的节点,test这个pod被强制调度到了有disktype=ssd这个label的node上。
[root@master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
httpd1-57c7b6f7cb-sk86h 1/1 Running 4 (173m ago) 2d 10.244.1.93 node2.example.com
nginx1-7cf8bc594f-8j8tv 1/1 Running 1 (173m ago) 3h40m 10.244.1.94 node2.example.com
nodeAffinity意为node亲和性调度策略。是用于替换nodeSelector的全新调度策略。目前有两种节点亲和性表达:
RequiredDuringSchedulingIgnoredDuringExecution:
必须满足制定的规则才可以调度pode到Node上。相当于硬限制
PreferredDuringSchedulingIgnoreDuringExecution:
强调优先满足制定规则,调度器会尝试调度pod到Node上,但并不强求,相当于软限制。多个优先级规则还可以设置权重值,以定义执行的先后顺序。
IgnoredDuringExecution的意思是:
如果一个pod所在的节点在pod运行期间标签发生了变更,不在符合该pod的节点亲和性需求,则系统将忽略node上lable的变化,该pod能机选在该节点运行。
NodeAffinity 语法支持的操作符包括:
nodeAffinity规则设置的注意事项如下:
如果同时定义了nodeSelector和nodeAffinity,name必须两个条件都得到满足,pod才能最终运行在指定的node上。
如果nodeAffinity指定了多个nodeSelectorTerms,那么其中一个能够匹配成功即可。
如果在nodeSelectorTerms中有多个matchExpressions,则一个节点必须满足所有matchExpressions才能运行该pod。
apiVersion: v1
kind: Pod
metadata:
name: test1
labels:
app: nginx
spec:
containers:
- name: test1
image: nginx
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution: //硬策略
nodeSelectorTerms:
- matchExpressions:
- key: disktype
values:
- ssd
operator: In
preferredDuringSchedulingIgnoredDuringExecution: //软策略
- weight: 10
preference:
matchExpressions:
- key: name
values:
- test
operator: In
给node2打上disktype=ssd的标签
[root@master ~]# kubectl label node node2.example.com disktype=ssd
node/node2.example.com labeled
[root@master ~]# kubectl get node node2.example.com --show-labels
NAME STATUS ROLES AGE VERSION LABELS
node2.example.com Ready 4d7h v1.23.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2.example.com,kubernetes.io/os=linux
给node1打上name=test的label和删除name=test的label并测试查看结果
[root@master ~]# kubectl label node node1.example.com name=test
node/node1.example.com labeled
[root@master ~]# kubectl get node node1.example.com --show-labels
NAME STATUS ROLES AGE VERSION LABELS
node1.example.com Ready 4d7h v1.23.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1.example.com,kubernetes.io/os=linux,name=test
[root@master ~]# kubectl apply -f test.yml
pod/test created
[root@master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
httpd1-57c7b6f7cb-sk86h 1/1 Running 4 (178m ago) 2d 10.244.1.93 node2.example.com
nginx1-7cf8bc594f-8j8tv 1/1 Running 1 (178m ago) 3h45m 10.244.1.94 node2.example.com
test 1/1 Running 0 13s 10.244.1.95 node2.example.com
删除node1上name=test的label
[root@master ~]# kubectl label node node1.example.com name-
node/node1.example.com unlabeled
[root@master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
httpd1-57c7b6f7cb-sk86h 1/1 Running 4 (179m ago) 2d1h 10.244.1.93 node2.example.com
nginx1-7cf8bc594f-8j8tv 1/1 Running 1 (179m ago) 3h46m 10.244.1.94 node2.example.com
test 1/1 Running 0 44s 10.244.1.95 node2.example.com
上面这个pod首先是要求要运行在有disktype=ssd这个label的node上,如果有多个node上都有这个label,则优先在有name=test这个label上创建
Taints:避免Pod调度到特定Node上
Tolerations:允许Pod调度到持有Taints的Node上
应用场景:
专用节点:根据业务线将Node分组管理,希望在默认情况下不调度该节点,只有配置了污点容忍才允许分配
配备特殊硬件:部分Node配有SSD硬盘、GPU,希望在默认情况下不调度该节点,只有配置了污点容忍才允许分配
基于Taint的驱逐
#Taint(污点)
[root@master haproxy]# kubectl describe node master
Name: master.example.com
Roles: control-plane,master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=master.example.com
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=
node-role.kubernetes.io/master=
node.kubernetes.io/exclude-from-external-load-balancers=
Annotations: flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"8e:50:ba:7a:30:2b"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.168.240.30
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 19 Dec 2021 02:41:49 -0500
Taints: node-role.kubernetes.io/master:NoSchedule #aints:避免Pod调度到特定Node上
Unschedulable: false
#Tolerations(污点容忍)
[root@master ~]# kubectl describe pod httpd1-57c7b6f7cb-sk86h
Name: httpd1-57c7b6f7cb-sk86h
Namespace: default
·····
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s ##Tolerations(污点容忍)允许Pod调度到持有Taints的Node上
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 12m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "2126da4117ba6ce45ddff8a2e0b9de59bac65f05c7be343249d50edea2cacf37" network for pod "httpd1-57c7b6f7cb-sk86h": networkPlugin cni failed to set up pod "httpd1-57c7b6f7cb-sk86h_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 12m kubelet Failed to create pod "best2001/httpd"
Normal Pulled 11m kubelet Successfully pulled image "best2001/httpd" in 16.175310708s
Normal Created 11m kubelet Created container httpd1
Normal Started 11m kubelet Started container httpd1
节点添加污点
格式: kubectl taint node [node] key=value:[effect]
例如: kubectl taint node k8s-node1 gpu=yes:NoSchedule验证: kubectl describe node k8s-node1 |grep Taint
其中[effect]可取值:
添加污点容忍(tolrations)字段到Pod配置中
#添加污点disktype
[root@master haproxy]# kubectl taint node node1.example.com disktype:NoSchedule
node/node1.example.com tainted
#查看
[root@master haproxy]# kubectl describe node node1.example.com
Name: node1.example.com
Roles:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=node1.example.com
kubernetes.io/os=linux
Annotations: flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"12:9e:43:99:21:bd"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.168.240.50
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 19 Dec 2021 03:27:16 -0500
Taints: disktype:NoSchedule
#测试创建一个容器
[root@master haproxy]# kubectl apply -f nginx.yml
deployment.apps/nginx1 created
service/nginx1 created
[root@master haproxy]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx1-7cf8bc594f-8j8tv 1/1 Running 0 14s 10.244.1.92 node2.example.com #因为node1上面有污点,所以创建的容器会在node2上面跑
去掉污点:
kubectl taint node [node] key:[effect]-
[root@master haproxy]# kubectl taint node node1.example.com disktype-
node/node1.example.com untainted
[root@master haproxy]# kubectl describe node node1.example.com
Name: node1.example.com
Roles:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=node1.example.com
kubernetes.io/os=linux
Annotations: flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"12:9e:43:99:21:bd"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.168.100.42
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 19 Dec 2021 03:27:16 -0500
Taints: #污点已经删除成功
Unschedulable: false