1、简介
Scheduler是ku bernet es的调度器, 主要的任务是把定义的pod分配到集群的节点上。听起来非常简单,但有很多要考虑的问题:
公平:如何保证每个节点都能被分配资源
资源高效利用:集群所有资源最大化被使用
效率:调度的性能要好,能够尽快地对大批量的pod完成调度工作
灵活:允许用户根据自己的需求控制调度的逻辑
Sheduler是作为单独的程序运行的,启动之后会一直连接APIServer,获取PodSpec.NodeName为空的pod,对每个pod都会创建一个binding,表明该pod应该放到哪个节点上
2、调度过程
调度分为几个部分:首先是过滤掉不满足条件的节点, 这个过程称为predicate; 然后对通过的节点按照优先级排序, 这个是priority; 最后从中选择优先级最高的节点。如果中间任何一步骤有错误, 就直接返回错误
Predicate有一系列的算法可以使用:
PodFitsResources:节点上剩余的资源是否大于pod请求的资源
PodFitsHost:如果pod指定了NodeName,检查节点名称是否和NodeName匹配
PodFitsHostPorts:节点上已经使用的port是否和pod申请的port冲爽
PodSelectorNatches:过滤掉和pod指定的label不匹配的节点
NoDiskConfliet:已经mount的volume和pod指定的volume不冲突,除非它们都是只读
如果在predicate过程中没有合适的节点, pod会一直在pending状态, 不断重试调度, 直到有节点满足条件,经过这个步骤, 如果有多个节点满足条件,就继续priorities过程:按照优先级大小对节点排序
优先级由一系列键值对组成、键是该优先级项的名称,值是它的权重(该项的重要性)、这些优先级选项包括:
LeastReqvestedPriority:通过计算CPU和Memory的使用率来决定权重, 使用率越低权重越高,换句话说,这个优先级指标向于资源使用比例更低的节点
BalancedResourceAllocation:节点上CPU和Memory使用率越按近, 权重越高、这个应该和上面的一起使用,不应该单使用
ImageLocaltyfriority:候肉于已有要使用镇像的节点, 镜像总大小值越大, 权重越高通过算漆对所有的优先织项目和权重进行计算,得出最的结果
3、自定义调度器
除了kubernetes自带的调度器,你也可以编写自己的调度器, 通过spec:schedulernane参数指定调度器的名字,可以为pod选择某个调度器进行调度, 比如下面的pod选择myschedvle r进行调度,而不是默认的default-scheduler:
1、node节点亲和性
1)pod.spec.nodeAffinity
preferredDuringSchedulingIgnoredDuringExecution:软策略
requiredDuringSchedulingIgnoredDuringExecution:硬策略
2)键值运算关系
In:label的值在某个列表中
Notln:label的值不在列表中
Gt:label的值大于某个值
Lt:label的值小于某个值
Exists:某个label存在
DoesNotExist:某个label不存在
3)requiredDuringSchedulinglgnoredDuringExecution:硬策略
[root@k8s-master1 nodeaffinity]# kubectl get node --show-labels
NAME STATUS ROLES AGE VERSION LABELS
192.168.100.30 Ready > 20d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=192.168.100.30,kubernetes.io/os=linux
192.168.100.40 Ready > 20d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=192.168.100.40,kubernetes.io/os=linux
#知在100.50这个没有的节点中创建node(状态为pending)
[root@k8s-master1 nodeaffinity]# vim ./required.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-required
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 192.168.100.50
[root@k8s-master1 nodeaffinity]# kubectl create -f required.yaml
[root@k8s-master1 nodeaffinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-required 0/1 Pending 0 5m15s > > > >
#将pod创建到100.30中
[root@k8s-master1 nodeaffinity]# kubectl delete pod --all
[root@k8s-master1 nodeaffinity]# vim required.yaml
apiVersion: v1
apiVersion: v1
kind: Pod
metadata:
name: nginx-required
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 192.168.100.30
[root@k8s-master1 nodeaffinity]# kubectl create -f required.yaml
[root@k8s-master1 nodeaffinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-required 1/1 Running 0 6s 172.18.0.4 192.168.100.30 > >
4)preferredDuringSchedulingIgnoredDuringExecution:软策略(weight是权重:数值越大权重越高,可有多个)
[root@k8s-master1 nodeaffinity]# vim preferred.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-required
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 192.168.100.50
[root@k8s-master1 nodeaffinity]# kubectl create -f preferred.yaml
[root@k8s-master1 nodeaffinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-required 1/1 Running 0 57s 172.18.0.3 192.168.100.40 > >
5)软策略和硬策略嵌套使用
[root@k8s-master1 nodeaffinity]# vim required-perferred.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-required
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 192.168.100.40
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 192.168.100.50
[root@k8s-master1 nodeaffinity]# kubectl create -f required-perferred.yaml
[root@k8s-master1 nodeaffinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-required 1/1 Running 0 99s 172.18.0.3 192.168.100.40 > >
2、pod节点亲和性
1)pod.spec.affinity.podAffinity/podAntiAffinity
requiredDuringSchedulinglgnoredDuringExecution:硬策略
preferredDuringSchedulingIgnoredDuringExecution:软策略
2)亲和性和反亲和性调度策略比较如下
拓扑域()
调度策略 匹配标签 操作符 拓扑域支持 调度目标
nodeAffinity 主机 In,NotIn,Exists,DoesNotExist,Gt,Lt 否 指定主机
podAffinity pod In,NotIn,Exists,DoesNotExist 是 pod于指定pod同一拓扑域
podAnitAffinity pod In,NotIn,Exists,DoesNotExist 是 pod于指定pod不在同一拓扑域
3)pod硬策略(podAffinity)
[root@k8s-master1 nodeaffinity]# vim podAffinity.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-1
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: kubernetes.io/hostname
[root@k8s-master1 nodeaffinity]# kubectl create -f podAffinity.yaml
[root@k8s-master1 nodeaffinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-1 1/1 Running 0 7s 172.18.0.4 192.168.100.40 > >
nginx-required 1/1 Running 0 63m 172.18.0.3 192.168.100.40 > >
4)硬策略(podAnitAffinity)
[root@k8s-master1 nodeaffinity]# vim required-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-3
labels:
app: nginx
spec:
containers:
- name: nginx
image: hub.benet.com/xitong/nginx
imagePullPolicy: IfNotPresent
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: kubernetes.io/hostname
[root@k8s-master1 nodeaffinity]# kubectl create -f podAnitAffinity.yaml
[root@k8s-master1 nodeaffinity]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-3 0/1 Pending 0 25s > > > >
nginx-deployment-58f4f857b-7td5x 1/1 Running 0 17m 172.17.85.2 192.168.100.30 > >
nginx-deployment-58f4f857b-jcd2p 1/1 Running 0 18m 172.17.17.4 192.168.100.40 >
[root@k8s-master1 /]# kubectl delete deployment --all
[root@k8s-master1 /]# kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx-3 1/1 Running 0 2m9s
1、污点(Taint)
节点亲和性, 是pod的一种属性(偏好或硬性要求),它使pod被吸引到一类特定的节点。Taint则相反,它使节点能够排斥一类特定的pod
Taint和toleration相互配合,可以用来避免pod被分配到不合适的节点上。每个节点上都可以应用一个或多个taint, 这表示对于那些不能容忍这些taint的pod,是不会被该节点接受的。如果将toleration应用于pod上,则表示这些pod可以(但不要求) 被调度到具有匹配taint的节点上
2、污点(Taint) 的组成
使用kubectl taint命令可以给某个Node节点设置污点, Node被设置上污点之后就和Pod之间存在了一种相斥的关系, 可以让Node拒绝Pod的调度执行, 甚至将Node已经存在的Pod驱逐出去每个污点的组成如下:
key=value:effect
每个污点有一个key和value作为污点的标签, 其中value可以为空,effect描述污点的作用, 当前tainteffect支持如下三个选项:
NoSchedule:表示k8s不会将Pod调度到具有该污点的Node上
PreferNoSchedule:表示k8s尽量避免将Pod调度到具有该污点的Node上
NoExecute:表示k8s不会将Pod调度到具有该污点的Node上, 同时会将Node上已经存在的Pod驱逐出去
3、污点的设置、查看和去除
1)设置污点
[root@k8s-master1 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
tomcat-6d6768b884-7c2pv 1/1 Running 0 34m 172.17.85.2 192.168.100.30 > >
[root@k8s-master1 ~]# kubectl taint nodes 192.168.100.30 updata=chaoyue:NoExecute
[root@k8s-master1 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
tomcat-6d6768b884-klkw8 1/1 Running 0 8s 172.17.17.4 192.168.100.40 > >
2)节点说明中, 查找Taints字段
[root@k8s-master1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
192.168.100.30 Ready > 21d v1.15.3
192.168.100.40 Ready > 21d v1.15.3
[root@k8s-master1 ~]# kubectl describe node 192.168.100.30
Taints: updata=chaoyue:NoExecute
3)去除污点
[root@k8s-master1 ~]# kubectl taint node 192.168.100.30 updata=chaoyue:NoExecute-
[root@k8s-master1 ~]# kubectl describe node 192.168.100.30
Taints: >
4、容忍(Toleration)
设置了污点的Node将根据taint的effect:NoSchedule、PreferNoSchedule、NoExecute和Pod之间产生互斥的关系,Pod将在一定程度上不会被调度到Node上。但我们可以在Pod上设置客忍(Toleration) ,意思是设置了容忍的Pod将可以容忍污点的存在,可以被调度到存在污点的Node上
1)测试
key,vaule,effect: 要与Node上设置的taint保持一致
operatorL: 的值为Exists将会忽略value值
tolerationSeconds: 用于描述当Pod需要被驱逐时可以在Pod上继续保留运行的时间
[root@k8s-master1 ~]# kubectl taint node 192.168.100.30 updata=chaoyue:NoExecute
[root@k8s-master1 ~]# kubectl taint node 192.168.100.40 updata=chaoyue:NoExecute
[root@k8s-master1 ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
tomcat-6d6768b884-vh782 0/1 Pending 0 2m7s
[root@k8s-master1 ~]# vim tomcat.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: tomcat1
spec:
replicas: 1
selector:
matchLabels:
run: tomcat1
template:
metadata:
labels:
run: tomcat1
spec:
containers:
- image: hub.benet.com/xitong/tomcat:7.1
imagePullPolicy: IfNotPresent
name: tomcat1
tolerations:
- key: "updata"
operator: "Equal"
value: "chaoyue"
effect: "NoExecute"
tolerationSeconds: 3600
[root@k8s-master1 ~]# kubectl apply -f tomcat.yaml
[root@k8s-master1 ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
tomcat-6d6768b884-vh782 0/1 Pending 0 3m13s
tomcat1-97cc4945b-hlk67 1/1 Running 0 5s
5、容忍的使用情况
1)当不指定key值时, 表示容忍所有的污点key:
tolerations:
- operator:"Exists"
2)当不指定effect值时, 表示容忍所有的污点作用
tolerations:
- key:"key"
operator:"Exists"I
3)有多个Master存在时,防止资源浪费,可以如下设置
kubectl taint nodes 192.168.100.30 kubernetes.io/master=:PreferNoSchedule
1、pod.spec.nodeName将pod直接调度到指定的Node节点上,会跳过Scheduler的调度策略,该规则时强制匹配
[root@k8s-master1 ~]# vim tomcat.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: tomcat
spec:
replicas: 2
template:
metadata:
labels:
run: tomcat
spec:
nodeName: 192.168.100.30
containers:
- image: hub.benet.com/xitong/tomcat:7.1
imagePullPolicy: IfNotPresent
name: tomcat
[root@k8s-master1 ~]# kubectl apply -f tomcat.yaml
[root@k8s-master1 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
tomcat-7bcfd659b7-22htc 1/1 Running 0 21s 172.17.85.6 192.168.100.30 > >
tomcat-7bcfd659b7-6hp9z 1/1 Running 0 21s 172.17.85.4 192.168.100.30 > >
tomcat-7bcfd659b7-mprcc 1/1 Running 0 21s 172.17.85.2 192.168.100.30 > >
tomcat-7bcfd659b7-vx269 1/1 Running 0 21s 172.17.85.3 192.168.100.30 > >
tomcat-7bcfd659b7-zpcgq 1/1 Running 0 21s 172.17.85.5 192.168.100.30 > >
2、pod.spec.nodeSelector:通过kubenetes的label-slelctor机制选择节点,有调度器调度策略匹配label,而后调度pod到目标节点,还匹配属于强制约束
[root@k8s-master1 ~]# vim tomcat.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: tomcat
spec:
replicas: 2
template:
metadata:
labels:
run: tomcat
spec:
nodeSelector:
disk: ssd
containers:
- image: hub.benet.com/xitong/tomcat:7.1
imagePullPolicy: IfNotPresent
name: tomcat
[root@k8s-master1 ~]# kubectl apply -f tomcat.yaml
[root@k8s-master1 ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
tomcat-79b7ddf6f4-8jj4t 0/1 Pending 0 6s
tomcat-79b7ddf6f4-hc2jv 0/1 Pending 0 6s
[root@k8s-master1 ~]# kubectl label node 192.168.100.40 disk=ssd
[root@k8s-master1 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
tomcat-79b7ddf6f4-8jj4t 1/1 Running 0 5m18s 172.17.17.6 192.168.100.40 > >
tomcat-79b7ddf6f4-hc2jv 1/1 Running 0 5m18s 172.17.17.4 192.168.100.40 > >