部署k8s(16):集群调度策略的四种方案

一、调度说明

1、简介

		Scheduler是ku bernet es的调度器, 主要的任务是把定义的pod分配到集群的节点上。听起来非常简单,但有很多要考虑的问题:
			公平:如何保证每个节点都能被分配资源
			资源高效利用:集群所有资源最大化被使用
			效率:调度的性能要好,能够尽快地对大批量的pod完成调度工作
			灵活:允许用户根据自己的需求控制调度的逻辑
			Sheduler是作为单独的程序运行的,启动之后会一直连接APIServer,获取PodSpec.NodeName为空的pod,对每个pod都会创建一个binding,表明该pod应该放到哪个节点上

2、调度过程

		调度分为几个部分:首先是过滤掉不满足条件的节点, 这个过程称为predicate; 然后对通过的节点按照优先级排序, 这个是priority; 最后从中选择优先级最高的节点。如果中间任何一步骤有错误, 就直接返回错误

		Predicate有一系列的算法可以使用:
		PodFitsResources:节点上剩余的资源是否大于pod请求的资源
		PodFitsHost:如果pod指定了NodeName,检查节点名称是否和NodeName匹配
		PodFitsHostPorts:节点上已经使用的port是否和pod申请的port冲爽
		PodSelectorNatches:过滤掉和pod指定的label不匹配的节点
		NoDiskConfliet:已经mount的volume和pod指定的volume不冲突,除非它们都是只读

		如果在predicate过程中没有合适的节点, pod会一直在pending状态, 不断重试调度, 直到有节点满足条件,经过这个步骤, 如果有多个节点满足条件,就继续priorities过程:按照优先级大小对节点排序

		优先级由一系列键值对组成、键是该优先级项的名称,值是它的权重(该项的重要性)、这些优先级选项包括:
		LeastReqvestedPriority:通过计算CPU和Memory的使用率来决定权重, 使用率越低权重越高,换句话说,这个优先级指标向于资源使用比例更低的节点
		BalancedResourceAllocation:节点上CPU和Memory使用率越按近, 权重越高、这个应该和上面的一起使用,不应该单使用
		ImageLocaltyfriority:候肉于已有要使用镇像的节点, 镜像总大小值越大, 权重越高通过算漆对所有的优先织项目和权重进行计算,得出最的结果

3、自定义调度器

		除了kubernetes自带的调度器,你也可以编写自己的调度器, 通过spec:schedulernane参数指定调度器的名字,可以为pod选择某个调度器进行调度, 比如下面的pod选择myschedvle r进行调度,而不是默认的default-scheduler:

二、节点亲和性

1、node节点亲和性

		1)pod.spec.nodeAffinity
			preferredDuringSchedulingIgnoredDuringExecution:软策略
			requiredDuringSchedulingIgnoredDuringExecution:硬策略
		2)键值运算关系
			In:label的值在某个列表中
			Notln:label的值不在列表中
			Gt:label的值大于某个值
			Lt:label的值小于某个值
			Exists:某个label存在
			DoesNotExist:某个label不存在
		3)requiredDuringSchedulinglgnoredDuringExecution:硬策略
			[root@k8s-master1 nodeaffinity]# kubectl get node --show-labels
				NAME             STATUS   ROLES    AGE   VERSION   LABELS
				192.168.100.30   Ready    >   20d   v1.15.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=192.168.100.30,kubernetes.io/os=linux
				192.168.100.40   Ready    >   20d   v1.15.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=192.168.100.40,kubernetes.io/os=linux
			#知在100.50这个没有的节点中创建node(状态为pending)
			[root@k8s-master1 nodeaffinity]# vim ./required.yaml 
				apiVersion: v1
				kind: Pod
				metadata:
				  name: nginx-required
				  labels:
				    app: nginx
				spec:
				  containers:
				   - name: nginx
				     image: nginx
				     imagePullPolicy: IfNotPresent
				  affinity:
				     nodeAffinity:
				       requiredDuringSchedulingIgnoredDuringExecution:
				         nodeSelectorTerms:
				         - matchExpressions:
				           - key: kubernetes.io/hostname
				             operator: In
				             values:
				             - 192.168.100.50
			[root@k8s-master1 nodeaffinity]# kubectl create -f required.yaml
			[root@k8s-master1 nodeaffinity]# kubectl get pod -o wide
				NAME             READY   STATUS    RESTARTS   AGE     IP       NODE     NOMINATED NODE   READINESS GATES
				nginx-required   0/1     Pending   0          5m15s   >   >   >           >
			#将pod创建到100.30中
			[root@k8s-master1 nodeaffinity]# kubectl delete pod --all
			[root@k8s-master1 nodeaffinity]# vim required.yaml 
				apiVersion: v1
				apiVersion: v1
				kind: Pod
				metadata:
				  name: nginx-required
				  labels:
				    app: nginx
				spec:
				  containers:
				   - name: nginx
				     image: nginx
				     imagePullPolicy: IfNotPresent
				  affinity:
				     nodeAffinity:
				       requiredDuringSchedulingIgnoredDuringExecution:
				         nodeSelectorTerms:
				         - matchExpressions:
				           - key: kubernetes.io/hostname
				             operator: In
				             values:
				             - 192.168.100.30
			[root@k8s-master1 nodeaffinity]# kubectl create -f required.yaml 
			[root@k8s-master1 nodeaffinity]# kubectl get pod -o wide
				NAME             READY   STATUS    RESTARTS   AGE   IP           NODE             NOMINATED NODE   READINESS GATES
				nginx-required   1/1     Running   0          6s    172.18.0.4   192.168.100.30   >           >
		4)preferredDuringSchedulingIgnoredDuringExecution:软策略(weight是权重:数值越大权重越高,可有多个)
			[root@k8s-master1 nodeaffinity]# vim preferred.yaml 
				apiVersion: v1
				kind: Pod
				metadata:
				  name: nginx-required
				  labels:
				    app: nginx
				spec:
				  containers:
				   - name: nginx
				     image: nginx
				     imagePullPolicy: IfNotPresent
				  affinity:
				     nodeAffinity:
				       preferredDuringSchedulingIgnoredDuringExecution:
				        - weight: 1
				          preference:
				            matchExpressions:
				             - key: kubernetes.io/hostname
				               operator: In
				               values:
				               - 192.168.100.50
			[root@k8s-master1 nodeaffinity]# kubectl create -f preferred.yaml
			[root@k8s-master1 nodeaffinity]# kubectl get pod -o wide
				NAME             READY   STATUS    RESTARTS   AGE   IP           NODE             NOMINATED NODE   READINESS GATES
				nginx-required   1/1     Running   0          57s   172.18.0.3   192.168.100.40   >           >
		5)软策略和硬策略嵌套使用
			[root@k8s-master1 nodeaffinity]# vim required-perferred.yaml 
				apiVersion: v1
				kind: Pod
				metadata:
				  name: nginx-required
				  labels:
				    app: nginx
				spec:
				  containers:
				   - name: nginx
				     image: nginx
				     imagePullPolicy: IfNotPresent
				  affinity:
				     nodeAffinity:
				       requiredDuringSchedulingIgnoredDuringExecution:
				         nodeSelectorTerms:
				         - matchExpressions:
				           - key: kubernetes.io/hostname
				             operator: In
				             values:
				             - 192.168.100.40
				       preferredDuringSchedulingIgnoredDuringExecution:
				        - weight: 1
				          preference:
				            matchExpressions:
				             - key: kubernetes.io/hostname
				               operator: In
				               values:
				               - 192.168.100.50
			[root@k8s-master1 nodeaffinity]# kubectl create -f required-perferred.yaml
			[root@k8s-master1 nodeaffinity]# kubectl get pod -o wide
				NAME             READY   STATUS    RESTARTS   AGE   IP           NODE             NOMINATED NODE   READINESS GATES
				nginx-required   1/1     Running   0          99s   172.18.0.3   192.168.100.40   >           >            

2、pod节点亲和性

		1)pod.spec.affinity.podAffinity/podAntiAffinity
			requiredDuringSchedulinglgnoredDuringExecution:硬策略
			preferredDuringSchedulingIgnoredDuringExecution:软策略
		2)亲和性和反亲和性调度策略比较如下
			拓扑域()
			调度策略 		匹配标签		操作符								拓扑域支持 	调度目标
			nodeAffinity 	主机 		In,NotIn,Exists,DoesNotExist,Gt,Lt 	否			指定主机
			podAffinity 	pod 		In,NotIn,Exists,DoesNotExist 		是 			pod于指定pod同一拓扑域
			podAnitAffinity pod 		In,NotIn,Exists,DoesNotExist 		是			pod于指定pod不在同一拓扑域
		3)pod硬策略(podAffinity)
			[root@k8s-master1 nodeaffinity]# vim podAffinity.yaml 
				apiVersion: v1
				kind: Pod
				metadata:
				  name: nginx-1
				  labels:
				    app: nginx
				spec:
				  containers:
				   - name: nginx
				     image: nginx
				     imagePullPolicy: IfNotPresent
				  affinity:
				     podAffinity:
				       requiredDuringSchedulingIgnoredDuringExecution:
				        - labelSelector:
				            matchExpressions:
				            - key: app
				              operator: In
				              values:
				               - nginx
				          topologyKey: kubernetes.io/hostname
			[root@k8s-master1 nodeaffinity]# kubectl create -f podAffinity.yaml
			[root@k8s-master1 nodeaffinity]# kubectl  get pod -o wide
				NAME             READY   STATUS    RESTARTS   AGE   IP           NODE             NOMINATED NODE   READINESS GATES
				nginx-1          1/1     Running   0          7s    172.18.0.4   192.168.100.40   >           >
				nginx-required   1/1     Running   0          63m   172.18.0.3   192.168.100.40   >           >
		4)硬策略(podAnitAffinity)
			[root@k8s-master1 nodeaffinity]# vim required-pod.yaml 
				apiVersion: v1
				kind: Pod
				metadata:
				  name: nginx-3
				  labels:
				    app: nginx
				spec:
				  containers:
				   - name: nginx
				     image: hub.benet.com/xitong/nginx
				     imagePullPolicy: IfNotPresent
				  affinity:
				     podAntiAffinity:
				       requiredDuringSchedulingIgnoredDuringExecution:
				         - labelSelector:
				             matchExpressions:
				              - key: app
				                operator: In
				                values:
				                 - nginx
				           topologyKey: kubernetes.io/hostname
			[root@k8s-master1 nodeaffinity]# kubectl create -f podAnitAffinity.yaml
			[root@k8s-master1 nodeaffinity]# kubectl  get pod -o wide 
				NAME                               READY   STATUS    RESTARTS   AGE   IP            NODE             NOMINATED NODE   READINESS GATES
				nginx-3                            0/1     Pending   0          25s   >        >           >           >
				nginx-deployment-58f4f857b-7td5x   1/1     Running   0          17m   172.17.85.2   192.168.100.30   >           >
				nginx-deployment-58f4f857b-jcd2p   1/1     Running   0          18m   172.17.17.4   192.168.100.40   > 
			[root@k8s-master1 /]# kubectl delete deployment --all
				[root@k8s-master1 /]# kubectl  get pod
				NAME                               READY   STATUS        RESTARTS   AGE
				nginx-3                            1/1     Running       0          2m9s

三、污点和容忍(Taint,Toleration)

1、污点(Taint)

		节点亲和性, 是pod的一种属性(偏好或硬性要求),它使pod被吸引到一类特定的节点。Taint则相反,它使节点能够排斥一类特定的pod

		Taint和toleration相互配合,可以用来避免pod被分配到不合适的节点上。每个节点上都可以应用一个或多个taint, 这表示对于那些不能容忍这些taint的pod,是不会被该节点接受的。如果将toleration应用于pod上,则表示这些pod可以(但不要求) 被调度到具有匹配taint的节点上

2、污点(Taint) 的组成

		使用kubectl taint命令可以给某个Node节点设置污点, Node被设置上污点之后就和Pod之间存在了一种相斥的关系, 可以让Node拒绝Pod的调度执行, 甚至将Node已经存在的Pod驱逐出去每个污点的组成如下:
  			key=value:effect
  		每个污点有一个key和value作为污点的标签, 其中value可以为空,effect描述污点的作用, 当前tainteffect支持如下三个选项:
  		NoSchedule:表示k8s不会将Pod调度到具有该污点的Node上
  		PreferNoSchedule:表示k8s尽量避免将Pod调度到具有该污点的Node上
  		NoExecute:表示k8s不会将Pod调度到具有该污点的Node上, 同时会将Node上已经存在的Pod驱逐出去

3、污点的设置、查看和去除

		1)设置污点
			[root@k8s-master1 ~]# kubectl  get pod -o wide
				NAME                      READY   STATUS    RESTARTS   AGE   IP            NODE             NOMINATED NODE   READINESS GATES
				tomcat-6d6768b884-7c2pv   1/1     Running   0          34m   172.17.85.2   192.168.100.30   >           >
			[root@k8s-master1 ~]# kubectl taint nodes 192.168.100.30 updata=chaoyue:NoExecute
			[root@k8s-master1 ~]# kubectl  get pod -o wide 
				NAME                      READY   STATUS    RESTARTS   AGE   IP            NODE             NOMINATED NODE   READINESS GATES
				tomcat-6d6768b884-klkw8   1/1     Running   0          8s    172.17.17.4   192.168.100.40   >           >
		2)节点说明中, 查找Taints字段
			[root@k8s-master1 ~]# kubectl  get node 
				NAME             STATUS   ROLES    AGE   VERSION
				192.168.100.30   Ready    >   21d   v1.15.3
				192.168.100.40   Ready    >   21d   v1.15.3
			[root@k8s-master1 ~]# kubectl describe node 192.168.100.30
				Taints:             updata=chaoyue:NoExecute	
		3)去除污点
			[root@k8s-master1 ~]# kubectl taint node 192.168.100.30 updata=chaoyue:NoExecute-
			[root@k8s-master1 ~]# kubectl  describe node 192.168.100.30
				Taints:             >

4、容忍(Toleration)

		设置了污点的Node将根据taint的effect:NoSchedule、PreferNoSchedule、NoExecute和Pod之间产生互斥的关系,Pod将在一定程度上不会被调度到Node上。但我们可以在Pod上设置客忍(Toleration) ,意思是设置了容忍的Pod将可以容忍污点的存在,可以被调度到存在污点的Node上

		1)测试
			key,vaule,effect:	要与Node上设置的taint保持一致
			operatorL:			的值为Exists将会忽略value值
			tolerationSeconds:	用于描述当Pod需要被驱逐时可以在Pod上继续保留运行的时间
			
			[root@k8s-master1 ~]# kubectl taint node 192.168.100.30 updata=chaoyue:NoExecute
			[root@k8s-master1 ~]# kubectl taint node 192.168.100.40 updata=chaoyue:NoExecute
			[root@k8s-master1 ~]# kubectl  get pod
				NAME                      READY   STATUS    RESTARTS   AGE
				tomcat-6d6768b884-vh782   0/1     Pending   0          2m7s
			[root@k8s-master1 ~]# vim tomcat.yaml 
				apiVersion: extensions/v1beta1
				kind: Deployment
				metadata:
				  name: tomcat1
				spec:
				  replicas: 1
				  selector:
				    matchLabels:
				      run: tomcat1
				  template:
				    metadata:
				      labels:
				        run: tomcat1
				    spec:
				      containers:
				      - image: hub.benet.com/xitong/tomcat:7.1
				        imagePullPolicy: IfNotPresent
				        name: tomcat1
				      tolerations:
				      - key: "updata"
				        operator: "Equal"
				        value: "chaoyue"
				        effect: "NoExecute"
				        tolerationSeconds: 3600
			[root@k8s-master1 ~]# kubectl apply -f tomcat.yaml 
			[root@k8s-master1 ~]# kubectl  get pod
				NAME                      READY   STATUS    RESTARTS   AGE
				tomcat-6d6768b884-vh782   0/1     Pending   0          3m13s
				tomcat1-97cc4945b-hlk67   1/1     Running   0          5s

5、容忍的使用情况

		1)当不指定key值时, 表示容忍所有的污点key:
			tolerations:
			- operator:"Exists"
		2)当不指定effect值时, 表示容忍所有的污点作用
			tolerations:
			- key:"key"
			  operator:"Exists"I
		3)有多个Master存在时,防止资源浪费,可以如下设置
			kubectl taint nodes 192.168.100.30 kubernetes.io/master=:PreferNoSchedule

四、固定节点调度

1、pod.spec.nodeName将pod直接调度到指定的Node节点上,会跳过Scheduler的调度策略,该规则时强制匹配

		[root@k8s-master1 ~]# vim tomcat.yaml 
			apiVersion: extensions/v1beta1
			kind: Deployment
			metadata:
			  name: tomcat
			spec:
			  replicas: 2
			  template:
			    metadata:
			      labels:
			        run: tomcat
			    spec:
			      nodeName: 192.168.100.30
			      containers:
			      - image: hub.benet.com/xitong/tomcat:7.1
			        imagePullPolicy: IfNotPresent
			        name: tomcat
		[root@k8s-master1 ~]# kubectl apply -f tomcat.yaml 
		[root@k8s-master1 ~]# kubectl  get pod -o wide
			NAME                      READY   STATUS    RESTARTS   AGE   IP            NODE             NOMINATED NODE   READINESS GATES
			tomcat-7bcfd659b7-22htc   1/1     Running   0          21s   172.17.85.6   192.168.100.30   >           >
			tomcat-7bcfd659b7-6hp9z   1/1     Running   0          21s   172.17.85.4   192.168.100.30   >           >
			tomcat-7bcfd659b7-mprcc   1/1     Running   0          21s   172.17.85.2   192.168.100.30   >           >
			tomcat-7bcfd659b7-vx269   1/1     Running   0          21s   172.17.85.3   192.168.100.30   >           >
			tomcat-7bcfd659b7-zpcgq   1/1     Running   0          21s   172.17.85.5   192.168.100.30   >           >

2、pod.spec.nodeSelector:通过kubenetes的label-slelctor机制选择节点,有调度器调度策略匹配label,而后调度pod到目标节点,还匹配属于强制约束

		[root@k8s-master1 ~]# vim tomcat.yaml 
			apiVersion: extensions/v1beta1
			kind: Deployment
			metadata:
			  name: tomcat
			spec:
			  replicas: 2
			  template:
			    metadata:
			      labels:
			        run: tomcat
			    spec:
			      nodeSelector:
			        disk: ssd
			      containers:
			      - image: hub.benet.com/xitong/tomcat:7.1
			        imagePullPolicy: IfNotPresent
			        name: tomcat
		[root@k8s-master1 ~]# kubectl apply -f tomcat.yaml 
		[root@k8s-master1 ~]# kubectl  get pod
			NAME                      READY   STATUS    RESTARTS   AGE
			tomcat-79b7ddf6f4-8jj4t   0/1     Pending   0          6s
			tomcat-79b7ddf6f4-hc2jv   0/1     Pending   0          6s
		[root@k8s-master1 ~]# kubectl  label node 192.168.100.40 disk=ssd
		[root@k8s-master1 ~]# kubectl  get pod -o wide 
			NAME                      READY   STATUS    RESTARTS   AGE     IP            NODE             NOMINATED NODE   READINESS GATES
			tomcat-79b7ddf6f4-8jj4t   1/1     Running   0          5m18s   172.17.17.6   192.168.100.40   >           >
			tomcat-79b7ddf6f4-hc2jv   1/1     Running   0          5m18s   172.17.17.4   192.168.100.40   >           >


你可能感兴趣的:(K8S,docker)