Kubernetes并不直接运行容器,而是使用一个抽象的资源对象来封装一个或多个容器,这个抽象就被称为 Pod,它也是 Kubernetes 的最小调度单元,在Kubernetes中,容器不称为我们之前在Docker中所谓的容器,而是被称为 Pod。同一个 Pod 中可以有多个容器并且同一个Pod中的多个容器共享网络名称和存储资源,这些容器可通过本地回环接口 lo 直接通信,但彼此之间又在 Mount、User、PID等名称空间上保持了隔离。尽管 Pod 中可以包含多个容器,但是作为最小调度单元,它应该尽可能的保持 “小”,所以通常一个Pod中只有一个主容器和其它辅助容器,辅助容器指的是(Filebeats、zabbix_agent客户端等)。
Pod主要为亲密性的应用而存在,例如像Nginx+PHP架构,应用+辅助容器,Nginx+Filebeats等类型的容器。
亲密性应用场景:
众所周知,容器之间是通过Namespace隔离的,Pod要想解决上述应用场景,那么就要让Pod里的容器之间高效共享,那么Pod之内的容器是如何进行网络共享的呢?
1.Pod之内的多个容器是怎么进行网络共享的呢?
kubernetes的解法是这样的:会在每个Pod里先启动一个infra container
小容器,然后让其他的容器连接进来这个网络命名空间,然后其他容器看到的网络试图就完全一样了,即网络设备、IP地址、Mac地址等,这就是解决网络共享问题。在Pod的IP地址就是infra container的IP地址。
infra container
)的容器。infra container
),这个基础容器使用的是pause
镜像,容器名称也被叫做pause
,这个容器非常小,主要取决于它的镜像docker image
只有几百kb, pause镜像使用汇编语言编写。测试网络
我们在一个Pod中启动nginx和centos容器,然后在centos中通过lo接口访问nginx的80端口
cat nginx_network_pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-network-pod
spec:
containers:
- name: nginx-network
image: nginx:latest
imagePullPolicy: IfNotPresent
- name: centos-network
image: centos:latest
imagePullPolicy: IfNotPresent
command: [ "/bin/bash", "-ce", "tail -f /dev/null" ]
#创建pod资源对象
kubectl apply -f manifests/pod/nginx_network_pod.yaml
#查看pod被调度到了哪台Node
k8sops@k8s-master01:~$ kubectl get pods -o wide | grep nginx-network-pod
nginx-network-pod 2/2 Running 0 53s 10.244.3.34 k8s-node01 > >
pod被调度到了node1上,我们去node上可以查看由k8s启动的容器,有nginx-network和centos-network容器之外,还有基础容器infra container
root@k8s-node01:/# docker ps | grep nginx-network
d3fea735ef5a 470671670cac "/bin/bash -ce 'tail…" 2 minutes ago Up 2 minutes k8s_centos-network_nginx-network-pod_default_c3acfee7-b262-4908-b083-67c5a4e50479_0
71f0554b5b6a 602e111c06b6 "nginx -g 'daemon of…" 2 minutes ago Up 2 minutes k8s_nginx-network_nginx-network-pod_default_c3acfee7-b262-4908-b083-67c5a4e50479_0
fb80158ec9ea registry.aliyuncs.com/google_containers/pause:3.2 "/pause" 2 minutes ago Up 2 minutes k8s_POD_nginx-network-pod_default_c3acfee7-b262-4908-b083-67c5a4e50479_0
#基础容器基于pause镜像启动,pause镜像只有683k
root@k8s-node01:/# docker images | grep pause
registry.aliyuncs.com/google_containers/pause 3.2 80d28bedfe5d 2 months ago 683kB
在master节点上进入centos容器,然后通过lo接口访问nginx来进行测试
kubectl exec -it pods/nginx-network-pod -c centos-network -- /bin/bash
#通过命令可以看到是监听着nginx的网络端口
[root@nginx-network-pod /]# ss -anplt
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:80 0.0.0.0:*
#但是他们容器之前又隔离PID,所以我们pa aux看到的父进程是我们在yaml语法中定义的命令
[root@nginx-network-pod /]# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 23028 1396 ? Ss 05:34 0:00 /usr/bin/coreutils --coreutils-prog-shebang=tail /usr/bin/tail -f /dev/null
root 8 0.1 0.0 12028 3264 pts/0 Ss 05:37 0:00 /bin/bash
root 24 0.0 0.0 43960 3400 pts/0 R+ 05:37 0:00 ps aux
#在此容器中通过lo接口访问nginx的服务
[root@nginx-network-pod /]# curl http://127.0.0.1 -I
HTTP/1.1 200 OK
Server: nginx/1.17.10
Date: Wed, 13 May 2020 05:37:28 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 14 Apr 2020 14:19:26 GMT
Connection: keep-alive
ETag: "5e95c66e-264"
Accept-Ranges: bytes
#查看本容器系统版本,确认此容器是centos容器而不是nginx容器
[root@nginx-network-pod /]# cat /etc/redhat-release
CentOS Linux release 8.1.1911 (Core)
#确认IP,查看IP是否与Pod中的pause容器进行共享网络
[root@nginx-network-pod /]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
3: eth0@if27: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
link/ether e2:3f:d5:97:57:7d brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.244.3.34/24 scope global eth0
valid_lft forever preferred_lft forever
#退出容器通过Pod IP访问Nginx服务
[root@nginx-network-pod /]# exit
exit
k8sops@k8s-master01:~$ curl http://10.244.3.34 -I
HTTP/1.1 200 OK
Server: nginx/1.17.10
Date: Wed, 13 May 2020 05:40:47 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 14 Apr 2020 14:19:26 GMT
Connection: keep-alive
ETag: "5e95c66e-264"
Accept-Ranges: bytes
2.容器之间共享存储
一个Pod中有两个容器,一个是nginx,另一个是centos容器,那么centos容器就需要读取nginx的日志文件,这个时候就需要让logstash容器读取到nginx容器的日志文件。k8s通过volume将nginx日志文件挂载出来,在本地宿主机生成一个目录,然后centos容器再将挂载出来的日志目录挂载到它自己的容器中,这样就实现了两个容器共享一个文件。
通过一个yaml配置清单实现一个pod中多容器
写一个yaml配置清单并观察这两个容器的网络和存储是否像上面描述的一样。
下面清单中启动了两个容器,分别是nginx和centos容器,nginx容器循环每个一秒往/data/hello文件中写入1-100个数字,写到100个数字即停止,然后由Pod默认的重启策略将Nginx容器重启,重启后再次从1写入到100,以此循环,并且nginx容器挂载了名称为data的卷,挂载到/data目录。
centos容器也挂载了名称为data的卷,并且挂载到了/data下,同时指定命令动态查看 /data/hello 文件
apiVersion: v1
kind: Pod
metadata:
name: nginx-volume-pod
spec:
containers:
- name: nginx-volume
image: nginx:latest
imagePullPolicy: IfNotPresent
command: [ "/bin/bash", "-ce", "for i in {1..100};do echo $i >> /data/hello;sleep 1;done" ]
volumeMounts:
- name: data
mountPath: /data
- name: centos-volume
image: centos:latest
imagePullPolicy: IfNotPresent
command: [ "/bin/bash", "-ce", "tail -f /data/hello" ]
volumeMounts:
- name: data
mountPath: /data
volumes:
- name: data
emptyDir: {}
#name:data:指定了共享卷名称
#emptyDir: {}:在本地宿主机的路径,如果写为这样,则在相应的node节点上的/var/lib/kubectl/pods下创建相对应的挂载目录
#创建Pod资源
kubectl apply -f nginx-volume-pod.yaml
#查看创建的pod被调度到哪台节点
kubectl get pods -n default -o wide | grep nginx-volume-pod
nginx-volume-pod 2/2 Running 0 98s 10.244.3.36 k8s-node01 > >
查看centos-volume容器的日志可以看到正在动态查看我们指定的命令
#进入nginx-volume容器查看 /data/ 目录是否挂载
k8sops@k8s-master01:~$ kubectl exec -it pods/nginx-volume-pod -c nginx-volume -- /bin/bash
root@nginx-volume-pod:/# ls -lrth /data/hello
-rw-r--r-- 1 root root 761 May 13 05:52 /data/hello
#进入centos-volume容器查看 /data/ 目录是否挂载
k8sops@k8s-master01:~$ kubectl exec -it pods/nginx-volume-pod -c centos-volume -- /bin/bash
[root@nginx-volume-pod /]# ls /data/hello -lrth
-rw-r--r-- 1 root root 818 May 13 05:53 /data/hello
我们在宿主机上可以找到容器中挂载的目录及在挂载目录中写入的文件
在Pod运行的Node主机上,进入 /var/lib/kubectl/pods/目录下,然后使用docker ps 命令查看容器生成的ID,通过ID在当前目录下进入后即可找到挂载的目录,如下图所示:
官方文档:https://kubernetes.io/docs/concepts/containers/images/
Pod镜像拉取策略参数为imagePullPolicy
,其值有三个:
如果要拉取公开镜像,直接使用下面示例即可,但要拉取私有的镜像,是必须认证镜像仓库才可以,文章末尾讲解拉取私人仓库。
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: nginx:latest
imagePullPolicy: IfNotPresent
#imagePullPolicy: Always
#imagePullPolicy: Never
如果要拉取私人镜像,则需要与镜像仓库进行认证,即docker login
,而在K8S集群中会有多个Node,显然这种方式是很不放方便的!为了解决这个问题,K8s实现了自动拉取镜像的功能。 以secret方式保存到K8S中,然后传给kubelet。
1.生成secret
在集群的主节点上使用 kubectl create 命令来生成secret
kubectl create secret docker-registry aliyun-registry --docker-username=useranme --docker-password=password --docker-server=registry.cn-shanghai.aliyuncs.com
docker-registry:指定生成secret的名称
–docker-username: 指定docker镜像仓库账号
–docker-password: 指定docker镜像仓库密码
–docker-server: 指定docke镜像仓库地址
–docker-email: 指定邮件地址(选填)
2.在配置清单中指定secret
使用参数是 imagePullSecrets
,ImagePullSecrets是一个可选的列表,其中包含对同一名称空间中秘密的引用,可用于提取这个PodSpec使用的任何图像。imagePullSecrets下的name
参数指定要引用的secrets的名字
cat aliyun-registry.yaml
apiVersion: v1
kind: Pod
metadata:
name: busybox
spec:
imagePullSecrets:
- name: aliyun-registry
containers:
- name: busybox
image: 指定私有镜像仓库地址
imagePullPolicy: IfNotPresent
command: [ "/bin/sh", "-c", "tail -f /etc/passwd" ]
3.创建配置清单并查看pod
kubectl apply -f aliyun-registry.yaml
kubectl get pods -o wide | grep busybox
busybox 1/1 Running 0 7m59s 10.244.3.44 k8s-node01 <none> <none>
官方文档:https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
1.Pod资源配额有两种:
申请配额是当容器就分配到了这么多资源,限制配额是容器最多能申请这么多资源
memory单位可以写为: M或者Mi,1M=1000kb,1Mi=1024kb
cpu单位可以写为:m或者数字,(1000m=1核CPU),(500m=0.5CPU),(250m=0.25CPU)
下面根据官方示例,创建一个pod,pod中两个容器,分别为mysql和wordpress,限制参数请结合上面部分
apiVersion: v1
kind: Pod
metadata:
name: frontend
spec:
containers:
- name: db
image: mysql
env:
- name: MYSQL_ROOT_PASSWORD
value: "password"
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
- name: wp
image: wordpress
resources:
requests:
memory: "64M"
cpu: "0.25"
limits:
memory: "128M"
cpu: "0.5"
kubectl apply -f limit_pod.yaml
kubectl get pods -o wide | grep frontend
frontend 2/2 Running 2 2m45s 10.244.3.45 k8s-node01 > >
kubectl describe pods/frontend
官方文档:https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy
状态值 描述
Pending API Server已经创建该Pod,但在Pod内还有一个或多个容器的镜像没有创建,包括正在下载镜像的过程。
Runnung Pod内所有容器均已创建,且至少有一个容器处于运行状态、正在启动状态或正在重启状态。
Succeeded Pod内所有容器均成功执行后退出,且不会再重启。
Failed Pod内所有容器均已退出,但至少有一个容器退出为失败状态。
Unknown 由于某种原因无法获取该Pod的状态,可能由于网络通信不畅导致。
状态列为 STATUS
k8sops@k8s-master01:~$ kubectl get pods -o wide -n nginx-ns
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-demo-nginx 1/1 Running 1 14d 10.244.2.15 k8s-node02 <none> <none>
pod-demo-nginx02 2/2 Running 15 14d 10.244.2.18 k8s-node02 <none> <none>
Pod重启策略使用的参数为restartPolicy
,字段有三个可选值:
1.编写配置清单
cat restart_pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: restart-pod
spec:
containers:
- name: restart-containers
image: nginx:latest
restartPolicy: Always
2.创建一个Pod资源
kubectl apply -f restart_pod.yaml
3.找到pod所运行的Node
kubectl get pods -o wide | grep restart-pod
restart-pod 1/1 Running 1 5m58s 10.244.5.34 k8s-node03 <none> <none>
4.到相应的Node上给结束掉这个Pod,不能在master上使用 kubectl delete pods/restart-pod 来介绍,因为这样会删掉Pod
#找到相应的容器
root@k8s-node03:~# docker ps | grep restart
32c70e9b113e nginx "nginx -g 'daemon of…" 10 minutes ago Up 10 minutes k8s_restart-containers_restart-pod_default_c6ba7906-d5a7-47f8-b523-2a4ecddbc552_1
ea7e9d98da19 registry.aliyuncs.com/google_containers/pause:3.2 "/pause" 13 minutes ago Up 13 minutes k8s_POD_restart-pod_default_c6ba7906-d5a7-47f8-b523-2a4ecddbc552_0
#根据容器ID Stop掉容器
root@k8s-node03:~# docker stop 32c70e9b113e
32c70e9b113e
5.然后到master上会发现以下过程
#Pod已完成工作
k8sops@k8s-master01:~/manifests/pod$ kubectl get pods -o wide | grep restart-pod
restart-pod 0/1 Completed 2 14m 10.244.5.34 k8s-node03 <none> <none>
#Pod等待中
k8sops@k8s-master01:~/manifests/pod$ kubectl get pods -o wide | grep restart-pod
restart-pod 0/1 CrashLoopBackOff 2 15m 10.244.5.34 k8s-node03 <none> <none>
#Pod已经正常再次运行起来
k8sops@k8s-master01:~/manifests/pod$ kubectl get pods -o wide | grep restart-pod
restart-pod 1/1 Running 3 15m 10.244.5.34 k8s-node03 <none> <none>
6.通过 kubectl describe 命令查看更详细的事件信息
1.编写资源配置清单
以下运行了一个Pod,容器镜像为centos,在容器中运行一个脚本,在/data/hello文件中写入数字1-300,每隔一秒写一个数字,写完后脚本执行结束,则退出容器(需要300秒,5分钟),重启策略使用 OnFailure
,我们这属于正常退出,退出后则不会再次启动容器。
cat restart_pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: restart-pod
spec:
containers:
- name: restart-containers
image: centos:centos7.6.1810
command: [ "/bin/bash", "-ce", "for i in {1..300};do echo $i >> /hello;sleep 1;done" ]
restartPolicy: OnFailure
2.创建资源对象
kubectl apply -f restart_pod.yaml
3.查看Pod状态
kubectl get pods -o wide | grep restart-pod
restart-pod 1/1 Running 0 11s 10.244.2.28 k8s-node02 <none> <none>
4.进入容器
kubectl exec -it pods/restart-pod -- /bin/bash
6.容器停止
当等待到5分钟,容器将脚本执行完成后,则退出容器,Pod状态也变为了 “Completed完成” 状态
因为我们使用的重启策略是OnFailure
,我们的容器也属于正常退出,所以不会再去自动启动此容器
7.测试异常退出
再次创建此Pod,然后到指定的Node上停止此容器
kubectl apply -f restart_pod.yaml
#restart-pod容器被调度到了node2上
kubectl get pods -o wide | grep restart-pod
restart-pod 1/1 Running 0 26s 10.244.2.29 k8s-node02 <none> <none>
#来到node2上手动停止此容器
root@k8s-node02:~# docker ps | grep restart
5943923ce8ab f1cb7c7d58b7 "/bin/bash -ce 'for …" 56 seconds ago Up 55 seconds k8s_restart-containers_restart-pod_default_ae58b877-36b7-49da-b984-1d7f2a9e42da_0
1fc5c7dcc18d registry.aliyuncs.com/google_containers/pause:3.2 "/pause" 58 seconds ago Up 56 seconds k8s_POD_restart-pod_default_ae58b877-36b7-49da-b984-1d7f2a9e42da_0
root@k8s-node02:~# docker stop 5943923ce8ab
5943923ce8ab
#再回到master上观察restart-pod状态,可以看到下面第一次看pod状态,已经为错误状态,紧接着再看一次Pod已经被重新启动起来,这就是被异常退出后通过OnFailure策略拉起来的Pod
k8sops@k8s-master01:~/manifests/pod$ kubectl get pods -o wide | grep restart-pod
restart-pod 0/1 Error 0 81s 10.244.2.29 k8s-node02 <none> <none>
k8sops@k8s-master01:~/manifests/pod$ kubectl get pods -o wide | grep restart-pod
restart-pod 1/1 Running 1 84s 10.244.2.29 k8s-node02 <none> <none>
默认情况下,kubelet根据容器运行状态作为健康依据,不能监控容器中应用程序状态,例如程序假死。这就会导致无法提供服务,丢失流量。因此引入健康检查机制确保容器健康存活。
Pod通过两类探针来检查容器的健康状态。分别是LivenessProbe
(存活探测)和 ReadinessProbe
(就绪探测)。
存活探测将通过http、shell命令或者tcp等方式去检测容器中的应用是否健康,然后将检查结果返回给kubelet,如果检查容器中应用为不健康状态提交给kubelet后,kubelet将根据Pod配置清单中定义的重启策略restartPolicy
来对Pod进行重启。
就绪探测也是通过http、shell命令或者tcp等方式去检测容器中的应用是否健康或则是否能够正常对外提供服务,如果能够正常对外提供服务,则认为该容器为(Ready状态),达到(Ready状态)的Pod才可以接收请求。
对于被Service所管理的Pod,Service与被管理Pod的关联关系也将基于Pod是否Ready进行设置,Pod对象启动后,容器应用通常需要一段时间才能完成其初始化的过程,例如加载配置或数据,甚至有些程序需要运行某类的预热过程,若在此阶段完成之前就接收客户端的请求,那么客户端返回时间肯定非常慢,严重影响了体验,所以因为避免Pod对象启动后立即让其处理客户端请求,而是等待容器初始化工作执行完成并转为Ready状态后再接收客户端请求。
如果容器或则Pod状态为(NoReady)状态,Kubernetes则会把该Pod从Service的后端endpoints Pod中去剔除。
以上介绍了两种探测类型livenessProbe
(存活探测),readinessProbe
(就绪探测),这两种探测都支持以下方式对容器进行健康检查
以上每种检查动作都可能有以下三种返回状态
通过在目标容器中执行由用户自定义的命令来判定容器的健康状态,即在容器内部执行一个命令,如果改命令的返回码为0,则表明容器健康。spec.containers.LivenessProbe
字段用于定义此类检测,它只有一个可用属性command,用于指定要执行的命令,下面是在资源清单文件中使用liveness-exec方式的示例:
1.创建资源配置清单
创建一个Pod——》运行Nginx容器——》首先启动nginx——》然后沉睡60秒后——〉删除nginx.pid
通过livenessProbe存活探测的exec命令判断nginx.pid文件是否存在,如果探测返回结果非0,则按照重启策略进行重启。
预期是容器真正(Ready)状态60s后,删除nginx.pid,exec命令探测生效,按照重启策略进行重启
cat ngx-health.yaml
apiVersion: v1
kind: Pod
metadata:
name: ngx-health
spec:
containers:
- name: ngx-liveness
image: nginx:latest
command:
- /bin/sh
- -c
- /usr/sbin/nginx; sleep 60; rm -rf /run/nginx.pid
livenessProbe:
exec:
command: [ "/bin/sh", "-c", "test", "-e", "/run/nginx.pid" ]
restartPolicy: Always
2.创建Pod资源
kubectl apply -f ngx-health.yaml
等待Pod Ready
3.查看Pod的详细信息
#第一次查看,Pod中的容器启动成功,事件正常
kubectl describe pods/ngx-health | grep -A 10 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node03
Normal Pulling 12s kubelet, k8s-node03 Pulling image "nginx:latest"
Normal Pulled 6s kubelet, k8s-node03 Successfully pulled image "nginx:latest"
Normal Created 6s kubelet, k8s-node03 Created container ngx-liveness
Normal Started 5s kubelet, k8s-node03 Started container ngx-liveness
#第二次查看,容器的livenessProbe探测失败,
kubectl describe pods/ngx-health | grep -A 10 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node03
Normal Pulling 52s kubelet, k8s-node03 Pulling image "nginx:latest"
Normal Pulled 46s kubelet, k8s-node03 Successfully pulled image "nginx:latest"
Normal Created 46s kubelet, k8s-node03 Created container ngx-liveness
Normal Started 45s kubelet, k8s-node03 Started container ngx-liveness
Warning Unhealthy 20s (x3 over 40s) kubelet, k8s-node03 Liveness probe failed:
Normal Killing 20s kubelet, k8s-node03 Container ngx-liveness failed liveness probe, will be restarted
#第三次查看,已经重新拉取镜像,然后创建容器再启动容器
kubectl describe pods/ngx-health | grep -A 10 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node03
Warning Unhealthy 35s (x3 over 55s) kubelet, k8s-node03 Liveness probe failed:
Normal Killing 35s kubelet, k8s-node03 Container ngx-liveness failed liveness probe, will be restarted
Normal Pulling 4s (x2 over 67s) kubelet, k8s-node03 Pulling image "nginx:latest"
Normal Pulled 2s (x2 over 61s) kubelet, k8s-node03 Successfully pulled image "nginx:latest"
Normal Created 2s (x2 over 61s) kubelet, k8s-node03 Created container ngx-liveness
Normal Started 2s (x2 over 60s) kubelet, k8s-node03 Started container ngx-liveness
通过长格式输出可以看到如下,第一次长格式输出Pod运行时间22s,重启次数为0
第二次长格式输出,运行时间是76s,Pod已经完成一次重启
kubectl get pods -o wide | grep ngx-health
ngx-health 1/1 Running 0 22s 10.244.5.44 k8s-node03 <none> <none>
kubectl get pods -o wide | grep ngx-health
ngx-health 1/1 Running 1 76s 10.244.5.44 k8s-node03 <none> <none>
第二次健康探测失败及第二次重启
kubectl describe pods/ngx-health | grep -A 10 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node03
Normal Pulled 58s (x2 over 117s) kubelet, k8s-node03 Successfully pulled image "nginx:latest"
Normal Created 58s (x2 over 117s) kubelet, k8s-node03 Created container ngx-liveness
Normal Started 58s (x2 over 116s) kubelet, k8s-node03 Started container ngx-liveness
Warning Unhealthy 31s (x6 over 111s) kubelet, k8s-node03 Liveness probe failed:
Normal Killing 31s (x2 over 91s) kubelet, k8s-node03 Container ngx-liveness failed liveness probe, will be restarted
Normal Pulling 0s (x3 over 2m3s) kubelet, k8s-node03 Pulling image "nginx:latest"
kubectl get pods -o wide | grep ngx-health
ngx-health 1/1 Running 2 2m13s 10.244.5.44 k8s-node03 <none> <none>
通过容器的ip地址,端口号及路径调用HTTPGet方法,如果响应的状态码大于等于200且小于400,则认为容器健康,spec.containers.livenessProbe.httpGet
字段用于定义此类检测,它的可用配置字段包括如下几个:
1.创建资源配置清单
创建一个Pod——》运行Nginx容器——》首先启动nginx——》然后沉睡60秒后——〉删除nginx.pid
通过livenessProbe存活探测的httpGet方式请求nginx项目根目录下的index.html文件,访问端口为80,访问地址默认为Pod IP,请求协议为HTTP,如果请求失败则按照重启策略进行重启。
cat ngx-health.yaml
apiVersion: v1
kind: Pod
metadata:
name: ngx-health
spec:
containers:
- name: ngx-liveness
image: nginx:latest
command:
- /bin/sh
- -c
- /usr/sbin/nginx; sleep 60; rm -rf /run/nginx.pid
livenessProbe:
httpGet:
path: /index.html
port: 80
scheme: HTTP
restartPolicy: Always
2.创建Pod资源对象
kubectl apply -f ngx-health.yaml
3.查看Pod运行状态
#容器创建
kubectl get pods -o wide | grep ngx-health
ngx-health 0/1 ContainerCreating 0 7s <none> k8s-node02 <none> <none>
#容器运行成功
kubectl get pods -o wide | grep ngx-health
ngx-health 1/1 Running 0 19s 10.244.2.36 k8s-node02 <none> <none>
4.查看Pod的详细事件信息
容器镜像拉取并启动成功
kubectl describe pods/ngx-health | grep -A 10 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node02
Normal Pulling 30s kubelet, k8s-node02 Pulling image "nginx:latest"
Normal Pulled 15s kubelet, k8s-node02 Successfully pulled image "nginx:latest"
Normal Created 15s kubelet, k8s-node02 Created container ngx-liveness
Normal Started 14s kubelet, k8s-node02 Started container ngx-liveness
容器ready状态后运行60s左右livenessProbe健康检测,可以看到下面已经又开始拉取镜像
kubectl describe pods/ngx-health | grep -A 15 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node02
Normal Pulled 63s kubelet, k8s-node02 Successfully pulled image "nginx:latest"
Normal Created 63s kubelet, k8s-node02 Created container ngx-liveness
Normal Started 62s kubelet, k8s-node02 Started container ngx-liveness
Normal Pulling 1s (x2 over 78s) kubelet, k8s-node02 Pulling image "nginx:latest"
镜像拉取完后再次重启创建并启动了一遍,可以看到 Age 列的时间已经重新计算
kubectl describe pods/ngx-health | grep -A 15 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node02
Normal Pulling 18s (x2 over 95s) kubelet, k8s-node02 Pulling image "nginx:latest"
Normal Pulled 2s (x2 over 80s) kubelet, k8s-node02 Successfully pulled image "nginx:latest"
Normal Created 2s (x2 over 80s) kubelet, k8s-node02 Created container ngx-liveness
Normal Started 1s (x2 over 79s) kubelet, k8s-node02 Started container ngx-liveness
长格式输出Pod,可以看到Pod已经重启过一次
kubectl get pods -o wide | grep ngx-health
ngx-health 0/1 Completed 0 96s 10.244.2.36 k8s-node02 <none> <none>
k8sops@k8s-master01:~/manifests/pod$ kubectl get pods -o wide | grep ngx-health
ngx-health 1/1 Running 1 104s 10.244.2.36 k8s-node02 <none> <none>
通过查看容器日志,可以看到下面的探测日志,默认10秒探测一次
kubectl logs -f pods/ngx-health
10.244.2.1 - - [15/May/2020:03:01:13 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.18" "-"
10.244.2.1 - - [15/May/2020:03:01:23 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.18" "-"
10.244.2.1 - - [15/May/2020:03:01:33 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.18" "-"
10.244.2.1 - - [15/May/2020:03:01:43 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.18" "-"
10.244.2.1 - - [15/May/2020:03:01:53 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.18" "-"
10.244.2.1 - - [15/May/2020:03:02:03 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "kube-probe/1.18" "-"
通过容器的IP地址和端口号进行TCP检查,如果能够建立TCP连接,则表明容器健康。相比较来说,它比基于HTTP的探测要更高效,更节约资源,但精准度略低,毕竟建立连接成功未必意味着页面资源可用,spec.containers.livenessProbe.tcpSocket
字段用于定义此类检测,它主要包含以下两个可用的属性:
1.创建资源配置清单
apiVersion: v1
kind: Pod
metadata:
name: ngx-health
spec:
containers:
- name: ngx-liveness
image: nginx:latest
command:
- /bin/sh
- -c
- /usr/sbin/nginx; sleep 60; rm -rf /run/nginx.pid
livenessProbe:
tcpSocket:
port: 80
restartPolicy: Always
2.创建资源对象
kubectl apply -f ngx-health.yaml
3.查看Pod创建属性信息
#容器创建并启动成功
kubectl describe pods/ngx-health | grep -A 15 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node02
Normal Pulling 19s kubelet, k8s-node02 Pulling image "nginx:latest"
Normal Pulled 9s kubelet, k8s-node02 Successfully pulled image "nginx:latest"
Normal Created 8s kubelet, k8s-node02 Created container ngx-liveness
Normal Started 8s kubelet, k8s-node02 Started container ngx-liveness
#在容器ready状态后60s左右Pod已经有了再次拉取镜像的动作
kubectl describe pods/ngx-health | grep -A 15 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/ngx-health to k8s-node02
Normal Pulled 72s kubelet, k8s-node02 Successfully pulled image "nginx:latest"
Normal Created 71s kubelet, k8s-node02 Created container ngx-liveness
Normal Started 71s kubelet, k8s-node02 Started container ngx-liveness
Normal Pulling 10s (x2 over 82s) kubelet, k8s-node02 Pulling image "nginx:latest"
#通过长格式输出Pod,也可以看到当前Pod已经进入了完成的状态,接下来就是重启Pod
kubectl get pods -o wide | grep ngx-health
ngx-health 0/1 Completed 0 90s 10.244.2.37 k8s-node02 <none> <none>
上面介绍了两种在不同时间段的探测方式,以及两种探测方式所支持的探测方法,这里介绍几个辅助参数
以下示例使用了就绪探测readinessProbe和存活探测livenessProbe
就绪探测配置解析:
initialDelaySeconds
后进行第一次就绪探测,将通过http访问探测容器网站根目录下的index.html文件,如果探测成功,则Pod将被标记为(Ready)状态。periodSeconds
参数所指定的间隔时间进行循环探测,下面我所指定的间隔时间是10秒钟,每隔10秒钟就绪探测一次。successThreshold
参数设定的值,那么会将它再次加入后端Pod。存活探测配置解析
initialDelaySeconds
后进行第一次存活探测,将通过tcpSocket探测容器的80端口,如果探测返回值为0则成功。1.资源配置清单
cat nginx-health.yaml
#create namespace
apiVersion: v1
kind: Namespace
metadata:
name: nginx-health-ns
labels:
resource: nginx-ns
spec:
---
#create deploy and pod
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-health-deploy
namespace: nginx-health-ns
labels:
resource: nginx-deploy
spec:
replicas: 3
revisionHistoryLimit: 10
selector:
matchLabels:
app: nginx-health
template:
metadata:
namespace: nginx-health-ns
labels:
app: nginx-health
spec:
restartPolicy: Always
containers:
- name: nginx-health-containers
image: nginx:1.17.1
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- /usr/sbin/nginx; sleep 60; rm -rf /run/nginx.pid
readinessProbe:
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
failureThreshold: 1
httpGet:
path: /index.html
port: 80
scheme: HTTP
livenessProbe:
initialDelaySeconds: 15
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
failureThreshold: 2
tcpSocket:
port: 80
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
---
#create service
apiVersion: v1
kind: Service
metadata:
name: nginx-health-svc
namespace: nginx-health-ns
labels:
resource: nginx-svc
spec:
clusterIP: 10.106.189.88
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: nginx-health
sessionAffinity: ClientIP
type: ClusterIP
2.创建资源对象
kubectl apply -f nginx-health.yaml
namespace/nginx-health-ns created
deployment.apps/nginx-health-deploy created
service/nginx-health-svc created
3.查看创建的资源对象
k8sops@k8s-master01:/$ kubectl get all -n nginx-health-ns -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/nginx-health-deploy-6bcc8f7f74-6wc6t 1/1 Running 0 24s 10.244.3.50 k8s-node01 <none> <none>
pod/nginx-health-deploy-6bcc8f7f74-cns27 1/1 Running 0 24s 10.244.5.52 k8s-node03 <none> <none>
pod/nginx-health-deploy-6bcc8f7f74-rsxjj 1/1 Running 0 24s 10.244.2.42 k8s-node02 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/nginx-health-svc ClusterIP 10.106.189.88 <none> 80/TCP 25s app=nginx-health
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
deployment.apps/nginx-health-deploy 3/3 3 3 25s nginx-health-containers nginx:1.17.1 app=nginx-health
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
replicaset.apps/nginx-health-deploy-6bcc8f7f74 3 3 3 25s nginx-health-containers nginx:1.17.1 app=nginx-health,pod-template-hash=6bcc8f7f74
4.查看Pod状态,目前Pod状态都没有就绪并且完成状态,准备重启
k8sops@k8s-master01:/$ kubectl get pods -n nginx-health-ns -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-health-deploy-6bcc8f7f74-6wc6t 0/1 Completed 0 64s 10.244.3.50 k8s-node01 <none> <none>
nginx-health-deploy-6bcc8f7f74-cns27 0/1 Completed 0 64s 10.244.5.52 k8s-node03 <none> <none>
nginx-health-deploy-6bcc8f7f74-rsxjj 0/1 Completed 0 64s 10.244.2.42 k8s-node02 <none> <none>
5.目前已经有一台Pod完成重启,已准备就绪
kubectl get pods -n nginx-health-ns -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-health-deploy-6bcc8f7f74-6wc6t 1/1 Running 1 73s 10.244.3.50 k8s-node01 <none> <none>
nginx-health-deploy-6bcc8f7f74-cns27 0/1 Running 1 73s 10.244.5.52 k8s-node03 <none> <none>
nginx-health-deploy-6bcc8f7f74-rsxjj 0/1 Running 1 73s 10.244.2.42 k8s-node02 <none> <none>
6.三台Pod都均完成重启,已准备就绪
kubectl get pods -n nginx-health-ns -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-health-deploy-6bcc8f7f74-6wc6t 1/1 Running 1 85s 10.244.3.50 k8s-node01 <none> <none>
nginx-health-deploy-6bcc8f7f74-cns27 1/1 Running 1 85s 10.244.5.52 k8s-node03 <none> <none>
nginx-health-deploy-6bcc8f7f74-rsxjj 1/1 Running 1 85s 10.244.2.42 k8s-node02 <none> <none>
7.在Pod重启的时候,可以看到Service可以动态关联和取消相关的Pod
调度器通过 kubernetes 的 watch 机制来发现集群中新创建且尚未被调度到 Node 上的 Pod。调度器会将发现的每一个未调度的 Pod 调度到一个合适的 Node 上来运行。调度器会依据调度原则来做出调度选择。
本章节要记录的调度策略有以下几种:
创建一个Pod的工作流程
create pod
阶段。write etcd
阶段。bind pod
阶段。write etcd
阶段。官方文档:https://kubernetes.io/zh/docs/concepts/scheduling-eviction/kube-scheduler/
kube-scheduler 是 Kubernetes 集群的默认调度器,并且是集群控制面(master)的一部分。
对每一个新创建的 Pod 或者是未被调度的 Pod,kube-scheduler 会选择一个最优的 Node 去运行这个 Pod。然而,Pod 内的每一个容器对资源都有不同的需求,而且 Pod 本身也有不同的资源需求。因此,Pod 在被调度到 Node 上之前,根据这些特定的资源调度需求,需要对集群中的 Node 进行一次过滤。
在一个集群中,满足一个 Pod 调度请求的所有 Node 称之为 可调度节点。如果没有任何一个 Node 能满足 Pod 的资源请求,那么这个 Pod 将一直停留在未调度状态直到调度器能够找到合适的 Node。
调度器先在集群中找到一个 Pod 的所有可调度节点,然后根据一系列函数对这些可调度节点打分,然后选出其中得分最高的 Node 来运行 Pod。之后,调度器将这个调度决定通知给 kube-apiserver,这个过程叫做 绑定。
在做调度决定时需要考虑的因素包括:单独和整体的资源请求、硬件/软件/策略限制、亲和以及反亲和要求、数据局域性、负载间的干扰等等。
kube-scheduler 给一个 pod 做调度选择包含两个步骤:
**过滤阶段:**过滤阶段会将所有满足 Pod 调度需求的 Node 选出来。例如,PodFitsResources 过滤函数会检查候选 Node 的可用资源能否满足 Pod 的资源请求。在过滤之后,得出一个 Node 列表,里面包含了所有可调度节点;通常情况下,这个 Node 列表包含不止一个 Node。如果这个列表是空的,代表这个 Pod 不可调度。
**打分阶段:**在过滤阶段后调度器会为 Pod 从所有可调度节点中选取一个最合适的 Node。根据当前启用的打分规则,调度器会给每一个可调度节点进行打分。最后,kube-scheduler 会将 Pod 调度到得分最高的 Node 上。如果存在多个得分最高的 Node,kube-scheduler 会从中随机选取一个。
官方文档:https://kubernetes.io/docs/reference/scheduling/policies/
过滤阶段需求:
打分阶段需求:
requestedToCapacity
的ResourceAllocationPriority
。scheduler.alpha.kubernetes.io/preferAvoidPods
。默认配置使用的就是kube-scheduler调度组件,我们下面例子启动三个Pod,看分别被分配到哪个Node。
1.创建资源配置清单
cat scheduler-pod.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: scheduler-deploy
spec:
replicas: 3
selector:
matchLabels:
app: scheduler-pod
template:
metadata:
labels:
app: scheduler-pod
spec:
containers:
- image: busybox:latest
name: scheduler-pod
command: [ "/bin/sh", "-c", "tail -f /etc/passwd" ]
2.使用kubectl创建资源对象
kubectl apply -f scheduler-pod.yaml
3.查看被kube-scheduler自动调度的Pod
两个Pod在Node03上,一个在Node02上
kubectl get pods -o wide | grep scheduler
scheduler-deploy-65d8f9c98-cqdm9 1/1 Running 0 111s 10.244.5.59 k8s-node03 <none> <none>
scheduler-deploy-65d8f9c98-d4t9p 1/1 Running 0 111s 10.244.5.58 k8s-node03 <none> <none>
scheduler-deploy-65d8f9c98-f8xxc 1/1 Running 0 111s 10.244.2.45 k8s-node02 <none> <none>
4.我们查看一下Node资源的使用情况
Node01,可用内存2.7G
nodeName会将Pod调度到指定的Node上
1.创建资源配置清单
cat nodeName-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: nodename-pod
spec:
nodeName: k8s-node02
containers:
- image: busybox:latest
name: nodename-containers
command: [ "/bin/sh", "-c", "tail -f /etc/passwd" ]
2.创建Pod资源对象
如下,nodename-pod被绑定在了k8s-node02上
kubectl get pods -o wide | grep name
nodename-pod 1/1 Running 0 25s 10.244.2.46 k8s-node02 <none> <none>
nodeSelector用于将Pod调度到匹配Label的Node上,所以要先给node打上标签,然后在Pod配置清单中选择指定Node的标签。
先给规划node用途,然后打标签,例如将两台node划分给不同团队使用:
1.为Node添加标签
node02给开发团队用,node03给大数据团队用
#添加标签
kubectl label nodes k8s-node02 team=development
kubectl label nodes k8s-node03 team=bigdata
#查看标签
kubectl get nodes -o wide --show-labels
2.创建资源配置清单
cat nodeSelector-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: nodeselector-pod
spec:
nodeSelector:
team: development
containers:
- image: busybox:latest
name: nodeselector-containers
command: [ "/bin/sh", "-c", "tail -f /etc/passwd" ]
3.创建Pod资源对象
kubectl apply -f nodeSelector-pod.yaml
4.查看pod被分配的Node
kubectl get pods -o wide | grep nodeselect
nodeselector-pod 1/1 Running 0 49s 10.244.2.47 k8s-node02 <none> <none>
4.删除标签
kubectl label nodes k8s-node02 team-
kubectl label nodes k8s-node03 team-
删除标签后pod还在正常运行
kubectl get pods -o wide | grep nodeselect
nodeselector-pod 1/1 Running 0 11m 10.244.2.47 k8s-node02 <none> <none>
把Pod删除然后再次创建Pod
kubectl delete pods/nodeselector-pod
kubectl apply -f nodeSelector-pod.yaml
#会发现该pod一直在等待中,找不到清单中配置标签的Node
kubectl get pods -o wide | grep nodeselect
nodeselector-pod 0/1 Pending 0 55s <none> <none> <none> <none>
#事件:6个节点都不匹配 node selector
kubectl describe pods/nodeselector-pod | grep -A 10 Events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling <unknown> default-scheduler 0/6 nodes are available: 6 node(s) didn't match node selector.
Warning FailedScheduling default-scheduler 0/6 nodes are available: 6 node(s) didn' t match node selector.
污点与容忍度(taint and tolerations)
官方文档:https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/#example-use-cases
污点是非常霸道的行为,我们可以给Node打上污点,打污点的程度有三个级别,分别如下
NoSchedule
,那么kube-scheduler
将不在被调度Pod到本机。PreferNoSchedule
,那么kube-scheduler
将尽量不调度Pod到本机。NoExecute
,那么kube-scheduler
不仅不会调度Pod到本机,还会驱逐Node上已有的Pod。污点应用场景:节点独占,例如具有特殊硬件设备的节点,如GPU计算型硬件需要给特定的应用去使用。
1.添加污点
为k8s-node02添加污点,污点程度为NoSchedule
,type=calculate
为标签
kubectl taint node k8s-node02 type=calculate:NoSchedule
2.查看污点
kubectl describe nodes k8s-node02 | grep Taints
这样的话我们创建Pod就不会被调度到我们打上污点的k8s-node02的节点上
3.创建Pod资源配置清单
我们创建3个Pod,看看其是否会将Pod调度到我们打污点Node上
cat taint-pod.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: taint-deploy
spec:
replicas: 3
selector:
matchLabels:
app: taint-pod
template:
metadata:
labels:
app: taint-pod
spec:
containers:
- image: busybox:latest
name: taint-pod
command: [ "/bin/sh", "-c", "tail -f /etc/passwd" ]
2.查看Pod被调度的Node
下面三个Pod都被调度到了Node03上,效果可能不是很明显,我们为Node02打了污点,还有Node01没有体会出来
kubectl apply -f taint-pod.yaml
kubectl get pods -o wide | grep taint
taint-deploy-748989f6d4-f7rbq 1/1 Running 0 41s 10.244.5.62 k8s-node03 <none> <none>
taint-deploy-748989f6d4-nzwjg 1/1 Running 0 41s 10.244.5.61 k8s-node03 <none> <none>
taint-deploy-748989f6d4-vzzdx 1/1 Running 0 41s 10.244.5.60 k8s-node03 <none> <none>
4.扩容Pod
我们将Pod扩容至9台,让Pod分配到Node01节点,可以直观的展现污点
kubectl scale --replicas=9 deploy/taint-deploy -n default
kubectl get pods -o wide | grep taint
taint-deploy-748989f6d4-4ls9d 1/1 Running 0 54s 10.244.5.65 k8s-node03 <none> <none>
taint-deploy-748989f6d4-794lh 1/1 Running 0 68s 10.244.5.63 k8s-node03 <none> <none>
taint-deploy-748989f6d4-bwh5p 1/1 Running 0 54s 10.244.5.66 k8s-node03 <none> <none>
taint-deploy-748989f6d4-ctknr 1/1 Running 0 68s 10.244.5.64 k8s-node03 <none> <none>
taint-deploy-748989f6d4-f7rbq 1/1 Running 0 2m27s 10.244.5.62 k8s-node03 <none> <none>
taint-deploy-748989f6d4-hf9sf 1/1 Running 0 68s 10.244.3.51 k8s-node01 <none> <none>
taint-deploy-748989f6d4-nzwjg 1/1 Running 0 2m27s 10.244.5.61 k8s-node03 <none> <none>
taint-deploy-748989f6d4-prg2f 1/1 Running 0 54s 10.244.3.52 k8s-node01 <none> <none>
taint-deploy-748989f6d4-vzzdx 1/1 Running 0 2m27s 10.244.5.60 k8s-node03 <none> <none>
以上调度了两台Pod到Node02,目前Node03和Node01都可以分配到Pod,而被打了污点的Node02无法分配Pod
5.删除污点
删除污点之需要指定标签的 key 及污点程度
kubectl taint node k8s-node02 type:NoSchedule-
上面为Node打了污点之后就没有Pod可以调度上去,那么我们必须要将专属的Pod给调度到打上污点的Node怎么办?那就给Pod添加上容忍度,被添加上容忍度的Pod则可以被调度到打了污点Node之上。
容忍度所用到的参数tolerations
,tolerations参数下的还有以下几个二级参数:
1.容忍度示例
#为node03打上污点
kubectl taint node k8s-node03 type=calculate:NoSchedule
#查看污点
kubectl describe nodes k8s-node03 | grep Taints
Taints: type=calculate:NoSchedule
2.Pod资源配置清单
cat taint-pod.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: taint-deploy
spec:
replicas: 3
selector:
matchLabels:
app: taint-pod
template:
metadata:
labels:
app: taint-pod
spec:
tolerations:
- key: "type"
operator: "Equal"
value: "calculate"
effect: "NoSchedule"
containers:
- image: busybox:latest
name: taint-pod
command: [ "/bin/sh", "-c", "tail -f /etc/passwd" ]
3.创建Pod资源对象
kubectl apply -f taint-pod.yaml
4.查看Pod分配的Node
以下两个Pod被调度到了Node03上,一个Pod被调度到了Node02上,我们做了容忍度是可以让Pod运行在被打了污点的Node上,但并不是所有打了容忍度的Pod都要运行在污点的Node上,所以 kube-scheduler 组件还会向其它Node进行调度Pod。
kubectl get pods -o wide | grep taint
taint-deploy-9868f98d7-dkr4n 1/1 Running 0 17s 10.244.5.74 k8s-node03 <none> <none>
taint-deploy-9868f98d7-f762b 1/1 Running 0 17s 10.244.5.75 k8s-node03 <none> <none>
taint-deploy-9868f98d7-zg4hk 1/1 Running 0 3m22s 10.244.2.49 k8s-node02 <none> <none>
5.为k8s-node01添加NoExecute驱赶程度的污点
#添加污点
kubectl taint node k8s-node01 type=data:NoExecute
#观察Node01上的Pod状态
kubectl get pods -o wide | grep k8s-node01
下图中Node01上的Pod正在被驱赶至其它Node上进行重启
kubectl explain pods.spec.affinity
参数文档:kubectl explain pods.spec.affinity.nodeAffinity
亲和性调度机制极大的扩展了Pod的调度能力,主要增强功能如下:
目前有两种节点亲和力表达:
IgnoredDuringExecution指:如果一个Pod所在的节点在Pod运行期间标签发生了变化,不再符合该Pod的节点亲和性需求,则系统将忽略Node上Label的变化,该Pod能继续在该节点运行。
参数文档:kubectl explain pods.spec.affinity.podAffinity
参数文档:kubectl explain pods.spec.affinity.podAntiAffinity