健康检查
Kubernetes 作为一个面向应用的集群管理工具,需要确保容器在部署后确实处在正常的运行状态。
容器探测用于检测容器中的应用实例是否正常工作,是保障业务可用性的一种传统机制。如果经过探测,实例的状态不符合预期,那么kubernetes就会把该问题实例“摘除”,不承担业务流量。
Kubernetes 提供了两种探针(Probe,支持 exec、tcpSocket 和 http 方式) 来探测容器的状态:
Pod 通过两类探针检查容器的健康状态:
LivenessProbe 探针
存活性探针,用于判断容器是否健康,告诉 Kubelet 一个容器什么时候 处于不健康的状态。
如果 LivenessProbe 探针探测到容器不健康,则 Kubelet 将删 除该容器,并根据容器的重启策略做相应的处理。如果一个容器不包含 LivenessProbe 探针,那么 Kubelet 认为该容器的 LivenessProbe 探针返回的值永 远是 “Success”。Kubelet 定期调用容器中的 LivenessProbe 探针来诊断容器的健康状况。
ReadinessProbe 探针
就绪性探针,用于判断容器是否启动完成且准备接收请求。
如果 ReadinessProbe 探针探测到失败,则 Pod 的状态将被修改(连续探测3次之后Ready状态不可用,STATUS状态变为Complete)。Endpoint Controller 将从 Service 的 Endpoint 中删除包含该容器所在 Pod 的 IP 地址的 Endpoint 条目。
livenessProbe 决定是否重启容器,readinessProbe 决定是否将请求转发给容器
探针是由 kubelet 对容器执行的定期诊断。要执行诊断,kubelet 调用由容器实现的 Handler。探针的三种类型处理方式:
Exec:在容器内部执行一个命令,如果该命令的退出状态码为 0,则表明容器健康;
……
livenessProbe: # 两种探针写法一致
exec:
command:
- cat
- /tmp/healthy
……
tcpSocket:通过容器的 IP 地址和端口号执行 TCP 检查,如果端口能被访 问,则表明容器健康;
……
livenessProbe: # 两种探针写法一致
tcpSocket:
port: 8080
……
httpGet:通过容器的 IP 地址和端口号及路径调用 HTTP GET 方法,如果 响应的状态码大于等于 200 且小于 400,则认为容器状态健康。
……
livenessProbe: # 两种探针写法一致
httpGet:
path: / # URI地址
port: 80 # 端口号
host: 127.0.0.1 # 主机地址
scheme: HTTP # 支持的协议 HTTP或HTTPS
……
LivenessProbe 和 ReadinessProbe 探针包含在 Pod 定义的 spec.containers.{某个容器} 中。这两个探针除了上述三种方式的子属性,还有同样的子属性:
[root@k8s-master ~]# kubectl explain pod.spec.containers.livenessProbe/readinessProbe
KIND: Pod
VERSION: v1
RESOURCE: livenessProbe <Object> / readinessProbe <Object>
FIELDS:
exec <Object>
httpGet <Object>
tcpSocket <Object>
initialDelaySeconds <integer> # 容器启动后等待多少秒执行第一次探测
timeoutSeconds <integer> # 探测超时时间。默认1秒,最小1秒
periodSeconds <integer> # 执行探测的频率。默认是10秒,最小1秒
failureThreshold <integer> # 连续探测失败多少次才被认定为失败、默认是3,最小值是1
successThreshold <integer> # 连续探测成功多少次才被认定为成功。默认是1
每次探测都将获得以下三种结果之一:
成功:容器通过了诊断。
失败:容器未通过诊断。
未知:诊断失败,因此不会采取任何行动
Liveness 探测和 Readiness 探测比较:
Liveness 探测和 Readiness 探测是两种 Health Check 机制,如果不特意配置,Kubernetes
将对两种探测采取相同的默认行为,即通过判断容器启动进程的返回值是否为零来判断探测是否成功。
两种探测的配置方法完全一样,支持的配置参数也一样。不同之处在于探测失败后的行为:Liveness 探测是重启容器;Readiness探测则是将容器设置为不可用,不接收 Service 转发的请求。
Liveness 探测和 Readiness 探测是独立执行的,二者之间没有依赖,所以可以单独使用,也可以同时使用。
用 Liveness 探测判断容器是否需要重启以实现自愈;用 Readiness 探测判断容器是否已经准备好对外提供服务
【例 】
ReadinessProbe探针(就绪检测)
[root@k8s-master ~]# vim readiness.yml
apiVersion: v1
kind: Pod
metadata:
name: readiness-httpget-pod
namespace: default
spec:
containers:
- name: readiness-httpget-container
image: nginx
imagePullPolicy: IfNotPresent
readinessProbe: # 就绪检测指针
httpGet: # httpGet 检测方式
port: 80
path: /index1.html
initialDelaySeconds: 1 # 容器启动1秒后检测
periodSeconds: 3 # 检测失败后多少秒后重试
[root@k8s-master ~]# kubectl create -f readiness.yml
# 正在拉取镜像
[root@k8s-master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
readiness-httpget-pod 0/1 ContainerCreating 0 9s
# 镜像拉取成功,虽然状态为running,但是没有 READY
[root@k8s-master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
readiness-httpget-pod 0/1 Running 0 108s
[root@k8s-master ~]# kubectl describe pod readiness-httpget-pod
……省略……
Normal Pulled 2m49s kubelet, k8s-node2 Successfully pulled image "nginx"
Normal Created 2m49s kubelet, k8s-node2 Created container readiness-httpget-container
Normal Started 2m49s kubelet, k8s-node2 Started container readiness-httpget-container
# 探测失败,因为不存在/index1.html文件
Warning Unhealthy 107s (x21 over 2m47s) kubelet, k8s-node2 Readiness probe failed: HTTP probe failed with statuscode: 404
# 进入容器创建index1.html文件
[root@k8s-master ~]# kubectl exec -it readiness-httpget-pod /bin/bash
root@readiness-httpget-pod:/# ls
bin docker-entrypoint.d home media proc sbin tmp
boot docker-entrypoint.sh lib mnt root srv usr
dev etc lib64 opt run sys var
root@readiness-httpget-pod:/# cd usr/share/nginx/html/
root@readiness-httpget-pod:/usr/share/nginx/html# ls
50x.html index.html
root@readiness-httpget-pod:/usr/share/nginx/html# echo "test" > index1.html
root@readiness-httpget-pod:/usr/share/nginx/html# exit
exit
# 此时容器已 READY(就绪)
[root@k8s-master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
readiness-httpget-pod 1/1 Running 0 9m11s
LivenessProbe 探针(存活检测)
exec 方式
[root@k8s-master ~]# vim liveness.yaml
apiVersion: v1
kind: Pod
metadata:
name: liveness-exec-pod
namespace: default
spec:
containers:
- name: liveness-exec-container
image: nginx
imagePullPolicy: IfNotPresent
command: ["/bin/sh","-c","touch /tmp/live && sleep 60 && rm -rf /tmp/live"]
livenessProbe:
exec:
command: ["ls","/tmp/live"] # 探测操作,通过该命令执行后的状态码,0表示探测状态为正常,非0不正常
initialDelaySeconds: 1 # 容器启动 1 秒之后开始探测
periodSeconds: 3 # 指定探测间隔 5 秒 ,连续探测 3 次失败,则会重启容器
# 创建
[root@k8s-master ~]# kubectl create -f liveness.yaml
pod/liveness-exec-pod created
# 查看
[root@k8s-master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
liveness-exec-pod 0/1 ContainerCreating 0 2s
[root@k8s-master ~]# kubectl get pods -o wide -w # RESTARTS一直在增长,因为就绪条件未满足,容器会一直重启
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-exec-pod 1/1 Running 0 44s 10.244.2.7 k8s-node1 <none> <none>
liveness-exec-pod 1/1 Running 1 101s 10.244.2.7 k8s-node1 <none> <none>
# RESTARTS表示pod重启次数,因为容器启动后的60s后删除了/tmp/live文件,而liveness探针正是检测该,后面的3600秒则是防止docker容器
# 文件来判断pod的存活。文件不存在,探测不到该文件,那么就会删除pod,然后重新启动pod,周而复始循环
[root@k8s-master ~]# kubectl describe pod liveness-exec-pod
……省略……
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/liveness-exec-pod to k8s-node2
Normal Pulled 76s (x3 over 3m29s) kubelet, k8s-node2 Container image "nginx:1.8" already present on machine
Normal Created 76s (x3 over 3m29s) kubelet, k8s-node2 Created container liveness-exec-container
Normal Started 76s (x3 over 3m29s) kubelet, k8s-node2 Started container liveness-exec-container
Warning Unhealthy 15s kubelet, k8s-node2 Liveness probe errored: rpc error: code = Unknown desc = container not running (180426712bd129c122f4df1432ef573cf1f56627a92ae76bf6db42fc4054df95)
Warning BackOff 12s (x4 over 89s) kubelet, k8s-node2 Back-off restarting failed container
httpGet 方式
[root@k8s-master ~]# vim liveness_httpget.yml
apiVersion: v1
kind: Pod
metadata:
name: liveness-httpget-pod
namespace: default
spec:
containers:
- name: liveness-httpget-container
image: nginx
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
port: 80
path: /index.html
scheme: HTTP # host 未指定就是本容器地址。其实就是访问 http://127.0.0.1:80/index.html
initialDelaySeconds: 1
periodSeconds: 3
timeoutSecond: 10
[root@k8s-master ~]# kubectl create -f liveness_httpget.yml
pod/liveness-httpget-pod created
[root@k8s-master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
liveness-httpget-pod 1/1 Running 0 15s
# 删除 index.html,模拟存活探针的探测条件不满足
[root@k8s-master ~]# kubectl exec liveness-httpget-pod -it -- rm -rf /usr/share/nginx/html/index.html
# 因为liveness探针检测文件不存在,会删除容器,并重启pod
[root@k8s-master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
liveness-httpget-pod 1/1 Running 1 104s
tcpSocket方式
[root@k8s-master ~]# vim liveness_tcp.yml
apiVersion: v1
kind: Pod
metadata:
name: liveness-tcp-pod
namespace: default
spec:
containers:
- name: liveness-tcp-container
image: nginx
imagePullPolicy: IfNotPresent
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 1
periodSeconds: 3
timeoutSeconds: 10
[root@k8s-master ~]# kubectl create -f liveness_tcp.yml
pod/liveness-tcp-pod created
[root@k8s-master ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-tcp-pod 1/1 Running 0 7s 10.244.2.10 k8s-node1 <none> <none>
# 因为8080端口未使用,所以一直无法探测到建立连接,所以会一直会删除容器,重启pod
[root@k8s-master ~]# kubectl get pods -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-tcp-pod 1/1 Running 1 13s 10.244.2.10 k8s-node1 <none> <none>
liveness-tcp-pod 1/1 Running 2 21s 10.244.2.10 k8s-node1 <none> <none>
liveness-tcp-pod 0/1 CrashLoopBackOff 2 29s 10.244.2.10 k8s-node1 <none> <none>
liveness-tcp-pod 1/1 Running 3 42s 10.244.2.10 k8s-node1 <none> <none>
liveness-tcp-pod 1/1 Running 4 50s 10.244.2.10 k8s-node1 <none> <none>
liveness-tcp-pod 0/1 CrashLoopBackOff 4 59s 10.244.2.10 k8s-node1 <none> <none>
就绪检测+存活检测
[root@k8s-master ~]# cat live+read.yml
apiVersion: v1
kind: Pod
metadata:
name: liveness-readiness-pod
namespace: default
spec:
containers:
- name: liveness-readiness-container
image: nginx
imagePullPolicy: IfNotPresent
readinessProbe:
httpGet:
port: 80
path: /index1.html
initialDelaySeconds: 1
periodSeconds: 3
livenessProbe:
httpGet:
port: 80
path: /index.html
initialDelaySeconds: 1
periodSeconds: 3
timeoutSeconds: 10
# 就绪检测,如果据徐检测不成立,那么pod会一直未就绪,就绪探针会一直检测,直到检测就绪条件满足,pod便会就绪;
# 存活检测,如果存活检测条件不成立,那么会删除容器,重启pod,存活探针会一直检测,周而复始,直到检测条件满足便会正常运行下去
[root@k8s-master ~]# kubectl create -f live+read.yml
pod/liveness-readiness-pod created
[root@k8s-master ~]# kubectl get pods -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-readiness-pod 0/1 Running 0 7s 10.244.2.12 k8s-node1 <none> <none>
# 因为此时就绪检测条件未成立,所以pod一直未就绪,即index1文件不存在
# 创建该文件
[root@k8s-master ~]# kubectl exec liveness-readiness-pod -it -- touch /usr/share/nginx/html/index1.html
# 就绪条件满足,pod 就绪
[root@k8s-master ~]# kubectl get pods -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-readiness-pod 1/1 Running 0 5m17s 10.244.2.12 k8s-node1 <none> <none>
# 该Pod会一直正常运行下去,因为无论是就绪检测还是存活检测条件都是满足的。
# 此时,删除index.html文件,模拟存货条件不满足
[root@k8s-master ~]# kubectl exec liveness-readiness-pod -it -- rm -rf /usr/share/nginx/html/index.html
[root@k8s-master ~]# kubectl get pods -o wide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
liveness-readiness-pod 1/1 Running 0 7m27s 10.244.2.12 k8s-node1 <none> <none>
liveness-readiness-pod 0/1 Running 1 7m33s 10.244.2.12 k8s-node1 <none> <none>
# 可以看到pod被重启,重启后index文件会重新生成,所以所有条件又满足,pod会继续正常运行下去