pod通过两类探针来检查容器的健康状态。分别是LivenessProbe(存活性探测)和ReadinessProbe(就绪型探测)
LivenessProbe和ReadinessProbe均可配置以下三种探针实现方式:
在下面的例子中,通过执行“cat /tmp/health” 命令来判断一个容器运行十分正常。
在该pod运行后。将在创建/tmp/health 文件10s后删除该文件。
LivenessProbe健康检查的初始探测时间(initialDelaySeconds)为15s,探测结果为Fail,将导致kubelet 杀掉该容器并重启它:
vim liveness-exec.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: busybox
args: ["/bin/sh","-c","echo ok > /tmp/health; sleep 10; rm -rf /tmp/health; sleep 600"]
livenessProbe:
exec:
command: ["cat","/tmp/health"]
initialDelaySeconds: 15
timeoutSeconds: 1
创建pod:
[root@bogon ~]# kubectl create -f liveness-exec.yaml
pod/liveness-exec created
[root@bogon ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
dapi-test-pod 0/1 Completed 0 8h
dapi-test-pod-container-vars 1/1 Running 0 7h8m
dapi-test-pod-volume 1/1 Running 0 4h49m
liveness-exec 1/1 Running 0 7s
查看详情
kubectl describe pods/liveness-exec
会发现restart 字样
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/liveness-exec to server01
Normal Pulled 8s (x3 over 2m31s) kubelet, server01 Successfully pulled image "busybox"
Normal Created 8s (x3 over 2m31s) kubelet, server01 Created container liveness
Normal Started 8s (x3 over 2m31s) kubelet, server01 Started container liveness
Warning Unhealthy <invalid> (x9 over 2m11s) kubelet, server01 Liveness probe failed: cat: can't open '/tmp/health': No such file or directory
Normal Killing <invalid> (x3 over 112s) kubelet, ****server01 Container liveness failed liveness probe, will be restarted****
Normal Pulling <invalid> (x4 over 2m36s) kubelet, server01 Pulling image "busybox"
在下面的例子中,通过与容器内的localhost:80端口建立tcp连接进行健康检查:
vim pod-with-healthcheck.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-with-healthcheck
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
livenessProbe:
tcpSocket:
port: 80
initialDelaySeconds: 30
timeoutSeconds: 1
创建pod
[root@bogon ~]# kubectl create -f pod-with-healthcheck.yaml
[root@bogon ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
pod-with-healthcheck 1/1 Running 0 45s
查看详情:
[root@bogon ~]# kubectl describe pod pod-with-healthcheck
Containers:
nginx:
Container ID: docker://19f987e8146d926fd28e04d1fa9677d60cb80f20e42a36fc20fba346412113c5
Image: nginx
Image ID: docker-pullable://nginx@sha256:a93c8a0b0974c967aebe868a186e5c205f4d3bcb5423a56559f2f9599074bbcd
Port: 80/TCP
Host Port: 0/TCP
State: Running
Started: Sat, 27 Jun 2020 21:11:23 +0800
Ready: True
Restart Count: 0
Liveness: tcp-socket :80 delay=30s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-btm7g (ro)
在下面的例子中,kubelet 定时发送HTTP请求到localhost:80/_status/healthz来进行容器应用的监控检查:
cat pod-http-get-action.yaml
apiVersion: v1
kind: Pod
metadata:
name: pod-http-get-action
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
livenessProbe:
httpGet:
path: /_status/healthz
port: 80
initialDelaySeconds: 30
timeoutSeconds: 1
查看详情
kubectl describe pod pod-http-get-action
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/pod-http-get-action to server01
Normal Pulling <invalid> (x2 over 30s) kubelet, server01 Pulling image "nginx"
Normal Killing <invalid> kubelet, server01 Container nginx failed liveness probe, will be restarted
Normal Pulled <invalid> (x2 over 29s) kubelet, server01 Successfully pulled image "nginx"
Normal Created <invalid> (x2 over 29s) kubelet, server01 Created container nginx
Normal Started <invalid> (x2 over 28s) kubelet, server01 Started container nginx
Warning Unhealthy <invalid> (x4 over <invalid>) kubelet, server01 Liveness probe failed: HTTP probe failed with statuscode: 404
该pod 已经重启了3次
[root@bogon ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
pod-http-get-action 1/1 Running 3 3m28s
对于每种探测方式,都需要设置 initialDelaySeconds 和timeoutSeconds 两个参数。他们的含义分别是: