Configure Liveness and Readiness Probes
对线上业务来说,保证服务的正常稳定是重中之重,对故障服务的及时处理避免影响业务以及快速恢复一直是开发运维的难点。Kubernetes提供了健康检查服务,对于检测到故障服务会被及时自动下线,以及通过重启服务的方式使服务自动恢复。
Liveness探针:主要用于判断Container
是否处于运行状态,比如当服务crash或者死锁等情况发生时,kubelet会kill掉Container,然后根据其设置的restart policy进行相应操作(可能会在本机重新启动Container,或者因为设置Kubernetes QoS,本机没有资源情况下会被分发的其他机器上重新启动)。
Readness探针:主要用于判断服务
是否已经正常工作,如果服务没有加载完成或工作异常,服务所在的Pod的IP地址会从服务的Endpoints中被移除,也就是说,当服务没有ready时,会将其从服务的load balancer中移除,不会再接受或响应任何请求。
探针检查结果分为3种情况:
readiness
)失败,故障的服务实例从service endpoint中下线
,外部请求将不会再转发到该服务上,一定程度上保证正在提供的服务的正确性,如果服务自我恢复了(比如网络问题),会自动重新加入service endpoint对外提供服务。liveness
)的探针,对故障服务的Container(liveness)的探针同样会失败,container会被kill掉
,并根据原设置的container重启策略,系统倾向于在其原所在的机器上重启该container、或其他机器重新创建一个pod。由于上面的机制,整个服务实现了自身高可用与自动恢复。
Pod的重启策略:
当pod中一个container处于退出状态(Exited)时,kubelet会根据Pod Spec中设置restartPolicy进行相应操作,重启的策略分别如下:
需要重点说明的是:
一个pod设置重启策略,适用于pod中全部container。
处于退出状态的容器由 kubelet 以五分钟为上限的指数衰减延迟(10秒,20秒,40秒…)重新启动,并在container重启成功十分钟后会重置该值。
liveness.yaml
容器会创建一个文件/tmp/healthy
,30秒后删除;探针5秒会检查一次,检查方式为cat /tmp/healthy
文件是否存在,检查到容易有问题,探测失败3次,则重建容器
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec-wfq
spec:
containers:
- name: liveness
image: busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
如检测日志:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 17m default-scheduler Successfully assigned liveness-exec-wfq to 10.0.1.16
Normal SuccessfulMountVolume 17m kubelet, 10.0.1.16 MountVolume.SetUp succeeded for volume "default-token-nqldz"
Normal Pulling 15m (x3 over 17m) kubelet, 10.0.1.16 pulling image "busybox"
Normal Pulled 15m (x3 over 17m) kubelet, 10.0.1.16 Successfully pulled image "busybox"
Normal Created 15m (x3 over 17m) kubelet, 10.0.1.16 Created container
Normal Started 15m (x3 over 17m) kubelet, 10.0.1.16 Started container
Warning Unhealthy 14m (x9 over 17m) kubelet, 10.0.1.16 Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
Normal Killing 7m (x7 over 16m) kubelet, 10.0.1.16 Killing container with id docker://liveness:Container failed liveness probe.. Container will be killed and recreated.
Warning BackOff 2m (x27 over 10m) kubelet, 10.0.1.16 Back-off restarting failed container
liveness_http.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-http-wfq
spec:
containers:
- name: liveness
image: googlecontainer/liveness
args:
- /server
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: X-Custom-Header
value: Awesome
initialDelaySeconds: 3
periodSeconds: 3
readness.yaml
apiVersion: v1
kind: Service
metadata:
name: nginxsvc
labels:
app: nginx
spec:
type: NodePort
ports:
- port: 80
protocol: TCP
name: http
- port: 443
protocol: TCP
name: https
selector:
run: my-nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-nginx
spec:
selector:
matchLabels:
run: my-nginx
replicas: 2
strategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
template:
metadata:
labels:
run: my-nginx
spec:
containers:
- name: nginxhttps
image: ymqytw/nginxhttps:1.5
command: ["/home/auto-reload-nginx.sh"]
ports:
- containerPort: 443
- containerPort: 80
livenessProbe:
httpGet:
path: /index.html
port: 80
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 1
readinessProbe:
httpGet:
path: /index.html
port: 80
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 1
initcontailner
如在initcontainer中先拉取nginx的配置,然后在nginx container 中去消费nginx的配置
init_container.yaml
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: busybox
command: ['sh', '-c', 'echo The app is running! && sleep 3600']
initContainers:
- name: init-myservice
image: busybox
command: ['sh', '-c', 'until nslookup myservice; do echo waiting for myservice; sleep 2; done;']
- name: init-mydb
image: busybox
command: ['sh', '-c', 'until nslookup mydb; do echo waiting for mydb; sleep 2; done;’]