研发工程师玩转Kubernetes——启动、存活和就绪探针

启动(Startup Probe)、存活(Liveness Probe)和就绪探针(Readiness Probe)有其不同的用途和优先级。

优先级和用途

启动探针(Startup Probe)用于Pod内程序告诉kubernetes,其准备工作已经做好。这些准备工作主要是指业务运行前的前置条件,比如资源文件下载完毕,内置数据库文件下载完毕等。这步完成后存活和就绪探针才会开始工作。
存活和就绪探针之间没有关系,所以它们没有优先级区别,即在启动探针确定Success后,它们两个同时开始检测。有任何一个失败就会执行其对应的失败处理动作。
存活探针用于表示程序是否活着。如果被认定不存活,会依据设置要么重启容器或让Pod调度失败。
就绪探针表示程序是否可以提供服务。一般Pod内程序是通过Service对外提供服务,如果就绪探针失败,Service会将该Pod摘除,这样流量就不会打到这个不能工作的Pod上;如果就绪探针成功了,该Pod又会被加进Service。
似乎有存活和就绪探针就够了,为什么还要启动探针呢?因为一些准备工作我们并不知道其需要花多长时间,比如可能网络带宽问题导致资源文件下载很慢。这个时候设置存活或者就绪探针就可能不准确,或者导致其不灵敏。所以设置启动探针可以提升其他探针的灵敏度。

启动和存活探针

# startup_liveness.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: startup-liveness-deployment
spec:
  selector:
    matchLabels:
      app: startup-liveness
  template:
    metadata:
      labels:
        app: startup-liveness
    spec:
      containers:
      - name: startup-liveness-container
        image: busybox
        command: ["/bin/sh", "-c", "sleep 6; touch /tempdir/ready; sleep 3;touch /tempdir/keepalive; while true; do sleep 5;  done"]
        volumeMounts:
        - name:  probe-volume
          mountPath:  /tempdir
        startupProbe:
          exec:
            command:
            - cat
            - /tempdir/ready
          initialDelaySeconds: 3
          failureThreshold: 6
          periodSeconds: 1
          successThreshold: 1
        livenessProbe:
          exec:
            command:
            - cat
            - /tempdir/keepalive
          failureThreshold: 6
          periodSeconds: 1
          successThreshold: 1
      volumes:
      - name: probe-volume
        emptyDir: 
          medium: Memory
          sizeLimit: 1Gi

这段清单中的逻辑如下图
研发工程师玩转Kubernetes——启动、存活和就绪探针_第1张图片
我们使用下面指令查看中间发生的事件

kubectl describe pod 
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  15s               default-scheduler  Successfully assigned default/startup-liveness-deployment-66f76576ff-9pnmj to ubuntub
  Normal   Pulling    15s               kubelet            Pulling image "busybox"
  Normal   Pulled     13s               kubelet            Successfully pulled image "busybox" in 2.603715682s (2.603722383s including waiting)
  Normal   Created    13s               kubelet            Created container startup-liveness-container
  Normal   Started    13s               kubelet            Started container startup-liveness-container
  Warning  Unhealthy  7s (x4 over 10s)  kubelet            Startup probe failed: cat: can't open '/tempdir/ready': No such file or directory
  Warning  Unhealthy  4s (x2 over 5s)   kubelet            Liveness probe failed: cat: can't open '/tempdir/keepalive': No such file or directory

可以看到Startup Probe在第4次检测时,/tempdir/ready文件还没创建。但是第5次时,就检测到了它,于是进入Liveness Probe检测状态。这从最后两个行为经历的时间差10s-5s=5s可以印证Startup Probe执行的次数大概在4~5次之间(因为检测周期periodSeconds是1秒)。
Liveness Probe在第2次检测时,/tempdir/keepalive还没创建。到第三次检测时,这个标志文件就创建了。于是整个Pod进入了Running状态。

启动和就绪探针

# startup_readiness.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: startup-readiness-deployment
spec:
  selector:
    matchLabels:
      app: startup-readiness
  template:
    metadata:
      labels:
        app: startup-readiness
    spec:
      containers:
      - name: startup-readiness-container
        image: busybox
        command: ["/bin/sh", "-c", "sleep 6; touch /tempdir/ready; sleep 3;touch /tempdir/readiness; while true; do sleep 5; done"]
        volumeMounts:
        - name:  probe-volume
          mountPath:  /tempdir
        startupProbe:
          exec:
            command:
            - cat
            - /tempdir/ready
          initialDelaySeconds: 3
          failureThreshold: 6
          periodSeconds: 1
          successThreshold: 1
        readinessProbe:
          exec:
            command:
            - cat
            - /tempdir/readiness
          failureThreshold: 6
          periodSeconds: 1
          successThreshold: 1
      volumes:
      - name: probe-volume
        emptyDir: 
          medium: Memory
          sizeLimit: 1Gi

和上一节流程类似
研发工程师玩转Kubernetes——启动、存活和就绪探针_第2张图片
其执行事件如下:

Events:
  Type     Reason     Age              From               Message
  ----     ------     ----             ----               -------
  Normal   Scheduled  13s              default-scheduler  Successfully assigned default/startup-readiness-deployment-64cbcc9659-k7m5v to ubuntuc
  Normal   Pulling    13s              kubelet            Pulling image "busybox"
  Normal   Pulled     11s              kubelet            Successfully pulled image "busybox" in 2.10831058s (2.10831728s including waiting)
  Normal   Created    11s              kubelet            Created container startup-readiness-container
  Normal   Started    11s              kubelet            Started container startup-readiness-container
  Warning  Unhealthy  5s (x4 over 8s)  kubelet            Startup probe failed: cat: can't open '/tempdir/ready': No such file or directory
  Warning  Unhealthy  2s (x3 over 4s)  kubelet            Readiness probe failed: cat: can't open '/tempdir/readiness': No such file or directory

这次readiness检测到第4次时才认定状态为success。
上述两个实验可以证明:启动探针(Startup Probe)检测状态是success后,存活(Liveness Probe)和就绪探针(Readiness Probe)才开始检测。
研发工程师玩转Kubernetes——启动、存活和就绪探针_第3张图片

存活和就绪探针

# liveness_readiness.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: liveness-readiness-deployment
spec:
  selector:
    matchLabels:
      app: liveness-readiness
  template:
    metadata:
      labels:
        app: liveness-readiness
    spec:
      containers:
      - name: liveness-readiness-container
        image: busybox
        command: ["/bin/sh", "-c", "sleep 3; touch /tempdir/keepalive; sleep 3;touch /tempdir/readiness; while true; do sleep 5; done"]
        volumeMounts:
        - name:  probe-volume
          mountPath:  /tempdir
        livenessProbe:
          exec:
            command:
            - cat
            - /tempdir/keepalive
          initialDelaySeconds: 3
          failureThreshold: 6
          periodSeconds: 1
          successThreshold: 1
        readinessProbe:
          exec:
            command:
            - cat
            - /tempdir/readiness
          failureThreshold: 6
          periodSeconds: 1
          successThreshold: 1
      volumes:
      - name: probe-volume
        emptyDir: 
          medium: Memory
          sizeLimit: 1Gi

通过Pod的Event可以看到,Liveness和Readiness Probe的生命长度一致(如下图都是6秒)。

Events:
  Type     Reason     Age              From               Message
  ----     ------     ----             ----               -------
  Normal   Scheduled  10s              default-scheduler  Successfully assigned default/liveness-readiness-deployment-f6db88747-znxsm to ubuntub
  Normal   Pulling    10s              kubelet            Pulling image "busybox"
  Normal   Pulled     8s               kubelet            Successfully pulled image "busybox" in 2.092699902s (2.092706902s including waiting)
  Normal   Created    8s               kubelet            Created container liveness-readiness-container
  Normal   Started    8s               kubelet            Started container liveness-readiness-container
  Warning  Unhealthy  5s (x2 over 6s)  kubelet            Liveness probe failed: cat: can't open '/tempdir/keepalive': No such file or directory
  Warning  Unhealthy  4s (x4 over 6s)  kubelet            Readiness probe failed: cat: can't open '/tempdir/readiness': No such file or directory

探针流程

研发工程师玩转Kubernetes——启动、存活和就绪探针_第4张图片

你可能感兴趣的:(kubernetes,kubernetes,容器,云原生)