kubernetes -- Pod健康检查

目录

一、Pod探针基本概念

1、Pod状态

2、更准确的判断Pod状态

3、容器探针

4、检测结果

​编辑

二、使用存活探针

1、存活探针案例

2、Liveness探针流程

3、查看存活探针信息

4、探针高级配置

5、探针高级配置

6、存活探针 - HTTP

7、存活探针 - TCP

三、使用就绪探针

1、就绪探针

2、存活探针和就绪探针对比

3、创建HTTP服务

4、查看Endpoint状态

1. 查看服务状态,endpoints如下:

2. Pod状态如下:

3. 现在进入第一个容器,删除其中的index.html文件

5、查看故障后状态

1. 查看服务状态

2. 恢复故障Pod


一、Pod探针基本概念

1、Pod状态

1. Pod的状态信息在PodStatus中定义,其中有一个phase字段,就是我们熟悉的以下一些状态

kubernetes -- Pod健康检查_第1张图片

2. 在何种状态下的Pod可以正常提供服务? 

2、更准确的判断Pod状态

Kubernetes借助探针(Prebes)机制,

探针可以会周期性的监测容器运行的状态,返回结果

        1. Liveness 探针:存活探针。

                Liveness探针用户捕获容器的状态是否处于存活状态。

                如果探测失败,kubelet会根据重启策略尝试恢复容器

        2. Readiness探针:就绪探针。

                如果 readiness 探针探测失败,

                则kubelet认为该容器没有准备好对外提供服务

                则endpointcontroller 会从与pod匹配的所有服务的端点中删除该Pod的地址

3、容器探针

kubelet可以周期性的执行Container的诊断。

为了执行诊断,kubelet 调用 Container 实现的 Handler,有三种Handler类型

        1. ExecAction:在容器内执行指定命令,

            如果命令退出时返回码0(表示命令成功执行了),则认为诊断成功

        2. TCPSocketAction:对指定端口上的容器的ip地址进行TCP检查。

            如果端口打开,则认为诊断成功

        3. HTTPGetAction:对指定端口和路径上的容器IP地址执行HTTP Get 请求。

            如果相应的状态码 ≥ 200 且 < 400,则诊断认为是成功的

4、检测结果

kubernetes -- Pod健康检查_第2张图片

二、使用存活探针

1、存活探针案例

1. 本案例采用execaction 模式的存活探针

2. livenessProbe 字段详细定义了存活探针,包括

        - Handler 采用 exec

        - 使用方式是运行 cat /tmp/healthy 命令

        - 探测延迟和探测周期是5秒钟

$ kubectl apply -f- <

参考资料: 配置存活、就绪和启动探针 | Kubernetes

2、Liveness探针流程

kubernetes -- Pod健康检查_第3张图片

$ kubectl get pods liveness-exec     #查看pod是否存在
NAME            READY   STATUS    RESTARTS     AGE
liveness-exec   1/1     Running  `1`(5s ago)   80s

3、查看存活探针信息

使用describe 命令查看pod信息

$ kubectl describe pods liveness-exec #查看当前探针的策略
Name:         liveness-exec
Namespace:    default
Priority:     0
Node:         k8s-worker1/192.168.147.103
Start Time:   Fri, 23 Sep 2022 02:14:50 +0000
Labels:       test=liveness
Annotations:  cni.projectcalico.org/containerID: 48099a7a7855d118ee67216855ffd4caf24ee712a8082556717b8b2bfe971081
              cni.projectcalico.org/podIP: 172.16.194.125/32
              cni.projectcalico.org/podIPs: 172.16.194.125/32
Status:       Running
IP:           172.16.194.125
IPs:
  IP:  172.16.194.125
Containers:
  liveness:
    Container ID:  docker://36377afbec966d304f1e9b0ef4eaac51f3c582fe1aaf7c60698315ca46395d89
    Image:         busybox
    Image ID:      docker-pullable://busybox@sha256:5acba83a746c7608ed544dc1533b87c737a0b0fb730301639a0179f9344b1678
    Port:          
    Host Port:     
    Args:
      /bin/sh
      -c
      touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
    State:          Running
      Started:      Fri, 23 Sep 2022 02:14:51 +0000
    Ready:          True
    Restart Count:  0
    Liveness:       exec [cat /tmp/healthy] delay=5s timeout=1s period=5s #success=1 #failure=3
    Environment:    
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lrgdh (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  kube-api-access-lrgdh:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  24s   default-scheduler  Successfully assigned default/liveness-exec to k8s-worker1
  Normal  Pulling    24s   kubelet            Pulling image "busybox"
  Normal  Pulled     23s   kubelet            Successfully pulled image "busybox" in 793.091641ms
  Normal  Created    23s   kubelet            Created container liveness
  Normal  Started    23s   kubelet            Started container liveness
$ kubectl describe pods liveness-exec | grep -i liveness:.*exec #过滤策略
    Liveness:       exec [cat /tmp/healthy] delay=5s timeout=1s period=5s #success=1 #failure=3

4、探针高级配置

1. 在上一步骤中使用describe命令可以看到探针的一些策略

2. delay=5s        表示探针在容器启动后5秒开始进行第一次探测

3. timeout=1s     表示容器必须在1秒内反馈信息给探针,否则视为失败

4. period=5s       表示每5秒探针进行一次探测

5. #success=1    表示探测连续成功1次,表示成功

6. #failure=3       表示探测连续失败3次,视为Pod处于failure状态,重启容器

5、探针高级配置

高级配置参数可以在配置参数时指定,以下为配置样例。

实现的功能与之前配置的探针一致

kubernetes -- Pod健康检查_第4张图片

创建

kubectl apply -f- <

查询

$ kubectl describe pods liveness-exec3 | grep -i liveness:.*  #查看策略
  liveness:
    Liveness:       exec [cat /tmp/healthy] delay=5s timeout=3s period=5s #success=1 #failure=3

6、存活探针 - HTTP

1. HTTP方式的存活探针,通过get方法定期向容器发送http请求。

    方法中定义了请求路径、端口、请求头等信息

2. 由于探针仅在返回码 ≥200,小于400的情况下返回正常,10秒后探针检测失败,

    kubelet会重启容器

3.创建存活探针HTTP(一个不存在的)

$ kubectl apply -f- <

4. 查询

$ kubectl get pods liveness-http 
NAME            READY   STATUS    RESTARTS   AGE
liveness-http   1/1     Running   0          19s
$ kubectl describe pods liveness-http     #查看pod详细信息步骤
Name:         liveness-http
Namespace:    default
Priority:     0
Node:         k8s-worker1/192.168.147.103
Start Time:   Fri, 23 Sep 2022 02:50:33 +0000
Labels:       test=liveness
Annotations:  cni.projectcalico.org/containerID: 02fe605f301c9c1c1c5e704c2864efbf567f24a4b2239fdda0fddb608ef8122f
              cni.projectcalico.org/podIP: 172.16.194.127/32
              cni.projectcalico.org/podIPs: 172.16.194.127/32
Status:       Running
IP:           172.16.194.127
IPs:
  IP:  172.16.194.127
Containers:
  liveness:
    Container ID:  docker://ec0d01ecaeed511e3e108f037ec539006c4983e81085c1cf51178cc081bb06bd
    Image:         mirrorgooglecontainers/liveness
    Image ID:      docker-pullable://mirrorgooglecontainers/liveness@sha256:854458862be990608ad916980f9d3c552ac978ff70ceb0f90508858ec8fc4a62
    Port:          
    Host Port:     
    Args:
      /server
    State:          Running
      Started:      Fri, 23 Sep 2022 02:52:20 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Fri, 23 Sep 2022 02:51:46 +0000
      Finished:     Fri, 23 Sep 2022 02:52:03 +0000
    Ready:          True
    Restart Count:  3
    Liveness:       http-get http://:8080/healthz delay=3s timeout=1s period=3s #success=1 #failure=3   #探针类型为http-get、端口、监测的文件、参数
    Environment:    
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mwpj8 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  kube-api-access-mwpj8:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  2m7s                default-scheduler  Successfully assigned default/liveness-http to k8s-worker1    #在哪个节点创建的
  Normal   Pulled     111s                kubelet            Successfully pulled image "mirrorgooglecontainers/liveness" in 14.77337733s
  Normal   Pulled     73s                 kubelet            Successfully pulled image "mirrorgooglecontainers/liveness" in 21.122850752s
  Normal   Created    54s (x3 over 111s)  kubelet            Created container liveness
  Normal   Started    54s (x3 over 111s)  kubelet            Started container liveness
  Normal   Pulled     54s                 kubelet            Successfully pulled image "mirrorgooglecontainers/liveness" in 747.964898ms
  Warning  Unhealthy  37s (x9 over 100s)  kubelet            Liveness probe failed: HTTP probe failed with statuscode: 500  #失败,正常值为200-400,导致探针失败
  Normal   Killing    37s (x3 over 94s)   kubelet            Container liveness failed liveness probe, will be restarted    #因为不健康所以重启
  Normal   Pulling    37s (x4 over 2m6s)  kubelet            Pulling image "mirrorgooglecontainers/liveness"

7、存活探针 - TCP

1. TCP 探针检测能否建立连接。实验中部署一个telnet服务,探针探测23端口

2. TCP探针参数与HTTP探针相似

3. 创建TCP探针

$ kubectl apply -f- <

4. 查询

$ kubectl get pods ubuntu #查询状态
NAME     READY   STATUS    RESTARTS   AGE
ubuntu   1/1     Running   0          60s

$ kubectl get pods -o wide ubuntu     #查询pod的ip
NAME     READY   STATUS    RESTARTS   AGE     IP              NODE          NOMINATED NODE   READINESS GATES
ubuntu   1/1     Running   0          2m13s   172.16.194.65   k8s-worker1              

5. 测试

$ telnet 172.16.194.65
Trying 172.16.194.65...
Connected to 172.16.194.65.
Escape character is '^]'.
Ubuntu 20.04.3 LTS
ubuntu login: 

三、使用就绪探针

1、就绪探针

kubernetes -- Pod健康检查_第5张图片

存活探针与就绪探针主要区别就是关键字

1. Pod处于存活状态并不意味着可以提供服务,创建完成后,

    通常需要进行诸如准备数据、安装和运行程序等步骤,才能对外提供服务 

2. Liveness 探针指示Pod是否处于存活状态,

    readiness探针则可指示容器是否已经一切准备就绪,可以对外提供服务

2、存活探针和就绪探针对比

1. 就绪探针与存活探针一致,

    可以使用 ExecAction,TCPSocketAction,HTTPGetAction三种方法

2. 就绪探针用于检测和显示Pod是否已经准备好对外提供业务。

    在实际使用场景中,就绪探针需要和业务绑定

就绪探针 存活探针
当Pod未通过检测 等待 杀死Pod,重启一个新的Pod
服务 如果检测失败,则从endpoint中移除pod Endpoint自动更新新pod信息
作用 Pod是否准备好提供服务 Pod是否存活

3、创建HTTP服务

创建http的deployment和service,

并在其中加入就绪探针,探测是否存在index.html文件

$ kubectl apply -f- <

4、查看Endpoint状态

1. 查看服务状态,endpoints如下:

$ kubectl get deployments.apps httpd-deployment     #查看deployment
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
httpd-deployment   3/3     3            3           3m51s
$ kubectl get service httpd-svc     #查看service
NAME        TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
httpd-svc   ClusterIP   10.98.114.43           8080/TCP   4m8s
$ kubectl get endpoints | awk '(NR==1){print $0} /httpd-svc/{print $0}' #查看endpoint
NAME         ENDPOINTS                                           AGE
httpd-svc    172.16.126.6:80,172.16.194.67:80,172.16.194.68:80   8m56s

2. Pod状态如下:

$ kubectl get pod -l app=httpd -o wide     #通过标签查看pod详细信息
NAME                                READY   STATUS    RESTARTS   AGE   IP              NODE          NOMINATED NODE   READINESS GATES
httpd-deployment-564dc969bb-k4sbr   1/1     Running   0          12m   172.16.194.68   k8s-worker1              
httpd-deployment-564dc969bb-qhdd6   1/1     Running   0          12m   172.16.126.6    k8s-worker2              
httpd-deployment-564dc969bb-wxqt7   1/1     Running   0          12m   172.16.194.67   k8s-worker1              

3. 现在进入第一个容器,删除其中的index.html文件

$ kubectl exec -it httpd-deployment-564dc969bb-k4sbr -- /bin/sh    #登录pod
# rm /usr/local/apache2/htdocs/index.html    #删除索引文件

5、查看故障后状态

1. 查看服务状态

endpoints如下,其中一个pod的端口信息已被移除endpint

$ kubectl get pods -l app=httpd -o wide
NAME                                READY   STATUS    RESTARTS   AGE   IP              NODE          NOMINATED NODE   READINESS GATES
httpd-deployment-564dc969bb-k4sbr   0/1     Running   0          22m   172.16.194.68   k8s-worker1              
httpd-deployment-564dc969bb-qhdd6   1/1     Running   0          22m   172.16.126.6    k8s-worker2              
httpd-deployment-564dc969bb-wxqt7   1/1     Running   0          22m   172.16.194.67   k8s-worker1   

2. 恢复故障Pod

$ kubectl delete pods httpd-deployment-564dc969bb-k4sbr    #删除pod后、pod会重新部署新的pod
pod "httpd-deployment-564dc969bb-k4sbr" deleted

$ kubectl get pods -l app=httpd    #再次查看已经恢复正常
NAME                                READY   STATUS    RESTARTS   AGE
httpd-deployment-564dc969bb-fwvq8   1/1     Running   0          32s
httpd-deployment-564dc969bb-qhdd6   1/1     Running   0          28m
httpd-deployment-564dc969bb-wxqt7   1/1     Running   0          28m
$ kubectl get endpoints httpd-svc     #endpoint也已经恢复
NAME        ENDPOINTS                                           AGE
httpd-svc   172.16.126.6:80,172.16.194.67:80,172.16.194.69:80   31m

你可能感兴趣的:(k8s,kubernetes,容器,云原生)