k8s 探针

在Kubernetes中可以通过探针配置运行状况检查,以确定每个 Pod 的状态。

一、概述

在k8s中只要将pod调度到某个节点,Kubelet就会运行pod的容器,如果该pod的容器有一个或者所有的都终止运行(容器的主进程崩溃),Kubelet将重启容器,所以即使应用程序本身没有做任何特殊的事,在Kubemetes中运行也能自动获得自我修复的能力。

自动重启容器以保证应用的正常运行,这是使用Kubernetes的优势,不过在某些情况,即使进程没有崩溃,有时应用程序运行也会出错。默认情况下Kubernetes只是检查Pod容器是否正常运行,但容器正常运行并不一定代表应用健康。

  • Kubemetes可以通过存活探针(liveness probe)检查容器是否还在运行;
  • 通过就绪探针(readiness probe)保证只有准备好了请求的Pod才能接收客户端请求。

二、探针介绍

K8S 提供了3种探针

  • readinessProbe
指示容器是否准备好服务请求(是否启动完成并就绪)。就绪探针初始延迟之前的就绪状态默认为Failure,待容器启动成功弹指指标探测结果为成功后,状态变更为 Success。如果未配置就绪探针,则默认状态为Success。
只有状态为 Success ,才会被纳入 pod 所属 service ,添加到endpoint中,也就是 service 接收到请求后才有可能会被分发处理请求。
如果ReadinessProbe探针检测到失败,则Pod的状态被修改。Endpoint Controller将从Service的Endpoint中删除包含该容器所在Pod的Endpoint。
  • livenessProbe
用于判断容器是否存活(running状态),如果LivenessProbe探针探测到容器不健康(你可以配置连续多少次失败才记为不健康),则 kubelet 会杀掉该容器,并根据容器的重启策略restartPolicy做相应的处理。如果未配置存活探针,则默认状态为Success。即探针返回的值永远是 Success。
  • startupProbe:
判断容器内的应用程序是否已启动。如果配置了启动探测,在则在启动探针状态为 Succes 之前,其他所有探针都处于无效状态,直到它成功后其他探针才起作用。如果启动探测失败,kubelet 将杀死容器,容器将服从其重启策略。如果容器没有配置启动探测,则默认状态为 Success。

容器重启策略restartPolicy有三个可选值:
Always:当容器终止退出后,总是重启容器,默认策略。
OnFailure:当容器异常退出(退出状态码非0)时,才重启容器。
Never:当容器终止退出,从不重启容器。

以上三种探针都具有以下参数:

initialDelaySeconds :启动 liveness、readiness 探针前要等待的秒数。默认是0
periodSeconds :检查探针的频率。默认是1
timeoutSeconds :检测超时时间,默认是1
successThreshold :探针需要通过的最小连续成功检查次数。通过为成功,默认是1
failureThreshold :将探针标记为失败之前的重试次数。对于 liveness 探针,这将导致 Pod 重新启动。对于 readiness 探针,将标记 Pod 为未就绪(unready)。默认是1

三、探针探测方式

每种探测机制支持三种健康检查方法,分别是命令行exec,httpGet和tcpSocket,其中exec通用性最强,适用与大部分场景,tcpSocket适用于TCP业务,httpGet适用于web业务

exec(自定义健康检查):在容器中执行指定的命令,如果执行成功,退出码为 0 则探测成功。
httpGet:通过容器的IP地址、端口号及路径调用 HTTP Get方法,如果响应的状态码大于等于200且小于400,则认为容器 健康。
tcpSocket:通过容器的 IP 地址和端口号执行 TCP 检 查,如果能够建立 TCP 连接,则表明容器健康。

探针探测结果有以下值:

  • Success:表示通过检测。
  • Failure:表示未通过检测。
  • Unknown:表示检测没有正常进行。

四、探针区别

readinessProbelivenessProbe 可以使用相同探测方式,只是对 Pod 的处置方式不同。

livenessProbe 当检测失败后,将杀死容器并根据 Pod 的重启策略来决定作出对应的措施

readinessProbe 当检测失败后,将 Pod 的 IP:Port 从对应的 EndPoint 列表中删除,不在接收流量

五、案例

启动pod,使用livenessProbe以及readinessProbe

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  namespace: test
spec:
  containers:
    - name: nginx
      image: nginx:latest
      livenessProbe:
        httpGet:
          path: /health
          port: 80
        initialDelaySeconds: 30
        periodSeconds: 10
      readinessProbe:
        tcpSocket:
          port: 8080
        initialDelaySeconds: 5
        periodSeconds: 5

kubectl get po -n test

[root@master ~]# kubectl get po  -n test  -o wide 
NAME    READY   STATUS    RESTARTS   AGE   IP             NODE    NOMINATED NODE   READINESS GATES
nginx   1/1     Running   0          19s   10.233.90.14   node1              


####nginx日志如下:每隔10S检测一下状态
[root@master ~]# kubectl logs nginx -n test  
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2023/04/03 06:30:29 [notice] 1#1: using the "epoll" event method
2023/04/03 06:30:29 [notice] 1#1: nginx/1.23.3
2023/04/03 06:30:29 [notice] 1#1: built by gcc 10.2.1 20210110 (Debian 10.2.1-6) 
2023/04/03 06:30:29 [notice] 1#1: OS: Linux 3.10.0-1127.el7.x86_64
2023/04/03 06:30:29 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2023/04/03 06:30:29 [notice] 1#1: start worker processes
2023/04/03 06:30:29 [notice] 1#1: start worker process 30
2023/04/03 06:30:29 [notice] 1#1: start worker process 31
2023/04/03 06:30:29 [notice] 1#1: start worker process 32
2023/04/03 06:30:29 [notice] 1#1: start worker process 33
192.168.5.227 - - [03/Apr/2023:06:31:08 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.25" "-"
192.168.5.227 - - [03/Apr/2023:06:31:18 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.25" "-"
192.168.5.227 - - [03/Apr/2023:06:31:28 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.25" "-"
192.168.5.227 - - [03/Apr/2023:06:31:38 +0000] "GET / HTTP/1.1" 200 615 "-" "kube-probe/1.25" "-"

测试异常情况:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  namespace: test
spec:
  nodeName: node1
  restartPolicy:  Always
  containers:
  - name: count
    image: nginx:latest
    imagePullPolicy: IfNotPresent
    livenessProbe:
      httpGet:
        port: 81    ####修改端口为81 
      initialDelaySeconds: 30
      periodSeconds: 10
    readinessProbe:
      tcpSocket:
        port: 80
      initialDelaySeconds: 5
      periodSeconds: 5

启动pod后,查看pod日志信息,如下:

2023/04/03 06:13:44 [notice] 32#32: gracefully shutting down
2023/04/03 06:13:44 [notice] 33#33: gracefully shutting down
2023/04/03 06:13:44 [notice] 32#32: exiting
2023/04/03 06:13:44 [notice] 33#33: exiting
2023/04/03 06:13:44 [notice] 33#33: exit
2023/04/03 06:13:44 [notice] 32#32: exit
2023/04/03 06:13:44 [notice] 31#31: gracefully shutting down
2023/04/03 06:13:44 [notice] 31#31: exiting
2023/04/03 06:13:44 [notice] 31#31: exit
2023/04/03 06:13:45 [notice] 30#30: gracefully shutting down
2023/04/03 06:13:45 [notice] 30#30: exiting
2023/04/03 06:13:45 [notice] 30#30: exit
2023/04/03 06:13:45 [notice] 1#1: signal 17 (SIGCHLD) received from 33
2023/04/03 06:13:45 [notice] 1#1: worker process 32 exited with code 0
2023/04/03 06:13:45 [notice] 1#1: worker process 33 exited with code 0
2023/04/03 06:13:45 [notice] 1#1: signal 29 (SIGIO) received
2023/04/03 06:13:45 [notice] 1#1: signal 17 (SIGCHLD) received from 32
2023/04/03 06:13:45 [notice] 1#1: signal 17 (SIGCHLD) received from 31
2023/04/03 06:13:45 [notice] 1#1: worker process 31 exited with code 0
2023/04/03 06:13:45 [notice] 1#1: signal 29 (SIGIO) received
2023/04/03 06:13:45 [notice] 1#1: signal 17 (SIGCHLD) received from 30
2023/04/03 06:13:45 [notice] 1#1: worker process 30 exited with code 0
2023/04/03 06:13:45 [notice] 1#1: exit

kubectl get po -n test查看pod

[root@master ~]# kubectl get po  -n test -o wide 
NAME    READY   STATUS             RESTARTS      AGE     IP             NODE    NOMINATED NODE   READINESS GATES
nginx   0/1     CrashLoopBackOff   5 (27s ago)   6m28s   10.233.90.12   node1              

[root@master ~]# kubectl describe po nginx -n test    ####describe查看pod信息如下,提示Liveness probe failed 

Name:             nginx
Namespace:        test
Priority:         0
Service Account:  default
Node:             node1/192.168.5.227
Start Time:       Mon, 03 Apr 2023 14:07:44 +0800
Labels:           
Annotations:      cni.projectcalico.org/containerID: fc89567be28e2010a6f0997482489b65649ac2136c92c603d5a5537dc2e7efbd
                  cni.projectcalico.org/podIP: 10.233.90.12/32
                  cni.projectcalico.org/podIPs: 10.233.90.12/32
Status:           Running
IP:               10.233.90.12
IPs:
  IP:  10.233.90.12
Containers:
  count:
    Container ID:   containerd://10de8e85609ac2d33cac17fa995a7a1c88f318ef10ddbb6086898fb32b216bb6
    Image:          nginx:latest
    Image ID:       docker.io/library/nginx@sha256:aa0afebbb3cfa473099a62c4b32e9b3fb73ed23f2a75a65ce1d4b4f55a5c2ef2
    Port:           
    Host Port:      
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 03 Apr 2023 14:18:51 +0800
      Finished:     Mon, 03 Apr 2023 14:19:45 +0800
    Ready:          False
    Restart Count:  7
    Liveness:       http-get http://:81/ delay=30s timeout=1s period=10s #success=1 #failure=3
    Readiness:      tcp-socket :80 delay=5s timeout=1s period=5s #success=1 #failure=3
    Environment:    
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zpsdf (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-zpsdf:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                     From     Message
  ----     ------     ----                    ----     -------
  Normal   Killing    10m (x3 over 12m)       kubelet  Container count failed liveness probe, will be restarted
  Normal   Pulled     10m (x4 over 13m)       kubelet  Container image "nginx:latest" already present on machine
  Normal   Started    10m (x4 over 13m)       kubelet  Started container count
  Warning  Unhealthy  9m57s (x10 over 12m)    kubelet  Liveness probe failed: Get "http://10.233.90.12:81/": dial tcp 10.233.90.12:81: connect: connection refused
  Normal   Created    8m36s (x6 over 13m)     kubelet  Created container count
  Warning  BackOff    3m22s (x19 over 7m36s)  kubelet  Back-off restarting failed container
[root@master ~]# 

你可能感兴趣的:(k8s,k8s)