Kubernetes学习之Pod探测

一、Pod存活性探测
  有不少的应用长时间持续运行后逐渐转为不可用状态,并且仅能通过重启操作来恢复,Kubernetes的容器存活性探测机制可发现诸如此类的问题,并根据探测结果结合重启策略触发后续的行为。存活性探测是隶属于容器级别的配置,kubelet可基于它判定何时需要重启一个容器,该诊断操作由容器的处理器(handler)进行定义的。Kubernetes支持三种处理器用于Pod的探测:
  1)ExecAction:在容器中执行一个命令,并根据其返回的状态码进行诊断的操作称为Exec探测,状态码为0表示探测诊断成功,否则Pod即为不健康状态
  2)TCPSocketAction:通过与容器的某TCP端口尝试建立连接进行诊断,端口能够成功打开即为正常状态,否则为不健康状态
  3)HTTPGetAction:通过向容器IP地址的某个指定端口的指定path发起HTTP GET请求进行诊断,响应码为2XX或3XX时即为成功,否则为失败
Kubernetes学习之Pod探测_第1张图片
  存活性探测:用于判定容器是否处于"运行"状态;一旦此类检测未通过,kubelet将杀死容器并根据其restartPolicy决定是否将其重启;未定义存活性检测的容器默认状态为"Success"
  就绪性探测:用于判断容器是否准备就绪并可对外提供服务;未通过检测的容器意味着其尚未准备就绪,端点控制器(如Service对象)会将其IP从所有匹配到此Pod对象的Service对象的端点列表中移除;检测通过之后,会再次将其IP添加至端点列表中

二、容器的重启策略
  容器程序发生崩溃或容器申请超出限制等原因都可能会导致Pod对象的终止,此时是否应该重建该Pod对象则取决于其重启策略(restartPolicy)属性的定义:
  1)Always:但凡Pod对象终止就将其重启,此为默认设定
  2)OnFailure:仅在Pod对象出现错误时方才将其重启
  3)Never:从不重启
  需要注意的是,restartPolicy适用于Pod对象中的所有容器,而且它仅用于控制在同一节点上重新启动Pod对象的相关容器。首次需要重启的容器,将在其需要时立即进行重启,随后再次需要重启的操作将由kubelet延迟一段时间后进行,且反复的重启操作的延迟时长依次是10秒、20秒、40秒、80秒、160秒和300秒,300秒是最大延迟。事实上,一旦绑定到一个节点,Pod对象将永远不会被重新绑定到另一个节点上,它要么被重启,要么终止,直到节点发生故障或者被删除。

三、Exec探针
  exec类型的探针通过在目标容器中执行由用户自定义的命令来判断容器的健康状态,若命令状态返回值为0则表示"成功"通过检测,其它值均为"失败"状态。

1)编写exec探针yaml文件

]# cat exec.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: exec-pod
  labels: 
    test: liveness-exec
spec:
  containers:
  - name: liveness-exec-demo
    image: busybox
    imagePullPolicy: IfNotPresent
    args: ["/bin/sh","-c"," touch /tmp/healthy; sleep 60; rm -rf /tmp/healthy; sleep 600"]
    livenessProbe:
      exec:
        command: ["test","-e","/tmp/healthy"]

]# kubectl apply -f exec.yaml 
pod/exec-pod created

2)检测Pod内文件是否存在

]# kubectl get pods -o wide 
NAME       READY   STATUS    RESTARTS   AGE   IP            NODE    NOMINATED NODE   READINESS GATES
exec-pod   1/1     Running   0          12s   10.244.1.42   node1   <none>           <none>

]# kubectl exec exec-pod -it -- /bin/sh
/ # ls /tmp/ -l 
total 0
-rw-r--r--    1 root     root             0 Aug  1 11:21 healthy
/ # exit

此处可以看到容器中是存在此文件的

3)再次查看Pod状态

]# kubectl get pods -o wide 
NAME       READY   STATUS    RESTARTS   AGE     IP            NODE    NOMINATED NODE   READINESS GATES
exec-pod   1/1     Running   1          2m37s   10.244.1.42   node1   <none>           <none>

# kubectl exec exec-pod -it -- /bin/sh
/ # ls -l /tmp/
total 0

过了60秒之后,再次查看Pod的状态发现该Pod已经重启了一次,说明Pod的存活性检测失败了,再进入Pod中查看文件已经被删除了

4)查看Pod的详细信息

]# kubectl describe pods exec-pod
Name:         exec-pod
Namespace:    default
Priority:     0
Node:         node1/172.16.2.101
Start Time:   Sat, 01 Aug 2020 19:21:03 +0800
Labels:       test=liveness-exec
Annotations:  Status:  Running
IP:           10.244.1.42
IPs:
  IP:  10.244.1.42
Containers:
  liveness-exec-demo:
    Container ID:  docker://e06871bd25c2b0d556821a2bd87de0e2f4862bb43bb90cb2b7e5fe2b6b740772
    Image:         busybox
    Image ID:      docker-pullable://busybox@sha256:4f47c01fa91355af2865ac10fef5bf6ec9c7f42ad2321377c21e844427972977
    Port:          <none>
    Host Port:     <none>
    Args:
      /bin/sh
      -c
       touch /tmp/healthy; sleep 60; rm -rf /tmp/healthy; sleep 600
    State:          Running
      Started:      Sat, 01 Aug 2020 19:24:57 +0800
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Sat, 01 Aug 2020 19:22:57 +0800
      Finished:     Sat, 01 Aug 2020 19:24:56 +0800
    Ready:          True
    Restart Count:  2
    Liveness:       exec [test -e /tmp/healthy] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-47pch (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-47pch:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-47pch
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                            From               Message
  ----     ------     ----                           ----               -------
  Normal   Scheduled  <unknown>                      default-scheduler  Successfully assigned default/exec-pod to node1
  Warning  Unhealthy  <invalid> (x6 over <invalid>)  kubelet, node1     Liveness probe failed:
  Normal   Killing    <invalid> (x2 over <invalid>)  kubelet, node1     Container liveness-exec-demo failed liveness probe, will be restarted
  Normal   Pulled     <invalid> (x3 over <invalid>)  kubelet, node1     Container image "busybox" already present on machine
  Normal   Created    <invalid> (x3 over <invalid>)  kubelet, node1     Created container liveness-exec-demo
  Normal   Started    <invalid> (x3 over <invalid>)  kubelet, node1     Started container liveness-exec-demo

从Pod的详细信息中我们发现容器上一次的状态是退出的,并且错误码为137,在events中可以看到,容器执行livenessprobe探测失败了,然后该容器被执行了重启操作

四、HTTP探针
  基于HTTP的探测(HTTPGetAction)向目标容器发起一个HTTP请求,根据其响应码进行结果判定,响应码形如2XX或3XX时表示检测通过,否则说明检测失败,容器会执行默认的重启策略。
  HTTP探测的可用配置字段
  host:请求的主机地址,默认为PodIP地址,也可以在httpHeaders中使用"Host:"头部来定义访问的主机
  port:请求的端口,必须字段
  httpHeaders<[]Object>:自定义的请求报文首部
  path:请求的HTTP资源路径,即URL path
  scheme:建立连接使用的协议,仅可以是HTTP或HTTPS,默认为HTTP协议

1)编写HTTP探测的yaml文件

]# cat http.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-http
spec:
  containers:
  - name: liveness-http-demo
    image: ikubernetes/myapp:v1
    imagePullPolicy: IfNotPresent
    ports:
    - name: http
      containerPort: 80
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh","-c"," echo Healthy > /usr/share/nginx/html/healthy.html"]
    livenessProbe:
      httpGet:
        path: /healthy.html
        port: http
        scheme: HTTP

]# kubectl apply -f http.yaml 
pod/liveness-http created

2)查看Pod文件是否存在

]# kubectl get pods -o wide
NAME            READY   STATUS    RESTARTS   AGE   IP            NODE    NOMINATED NODE   READINESS GATES
liveness-http   1/1     Running   0          39s   10.244.1.44   node1   <none>           <none>

]# kubectl exec liveness-http -it -- /bin/sh
/ # ls -l /usr/share/nginx/html/healthy.html 
-rw-r--r--    1 root     root             8 Aug  1 11:44 /usr/share/nginx/html/healthy.html
/ # cat /usr/share/nginx/html/healthy.html 
Healthy

3)删除文件

/ # rm /usr/share/nginx/html/healthy.html 
/ # exit

4)再次查看Pod状态信息

]# kubectl get pods -o wide 
NAME            READY   STATUS    RESTARTS   AGE    IP            NODE    NOMINATED NODE   READINESS GATES
liveness-http   1/1     Running   1          3m4s   10.244.1.44   node1   <none>           <none>

可以看到Pod已经被重启了一次

5)查看Pod详细信息

]# kubectl describe pods liveness-http
Name:         liveness-http
Namespace:    default
Priority:     0
Node:         node1/172.16.2.101
Start Time:   Sat, 01 Aug 2020 19:44:54 +0800
Labels:       test=liveness
Annotations:  Status:  Running
IP:           10.244.1.44
IPs:
  IP:  10.244.1.44
Containers:
  liveness-http-demo:
    Container ID:   docker://3549c5a13a1448260c00e138676c475b26c75fbf3417fe44ef546b3b89014037
    Image:          ikubernetes/myapp:v1
    Image ID:       docker-pullable://ikubernetes/myapp@sha256:9c3dc30b5219788b2b8a4b065f548b922a34479577befb54b03330999d30d513
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sat, 01 Aug 2020 19:47:43 +0800
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sat, 01 Aug 2020 19:44:54 +0800
      Finished:     Sat, 01 Aug 2020 19:47:42 +0800
    Ready:          True
    Restart Count:  1
    Liveness:       http-get http://:http/healthy.html delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-47pch (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-47pch:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-47pch
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                            From               Message
  ----     ------     ----                           ----               -------
  Normal   Scheduled  <unknown>                      default-scheduler  Successfully assigned default/liveness-http to node1
  Normal   Pulled     <invalid> (x2 over <invalid>)  kubelet, node1     Container image "ikubernetes/myapp:v1" already present on machine
  Warning  Unhealthy  <invalid> (x3 over <invalid>)  kubelet, node1     Liveness probe failed: HTTP probe failed with statuscode: 404
  Normal   Killing    <invalid>                      kubelet, node1     Container liveness-http-demo failed liveness probe, will be restarted
  Normal   Created    <invalid> (x2 over <invalid>)  kubelet, node1     Created container liveness-http-demo
  Normal   Started    <invalid> (x2 over <invalid>)  kubelet, node1     Started container liveness-http-demo

从Pod的详细信息中我们发现容器上一次的状态是退出的,并且重启了一次,在events中可以看到,容器执行livenessprobe探测失败了,HTTP探测的返回状态码为404,然后该容器被执行了重启操作

五、TCP探针
  基于TCP的存活性探测(TCPSocketAction)用于向容器的特定端口发起TCP请求并尝试建立连接进行结果判定,连接建立成功即为通过检测;相比较来说,它比基于HTTP的探测要更高效、更节约资源,但是精准度略低,毕竟连接建立成功未必意味着页面资源可访问。TCP探测主要包含以下可用的字段属性:
  1)host:请求连接的目标IP地址,默认为PodIP
  2)port:请求连接的目标端口,必选字段
  
  存活性探测行为属性
   initialDelaySeconds:存活性探测延迟时长,即容器启动多久之后再开始第一次探测操作,显示为delay属性;默认为0秒,即容器启动后立刻便开始进行探测;
   timeoutSeconds:存活性探测的超时时长,显示为timeout属性,默认为1s,最小值也为1s;
   periodSeconds:存活性探测的频度,显示为period属性,默认为10s,最小值为1s;过高的频率会对Pod对象带来较大的额外开销,而过低的频率又会使得对错误的反应不及时;
   successThreshold:处于失败状态时,探测操作至少需要连续多少次的成功才能被认为是通过检测,显示为#success属性,默认值为1,最小值也为1;
   failureThreshold:处于成功状态时,探测操作至少连续多少次的失败才被视为是检测不通过,显示为#failure属性,默认为3,最小值为1;

1)编写TCP探针yaml文件

# cat tcp.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-tcp
spec:
  containers:
  - name: liveness-tcp-demo
    image: ikubernetes/myapp:v1
    imagePullPolicy: IfNotPresent
    ports:
    - name: http
      containerPort: 80
    livenessProbe:
      tcpSocket:
        port: http

]# kubectl apply -f tcp.yaml 
pod/liveness-tcp created

2)查看Pod信息

]# kubectl get pods -o wide 
NAME           READY   STATUS    RESTARTS   AGE   IP            NODE    NOMINATED NODE   READINESS GATES
liveness-tcp   1/1     Running   0          8s    10.244.1.45   node1   <none>           <none>

]# kubectl exec liveness-tcp -it -- /bin/sh
/ # netstat -tunlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      1/nginx: master pro

3)停止Pod的80端口

/ # nginx -s stop 
2020/08/01 12:02:12 [notice] 13#13: signal process started
/ # command terminated with exit code 137

4)查看Pod信息

]# kubectl get pods -o wide 
NAME           READY   STATUS    RESTARTS   AGE     IP            NODE    NOMINATED NODE   READINESS GATES
liveness-tcp   1/1     Running   1          3m16s   10.244.1.45   node1   <none>           <none>

可以看到Pod已经被重启了一次,当然这个重启并不是TCP探针检测失败的重启,因为应用退出了容器也就自然会结束掉

六、就绪型探测
  Pod对象启动后,容器应用通常需要一段时间才能完成其初始化过程,例如加载配置文件,甚至有些程序需要运行某类的预热过程,若在此阶段完成之前接入客户端的请求,势必会因为等待太久而影响用户的体验。因此,应该避免Pod对象启动后立即让其处理客户端请求,而是等待容器初始化工作完成并转为"就绪"状态,尤其是存在其他提供相同服务的Pod对象的场景更是如此。
  与存活性探测机制类似,就绪型探测是用来判断容器就绪与否的周期性(默认周期是10秒钟)操作,它用于探测容器是否已经初始化完成并服务于客户端请求,探测操作返回"success"状态时,即为传递容器已经"就绪"的信号。
  与存活性探测机制相同,就绪型探测也支持Exec、HTTP GET和TCP Socket三种探测方式,且各自定义的机制也都相同。但与存活性探测触发的操作不同的是,探测失败时,就绪型探测不会杀死或重启容器以保证其健康性,而是通知其尚未就绪,并触发依赖于其就绪状态的操作(例如,从Service对象中移除此Pod对象)以确保不会有客户端请求接入此Pod对象;不过,即便实在运行过程中,Pod就绪性探测依然有其价值所在,例如Pod A依赖到的Pod B因为网络故障等原因而不可用时,Pod A的服务应该转为未就绪状态,以免无法向客户端提供完整的响应。
Kubernetes学习之Pod探测_第2张图片
1)编写exec探测方式的yaml文件

]# cat exec-readiness.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: exec-pod
  labels: 
    test: readiness-exec
spec:
  containers:
  - name: readiness-exec-demo
    image: busybox
    imagePullPolicy: IfNotPresent
    args: ["/bin/sh","-c","while true;do rm -f /tmpo/ready; sleep 30; touch /tmp/ready; sleep 300;done"]
    readinessProbe:
      exec:
        command: ["test","-e","/tmp/ready"]
      initialDelaySeconds: 5
      periodSeconds: 5

]# kubectl apply -f exec-readiness.yaml 
pod/exec-pod created

2)查看Po状态

]# kubectl get pods -o wide 
NAME       READY   STATUS    RESTARTS   AGE   IP            NODE    NOMINATED NODE   READINESS GATES
exec-pod   0/1     Running   0          7s    10.244.1.46   node1   <none>           <none>

# kubectl exec exec-pod -it -- /bin/sh
/ # ls /tmp/
ready

3)删除文件

/ # rm /tmp/ready 
/ # exit

4)再次查看Pod详细状态

]# kubectl get pods -o wide 
NAME       READY   STATUS    RESTARTS   AGE     IP            NODE    NOMINATED NODE   READINESS GATES
exec-pod   0/1     Running   0          3m36s   10.244.1.46   node1   <none>           <none>

]# kubectl describe pods exec-pod 
Name:         exec-pod
Namespace:    default
Priority:     0
Node:         node1/172.16.2.101
Start Time:   Sat, 01 Aug 2020 20:38:33 +0800
Labels:       test=readiness-exec
Annotations:  Status:  Running
IP:           10.244.1.46
IPs:
  IP:  10.244.1.46
Containers:
  readiness-exec-demo:
    Container ID:  docker://cd1b74a31ad2d0577a9bad6577a6fae620430a9ab5256a271aa908c4562fbd9f
    Image:         busybox
    Image ID:      docker-pullable://busybox@sha256:4f47c01fa91355af2865ac10fef5bf6ec9c7f42ad2321377c21e844427972977
    Port:          <none>
    Host Port:     <none>
    Args:
      /bin/sh
      -c
      while true;do rm -f /tmpo/ready; sleep 30; touch /tmp/ready; sleep 300;done
    State:          Running
      Started:      Sat, 01 Aug 2020 20:38:34 +0800
    Ready:          True
    Restart Count:  0
    Readiness:      exec [test -e /tmp/ready] delay=5s timeout=1s period=5s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-47pch (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-47pch:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-47pch
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                            From               Message
  ----     ------     ----                           ----               -------
  Normal   Scheduled  <unknown>                      default-scheduler  Successfully assigned default/exec-pod to node1
  Normal   Pulled     <invalid>                      kubelet, node1     Container image "busybox" already present on machine
  Normal   Created    <invalid>                      kubelet, node1     Created container readiness-exec-demo
  Normal   Started    <invalid>                      kubelet, node1     Started container readiness-exec-demo
  Warning  Unhealthy  <invalid> (x5 over <invalid>)  kubelet, node1     Readiness probe failed:

可以看到,文件被删除了Readiness探测失败了,但是容器并没有重启;
Kubernetes学习之Pod探测_第3张图片

你可能感兴趣的:(Kubernetes学习)