k8s Trouble Shooting 故障排除

    本文要讲的是k8s的故障排除,比较浅,最近刚入门。主要涵盖的内容是查看k8s对象的当前运行时信息;对于服务、容器的问题是如何诊断的;对于某些复杂的问题例如pod调度问题是如何排查的。

1、查看系统的Event事件

    在对象资源(pod,service,RC,node,namespace,deployment等)运行有问题时,例如pod创建后没有成功运行,都应该查看k8s对象的当前运行时信息,特别是与对象关联的Event事件。这些事件记录了相关主题、发生时段、最近发生时间、发生次数和时间原因等。

    k8s提供一下命令来查看对象运行状态:

kubectl describe pod xxxx
kubectl describe node xxxx

结果如下:
 

[root@centos ~]# kubectl  get pod
NAME                    READY   STATUS    RESTARTS   AGE
curl-5f8bff6547-rb4qk   1/1     Running   2          3d14h
redis-master-7j8cm      1/1     Running   2          3d14h
webapp-j7gd2            1/1     Running   3          3d21h
webapp-kzrn7            1/1     Running   3          3d14h
[root@centos ~]# kubectl describe pod webapp-j7gd2 
Name:               webapp-j7gd2
Namespace:          default
Priority:           0
PriorityClassName:  
Node:               node3/192.168.195.138
Start Time:         Mon, 08 Apr 2019 13:19:25 +0800
Labels:             app=webapp
Annotations:        
Status:             Running
IP:                 10.244.1.35
Controlled By:      ReplicationController/webapp
Containers:
  webapp:
    Container ID:   docker://e4dd5ec51e4d05456bd1605459a252085ad092c6be26e2becd5301114a470a33
    Image:          tomcat:9-jre8-alpine
    Image ID:       docker-pullable://tomcat@sha256:67fc2a0a54f9dfa7abda85a2900d721a55115dcae8ca7da560e65d15ca4c8aa7
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Thu, 11 Apr 2019 09:26:42 +0800
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Mon, 08 Apr 2019 21:52:27 +0800
      Finished:     Thu, 11 Apr 2019 09:25:55 +0800
    Ready:          True
    Restart Count:  3
    Environment:    
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-nx72w (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-nx72w:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-nx72w
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          

最后一行的event信息比较难重要,我这个pod是没有问题的,所以没啥信息,如果你的pod有一场的话,这边是会有错误信息的。然后错误信息是英文的,你一看就知道是什么问题。一般是镜像拉不到啥的,没有可用的node等等。如果你的pod是在某个namespace下的,不是default命名空间下的,那就需要用一下命令来指定命名空间:

kubectl describe pod xxx -n 你的命名空间

2、查看容器的日志

  在需要排查容器内部应用程序生成的日志时,可以使用kubectl logs 命令,例如:

[root@centos ~]# kubectl  get pod
NAME                    READY   STATUS    RESTARTS   AGE
curl-5f8bff6547-rb4qk   1/1     Running   2          3d14h
redis-master-7j8cm      1/1     Running   2          3d14h
webapp-j7gd2            1/1     Running   3          3d21h
webapp-kzrn7            1/1     Running   3          3d14h
[root@centos ~]# kubectl logs webapp-j7gd2 
11-Apr-2019 01:26:45.108 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version name:   Apache Tomcat/9.0.17
11-Apr-2019 01:26:45.145 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server built:          Mar 13 2019 15:55:27 UTC
11-Apr-2019 01:26:45.146 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version number: 9.0.17.0
11-Apr-2019 01:26:45.146 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Name:               Linux
11-Apr-2019 01:26:45.146 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Version:            3.10.0-957.el7.x86_64
11-Apr-2019 01:26:45.146 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Architecture:          amd64
11-Apr-2019 01:26:45.146 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Java Home:             /usr/lib/jvm/java-1.8-openjdk/jre
11-Apr-2019 01:26:45.147 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Version:           1.8.0_201-b08
11-Apr-2019 01:26:45.147 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Vendor:            Oracle Corporation
11-Apr-2019 01:26:45.147 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_BASE:         /usr/local/tomcat
11-Apr-2019 01:26:45.147 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_HOME:         /usr/local/tomcat
11-Apr-2019 01:26:45.148 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties
11-Apr-2019 01:26:45.148 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
11-Apr-2019 01:26:45.148 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djdk.tls.ephemeralDHKeySize=2048
11-Apr-2019 01:26:45.149 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.protocol.handler.pkgs=org.apache.catalina.webresources
11-Apr-2019 01:26:45.149 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dorg.apache.catalina.security.SecurityListener.UMASK=0027
11-Apr-2019 01:26:45.150 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dignore.endorsed.dirs=
11-Apr-2019 01:26:45.150 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcatalina.base=/usr/local/tomcat
11-Apr-2019 01:26:45.150 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcatalina.home=/usr/local/tomcat
11-Apr-2019 01:26:45.150 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.io.tmpdir=/usr/local/tomcat/temp
11-Apr-2019 01:26:45.151 INFO [main] org.apache.catalina.core.AprLifecycleListener.lifecycleEvent Loaded APR based Apache Tomcat Native library [1.2.21] using APR version [1.6.5].
11-Apr-2019 01:26:45.151 INFO [main] org.apache.catalina.core.AprLifecycleListener.lifecycleEvent APR capabilities: IPv6 [true], sendfile [true], accept filters [false], random [true].
11-Apr-2019 01:26:45.151 INFO [main] org.apache.catalina.core.AprLifecycleListener.lifecycleEvent APR/OpenSSL configuration: useAprConnector [false], useOpenSSL [true]
11-Apr-2019 01:26:45.160 INFO [main] org.apache.catalina.core.AprLifecycleListener.initializeSSL OpenSSL successfully initialized [OpenSSL 1.1.1b  26 Feb 2019]
11-Apr-2019 01:26:45.606 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["http-nio-8080"]
11-Apr-2019 01:26:45.678 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["ajp-nio-8009"]
11-Apr-2019 01:26:45.689 INFO [main] org.apache.catalina.startup.Catalina.load Server initialization in [2,071] milliseconds
11-Apr-2019 01:26:45.755 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service [Catalina]
11-Apr-2019 01:26:45.755 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet engine: [Apache Tomcat/9.0.17]
11-Apr-2019 01:26:45.777 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/ROOT]
11-Apr-2019 01:26:46.985 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/ROOT] has finished in [1,202] ms
11-Apr-2019 01:26:46.986 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/docs]
11-Apr-2019 01:26:47.071 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/docs] has finished in [86] ms
11-Apr-2019 01:26:47.080 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/examples]
11-Apr-2019 01:26:48.100 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/examples] has finished in [1,020] ms
11-Apr-2019 01:26:48.104 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/host-manager]
11-Apr-2019 01:26:48.169 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/host-manager] has finished in [65] ms
11-Apr-2019 01:26:48.169 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/manager]
11-Apr-2019 01:26:48.227 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/manager] has finished in [58] ms
11-Apr-2019 01:26:48.235 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["http-nio-8080"]
11-Apr-2019 01:26:48.302 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["ajp-nio-8009"]
11-Apr-2019 01:26:48.323 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in [2,633] milliseconds

    如果在一个pod中包含多个容器,则需要通过-c参数来指定容器的名称来进行查看,例如:

kubectl logs  -c 

当然也可以直接直用docker logs

[root@node2 ~]# docker ps | grep web
6041a63c30ea        6097ab3c4283           "catalina.sh run"        25 hours ago        Up 25 hours                             k8s_webapp_webapp-kzrn7_default_7c476613-59f4-11e9-9a41-000c29f1f0e4_3
974390ced06b        k8s.gcr.io/pause:3.1   "/pause"                 25 hours ago        Up 25 hours                             k8s_POD_webapp-kzrn7_default_7c476613-59f4-11e9-9a41-000c29f1f0e4_7
[root@node2 ~]# docker logs 6041a63c30ea
11-Apr-2019 01:26:33.432 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version name:   Apache Tomcat/9.0.17
11-Apr-2019 01:26:33.526 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server built:          Mar 13 2019 15:55:27 UTC
11-Apr-2019 01:26:33.526 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version number: 9.0.17.0
11-Apr-2019 01:26:33.526 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Name:               Linux
11-Apr-2019 01:26:33.527 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Version:            3.10.0-957.el7.x86_64
11-Apr-2019 01:26:33.527 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Architecture:          amd64
11-Apr-2019 01:26:33.527 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Java Home:             /usr/lib/jvm/java-1.8-openjdk/jre
11-Apr-2019 01:26:33.527 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Version:           1.8.0_201-b08
11-Apr-2019 01:26:33.528 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Vendor:            Oracle Corporation
11-Apr-2019 01:26:33.528 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_BASE:         /usr/local/tomcat
11-Apr-2019 01:26:33.528 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_HOME:         /usr/local/tomcat
11-Apr-2019 01:26:33.529 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties
11-Apr-2019 01:26:33.529 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
11-Apr-2019 01:26:33.529 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djdk.tls.ephemeralDHKeySize=2048
11-Apr-2019 01:26:33.529 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.protocol.handler.pkgs=org.apache.catalina.webresources
11-Apr-2019 01:26:33.530 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dorg.apache.catalina.security.SecurityListener.UMASK=0027
11-Apr-2019 01:26:33.530 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dignore.endorsed.dirs=
11-Apr-2019 01:26:33.530 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcatalina.base=/usr/local/tomcat
11-Apr-2019 01:26:33.530 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Dcatalina.home=/usr/local/tomcat
11-Apr-2019 01:26:33.530 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Djava.io.tmpdir=/usr/local/tomcat/temp
11-Apr-2019 01:26:33.531 INFO [main] org.apache.catalina.core.AprLifecycleListener.lifecycleEvent Loaded APR based Apache Tomcat Native library [1.2.21] using APR version [1.6.5].
11-Apr-2019 01:26:33.539 INFO [main] org.apache.catalina.core.AprLifecycleListener.lifecycleEvent APR capabilities: IPv6 [true], sendfile [true], accept filters [false], random [true].
11-Apr-2019 01:26:33.540 INFO [main] org.apache.catalina.core.AprLifecycleListener.lifecycleEvent APR/OpenSSL configuration: useAprConnector [false], useOpenSSL [true]
11-Apr-2019 01:26:33.565 INFO [main] org.apache.catalina.core.AprLifecycleListener.initializeSSL OpenSSL successfully initialized [OpenSSL 1.1.1b  26 Feb 2019]
11-Apr-2019 01:26:34.291 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["http-nio-8080"]
11-Apr-2019 01:26:34.374 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["ajp-nio-8009"]
11-Apr-2019 01:26:34.378 INFO [main] org.apache.catalina.startup.Catalina.load Server initialization in [3,215] milliseconds
11-Apr-2019 01:26:34.467 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service [Catalina]
11-Apr-2019 01:26:34.468 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet engine: [Apache Tomcat/9.0.17]
11-Apr-2019 01:26:34.507 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/ROOT]
11-Apr-2019 01:26:36.293 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/ROOT] has finished in [1,786] ms
11-Apr-2019 01:26:36.294 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/docs]
11-Apr-2019 01:26:36.368 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/docs] has finished in [73] ms
11-Apr-2019 01:26:36.377 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/examples]
11-Apr-2019 01:26:37.797 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/examples] has finished in [1,420] ms
11-Apr-2019 01:26:37.802 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/host-manager]
11-Apr-2019 01:26:38.031 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/host-manager] has finished in [228] ms
11-Apr-2019 01:26:38.032 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory [/usr/local/tomcat/webapps/manager]
11-Apr-2019 01:26:38.161 INFO [main] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory [/usr/local/tomcat/webapps/manager] has finished in [128] ms
11-Apr-2019 01:26:38.183 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["http-nio-8080"]
11-Apr-2019 01:26:38.244 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["ajp-nio-8009"]
11-Apr-2019 01:26:38.290 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in [3,911] milliseconds

3、查看k8s的服务日志

如果在linux系统上进行安装,并且是使用systemd系统来管理k8s服务,那么systemd的journal系统会接管服务程序的输出日志。可以使用systemd status 或者systemctl status或者journalctl查看系统服务日志:

[root@node2 ~]# systemctl status kubelet.service 
Display all 502 possibilities? (y or n)
[root@node2 ~]# systemctl status kubelet.service 
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Thu 2019-04-11 09:25:36 CST; 1 day 1h ago
     Docs: https://kubernetes.io/docs/
 Main PID: 7793 (kubelet)
    Tasks: 19
   Memory: 112.4M
   CGroup: /system.slice/kubelet.service
           └─7793 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/v...

Apr 12 09:56:44 node2 kubelet[7793]: W0412 09:56:44.886746    7793 reflector.go:270] object-"kube-system"/"kube-proxy": watch of *v1.ConfigMa... (562273)
Apr 12 09:57:46 node2 kubelet[7793]: W0412 09:57:46.933029    7793 reflector.go:270] object-"kube-system"/"kube-flannel-cfg": watch of *v1.Co... (562359)
Apr 12 10:04:45 node2 kubelet[7793]: W0412 10:04:45.828641    7793 reflector.go:270] object-"kube-system"/"coredns": watch of *v1.ConfigMap e... (562964)
Apr 12 10:11:04 node2 kubelet[7793]: W0412 10:11:04.635497    7793 reflector.go:270] object-"kube-system"/"kube-flannel-cfg": watch of *v1.Co... (563510)
Apr 12 10:12:23 node2 kubelet[7793]: W0412 10:12:23.593624    7793 reflector.go:270] object-"kube-system"/"kube-proxy": watch of *v1.ConfigMa... (563619)
Apr 12 10:24:09 node2 kubelet[7793]: W0412 10:24:09.875061    7793 reflector.go:270] object-"kube-system"/"coredns": watch of *v1.ConfigMap e... (564637)
Apr 12 10:26:55 node2 kubelet[7793]: W0412 10:26:55.642788    7793 reflector.go:270] object-"kube-system"/"kube-proxy": watch of *v1.ConfigMa... (564886)
Apr 12 10:28:14 node2 kubelet[7793]: W0412 10:28:14.693489    7793 reflector.go:270] object-"kube-system"/"kube-flannel-cfg": watch of *v1.Co... (564992)
Apr 12 10:43:12 node2 kubelet[7793]: W0412 10:43:12.893306    7793 reflector.go:270] object-"kube-system"/"coredns": watch of *v1.ConfigMap e... (566287)
Apr 12 10:43:37 node2 kubelet[7793]: W0412 10:43:37.662130    7793 reflector.go:270] object-"kube-system"/"kube-proxy": watch of *v1.ConfigMa... (566320)
Hint: Some lines were ellipsized, use -l to show in full.

或者

[root@centos ~]# journalctl -xeu kubelet
Apr 12 10:46:53 centos.master kubelet[9787]: E0412 10:46:53.510165    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:53 centos.master kubelet[9787]: E0412 10:46:53.610691    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:53 centos.master kubelet[9787]: E0412 10:46:53.711008    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:53 centos.master kubelet[9787]: E0412 10:46:53.811468    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:53 centos.master kubelet[9787]: I0412 10:46:53.883382    9787 kubelet_node_status.go:278] Setting node annotation to enable volume controlle
Apr 12 10:46:53 centos.master kubelet[9787]: E0412 10:46:53.912065    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:53 centos.master kubelet[9787]: I0412 10:46:53.914043    9787 kubelet_node_status.go:72] Attempting to register node centos.master
Apr 12 10:46:53 centos.master kubelet[9787]: E0412 10:46:53.916659    9787 kubelet_node_status.go:94] Unable to register node "centos.master" with API se
Apr 12 10:46:54 centos.master kubelet[9787]: E0412 10:46:54.012363    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:54 centos.master kubelet[9787]: E0412 10:46:54.113003    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:54 centos.master kubelet[9787]: I0412 10:46:54.147210    9787 kubelet_node_status.go:278] Setting node annotation to enable volume controlle
Apr 12 10:46:54 centos.master kubelet[9787]: E0412 10:46:54.213291    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:54 centos.master kubelet[9787]: E0412 10:46:54.313616    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:54 centos.master kubelet[9787]: E0412 10:46:54.413970    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:54 centos.master kubelet[9787]: E0412 10:46:54.514292    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:54 centos.master kubelet[9787]: E0412 10:46:54.615167    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:54 centos.master kubelet[9787]: E0412 10:46:54.715863    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:54 centos.master kubelet[9787]: E0412 10:46:54.816154    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:54 centos.master kubelet[9787]: E0412 10:46:54.916432    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:55 centos.master kubelet[9787]: E0412 10:46:55.017040    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:55 centos.master kubelet[9787]: E0412 10:46:55.117863    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:55 centos.master kubelet[9787]: E0412 10:46:55.218694    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:55 centos.master kubelet[9787]: E0412 10:46:55.319663    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:55 centos.master kubelet[9787]: E0412 10:46:55.420254    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:55 centos.master kubelet[9787]: E0412 10:46:55.521053    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:55 centos.master kubelet[9787]: E0412 10:46:55.621575    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:55 centos.master kubelet[9787]: E0412 10:46:55.722435    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:55 centos.master kubelet[9787]: E0412 10:46:55.823464    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:55 centos.master kubelet[9787]: E0412 10:46:55.924273    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:56 centos.master kubelet[9787]: E0412 10:46:56.024392    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:56 centos.master kubelet[9787]: E0412 10:46:56.125129    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:56 centos.master kubelet[9787]: I0412 10:46:56.146767    9787 kubelet_node_status.go:278] Setting node annotation to enable volume controlle
Apr 12 10:46:56 centos.master kubelet[9787]: E0412 10:46:56.225839    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:56 centos.master kubelet[9787]: E0412 10:46:56.326354    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:56 centos.master kubelet[9787]: E0412 10:46:56.427552    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:56 centos.master kubelet[9787]: E0412 10:46:56.528289    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:56 centos.master kubelet[9787]: E0412 10:46:56.628843    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:56 centos.master kubelet[9787]: E0412 10:46:56.729056    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:56 centos.master kubelet[9787]: E0412 10:46:56.829340    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:56 centos.master kubelet[9787]: E0412 10:46:56.929690    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:57 centos.master kubelet[9787]: E0412 10:46:57.030373    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:57 centos.master kubelet[9787]: E0412 10:46:57.131158    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:57 centos.master kubelet[9787]: E0412 10:46:57.232373    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:57 centos.master kubelet[9787]: E0412 10:46:57.333084    9787 kubelet.go:2266] node "centos.master" not found
Apr 12 10:46:57 centos.master kubelet[9787]: E0412 10:46:57.433269    9787 kubelet.go:2266] node "centos.master" not found

上面的kubelet服务日志告诉我centos.master 的node找不到。

 好了到这里三板斧算是用完了。很简单的三板斧,只能用于基本排查。

  如果某个k8s对象存在问题而查看系统服务的日志,则我们可以用这个对象的名字作为关键字来搜索日志,在大多数情况下,我么平常所遇到的主要是与pod对象相关的问题,比如无法创建pod,pod启动后就停止或者Pod副本无法增加等。此时,我们可以先确定哪个pod在哪个节点上,然后登陆这个节点,从kubelet的日志中查询该pod的完整日志,然后进行问题排查。对于与pod扩容相关或者与RC相关的问题,则很有可能在kjbe-controller-manager及Kube-scheduler的日志中找出问题的关键点。

   另外kube-proxy经常被我们忽略,因为就算他停了,pod的状态依旧时正常的,但会导致某些服务访问异常。

 

你可能感兴趣的:(Kubernetes实践与问题,k8s,kubernetes,故障排除)