记一次故障-Rancher界面突然无法访问,且K8s集群无法访问

公司使用docker单节点方式部署rancher,利用rancehr来操作k8s集群,有一天要访问rancher ui时,发现打不开,然后部署的所有容器也都不能使用,立马到服务器上查看情况,发现rancher容器还在,然后尝试进入容器时,报了错cannot exec in a stopped state: unknown,然后尝试查看rancher日志,发现可以查看

E0712 15:47:03.730752       6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v3.ProjectCatalog: Get https://127.0.0.1:6443/apis/management.cattle.io/v3/watch/projectcatalogs?allowWatchBookmarks=true&resourceVersion=155367341&timeout=30m0s&timeoutSeconds=574: dial tcp 127.0.0.1:6443: connect: connection refused
E0712 15:47:03.730790       6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v3.Catalog: Get https://127.0.0.1:6443/apis/management.cattle.io/v3/watch/catalogs?allowWatchBookmarks=true&resourceVersion=155367339&timeout=30m0s&timeoutSeconds=404: dial tcp 127.0.0.1:6443: connect: connection refused
E0712 15:47:02.947639       6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v3.KontainerDriver: Get https://127.0.0.1:6443/apis/management.cattle.io/v3/watch/kontainerdrivers?allowWatchBookmarks=true&resourceVersion=155367345&timeout=30m0s&timeoutSeconds=481: dial tcp 127.0.0.1:6443: connect: connection refused
E0712 15:47:03.730823       6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v3.Pipeline: Get https://127.0.0.1:6443/apis/project.cattle.io/v3/watch/pipelines?allowWatchBookmarks=true&resourceVersion=155367348&timeout=30m0s&timeoutSeconds=568: dial tcp 127.0.0.1:6443: connect: connection refused
E0712 15:47:03.730842       6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v1.Namespace: Get https://127.0.0.1:6443/api/v1/watch/namespaces?allowWatchBookmarks=true&resourceVersion=155367325&timeout=30m0s&timeoutSeconds=449: dial tcp 127.0.0.1:6443: connect: connection refused
E0712 15:47:02.947667       6 reflector.go:307] github.com/rancher/norman/controller/generic_controller.go:229: Failed to watch *v3.RKEK8sSystemImage: Get https://127.0.0.1:6443/apis/management.cattle.io/v3/watch/rkek8ssystemimages?allowWatchBookmarks=true&resourceVersion=155367347&timeout=30m0s&timeoutSeconds=504: dial tcp 127.0.0.1:6443: connect: connection refused
2021/07/12 15:47:03 [FATAL] k3s exited with: exit status 255

通过日志发现,k3s exited with: exit status 255以及127.0.0.1:6443: connect: connection refused,因为6443是kube-apiserver所以估计应该是k8s集群的问题,然后查询了一下255这个状态,在githab上发现

记一次故障-Rancher界面突然无法访问,且K8s集群无法访问_第1张图片

下面有一个回复(利用chrome浏览器自动翻译)
记一次故障-Rancher界面突然无法访问,且K8s集群无法访问_第2张图片

以及另外一篇博文跟我的情况比较像
记一次故障-Rancher界面突然无法访问,且K8s集群无法访问_第3张图片

下面是一条回复

记一次故障-Rancher界面突然无法访问,且K8s集群无法访问_第4张图片

估计应该是k3s崩了,于是重启了一下对应机器,发现k3s正常运行了,但是rancher却没有启动,重启rancherdocker容器

docker resatrt rancher

发现443端口被占用

于是通过命令查找占用443端口的进程

netstat -tunlp|grep 443

发现是nginx占用了,但是这台机器并没有安装nginx,于是根据pid查看nginx所在位置

cd /proc/92922/cwd

发现有nginx配置,编辑nginx.conf发现有很多ingress-controller的配置,于是猜测这个nginxingress-controller容器的,于是查看ingress-controller的信息

docker inspect ingress_controller 

记一次故障-Rancher界面突然无法访问,且K8s集群无法访问_第5张图片

发现其确实占用了443端口,于是先停止ingress-controller,再启动rancher,再重启ingress_controller

docker stop ingress_controller
docker restart rancher
docker restart ingress_controller

问题解决

你可能感兴趣的:(k8s,服务器,k8s,运维)