rancher 页面无法访问故障处理

因服务器内存不足，关机扩容内存并重启主机之后，rancher访问失败

image.png

解决办法：

1. 删除有问题的ingress规则（别问为啥是这个 因为没别的了 如果规则多 请倒着删） 
[root@i-5wa2ciao ~]# kubectl delete ingress -nkaishidongle    test-ingrress
ingress.extensions "test-ingrress" deleted

2. 重建ingress pod
[root@i-5wa2ciao ~]# kubectl delete po nginx-ingress-controller-89827  nginx-ingress-controller-pdvzj  nginx-ingress-controller-zd7fd  -ningress-nginx
pod "nginx-ingress-controller-89827" deleted
pod "nginx-ingress-controller-pdvzj" deleted
pod "nginx-ingress-controller-zd7fd" deleted

3. 验证
[root@i-5wa2ciao ~]# kubectl get po -ningress-nginx
NAME                                    READY   STATUS    RESTARTS   AGE
default-http-backend-598b7d7dbd-mbw6n   1/1     Running   0          41m
nginx-ingress-controller-d44jn          1/1     Running   0          9m29s
nginx-ingress-controller-dr5gr          1/1     Running   0          9m25s
nginx-ingress-controller-glf4x          1/1     Running   0          9m19s

排查过程：

查看ingress规则，确保rancher域名规则存在

[root@i-5wa2ciao ~]# kubectl get ingress -A
NAMESPACE       NAME            CLASS    HOSTS                   ADDRESS   PORTS     AGE
cattle-system   rancher            merancher.enncloud.cn             80, 443   140d
kaishidongle    test-ingrress      lmnginx.enncloud.cn               80        101d

查看ingress状态

[root@i-5wa2ciao ~]# kubectl get po -A|grep ingress
ingress-nginx               default-http-backend-598b7d7dbd-mbw6n                      1/1     Running            0          7m49s
ingress-nginx               nginx-ingress-controller-89827                             0/1     CrashLoopBackOff   6          7m44s
ingress-nginx               nginx-ingress-controller-pdvzj                             0/1     CrashLoopBackOff   6          7m41s
ingress-nginx               nginx-ingress-controller-zd7fd                             0/1     CrashLoopBackOff   6          7m40s

因为ingress 处于 CrashLoopBackOff 状态，使用descirbe 查看错误

[root@i-5wa2ciao .kube]# kubectl describe po -ningress-nginx               nginx-ingress-controller-48288
 .........
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  101s               default-scheduler  Successfully assigned ingress-nginx/nginx-ingress-controller-48288 to rancher-40-181
  Warning  Unhealthy  17s (x6 over 87s)  kubelet            Liveness probe failed: HTTP probe failed with statuscode: 500
  Normal   Killing    17s (x2 over 67s)  kubelet            Container nginx-ingress-controller failed liveness probe, will be restarted
  Warning  Unhealthy  11s (x8 over 91s)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 500
  Normal   Pulled     4s (x3 over 101s)  kubelet            Container image "rancher/nginx-ingress-controller:nginx-0.35.0-rancher2" already present on machine
  Normal   Created    4s (x3 over 101s)  kubelet            Created container nginx-ingress-controller
  Normal   Started    4s (x3 over 101s)  kubelet            Started container nginx-ingress-controller

没有获取有用信息，查看ingress 日志

I0616 09:39:18.759940       6 status.go:86] new leader elected: nginx-ingress-controller-48288
I0616 09:39:18.766025       6 status.go:208] runningAddresses: pod [nginx-ingress-controller-48288] on [rancher-40-181] is not ready
I0616 09:39:18.766039       6 status.go:208] runningAddresses: pod [nginx-ingress-controller-4knqx] on [rancher-40-185] is not ready
I0616 09:39:18.766044       6 status.go:208] runningAddresses: pod [nginx-ingress-controller-7kl82] on [rancher-40-179] is not ready
E0616 09:39:18.816189       6 controller.go:153] Unexpected failure reloading the backend:

-------------------------------------------------------------------------------
Error: exit status 1
2022/06/16 09:39:18 [emerg] 33#33: "proxy_http_version" directive is duplicate in /tmp/nginx-cfg111270477:554
nginx: [emerg] "proxy_http_version" directive is duplicate in /tmp/nginx-cfg111270477:554
nginx: configuration file /tmp/nginx-cfg111270477 test failed

-------------------------------------------------------------------------------
W0616 09:39:18.816207       6 queue.go:130] requeuing initial-sync, err 
-------------------------------------------------------------------------------
Error: exit status 1
2022/06/16 09:39:18 [emerg] 33#33: "proxy_http_version" directive is duplicate in /tmp/nginx-cfg111270477:554
nginx: [emerg] "proxy_http_version" directive is duplicate in /tmp/nginx-cfg111270477:554
nginx: configuration file /tmp/nginx-cfg111270477 test failed

-------------------------------------------------------------------------------
W0616 09:39:22.082672       6 controller.go:1163] SSL certificate for server "merancher.enncloud.cn" is about to expire (2022-06-20 08:01:06 +0000 UTC)
I0616 09:39:22.082752       6 controller.go:141] Configuration changes detected, backend reload required.
E0616 09:39:22.120857       6 controller.go:153] Unexpected failure reloading the backend:

-------------------------------------------------------------------------------
Error: exit status 1
2022/06/16 09:39:22 [emerg] 40#40: "proxy_http_version" directive is duplicate in /tmp/nginx-cfg838461768:554
nginx: [emerg] "proxy_http_version" directive is duplicate in /tmp/nginx-cfg838461768:554
nginx: configuration file /tmp/nginx-cfg838461768 test failed

-------------------------------------------------------------------------------
W0616 09:39:22.120873       6 queue.go:130] requeuing cattle-monitoring-system/pushprox-kube-proxy-client, err 
-------------------------------------------------------------------------------
Error: exit status 1
2022/06/16 09:39:22 [emerg] 40#40: "proxy_http_version" directive is duplicate in /tmp/nginx-cfg838461768:554
nginx: [emerg] "proxy_http_version" directive is duplicate in /tmp/nginx-cfg838461768:554
nginx: configuration file /tmp/nginx-cfg838461768 test failed

-------------------------------------------------------------------------------
W0616 09:39:25.416024       6 controller.go:1163] SSL certificate for server "merancher.enncloud.cn" is about to expire (2022-06-20 08:01:06 +0000 UTC)
I0616 09:39:25.416103       6 controller.go:141] Configuration changes detected, backend reload required.
E0616 09:39:25.452786       6 controller.go:153] Unexpected failure reloading the backend:

-------------------------------------------------------------------------------
Error: exit status 1
2022/06/16 09:39:25 [emerg] 48#48: "proxy_http_version" directive is duplicate in /tmp/nginx-cfg224385031:554
nginx: [emerg] "proxy_http_version" directive is duplicate in /tmp/nginx-cfg224385031:554
nginx: configuration file /tmp/nginx-cfg224385031 test failed

经查询得知出现此问题的原因为之前部署的某个服务ingress有问题，导致后部署的ingress无法生效，且重启nginx后拉取ingress配置错误启动失败，导致nginx所有服务无法代理
参考网络文章1
nginx ingress最后的倔强
解决办法

1. 查询nginx规则
[root@i-5wa2ciao ~]# kubectl get ingress -A
NAMESPACE       NAME            CLASS    HOSTS                   ADDRESS   PORTS     AGE
cattle-system   rancher            merancher.enncloud.cn             80, 443   140d
kaishidongle    test-ingrress      lmnginx.enncloud.cn               80        101d

2. 删除有问题的ingress规则（别问为啥是这个 因为没别的了 如果规则多 请倒着删） 
[root@i-5wa2ciao ~]# kubectl delete ingress -nkaishidongle    test-ingrress
ingress.extensions "test-ingrress" deleted

3. 重建ingress pod
[root@i-5wa2ciao ~]# kubectl delete po nginx-ingress-controller-89827  nginx-ingress-controller-pdvzj  nginx-ingress-controller-zd7fd  -ningress-nginx
pod "nginx-ingress-controller-89827" deleted
pod "nginx-ingress-controller-pdvzj" deleted
pod "nginx-ingress-controller-zd7fd" deleted

4. 验证
[root@i-5wa2ciao ~]# kubectl get po -ningress-nginx
NAME                                    READY   STATUS    RESTARTS   AGE
default-http-backend-598b7d7dbd-mbw6n   1/1     Running   0          41m
nginx-ingress-controller-d44jn          1/1     Running   0          9m29s
nginx-ingress-controller-dr5gr          1/1     Running   0          9m25s
nginx-ingress-controller-glf4x          1/1     Running   0          9m19s

页面访问正常

image.png

rancher 页面无法访问故障处理

解决办法：

排查过程：

你可能感兴趣的:(rancher 页面无法访问故障处理)